Files
plant-disease-id/apps/web/tasks/production-ml-pipeline/01-plantvillage-class-inventory.md
2026-06-06 15:09:46 -04:00

6.9 KiB
Raw Blame History

01. PlantVillage Class Inventory and Knowledge Base Mapping

meta: id: production-ml-pipeline-01 feature: production-ml-pipeline priority: P0 depends_on: [] tags: [data, mapping, research]

objective:

  • Document all 38 PlantVillage model output classes
  • Map each class index to a definitive disease ID in the knowledge base
  • Identify which plants and diseases are missing from the KB and must be added
  • Produce a complete, authoritative mapping file that subsequent tasks consume

deliverables:

  • src/lib/ml/plantvillage-classes.ts — definitive mapping of all 38 class indices to structured metadata
  • Updated tasks/production-ml-pipeline/class-mapping-reference.md — human-readable reference document

steps:

  1. Document the canonical 38 PlantVillage class labels in order (index 037):

    0:  Apple___Apple_scab
    1:  Apple___Black_rot
    2:  Apple___Cedar_apple_rust
    3:  Apple___healthy
    4:  Blueberry___healthy
    5:  Cherry_(including_sour)___Powdery_mildew
    6:  Cherry_(including_sour)___healthy
    7:  Corn_(maize)___Cercospora_leaf_spot Gray_leaf_spot
    8:  Corn_(maize)___Common_rust_
    9:  Corn_(maize)___Northern_Leaf_Blight
    10: Corn_(maize)___healthy
    11: Grape___Black_rot
    12: Grape___Esca_(Black_Measles)
    13: Grape___Leaf_blight_(Isariopsis_Leaf_Spot)
    14: Grape___healthy
    15: Orange___Haunglongbing_(Citrus_greening)
    16: Peach___Bacterial_spot
    17: Peach___healthy
    18: Pepper,_bell___Bacterial_spot
    19: Pepper,_bell___healthy
    20: Potato___Early_blight
    21: Potato___Late_blight
    22: Potato___healthy
    23: Raspberry___healthy
    24: Soybean___healthy
    25: Squash___Powdery_mildew
    26: Strawberry___Leaf_scorch
    27: Strawberry___healthy
    28: Tomato___Bacterial_spot
    29: Tomato___Early_blight
    30: Tomato___Late_blight
    31: Tomato___Leaf_Mold
    32: Tomato___Septoria_leaf_spot
    33: Tomato___Spider_mites Two-spotted_spider_mite
    34: Tomato___Target_Spot
    35: Tomato___Tomato_Yellow_Leaf_Curl_Virus
    36: Tomato___Tomato_mosaic_virus
    37: Tomato___healthy
    
  2. For each class, determine the mapping target:

    • Healthy classes (13 total: indices 3, 4, 6, 10, 14, 17, 19, 22, 23, 24, 27, 37): map to a special "healthy" sentinel. These indicate the model detected no disease.
    • Disease classes with exact KB match: map directly to existing disease ID.
      • 28 → bacterial-leaf-spot-tomato (Tomato Bacterial_spot ≈ bacterial-leaf-spot-tomato)
      • 29 → early-blight
      • 30 → late-blight
      • 32 → septoria-leaf-spot
      • 25 → squash-powdery-mildew
      • 26 → strawberry-leaf-scorch
      • 18 → pepper-bacterial-wilt (closest match to Pepper Bacterial_spot)
    • Disease classes needing new KB entries (no existing disease in our KB):
      • 0: Apple_scab → new disease apple-scab under plant apple
      • 1: Apple_black_rot → new disease apple-black-rot under plant apple
      • 2: Apple_cedar_apple_rust → new disease apple-cedar-apple-rust under plant apple
      • 5: Cherry_powdery_mildew → new disease cherry-powdery-mildew under plant cherry
      • 7: Corn_cercospora_leaf_spot → new disease corn-gray-leaf-spot under plant corn
      • 8: Corn_common_rust → new disease corn-common-rust under plant corn
      • 9: Corn_northern_leaf_blight → new disease corn-northern-leaf-blight under plant corn
      • 11: Grape_black_rot → new disease grape-black-rot under plant grape
      • 12: Grape_esca → new disease grape-esca under plant grape
      • 13: Grape_leaf_blight → new disease grape-leaf-blight under plant grape
      • 15: Orange_huanglongbing → new disease orange-citrus-greening under plant orange
      • 16: Peach_bacterial_spot → new disease peach-bacterial-spot under plant peach
      • 20: Potato_early_blight → new disease potato-early-blight under plant potato
      • 21: Potato_late_blight → new disease potato-late-blight under plant potato
      • 31: Tomato_leaf_mold → new disease tomato-leaf-mold under plant tomato
      • 33: Tomato_spider_mites → new disease tomato-spider-mites under plant tomato
      • 34: Tomato_target_spot → new disease tomato-target-spot under plant tomato
      • 35: Tomato_yellow_leaf_curl_virus → new disease tomato-yellow-leaf-curl-virus under plant tomato
      • 36: Tomato_mosaic_virus → new disease tomato-mosaic-virus under plant tomato
  3. Create the mapping type and data structure in src/lib/ml/plantvillage-classes.ts:

    export interface PlantVillageClass {
      index: number;
      rawLabel: string;
      plantId: string;        // KB plant slug
      diseaseId: string | null; // null for healthy classes
      isHealthy: boolean;
      displayName: string;     // human-readable disease name
    }
    
    export const PLANTVILLAGE_CLASSES: readonly PlantVillageClass[] = [ ... ];
    
  4. For each class, also record:

    • The PlantVillage plant name (e.g., "Tomato", "Apple")
    • The target KB plantId (e.g., "tomato", "apple")
    • The target KB diseaseId (e.g., "early-blight") or null for healthy
    • Whether the disease needs to be added to the KB (boolean flag for task 02)
  5. Verify the mapping covers all 38 indices with no gaps or duplicates.

tests:

  • Unit: mapping has exactly 38 entries
  • Unit: indices 037 are all present, no gaps
  • Unit: each non-healthy entry has a non-null diseaseId
  • Unit: each healthy entry has null diseaseId and isHealthy=true
  • Unit: no duplicate diseaseIds across non-healthy entries
  • Unit: all plantIds are valid slugs (lowercase, kebab-case)

acceptance_criteria:

  • src/lib/ml/plantvillage-classes.ts exports PLANTVILLAGE_CLASSES array with exactly 38 entries
  • Every index 037 maps to exactly one entry
  • 13 entries are healthy (isHealthy=true, diseaseId=null)
  • 25 entries are diseases with valid plantId and diseaseId
  • Each entry includes rawLabel, plantId, diseaseId, displayName
  • All new disease IDs follow kebab-case convention matching existing KB pattern
  • Reference document class-mapping-reference.md lists all 38 classes with their KB mappings

validation:

  • npx vitest run src/lib/ml/plantvillage-classes.test.ts — all mapping tests pass
  • Manual review: each of the 25 disease entries maps to a plausible disease in our KB

notes:

  • This task produces the authoritative mapping consumed by task 02 (KB expansion) and task 03 (label mapping)
  • The PlantVillage class order is fixed by the model's training — do NOT reorder
  • "Tomato Bacterial_spot" maps to our existing bacterial-leaf-spot-tomato — this is the closest match, not a perfect one
  • "Pepper Bacterial_spot" maps to pepper-bacterial-wilt — imperfect but closest available match
  • 10 new plants must be added to the KB: apple, blueberry, cherry, corn, grape, orange, peach, potato, raspberry, soybean
  • Blueberry, Raspberry, Soybean only have "healthy" class — still need plant entries for context but no new disease entries