# 01. PlantVillage Class Inventory and Knowledge Base Mapping meta: id: production-ml-pipeline-01 feature: production-ml-pipeline priority: P0 depends_on: [] tags: [data, mapping, research] objective: - Document all 38 PlantVillage model output classes - Map each class index to a definitive disease ID in the knowledge base - Identify which plants and diseases are missing from the KB and must be added - Produce a complete, authoritative mapping file that subsequent tasks consume deliverables: - `src/lib/ml/plantvillage-classes.ts` — definitive mapping of all 38 class indices to structured metadata - Updated `tasks/production-ml-pipeline/class-mapping-reference.md` — human-readable reference document steps: 1. Document the canonical 38 PlantVillage class labels in order (index 0–37): ``` 0: Apple___Apple_scab 1: Apple___Black_rot 2: Apple___Cedar_apple_rust 3: Apple___healthy 4: Blueberry___healthy 5: Cherry_(including_sour)___Powdery_mildew 6: Cherry_(including_sour)___healthy 7: Corn_(maize)___Cercospora_leaf_spot Gray_leaf_spot 8: Corn_(maize)___Common_rust_ 9: Corn_(maize)___Northern_Leaf_Blight 10: Corn_(maize)___healthy 11: Grape___Black_rot 12: Grape___Esca_(Black_Measles) 13: Grape___Leaf_blight_(Isariopsis_Leaf_Spot) 14: Grape___healthy 15: Orange___Haunglongbing_(Citrus_greening) 16: Peach___Bacterial_spot 17: Peach___healthy 18: Pepper,_bell___Bacterial_spot 19: Pepper,_bell___healthy 20: Potato___Early_blight 21: Potato___Late_blight 22: Potato___healthy 23: Raspberry___healthy 24: Soybean___healthy 25: Squash___Powdery_mildew 26: Strawberry___Leaf_scorch 27: Strawberry___healthy 28: Tomato___Bacterial_spot 29: Tomato___Early_blight 30: Tomato___Late_blight 31: Tomato___Leaf_Mold 32: Tomato___Septoria_leaf_spot 33: Tomato___Spider_mites Two-spotted_spider_mite 34: Tomato___Target_Spot 35: Tomato___Tomato_Yellow_Leaf_Curl_Virus 36: Tomato___Tomato_mosaic_virus 37: Tomato___healthy ``` 2. For each class, determine the mapping target: - **Healthy classes** (13 total: indices 3, 4, 6, 10, 14, 17, 19, 22, 23, 24, 27, 37): map to a special `"healthy"` sentinel. These indicate the model detected no disease. - **Disease classes with exact KB match**: map directly to existing disease ID. - 28 → `bacterial-leaf-spot-tomato` (Tomato Bacterial_spot ≈ bacterial-leaf-spot-tomato) - 29 → `early-blight` - 30 → `late-blight` - 32 → `septoria-leaf-spot` - 25 → `squash-powdery-mildew` - 26 → `strawberry-leaf-scorch` - 18 → `pepper-bacterial-wilt` (closest match to Pepper Bacterial_spot) - **Disease classes needing new KB entries** (no existing disease in our KB): - 0: Apple_scab → new disease `apple-scab` under plant `apple` - 1: Apple_black_rot → new disease `apple-black-rot` under plant `apple` - 2: Apple_cedar_apple_rust → new disease `apple-cedar-apple-rust` under plant `apple` - 5: Cherry_powdery_mildew → new disease `cherry-powdery-mildew` under plant `cherry` - 7: Corn_cercospora_leaf_spot → new disease `corn-gray-leaf-spot` under plant `corn` - 8: Corn_common_rust → new disease `corn-common-rust` under plant `corn` - 9: Corn_northern_leaf_blight → new disease `corn-northern-leaf-blight` under plant `corn` - 11: Grape_black_rot → new disease `grape-black-rot` under plant `grape` - 12: Grape_esca → new disease `grape-esca` under plant `grape` - 13: Grape_leaf_blight → new disease `grape-leaf-blight` under plant `grape` - 15: Orange_huanglongbing → new disease `orange-citrus-greening` under plant `orange` - 16: Peach_bacterial_spot → new disease `peach-bacterial-spot` under plant `peach` - 20: Potato_early_blight → new disease `potato-early-blight` under plant `potato` - 21: Potato_late_blight → new disease `potato-late-blight` under plant `potato` - 31: Tomato_leaf_mold → new disease `tomato-leaf-mold` under plant `tomato` - 33: Tomato_spider_mites → new disease `tomato-spider-mites` under plant `tomato` - 34: Tomato_target_spot → new disease `tomato-target-spot` under plant `tomato` - 35: Tomato_yellow_leaf_curl_virus → new disease `tomato-yellow-leaf-curl-virus` under plant `tomato` - 36: Tomato_mosaic_virus → new disease `tomato-mosaic-virus` under plant `tomato` 3. Create the mapping type and data structure in `src/lib/ml/plantvillage-classes.ts`: ```typescript export interface PlantVillageClass { index: number; rawLabel: string; plantId: string; // KB plant slug diseaseId: string | null; // null for healthy classes isHealthy: boolean; displayName: string; // human-readable disease name } export const PLANTVILLAGE_CLASSES: readonly PlantVillageClass[] = [ ... ]; ``` 4. For each class, also record: - The PlantVillage plant name (e.g., "Tomato", "Apple") - The target KB plantId (e.g., "tomato", "apple") - The target KB diseaseId (e.g., "early-blight") or null for healthy - Whether the disease needs to be added to the KB (boolean flag for task 02) 5. Verify the mapping covers all 38 indices with no gaps or duplicates. tests: - Unit: mapping has exactly 38 entries - Unit: indices 0–37 are all present, no gaps - Unit: each non-healthy entry has a non-null diseaseId - Unit: each healthy entry has null diseaseId and isHealthy=true - Unit: no duplicate diseaseIds across non-healthy entries - Unit: all plantIds are valid slugs (lowercase, kebab-case) acceptance_criteria: - `src/lib/ml/plantvillage-classes.ts` exports `PLANTVILLAGE_CLASSES` array with exactly 38 entries - Every index 0–37 maps to exactly one entry - 13 entries are healthy (isHealthy=true, diseaseId=null) - 25 entries are diseases with valid plantId and diseaseId - Each entry includes rawLabel, plantId, diseaseId, displayName - All new disease IDs follow kebab-case convention matching existing KB pattern - Reference document `class-mapping-reference.md` lists all 38 classes with their KB mappings validation: - `npx vitest run src/lib/ml/plantvillage-classes.test.ts` — all mapping tests pass - Manual review: each of the 25 disease entries maps to a plausible disease in our KB notes: - This task produces the authoritative mapping consumed by task 02 (KB expansion) and task 03 (label mapping) - The PlantVillage class order is fixed by the model's training — do NOT reorder - "Tomato Bacterial_spot" maps to our existing `bacterial-leaf-spot-tomato` — this is the closest match, not a perfect one - "Pepper Bacterial_spot" maps to `pepper-bacterial-wilt` — imperfect but closest available match - 10 new plants must be added to the KB: apple, blueberry, cherry, corn, grape, orange, peach, potato, raspberry, soybean - Blueberry, Raspberry, Soybean only have "healthy" class — still need plant entries for context but no new disease entries