Files
plant-disease-id/tasks/production-ml-pipeline/01-plantvillage-class-inventory.md
2026-06-08 16:42:04 -04:00

153 lines
6.9 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# 01. PlantVillage Class Inventory and Knowledge Base Mapping
meta:
id: production-ml-pipeline-01
feature: production-ml-pipeline
priority: P0
depends_on: []
tags: [data, mapping, research]
objective:
- Document all 38 PlantVillage model output classes
- Map each class index to a definitive disease ID in the knowledge base
- Identify which plants and diseases are missing from the KB and must be added
- Produce a complete, authoritative mapping file that subsequent tasks consume
deliverables:
- `src/lib/ml/plantvillage-classes.ts` — definitive mapping of all 38 class indices to structured metadata
- Updated `tasks/production-ml-pipeline/class-mapping-reference.md` — human-readable reference document
steps:
1. Document the canonical 38 PlantVillage class labels in order (index 037):
```
0: Apple___Apple_scab
1: Apple___Black_rot
2: Apple___Cedar_apple_rust
3: Apple___healthy
4: Blueberry___healthy
5: Cherry_(including_sour)___Powdery_mildew
6: Cherry_(including_sour)___healthy
7: Corn_(maize)___Cercospora_leaf_spot Gray_leaf_spot
8: Corn_(maize)___Common_rust_
9: Corn_(maize)___Northern_Leaf_Blight
10: Corn_(maize)___healthy
11: Grape___Black_rot
12: Grape___Esca_(Black_Measles)
13: Grape___Leaf_blight_(Isariopsis_Leaf_Spot)
14: Grape___healthy
15: Orange___Haunglongbing_(Citrus_greening)
16: Peach___Bacterial_spot
17: Peach___healthy
18: Pepper,_bell___Bacterial_spot
19: Pepper,_bell___healthy
20: Potato___Early_blight
21: Potato___Late_blight
22: Potato___healthy
23: Raspberry___healthy
24: Soybean___healthy
25: Squash___Powdery_mildew
26: Strawberry___Leaf_scorch
27: Strawberry___healthy
28: Tomato___Bacterial_spot
29: Tomato___Early_blight
30: Tomato___Late_blight
31: Tomato___Leaf_Mold
32: Tomato___Septoria_leaf_spot
33: Tomato___Spider_mites Two-spotted_spider_mite
34: Tomato___Target_Spot
35: Tomato___Tomato_Yellow_Leaf_Curl_Virus
36: Tomato___Tomato_mosaic_virus
37: Tomato___healthy
```
2. For each class, determine the mapping target:
- **Healthy classes** (13 total: indices 3, 4, 6, 10, 14, 17, 19, 22, 23, 24, 27, 37): map to a special `"healthy"` sentinel. These indicate the model detected no disease.
- **Disease classes with exact KB match**: map directly to existing disease ID.
- 28 → `bacterial-leaf-spot-tomato` (Tomato Bacterial_spot ≈ bacterial-leaf-spot-tomato)
- 29 → `early-blight`
- 30 → `late-blight`
- 32 → `septoria-leaf-spot`
- 25 → `squash-powdery-mildew`
- 26 → `strawberry-leaf-scorch`
- 18 → `pepper-bacterial-wilt` (closest match to Pepper Bacterial_spot)
- **Disease classes needing new KB entries** (no existing disease in our KB):
- 0: Apple_scab → new disease `apple-scab` under plant `apple`
- 1: Apple_black_rot → new disease `apple-black-rot` under plant `apple`
- 2: Apple_cedar_apple_rust → new disease `apple-cedar-apple-rust` under plant `apple`
- 5: Cherry_powdery_mildew → new disease `cherry-powdery-mildew` under plant `cherry`
- 7: Corn_cercospora_leaf_spot → new disease `corn-gray-leaf-spot` under plant `corn`
- 8: Corn_common_rust → new disease `corn-common-rust` under plant `corn`
- 9: Corn_northern_leaf_blight → new disease `corn-northern-leaf-blight` under plant `corn`
- 11: Grape_black_rot → new disease `grape-black-rot` under plant `grape`
- 12: Grape_esca → new disease `grape-esca` under plant `grape`
- 13: Grape_leaf_blight → new disease `grape-leaf-blight` under plant `grape`
- 15: Orange_huanglongbing → new disease `orange-citrus-greening` under plant `orange`
- 16: Peach_bacterial_spot → new disease `peach-bacterial-spot` under plant `peach`
- 20: Potato_early_blight → new disease `potato-early-blight` under plant `potato`
- 21: Potato_late_blight → new disease `potato-late-blight` under plant `potato`
- 31: Tomato_leaf_mold → new disease `tomato-leaf-mold` under plant `tomato`
- 33: Tomato_spider_mites → new disease `tomato-spider-mites` under plant `tomato`
- 34: Tomato_target_spot → new disease `tomato-target-spot` under plant `tomato`
- 35: Tomato_yellow_leaf_curl_virus → new disease `tomato-yellow-leaf-curl-virus` under plant `tomato`
- 36: Tomato_mosaic_virus → new disease `tomato-mosaic-virus` under plant `tomato`
3. Create the mapping type and data structure in `src/lib/ml/plantvillage-classes.ts`:
```typescript
export interface PlantVillageClass {
index: number;
rawLabel: string;
plantId: string; // KB plant slug
diseaseId: string | null; // null for healthy classes
isHealthy: boolean;
displayName: string; // human-readable disease name
}
export const PLANTVILLAGE_CLASSES: readonly PlantVillageClass[] = [ ... ];
```
4. For each class, also record:
- The PlantVillage plant name (e.g., "Tomato", "Apple")
- The target KB plantId (e.g., "tomato", "apple")
- The target KB diseaseId (e.g., "early-blight") or null for healthy
- Whether the disease needs to be added to the KB (boolean flag for task 02)
5. Verify the mapping covers all 38 indices with no gaps or duplicates.
tests:
- Unit: mapping has exactly 38 entries
- Unit: indices 037 are all present, no gaps
- Unit: each non-healthy entry has a non-null diseaseId
- Unit: each healthy entry has null diseaseId and isHealthy=true
- Unit: no duplicate diseaseIds across non-healthy entries
- Unit: all plantIds are valid slugs (lowercase, kebab-case)
acceptance_criteria:
- `src/lib/ml/plantvillage-classes.ts` exports `PLANTVILLAGE_CLASSES` array with exactly 38 entries
- Every index 037 maps to exactly one entry
- 13 entries are healthy (isHealthy=true, diseaseId=null)
- 25 entries are diseases with valid plantId and diseaseId
- Each entry includes rawLabel, plantId, diseaseId, displayName
- All new disease IDs follow kebab-case convention matching existing KB pattern
- Reference document `class-mapping-reference.md` lists all 38 classes with their KB mappings
validation:
- `npx vitest run src/lib/ml/plantvillage-classes.test.ts` — all mapping tests pass
- Manual review: each of the 25 disease entries maps to a plausible disease in our KB
notes:
- This task produces the authoritative mapping consumed by task 02 (KB expansion) and task 03 (label mapping)
- The PlantVillage class order is fixed by the model's training — do NOT reorder
- "Tomato Bacterial_spot" maps to our existing `bacterial-leaf-spot-tomato` — this is the closest match, not a perfect one
- "Pepper Bacterial_spot" maps to `pepper-bacterial-wilt` — imperfect but closest available match
- 10 new plants must be added to the KB: apple, blueberry, cherry, corn, grape, orange, peach, potato, raspberry, soybean
- Blueberry, Raspberry, Soybean only have "healthy" class — still need plant entries for context but no new disease entries