This commit is contained in:
2026-06-06 15:09:46 -04:00
parent 78220d3568
commit 06295c83ca
56 changed files with 12018 additions and 440 deletions

View File

@@ -0,0 +1,152 @@
# 01. PlantVillage Class Inventory and Knowledge Base Mapping
meta:
id: production-ml-pipeline-01
feature: production-ml-pipeline
priority: P0
depends_on: []
tags: [data, mapping, research]
objective:
- Document all 38 PlantVillage model output classes
- Map each class index to a definitive disease ID in the knowledge base
- Identify which plants and diseases are missing from the KB and must be added
- Produce a complete, authoritative mapping file that subsequent tasks consume
deliverables:
- `src/lib/ml/plantvillage-classes.ts` — definitive mapping of all 38 class indices to structured metadata
- Updated `tasks/production-ml-pipeline/class-mapping-reference.md` — human-readable reference document
steps:
1. Document the canonical 38 PlantVillage class labels in order (index 037):
```
0: Apple___Apple_scab
1: Apple___Black_rot
2: Apple___Cedar_apple_rust
3: Apple___healthy
4: Blueberry___healthy
5: Cherry_(including_sour)___Powdery_mildew
6: Cherry_(including_sour)___healthy
7: Corn_(maize)___Cercospora_leaf_spot Gray_leaf_spot
8: Corn_(maize)___Common_rust_
9: Corn_(maize)___Northern_Leaf_Blight
10: Corn_(maize)___healthy
11: Grape___Black_rot
12: Grape___Esca_(Black_Measles)
13: Grape___Leaf_blight_(Isariopsis_Leaf_Spot)
14: Grape___healthy
15: Orange___Haunglongbing_(Citrus_greening)
16: Peach___Bacterial_spot
17: Peach___healthy
18: Pepper,_bell___Bacterial_spot
19: Pepper,_bell___healthy
20: Potato___Early_blight
21: Potato___Late_blight
22: Potato___healthy
23: Raspberry___healthy
24: Soybean___healthy
25: Squash___Powdery_mildew
26: Strawberry___Leaf_scorch
27: Strawberry___healthy
28: Tomato___Bacterial_spot
29: Tomato___Early_blight
30: Tomato___Late_blight
31: Tomato___Leaf_Mold
32: Tomato___Septoria_leaf_spot
33: Tomato___Spider_mites Two-spotted_spider_mite
34: Tomato___Target_Spot
35: Tomato___Tomato_Yellow_Leaf_Curl_Virus
36: Tomato___Tomato_mosaic_virus
37: Tomato___healthy
```
2. For each class, determine the mapping target:
- **Healthy classes** (13 total: indices 3, 4, 6, 10, 14, 17, 19, 22, 23, 24, 27, 37): map to a special `"healthy"` sentinel. These indicate the model detected no disease.
- **Disease classes with exact KB match**: map directly to existing disease ID.
- 28 → `bacterial-leaf-spot-tomato` (Tomato Bacterial_spot ≈ bacterial-leaf-spot-tomato)
- 29 → `early-blight`
- 30 → `late-blight`
- 32 → `septoria-leaf-spot`
- 25 → `squash-powdery-mildew`
- 26 → `strawberry-leaf-scorch`
- 18 → `pepper-bacterial-wilt` (closest match to Pepper Bacterial_spot)
- **Disease classes needing new KB entries** (no existing disease in our KB):
- 0: Apple_scab → new disease `apple-scab` under plant `apple`
- 1: Apple_black_rot → new disease `apple-black-rot` under plant `apple`
- 2: Apple_cedar_apple_rust → new disease `apple-cedar-apple-rust` under plant `apple`
- 5: Cherry_powdery_mildew → new disease `cherry-powdery-mildew` under plant `cherry`
- 7: Corn_cercospora_leaf_spot → new disease `corn-gray-leaf-spot` under plant `corn`
- 8: Corn_common_rust → new disease `corn-common-rust` under plant `corn`
- 9: Corn_northern_leaf_blight → new disease `corn-northern-leaf-blight` under plant `corn`
- 11: Grape_black_rot → new disease `grape-black-rot` under plant `grape`
- 12: Grape_esca → new disease `grape-esca` under plant `grape`
- 13: Grape_leaf_blight → new disease `grape-leaf-blight` under plant `grape`
- 15: Orange_huanglongbing → new disease `orange-citrus-greening` under plant `orange`
- 16: Peach_bacterial_spot → new disease `peach-bacterial-spot` under plant `peach`
- 20: Potato_early_blight → new disease `potato-early-blight` under plant `potato`
- 21: Potato_late_blight → new disease `potato-late-blight` under plant `potato`
- 31: Tomato_leaf_mold → new disease `tomato-leaf-mold` under plant `tomato`
- 33: Tomato_spider_mites → new disease `tomato-spider-mites` under plant `tomato`
- 34: Tomato_target_spot → new disease `tomato-target-spot` under plant `tomato`
- 35: Tomato_yellow_leaf_curl_virus → new disease `tomato-yellow-leaf-curl-virus` under plant `tomato`
- 36: Tomato_mosaic_virus → new disease `tomato-mosaic-virus` under plant `tomato`
3. Create the mapping type and data structure in `src/lib/ml/plantvillage-classes.ts`:
```typescript
export interface PlantVillageClass {
index: number;
rawLabel: string;
plantId: string; // KB plant slug
diseaseId: string | null; // null for healthy classes
isHealthy: boolean;
displayName: string; // human-readable disease name
}
export const PLANTVILLAGE_CLASSES: readonly PlantVillageClass[] = [ ... ];
```
4. For each class, also record:
- The PlantVillage plant name (e.g., "Tomato", "Apple")
- The target KB plantId (e.g., "tomato", "apple")
- The target KB diseaseId (e.g., "early-blight") or null for healthy
- Whether the disease needs to be added to the KB (boolean flag for task 02)
5. Verify the mapping covers all 38 indices with no gaps or duplicates.
tests:
- Unit: mapping has exactly 38 entries
- Unit: indices 037 are all present, no gaps
- Unit: each non-healthy entry has a non-null diseaseId
- Unit: each healthy entry has null diseaseId and isHealthy=true
- Unit: no duplicate diseaseIds across non-healthy entries
- Unit: all plantIds are valid slugs (lowercase, kebab-case)
acceptance_criteria:
- `src/lib/ml/plantvillage-classes.ts` exports `PLANTVILLAGE_CLASSES` array with exactly 38 entries
- Every index 037 maps to exactly one entry
- 13 entries are healthy (isHealthy=true, diseaseId=null)
- 25 entries are diseases with valid plantId and diseaseId
- Each entry includes rawLabel, plantId, diseaseId, displayName
- All new disease IDs follow kebab-case convention matching existing KB pattern
- Reference document `class-mapping-reference.md` lists all 38 classes with their KB mappings
validation:
- `npx vitest run src/lib/ml/plantvillage-classes.test.ts` — all mapping tests pass
- Manual review: each of the 25 disease entries maps to a plausible disease in our KB
notes:
- This task produces the authoritative mapping consumed by task 02 (KB expansion) and task 03 (label mapping)
- The PlantVillage class order is fixed by the model's training — do NOT reorder
- "Tomato Bacterial_spot" maps to our existing `bacterial-leaf-spot-tomato` — this is the closest match, not a perfect one
- "Pepper Bacterial_spot" maps to `pepper-bacterial-wilt` — imperfect but closest available match
- 10 new plants must be added to the KB: apple, blueberry, cherry, corn, grape, orange, peach, potato, raspberry, soybean
- Blueberry, Raspberry, Soybean only have "healthy" class — still need plant entries for context but no new disease entries