8.2 KiB
02. Label Mapping Layer Implementation
meta: id: production-ml-pipeline-02 feature: production-ml-pipeline priority: P0 depends_on: [production-ml-pipeline-01] tags: [implementation, knowledge-base, tests-required]
objective:
- Expand the knowledge base to cover all PlantVillage plants and diseases
- Rewrite
src/lib/ml/labels.tsto use the PlantVillage class mapping from task 01 - Ensure every model output index resolves to a valid KB disease or the "healthy" sentinel
- The label layer must be the single source of truth for model-index → disease mapping
deliverables:
- Updated
src/data/plants.json— 10 new PlantVillage plants added (apple, blueberry, cherry, corn, grape, orange, peach, potato, raspberry, soybean) - Updated
src/data/diseases.json— 19 new disease entries added for PlantVillage diseases not yet in KB src/lib/ml/labels.ts— fully rewritten to use PlantVillage class mappingsrc/lib/ml/labels.test.ts— updated to validate against new mappingscripts/seed-plantvillage-kb.ts— DB migration script to insert new plants and diseases into Turso
steps:
-
Add 10 new plants to
src/data/plants.json— each with proper metadata:// New plants needed (PlantVillage coverage): { id: "apple", commonName: "Apple", scientificName: "Malus domestica", family: "Rosaceae", category: "fruit" } { id: "cherry", commonName: "Cherry", scientificName: "Prunus avium", family: "Rosaceae", category: "fruit" } { id: "corn", commonName: "Corn (Maize)", scientificName: "Zea mays", family: "Poaceae", category: "vegetable" } { id: "grape", commonName: "Grape", scientificName: "Vitis vinifera", family: "Vitaceae", category: "fruit" } { id: "orange", commonName: "Orange", scientificName: "Citrus sinensis", family: "Rutaceae", category: "fruit" } { id: "peach", commonName: "Peach", scientificName: "Prunus persica", family: "Rosaceae", category: "fruit" } { id: "potato", commonName: "Potato", scientificName: "Solanum tuberosum", family: "Solanaceae", category: "vegetable" } { id: "blueberry", commonName: "Blueberry", scientificName: "Vaccinium corymbosum", family: "Ericaceae", category: "fruit" } { id: "raspberry", commonName: "Raspberry", scientificName: "Rubus idaeus", family: "Rosaceae", category: "fruit" } { id: "soybean", commonName: "Soybean", scientificName: "Glycine max", family: "Fabaceae", category: "vegetable" }- Add
imageUrlfor each (use Wikipedia pageimages, same pattern asfill-plant-images.ts) - Add
careSummaryfor each
- Add
-
Add 19 new diseases to
src/data/diseases.json— each with full structured data:- Use the template-based approach from
scripts/disease-templates.tswhere possible - Source disease details from:
- UW-Madison PDDC factsheets (pddc.wisc.edu)
- Cornell Plant Clinic (plantclinic.cornell.edu)
- University extension publications
- Each disease must have:
id,plantId,name,scientificName,causalAgentType,description,symptoms(≥3),causes(≥2),treatment(≥3),prevention(≥2),lookalikeDiseaseIds,severity,prevalence - New disease entries needed:
- apple-scab, apple-black-rot, apple-cedar-apple-rust (plant: apple)
- cherry-powdery-mildew (plant: cherry)
- corn-gray-leaf-spot, corn-common-rust, corn-northern-leaf-blight (plant: corn)
- grape-black-rot, grape-esca, grape-leaf-blight (plant: grape)
- orange-citrus-greening (plant: orange)
- peach-bacterial-spot (plant: peach)
- potato-early-blight, potato-late-blight (plant: potato)
- tomato-leaf-mold, tomato-spider-mites, tomato-target-spot, tomato-yellow-leaf-curl-virus, tomato-mosaic-virus (plant: tomato)
- Use programmatic approach: write a generator script that pulls from UW-Madison PDDC / Cornell factsheets and Wikipedia, following the same pattern as
scripts/generate-full-kb.ts
- Use the template-based approach from
-
Update lookalikeDiseaseIds — cross-reference within new diseases:
- Apple scab ↔ Apple black rot (both cause leaf spots on apple)
- Potato early blight ↔ Potato late blight (both affect potato foliage)
- Grape black rot ↔ Grape esca (both cause fruit rot)
- Tomato early blight ↔ Tomato septoria leaf spot ↔ Tomato target spot (all cause leaf lesions)
- Tomato leaf mold ↔ Tomato septoria leaf spot (both cause leaf spots in humid conditions)
-
Rewrite
src/lib/ml/labels.tsto use the PlantVillage mapping:import { PLANTVILLAGE_CLASSES } from "./plantvillage-classes"; // Total output classes from model export const NUM_CLASSES = 38; // Index 0–37 → disease lookup export function getDiseaseIdForIndex(index: number): string { const entry = PLANTVILLAGE_CLASSES[index]; if (!entry || entry.isHealthy) return "healthy"; return entry.diseaseId; } export function getPlantIdForIndex(index: number): string { return PLANTVILLAGE_CLASSES[index]?.plantId ?? "unknown"; } export function isHealthyClass(index: number): boolean { return PLANTVILLAGE_CLASSES[index]?.isHealthy ?? false; } // Disease ID → index (for reverse lookup) export function getIndexForDiseaseId(diseaseId: string): number { const entry = PLANTVILLAGE_CLASSES.find((c) => c.diseaseId === diseaseId.toLowerCase()); return entry?.index ?? -1; } -
Remove old assumptions — the old labels.ts assumed 95 classes (93 diseases + healthy + unknown). Delete all references to
diseases.jsonindex ordering from labels.ts. The mapping is now defined byplantvillage-classes.ts, not by JSON file order. -
Create DB migration script
scripts/seed-plantvillage-kb.ts:- Read updated
src/data/plants.jsonandsrc/data/diseases.json - Insert new plants and diseases into Turso DB using Drizzle ORM
- Use UPSERT (INSERT OR REPLACE) to be idempotent
- Log what was inserted/updated
- Read updated
-
Run the migration to populate the DB with new data.
tests:
- Unit:
labels.test.tsvalidates all 38 indices map correctly - Unit:
getDiseaseIdForIndex(29)returns"early-blight" - Unit:
getDiseaseIdForIndex(3)returns"healthy"(Apple healthy class) - Unit:
getIndexForDiseaseId("early-blight")returns29 - Unit:
isHealthyClass(37)returnstrue(Tomato healthy) - Unit:
isHealthyClass(29)returnsfalse(Tomato Early_blight) - Unit:
getPlantIdForIndex(0)returns"apple" - Unit: All 25 non-healthy diseaseIds resolve to real DB entries via
getDiseaseById() - Integration:
scripts/seed-plantvillage-kb.tsruns without errors, inserts all 10 plants and 19 diseases - Integration: After seeding, DB query for each new disease returns a complete record
acceptance_criteria:
PLANTVILLAGE_CLASSESin labels.ts has exactly 38 entries matching model output order- 13 healthy indices correctly return "healthy" from
getDiseaseIdForIndex() - 25 disease indices correctly return valid diseaseIds
- All 10 new plants exist in
src/data/plants.jsonwith valid metadata and imageUrl - All 19 new diseases exist in
src/data/diseases.jsonwith full structured data (symptoms, treatment, prevention, etc.) - DB migration script runs successfully, all new data queryable from Turso
- Old
diseases.jsonordering assumption is completely removed from labels.ts - All existing tests still pass (no regressions in browse, search, detail pages)
validation:
npx vitest run src/lib/ml/labels.test.tsnpx vitest run src/lib/ml/plantvillage-classes.test.tsnpx tsx scripts/seed-plantvillage-kb.ts— verify output shows correct insertsnpx vitest run— full test suite passes- Manual: query DB for each new plant/disease and verify complete data
notes:
- Disease data must come from authoritative sources (university extension services), not hand-written
- Use the same template-based generation approach from
scripts/generate-full-kb.tsfor consistency - The
pepper-bacterial-wiltdisease already exists — map Pepper___Bacterial_spot to it even though it's not a perfect match (it's the closest available) - Blueberry, Raspberry, and Soybean only have "healthy" classes in PlantVillage — add plant entries but no disease entries for these (they don't need new disease IDs since they always map to "healthy")
- Total disease count after this task: 93 (existing) + 19 (new) = 112 diseases