Files
plant-disease-id/apps/web/tasks/production-ml-pipeline/02-label-mapping-implementation.md
2026-06-06 15:09:46 -04:00

8.2 KiB
Raw Blame History

02. Label Mapping Layer Implementation

meta: id: production-ml-pipeline-02 feature: production-ml-pipeline priority: P0 depends_on: [production-ml-pipeline-01] tags: [implementation, knowledge-base, tests-required]

objective:

  • Expand the knowledge base to cover all PlantVillage plants and diseases
  • Rewrite src/lib/ml/labels.ts to use the PlantVillage class mapping from task 01
  • Ensure every model output index resolves to a valid KB disease or the "healthy" sentinel
  • The label layer must be the single source of truth for model-index → disease mapping

deliverables:

  • Updated src/data/plants.json — 10 new PlantVillage plants added (apple, blueberry, cherry, corn, grape, orange, peach, potato, raspberry, soybean)
  • Updated src/data/diseases.json — 19 new disease entries added for PlantVillage diseases not yet in KB
  • src/lib/ml/labels.ts — fully rewritten to use PlantVillage class mapping
  • src/lib/ml/labels.test.ts — updated to validate against new mapping
  • scripts/seed-plantvillage-kb.ts — DB migration script to insert new plants and diseases into Turso

steps:

  1. Add 10 new plants to src/data/plants.json — each with proper metadata:

    // New plants needed (PlantVillage coverage):
    { id: "apple", commonName: "Apple", scientificName: "Malus domestica", family: "Rosaceae", category: "fruit" }
    { id: "cherry", commonName: "Cherry", scientificName: "Prunus avium", family: "Rosaceae", category: "fruit" }
    { id: "corn", commonName: "Corn (Maize)", scientificName: "Zea mays", family: "Poaceae", category: "vegetable" }
    { id: "grape", commonName: "Grape", scientificName: "Vitis vinifera", family: "Vitaceae", category: "fruit" }
    { id: "orange", commonName: "Orange", scientificName: "Citrus sinensis", family: "Rutaceae", category: "fruit" }
    { id: "peach", commonName: "Peach", scientificName: "Prunus persica", family: "Rosaceae", category: "fruit" }
    { id: "potato", commonName: "Potato", scientificName: "Solanum tuberosum", family: "Solanaceae", category: "vegetable" }
    { id: "blueberry", commonName: "Blueberry", scientificName: "Vaccinium corymbosum", family: "Ericaceae", category: "fruit" }
    { id: "raspberry", commonName: "Raspberry", scientificName: "Rubus idaeus", family: "Rosaceae", category: "fruit" }
    { id: "soybean", commonName: "Soybean", scientificName: "Glycine max", family: "Fabaceae", category: "vegetable" }
    
    • Add imageUrl for each (use Wikipedia pageimages, same pattern as fill-plant-images.ts)
    • Add careSummary for each
  2. Add 19 new diseases to src/data/diseases.json — each with full structured data:

    • Use the template-based approach from scripts/disease-templates.ts where possible
    • Source disease details from:
      • UW-Madison PDDC factsheets (pddc.wisc.edu)
      • Cornell Plant Clinic (plantclinic.cornell.edu)
      • University extension publications
    • Each disease must have: id, plantId, name, scientificName, causalAgentType, description, symptoms (≥3), causes (≥2), treatment (≥3), prevention (≥2), lookalikeDiseaseIds, severity, prevalence
    • New disease entries needed:
      • apple-scab, apple-black-rot, apple-cedar-apple-rust (plant: apple)
      • cherry-powdery-mildew (plant: cherry)
      • corn-gray-leaf-spot, corn-common-rust, corn-northern-leaf-blight (plant: corn)
      • grape-black-rot, grape-esca, grape-leaf-blight (plant: grape)
      • orange-citrus-greening (plant: orange)
      • peach-bacterial-spot (plant: peach)
      • potato-early-blight, potato-late-blight (plant: potato)
      • tomato-leaf-mold, tomato-spider-mites, tomato-target-spot, tomato-yellow-leaf-curl-virus, tomato-mosaic-virus (plant: tomato)
    • Use programmatic approach: write a generator script that pulls from UW-Madison PDDC / Cornell factsheets and Wikipedia, following the same pattern as scripts/generate-full-kb.ts
  3. Update lookalikeDiseaseIds — cross-reference within new diseases:

    • Apple scab ↔ Apple black rot (both cause leaf spots on apple)
    • Potato early blight ↔ Potato late blight (both affect potato foliage)
    • Grape black rot ↔ Grape esca (both cause fruit rot)
    • Tomato early blight ↔ Tomato septoria leaf spot ↔ Tomato target spot (all cause leaf lesions)
    • Tomato leaf mold ↔ Tomato septoria leaf spot (both cause leaf spots in humid conditions)
  4. Rewrite src/lib/ml/labels.ts to use the PlantVillage mapping:

    import { PLANTVILLAGE_CLASSES } from "./plantvillage-classes";
    
    // Total output classes from model
    export const NUM_CLASSES = 38;
    
    // Index 037 → disease lookup
    export function getDiseaseIdForIndex(index: number): string {
      const entry = PLANTVILLAGE_CLASSES[index];
      if (!entry || entry.isHealthy) return "healthy";
      return entry.diseaseId;
    }
    
    export function getPlantIdForIndex(index: number): string {
      return PLANTVILLAGE_CLASSES[index]?.plantId ?? "unknown";
    }
    
    export function isHealthyClass(index: number): boolean {
      return PLANTVILLAGE_CLASSES[index]?.isHealthy ?? false;
    }
    
    // Disease ID → index (for reverse lookup)
    export function getIndexForDiseaseId(diseaseId: string): number {
      const entry = PLANTVILLAGE_CLASSES.find((c) => c.diseaseId === diseaseId.toLowerCase());
      return entry?.index ?? -1;
    }
    
  5. Remove old assumptions — the old labels.ts assumed 95 classes (93 diseases + healthy + unknown). Delete all references to diseases.json index ordering from labels.ts. The mapping is now defined by plantvillage-classes.ts, not by JSON file order.

  6. Create DB migration script scripts/seed-plantvillage-kb.ts:

    • Read updated src/data/plants.json and src/data/diseases.json
    • Insert new plants and diseases into Turso DB using Drizzle ORM
    • Use UPSERT (INSERT OR REPLACE) to be idempotent
    • Log what was inserted/updated
  7. Run the migration to populate the DB with new data.

tests:

  • Unit: labels.test.ts validates all 38 indices map correctly
  • Unit: getDiseaseIdForIndex(29) returns "early-blight"
  • Unit: getDiseaseIdForIndex(3) returns "healthy" (Apple healthy class)
  • Unit: getIndexForDiseaseId("early-blight") returns 29
  • Unit: isHealthyClass(37) returns true (Tomato healthy)
  • Unit: isHealthyClass(29) returns false (Tomato Early_blight)
  • Unit: getPlantIdForIndex(0) returns "apple"
  • Unit: All 25 non-healthy diseaseIds resolve to real DB entries via getDiseaseById()
  • Integration: scripts/seed-plantvillage-kb.ts runs without errors, inserts all 10 plants and 19 diseases
  • Integration: After seeding, DB query for each new disease returns a complete record

acceptance_criteria:

  • PLANTVILLAGE_CLASSES in labels.ts has exactly 38 entries matching model output order
  • 13 healthy indices correctly return "healthy" from getDiseaseIdForIndex()
  • 25 disease indices correctly return valid diseaseIds
  • All 10 new plants exist in src/data/plants.json with valid metadata and imageUrl
  • All 19 new diseases exist in src/data/diseases.json with full structured data (symptoms, treatment, prevention, etc.)
  • DB migration script runs successfully, all new data queryable from Turso
  • Old diseases.json ordering assumption is completely removed from labels.ts
  • All existing tests still pass (no regressions in browse, search, detail pages)

validation:

  • npx vitest run src/lib/ml/labels.test.ts
  • npx vitest run src/lib/ml/plantvillage-classes.test.ts
  • npx tsx scripts/seed-plantvillage-kb.ts — verify output shows correct inserts
  • npx vitest run — full test suite passes
  • Manual: query DB for each new plant/disease and verify complete data

notes:

  • Disease data must come from authoritative sources (university extension services), not hand-written
  • Use the same template-based generation approach from scripts/generate-full-kb.ts for consistency
  • The pepper-bacterial-wilt disease already exists — map Pepper___Bacterial_spot to it even though it's not a perfect match (it's the closest available)
  • Blueberry, Raspberry, and Soybean only have "healthy" classes in PlantVillage — add plant entries but no disease entries for these (they don't need new disease IDs since they always map to "healthy")
  • Total disease count after this task: 93 (existing) + 19 (new) = 112 diseases