150 lines
8.2 KiB
Markdown
150 lines
8.2 KiB
Markdown
# 02. Label Mapping Layer Implementation
|
||
|
||
meta:
|
||
id: production-ml-pipeline-02
|
||
feature: production-ml-pipeline
|
||
priority: P0
|
||
depends_on: [production-ml-pipeline-01]
|
||
tags: [implementation, knowledge-base, tests-required]
|
||
|
||
objective:
|
||
|
||
- Expand the knowledge base to cover all PlantVillage plants and diseases
|
||
- Rewrite `src/lib/ml/labels.ts` to use the PlantVillage class mapping from task 01
|
||
- Ensure every model output index resolves to a valid KB disease or the "healthy" sentinel
|
||
- The label layer must be the single source of truth for model-index → disease mapping
|
||
|
||
deliverables:
|
||
|
||
- Updated `src/data/plants.json` — 10 new PlantVillage plants added (apple, blueberry, cherry, corn, grape, orange, peach, potato, raspberry, soybean)
|
||
- Updated `src/data/diseases.json` — 19 new disease entries added for PlantVillage diseases not yet in KB
|
||
- `src/lib/ml/labels.ts` — fully rewritten to use PlantVillage class mapping
|
||
- `src/lib/ml/labels.test.ts` — updated to validate against new mapping
|
||
- `scripts/seed-plantvillage-kb.ts` — DB migration script to insert new plants and diseases into Turso
|
||
|
||
steps:
|
||
|
||
1. **Add 10 new plants to `src/data/plants.json`** — each with proper metadata:
|
||
|
||
```typescript
|
||
// New plants needed (PlantVillage coverage):
|
||
{ id: "apple", commonName: "Apple", scientificName: "Malus domestica", family: "Rosaceae", category: "fruit" }
|
||
{ id: "cherry", commonName: "Cherry", scientificName: "Prunus avium", family: "Rosaceae", category: "fruit" }
|
||
{ id: "corn", commonName: "Corn (Maize)", scientificName: "Zea mays", family: "Poaceae", category: "vegetable" }
|
||
{ id: "grape", commonName: "Grape", scientificName: "Vitis vinifera", family: "Vitaceae", category: "fruit" }
|
||
{ id: "orange", commonName: "Orange", scientificName: "Citrus sinensis", family: "Rutaceae", category: "fruit" }
|
||
{ id: "peach", commonName: "Peach", scientificName: "Prunus persica", family: "Rosaceae", category: "fruit" }
|
||
{ id: "potato", commonName: "Potato", scientificName: "Solanum tuberosum", family: "Solanaceae", category: "vegetable" }
|
||
{ id: "blueberry", commonName: "Blueberry", scientificName: "Vaccinium corymbosum", family: "Ericaceae", category: "fruit" }
|
||
{ id: "raspberry", commonName: "Raspberry", scientificName: "Rubus idaeus", family: "Rosaceae", category: "fruit" }
|
||
{ id: "soybean", commonName: "Soybean", scientificName: "Glycine max", family: "Fabaceae", category: "vegetable" }
|
||
```
|
||
|
||
- Add `imageUrl` for each (use Wikipedia pageimages, same pattern as `fill-plant-images.ts`)
|
||
- Add `careSummary` for each
|
||
|
||
2. **Add 19 new diseases to `src/data/diseases.json`** — each with full structured data:
|
||
- Use the template-based approach from `scripts/disease-templates.ts` where possible
|
||
- Source disease details from:
|
||
- UW-Madison PDDC factsheets (pddc.wisc.edu)
|
||
- Cornell Plant Clinic (plantclinic.cornell.edu)
|
||
- University extension publications
|
||
- Each disease must have: `id`, `plantId`, `name`, `scientificName`, `causalAgentType`, `description`, `symptoms` (≥3), `causes` (≥2), `treatment` (≥3), `prevention` (≥2), `lookalikeDiseaseIds`, `severity`, `prevalence`
|
||
- New disease entries needed:
|
||
- apple-scab, apple-black-rot, apple-cedar-apple-rust (plant: apple)
|
||
- cherry-powdery-mildew (plant: cherry)
|
||
- corn-gray-leaf-spot, corn-common-rust, corn-northern-leaf-blight (plant: corn)
|
||
- grape-black-rot, grape-esca, grape-leaf-blight (plant: grape)
|
||
- orange-citrus-greening (plant: orange)
|
||
- peach-bacterial-spot (plant: peach)
|
||
- potato-early-blight, potato-late-blight (plant: potato)
|
||
- tomato-leaf-mold, tomato-spider-mites, tomato-target-spot, tomato-yellow-leaf-curl-virus, tomato-mosaic-virus (plant: tomato)
|
||
- Use programmatic approach: write a generator script that pulls from UW-Madison PDDC / Cornell factsheets and Wikipedia, following the same pattern as `scripts/generate-full-kb.ts`
|
||
|
||
3. **Update lookalikeDiseaseIds** — cross-reference within new diseases:
|
||
- Apple scab ↔ Apple black rot (both cause leaf spots on apple)
|
||
- Potato early blight ↔ Potato late blight (both affect potato foliage)
|
||
- Grape black rot ↔ Grape esca (both cause fruit rot)
|
||
- Tomato early blight ↔ Tomato septoria leaf spot ↔ Tomato target spot (all cause leaf lesions)
|
||
- Tomato leaf mold ↔ Tomato septoria leaf spot (both cause leaf spots in humid conditions)
|
||
|
||
4. **Rewrite `src/lib/ml/labels.ts`** to use the PlantVillage mapping:
|
||
|
||
```typescript
|
||
import { PLANTVILLAGE_CLASSES } from "./plantvillage-classes";
|
||
|
||
// Total output classes from model
|
||
export const NUM_CLASSES = 38;
|
||
|
||
// Index 0–37 → disease lookup
|
||
export function getDiseaseIdForIndex(index: number): string {
|
||
const entry = PLANTVILLAGE_CLASSES[index];
|
||
if (!entry || entry.isHealthy) return "healthy";
|
||
return entry.diseaseId;
|
||
}
|
||
|
||
export function getPlantIdForIndex(index: number): string {
|
||
return PLANTVILLAGE_CLASSES[index]?.plantId ?? "unknown";
|
||
}
|
||
|
||
export function isHealthyClass(index: number): boolean {
|
||
return PLANTVILLAGE_CLASSES[index]?.isHealthy ?? false;
|
||
}
|
||
|
||
// Disease ID → index (for reverse lookup)
|
||
export function getIndexForDiseaseId(diseaseId: string): number {
|
||
const entry = PLANTVILLAGE_CLASSES.find((c) => c.diseaseId === diseaseId.toLowerCase());
|
||
return entry?.index ?? -1;
|
||
}
|
||
```
|
||
|
||
5. **Remove old assumptions** — the old labels.ts assumed 95 classes (93 diseases + healthy + unknown). Delete all references to `diseases.json` index ordering from labels.ts. The mapping is now defined by `plantvillage-classes.ts`, not by JSON file order.
|
||
|
||
6. **Create DB migration script** `scripts/seed-plantvillage-kb.ts`:
|
||
- Read updated `src/data/plants.json` and `src/data/diseases.json`
|
||
- Insert new plants and diseases into Turso DB using Drizzle ORM
|
||
- Use UPSERT (INSERT OR REPLACE) to be idempotent
|
||
- Log what was inserted/updated
|
||
|
||
7. **Run the migration** to populate the DB with new data.
|
||
|
||
tests:
|
||
|
||
- Unit: `labels.test.ts` validates all 38 indices map correctly
|
||
- Unit: `getDiseaseIdForIndex(29)` returns `"early-blight"`
|
||
- Unit: `getDiseaseIdForIndex(3)` returns `"healthy"` (Apple healthy class)
|
||
- Unit: `getIndexForDiseaseId("early-blight")` returns `29`
|
||
- Unit: `isHealthyClass(37)` returns `true` (Tomato healthy)
|
||
- Unit: `isHealthyClass(29)` returns `false` (Tomato Early_blight)
|
||
- Unit: `getPlantIdForIndex(0)` returns `"apple"`
|
||
- Unit: All 25 non-healthy diseaseIds resolve to real DB entries via `getDiseaseById()`
|
||
- Integration: `scripts/seed-plantvillage-kb.ts` runs without errors, inserts all 10 plants and 19 diseases
|
||
- Integration: After seeding, DB query for each new disease returns a complete record
|
||
|
||
acceptance_criteria:
|
||
|
||
- `PLANTVILLAGE_CLASSES` in labels.ts has exactly 38 entries matching model output order
|
||
- 13 healthy indices correctly return "healthy" from `getDiseaseIdForIndex()`
|
||
- 25 disease indices correctly return valid diseaseIds
|
||
- All 10 new plants exist in `src/data/plants.json` with valid metadata and imageUrl
|
||
- All 19 new diseases exist in `src/data/diseases.json` with full structured data (symptoms, treatment, prevention, etc.)
|
||
- DB migration script runs successfully, all new data queryable from Turso
|
||
- Old `diseases.json` ordering assumption is completely removed from labels.ts
|
||
- All existing tests still pass (no regressions in browse, search, detail pages)
|
||
|
||
validation:
|
||
|
||
- `npx vitest run src/lib/ml/labels.test.ts`
|
||
- `npx vitest run src/lib/ml/plantvillage-classes.test.ts`
|
||
- `npx tsx scripts/seed-plantvillage-kb.ts` — verify output shows correct inserts
|
||
- `npx vitest run` — full test suite passes
|
||
- Manual: query DB for each new plant/disease and verify complete data
|
||
|
||
notes:
|
||
|
||
- Disease data must come from authoritative sources (university extension services), not hand-written
|
||
- Use the same template-based generation approach from `scripts/generate-full-kb.ts` for consistency
|
||
- The `pepper-bacterial-wilt` disease already exists — map Pepper\_\_\_Bacterial_spot to it even though it's not a perfect match (it's the closest available)
|
||
- Blueberry, Raspberry, and Soybean only have "healthy" classes in PlantVillage — add plant entries but no disease entries for these (they don't need new disease IDs since they always map to "healthy")
|
||
- Total disease count after this task: 93 (existing) + 19 (new) = 112 diseases
|