re-init
This commit is contained in:
152
tasks/production-ml-pipeline/01-plantvillage-class-inventory.md
Normal file
152
tasks/production-ml-pipeline/01-plantvillage-class-inventory.md
Normal file
@@ -0,0 +1,152 @@
|
||||
# 01. PlantVillage Class Inventory and Knowledge Base Mapping
|
||||
|
||||
meta:
|
||||
id: production-ml-pipeline-01
|
||||
feature: production-ml-pipeline
|
||||
priority: P0
|
||||
depends_on: []
|
||||
tags: [data, mapping, research]
|
||||
|
||||
objective:
|
||||
|
||||
- Document all 38 PlantVillage model output classes
|
||||
- Map each class index to a definitive disease ID in the knowledge base
|
||||
- Identify which plants and diseases are missing from the KB and must be added
|
||||
- Produce a complete, authoritative mapping file that subsequent tasks consume
|
||||
|
||||
deliverables:
|
||||
|
||||
- `src/lib/ml/plantvillage-classes.ts` — definitive mapping of all 38 class indices to structured metadata
|
||||
- Updated `tasks/production-ml-pipeline/class-mapping-reference.md` — human-readable reference document
|
||||
|
||||
steps:
|
||||
|
||||
1. Document the canonical 38 PlantVillage class labels in order (index 0–37):
|
||||
|
||||
```
|
||||
0: Apple___Apple_scab
|
||||
1: Apple___Black_rot
|
||||
2: Apple___Cedar_apple_rust
|
||||
3: Apple___healthy
|
||||
4: Blueberry___healthy
|
||||
5: Cherry_(including_sour)___Powdery_mildew
|
||||
6: Cherry_(including_sour)___healthy
|
||||
7: Corn_(maize)___Cercospora_leaf_spot Gray_leaf_spot
|
||||
8: Corn_(maize)___Common_rust_
|
||||
9: Corn_(maize)___Northern_Leaf_Blight
|
||||
10: Corn_(maize)___healthy
|
||||
11: Grape___Black_rot
|
||||
12: Grape___Esca_(Black_Measles)
|
||||
13: Grape___Leaf_blight_(Isariopsis_Leaf_Spot)
|
||||
14: Grape___healthy
|
||||
15: Orange___Haunglongbing_(Citrus_greening)
|
||||
16: Peach___Bacterial_spot
|
||||
17: Peach___healthy
|
||||
18: Pepper,_bell___Bacterial_spot
|
||||
19: Pepper,_bell___healthy
|
||||
20: Potato___Early_blight
|
||||
21: Potato___Late_blight
|
||||
22: Potato___healthy
|
||||
23: Raspberry___healthy
|
||||
24: Soybean___healthy
|
||||
25: Squash___Powdery_mildew
|
||||
26: Strawberry___Leaf_scorch
|
||||
27: Strawberry___healthy
|
||||
28: Tomato___Bacterial_spot
|
||||
29: Tomato___Early_blight
|
||||
30: Tomato___Late_blight
|
||||
31: Tomato___Leaf_Mold
|
||||
32: Tomato___Septoria_leaf_spot
|
||||
33: Tomato___Spider_mites Two-spotted_spider_mite
|
||||
34: Tomato___Target_Spot
|
||||
35: Tomato___Tomato_Yellow_Leaf_Curl_Virus
|
||||
36: Tomato___Tomato_mosaic_virus
|
||||
37: Tomato___healthy
|
||||
```
|
||||
|
||||
2. For each class, determine the mapping target:
|
||||
- **Healthy classes** (13 total: indices 3, 4, 6, 10, 14, 17, 19, 22, 23, 24, 27, 37): map to a special `"healthy"` sentinel. These indicate the model detected no disease.
|
||||
- **Disease classes with exact KB match**: map directly to existing disease ID.
|
||||
- 28 → `bacterial-leaf-spot-tomato` (Tomato Bacterial_spot ≈ bacterial-leaf-spot-tomato)
|
||||
- 29 → `early-blight`
|
||||
- 30 → `late-blight`
|
||||
- 32 → `septoria-leaf-spot`
|
||||
- 25 → `squash-powdery-mildew`
|
||||
- 26 → `strawberry-leaf-scorch`
|
||||
- 18 → `pepper-bacterial-wilt` (closest match to Pepper Bacterial_spot)
|
||||
- **Disease classes needing new KB entries** (no existing disease in our KB):
|
||||
- 0: Apple_scab → new disease `apple-scab` under plant `apple`
|
||||
- 1: Apple_black_rot → new disease `apple-black-rot` under plant `apple`
|
||||
- 2: Apple_cedar_apple_rust → new disease `apple-cedar-apple-rust` under plant `apple`
|
||||
- 5: Cherry_powdery_mildew → new disease `cherry-powdery-mildew` under plant `cherry`
|
||||
- 7: Corn_cercospora_leaf_spot → new disease `corn-gray-leaf-spot` under plant `corn`
|
||||
- 8: Corn_common_rust → new disease `corn-common-rust` under plant `corn`
|
||||
- 9: Corn_northern_leaf_blight → new disease `corn-northern-leaf-blight` under plant `corn`
|
||||
- 11: Grape_black_rot → new disease `grape-black-rot` under plant `grape`
|
||||
- 12: Grape_esca → new disease `grape-esca` under plant `grape`
|
||||
- 13: Grape_leaf_blight → new disease `grape-leaf-blight` under plant `grape`
|
||||
- 15: Orange_huanglongbing → new disease `orange-citrus-greening` under plant `orange`
|
||||
- 16: Peach_bacterial_spot → new disease `peach-bacterial-spot` under plant `peach`
|
||||
- 20: Potato_early_blight → new disease `potato-early-blight` under plant `potato`
|
||||
- 21: Potato_late_blight → new disease `potato-late-blight` under plant `potato`
|
||||
- 31: Tomato_leaf_mold → new disease `tomato-leaf-mold` under plant `tomato`
|
||||
- 33: Tomato_spider_mites → new disease `tomato-spider-mites` under plant `tomato`
|
||||
- 34: Tomato_target_spot → new disease `tomato-target-spot` under plant `tomato`
|
||||
- 35: Tomato_yellow_leaf_curl_virus → new disease `tomato-yellow-leaf-curl-virus` under plant `tomato`
|
||||
- 36: Tomato_mosaic_virus → new disease `tomato-mosaic-virus` under plant `tomato`
|
||||
|
||||
3. Create the mapping type and data structure in `src/lib/ml/plantvillage-classes.ts`:
|
||||
|
||||
```typescript
|
||||
export interface PlantVillageClass {
|
||||
index: number;
|
||||
rawLabel: string;
|
||||
plantId: string; // KB plant slug
|
||||
diseaseId: string | null; // null for healthy classes
|
||||
isHealthy: boolean;
|
||||
displayName: string; // human-readable disease name
|
||||
}
|
||||
|
||||
export const PLANTVILLAGE_CLASSES: readonly PlantVillageClass[] = [ ... ];
|
||||
```
|
||||
|
||||
4. For each class, also record:
|
||||
- The PlantVillage plant name (e.g., "Tomato", "Apple")
|
||||
- The target KB plantId (e.g., "tomato", "apple")
|
||||
- The target KB diseaseId (e.g., "early-blight") or null for healthy
|
||||
- Whether the disease needs to be added to the KB (boolean flag for task 02)
|
||||
|
||||
5. Verify the mapping covers all 38 indices with no gaps or duplicates.
|
||||
|
||||
tests:
|
||||
|
||||
- Unit: mapping has exactly 38 entries
|
||||
- Unit: indices 0–37 are all present, no gaps
|
||||
- Unit: each non-healthy entry has a non-null diseaseId
|
||||
- Unit: each healthy entry has null diseaseId and isHealthy=true
|
||||
- Unit: no duplicate diseaseIds across non-healthy entries
|
||||
- Unit: all plantIds are valid slugs (lowercase, kebab-case)
|
||||
|
||||
acceptance_criteria:
|
||||
|
||||
- `src/lib/ml/plantvillage-classes.ts` exports `PLANTVILLAGE_CLASSES` array with exactly 38 entries
|
||||
- Every index 0–37 maps to exactly one entry
|
||||
- 13 entries are healthy (isHealthy=true, diseaseId=null)
|
||||
- 25 entries are diseases with valid plantId and diseaseId
|
||||
- Each entry includes rawLabel, plantId, diseaseId, displayName
|
||||
- All new disease IDs follow kebab-case convention matching existing KB pattern
|
||||
- Reference document `class-mapping-reference.md` lists all 38 classes with their KB mappings
|
||||
|
||||
validation:
|
||||
|
||||
- `npx vitest run src/lib/ml/plantvillage-classes.test.ts` — all mapping tests pass
|
||||
- Manual review: each of the 25 disease entries maps to a plausible disease in our KB
|
||||
|
||||
notes:
|
||||
|
||||
- This task produces the authoritative mapping consumed by task 02 (KB expansion) and task 03 (label mapping)
|
||||
- The PlantVillage class order is fixed by the model's training — do NOT reorder
|
||||
- "Tomato Bacterial_spot" maps to our existing `bacterial-leaf-spot-tomato` — this is the closest match, not a perfect one
|
||||
- "Pepper Bacterial_spot" maps to `pepper-bacterial-wilt` — imperfect but closest available match
|
||||
- 10 new plants must be added to the KB: apple, blueberry, cherry, corn, grape, orange, peach, potato, raspberry, soybean
|
||||
- Blueberry, Raspberry, Soybean only have "healthy" class — still need plant entries for context but no new disease entries
|
||||
149
tasks/production-ml-pipeline/02-label-mapping-implementation.md
Normal file
149
tasks/production-ml-pipeline/02-label-mapping-implementation.md
Normal file
@@ -0,0 +1,149 @@
|
||||
# 02. Label Mapping Layer Implementation
|
||||
|
||||
meta:
|
||||
id: production-ml-pipeline-02
|
||||
feature: production-ml-pipeline
|
||||
priority: P0
|
||||
depends_on: [production-ml-pipeline-01]
|
||||
tags: [implementation, knowledge-base, tests-required]
|
||||
|
||||
objective:
|
||||
|
||||
- Expand the knowledge base to cover all PlantVillage plants and diseases
|
||||
- Rewrite `src/lib/ml/labels.ts` to use the PlantVillage class mapping from task 01
|
||||
- Ensure every model output index resolves to a valid KB disease or the "healthy" sentinel
|
||||
- The label layer must be the single source of truth for model-index → disease mapping
|
||||
|
||||
deliverables:
|
||||
|
||||
- Updated `src/data/plants.json` — 10 new PlantVillage plants added (apple, blueberry, cherry, corn, grape, orange, peach, potato, raspberry, soybean)
|
||||
- Updated `src/data/diseases.json` — 19 new disease entries added for PlantVillage diseases not yet in KB
|
||||
- `src/lib/ml/labels.ts` — fully rewritten to use PlantVillage class mapping
|
||||
- `src/lib/ml/labels.test.ts` — updated to validate against new mapping
|
||||
- `scripts/seed-plantvillage-kb.ts` — DB migration script to insert new plants and diseases into Turso
|
||||
|
||||
steps:
|
||||
|
||||
1. **Add 10 new plants to `src/data/plants.json`** — each with proper metadata:
|
||||
|
||||
```typescript
|
||||
// New plants needed (PlantVillage coverage):
|
||||
{ id: "apple", commonName: "Apple", scientificName: "Malus domestica", family: "Rosaceae", category: "fruit" }
|
||||
{ id: "cherry", commonName: "Cherry", scientificName: "Prunus avium", family: "Rosaceae", category: "fruit" }
|
||||
{ id: "corn", commonName: "Corn (Maize)", scientificName: "Zea mays", family: "Poaceae", category: "vegetable" }
|
||||
{ id: "grape", commonName: "Grape", scientificName: "Vitis vinifera", family: "Vitaceae", category: "fruit" }
|
||||
{ id: "orange", commonName: "Orange", scientificName: "Citrus sinensis", family: "Rutaceae", category: "fruit" }
|
||||
{ id: "peach", commonName: "Peach", scientificName: "Prunus persica", family: "Rosaceae", category: "fruit" }
|
||||
{ id: "potato", commonName: "Potato", scientificName: "Solanum tuberosum", family: "Solanaceae", category: "vegetable" }
|
||||
{ id: "blueberry", commonName: "Blueberry", scientificName: "Vaccinium corymbosum", family: "Ericaceae", category: "fruit" }
|
||||
{ id: "raspberry", commonName: "Raspberry", scientificName: "Rubus idaeus", family: "Rosaceae", category: "fruit" }
|
||||
{ id: "soybean", commonName: "Soybean", scientificName: "Glycine max", family: "Fabaceae", category: "vegetable" }
|
||||
```
|
||||
|
||||
- Add `imageUrl` for each (use Wikipedia pageimages, same pattern as `fill-plant-images.ts`)
|
||||
- Add `careSummary` for each
|
||||
|
||||
2. **Add 19 new diseases to `src/data/diseases.json`** — each with full structured data:
|
||||
- Use the template-based approach from `scripts/disease-templates.ts` where possible
|
||||
- Source disease details from:
|
||||
- UW-Madison PDDC factsheets (pddc.wisc.edu)
|
||||
- Cornell Plant Clinic (plantclinic.cornell.edu)
|
||||
- University extension publications
|
||||
- Each disease must have: `id`, `plantId`, `name`, `scientificName`, `causalAgentType`, `description`, `symptoms` (≥3), `causes` (≥2), `treatment` (≥3), `prevention` (≥2), `lookalikeDiseaseIds`, `severity`, `prevalence`
|
||||
- New disease entries needed:
|
||||
- apple-scab, apple-black-rot, apple-cedar-apple-rust (plant: apple)
|
||||
- cherry-powdery-mildew (plant: cherry)
|
||||
- corn-gray-leaf-spot, corn-common-rust, corn-northern-leaf-blight (plant: corn)
|
||||
- grape-black-rot, grape-esca, grape-leaf-blight (plant: grape)
|
||||
- orange-citrus-greening (plant: orange)
|
||||
- peach-bacterial-spot (plant: peach)
|
||||
- potato-early-blight, potato-late-blight (plant: potato)
|
||||
- tomato-leaf-mold, tomato-spider-mites, tomato-target-spot, tomato-yellow-leaf-curl-virus, tomato-mosaic-virus (plant: tomato)
|
||||
- Use programmatic approach: write a generator script that pulls from UW-Madison PDDC / Cornell factsheets and Wikipedia, following the same pattern as `scripts/generate-full-kb.ts`
|
||||
|
||||
3. **Update lookalikeDiseaseIds** — cross-reference within new diseases:
|
||||
- Apple scab ↔ Apple black rot (both cause leaf spots on apple)
|
||||
- Potato early blight ↔ Potato late blight (both affect potato foliage)
|
||||
- Grape black rot ↔ Grape esca (both cause fruit rot)
|
||||
- Tomato early blight ↔ Tomato septoria leaf spot ↔ Tomato target spot (all cause leaf lesions)
|
||||
- Tomato leaf mold ↔ Tomato septoria leaf spot (both cause leaf spots in humid conditions)
|
||||
|
||||
4. **Rewrite `src/lib/ml/labels.ts`** to use the PlantVillage mapping:
|
||||
|
||||
```typescript
|
||||
import { PLANTVILLAGE_CLASSES } from "./plantvillage-classes";
|
||||
|
||||
// Total output classes from model
|
||||
export const NUM_CLASSES = 38;
|
||||
|
||||
// Index 0–37 → disease lookup
|
||||
export function getDiseaseIdForIndex(index: number): string {
|
||||
const entry = PLANTVILLAGE_CLASSES[index];
|
||||
if (!entry || entry.isHealthy) return "healthy";
|
||||
return entry.diseaseId;
|
||||
}
|
||||
|
||||
export function getPlantIdForIndex(index: number): string {
|
||||
return PLANTVILLAGE_CLASSES[index]?.plantId ?? "unknown";
|
||||
}
|
||||
|
||||
export function isHealthyClass(index: number): boolean {
|
||||
return PLANTVILLAGE_CLASSES[index]?.isHealthy ?? false;
|
||||
}
|
||||
|
||||
// Disease ID → index (for reverse lookup)
|
||||
export function getIndexForDiseaseId(diseaseId: string): number {
|
||||
const entry = PLANTVILLAGE_CLASSES.find((c) => c.diseaseId === diseaseId.toLowerCase());
|
||||
return entry?.index ?? -1;
|
||||
}
|
||||
```
|
||||
|
||||
5. **Remove old assumptions** — the old labels.ts assumed 95 classes (93 diseases + healthy + unknown). Delete all references to `diseases.json` index ordering from labels.ts. The mapping is now defined by `plantvillage-classes.ts`, not by JSON file order.
|
||||
|
||||
6. **Create DB migration script** `scripts/seed-plantvillage-kb.ts`:
|
||||
- Read updated `src/data/plants.json` and `src/data/diseases.json`
|
||||
- Insert new plants and diseases into Turso DB using Drizzle ORM
|
||||
- Use UPSERT (INSERT OR REPLACE) to be idempotent
|
||||
- Log what was inserted/updated
|
||||
|
||||
7. **Run the migration** to populate the DB with new data.
|
||||
|
||||
tests:
|
||||
|
||||
- Unit: `labels.test.ts` validates all 38 indices map correctly
|
||||
- Unit: `getDiseaseIdForIndex(29)` returns `"early-blight"`
|
||||
- Unit: `getDiseaseIdForIndex(3)` returns `"healthy"` (Apple healthy class)
|
||||
- Unit: `getIndexForDiseaseId("early-blight")` returns `29`
|
||||
- Unit: `isHealthyClass(37)` returns `true` (Tomato healthy)
|
||||
- Unit: `isHealthyClass(29)` returns `false` (Tomato Early_blight)
|
||||
- Unit: `getPlantIdForIndex(0)` returns `"apple"`
|
||||
- Unit: All 25 non-healthy diseaseIds resolve to real DB entries via `getDiseaseById()`
|
||||
- Integration: `scripts/seed-plantvillage-kb.ts` runs without errors, inserts all 10 plants and 19 diseases
|
||||
- Integration: After seeding, DB query for each new disease returns a complete record
|
||||
|
||||
acceptance_criteria:
|
||||
|
||||
- `PLANTVILLAGE_CLASSES` in labels.ts has exactly 38 entries matching model output order
|
||||
- 13 healthy indices correctly return "healthy" from `getDiseaseIdForIndex()`
|
||||
- 25 disease indices correctly return valid diseaseIds
|
||||
- All 10 new plants exist in `src/data/plants.json` with valid metadata and imageUrl
|
||||
- All 19 new diseases exist in `src/data/diseases.json` with full structured data (symptoms, treatment, prevention, etc.)
|
||||
- DB migration script runs successfully, all new data queryable from Turso
|
||||
- Old `diseases.json` ordering assumption is completely removed from labels.ts
|
||||
- All existing tests still pass (no regressions in browse, search, detail pages)
|
||||
|
||||
validation:
|
||||
|
||||
- `npx vitest run src/lib/ml/labels.test.ts`
|
||||
- `npx vitest run src/lib/ml/plantvillage-classes.test.ts`
|
||||
- `npx tsx scripts/seed-plantvillage-kb.ts` — verify output shows correct inserts
|
||||
- `npx vitest run` — full test suite passes
|
||||
- Manual: query DB for each new plant/disease and verify complete data
|
||||
|
||||
notes:
|
||||
|
||||
- Disease data must come from authoritative sources (university extension services), not hand-written
|
||||
- Use the same template-based generation approach from `scripts/generate-full-kb.ts` for consistency
|
||||
- The `pepper-bacterial-wilt` disease already exists — map Pepper\_\_\_Bacterial_spot to it even though it's not a perfect match (it's the closest available)
|
||||
- Blueberry, Raspberry, and Soybean only have "healthy" classes in PlantVillage — add plant entries but no disease entries for these (they don't need new disease IDs since they always map to "healthy")
|
||||
- Total disease count after this task: 93 (existing) + 19 (new) = 112 diseases
|
||||
170
tasks/production-ml-pipeline/03-model-loading-verification.md
Normal file
170
tasks/production-ml-pipeline/03-model-loading-verification.md
Normal file
@@ -0,0 +1,170 @@
|
||||
# 03. TensorFlow.js Model Loading Verification and Fixes
|
||||
|
||||
meta:
|
||||
id: production-ml-pipeline-03
|
||||
feature: production-ml-pipeline
|
||||
priority: P0
|
||||
depends_on: []
|
||||
tags: [implementation, model, tests-required]
|
||||
|
||||
objective:
|
||||
|
||||
- Verify the converted TF.js GraphModel loads successfully on the Node.js server
|
||||
- Fix input tensor format handling (NCHW pipeline input → NHWC model input)
|
||||
- Determine whether model output is logits or pre-computed softmax probabilities
|
||||
- Ensure inference produces valid [1, 38] output without errors
|
||||
- Install `@tensorflow/tfjs-node` for server-side native acceleration
|
||||
|
||||
deliverables:
|
||||
|
||||
- `src/lib/ml/model-loader.ts` — fixed and verified for real model loading
|
||||
- `src/lib/ml/model-loader.test.ts` — updated integration tests
|
||||
- `package.json` — `@tensorflow/tfjs-node` added as dependency (if needed)
|
||||
- `src/lib/ml/inference.ts` — fixed output interpretation (logits vs probabilities)
|
||||
- `src/lib/ml/inference.test.ts` — updated for real model inference
|
||||
|
||||
steps:
|
||||
|
||||
1. **Determine output interpretation** — inspect the graph topology to resolve whether `Identity:0` is pre-softmax logits or post-softmax probabilities:
|
||||
- The model graph contains a `Softmax` node at `StatefulPartitionedCall/mnv2_pv_original_1/dense_1/Softmax`
|
||||
- The output `Identity:0` may be after Softmax (probabilities) or before (logits)
|
||||
- Test: run inference on a zero tensor — if output sums to ~1.0, it's already probabilities; if output has negative values or doesn't sum to 1.0, it's logits
|
||||
- Fix: if output is already probabilities, remove the `softmaxFloat32()` call in `inference.ts` and use the raw output directly
|
||||
|
||||
2. **Fix input tensor format** — the model expects NHWC `[1, 160, 160, 3]` but our pipeline produces NCHW `[3, 160, 160]`:
|
||||
|
||||
```typescript
|
||||
// Current code in model-loader.ts tryLoadTFJS():
|
||||
const inputTensor = tf
|
||||
.tensor4d(Array.from(tensor), [3, 160, 160])
|
||||
.transpose([1, 2, 0]) // [160, 160, 3]
|
||||
.expandDims(0); // [1, 160, 160, 3] NHWC
|
||||
```
|
||||
|
||||
- Verify this transpose is correct (NCHW → NHWC)
|
||||
- Verify the tensor values are in the expected range (ImageNet-normalized: roughly -2.5 to +2.5)
|
||||
- Alternative: reshape directly as `[1, 160, 160, 3]` if the identify endpoint produces NHWC data
|
||||
|
||||
3. **Install `@tensorflow/tfjs-node`** for server-side native acceleration:
|
||||
|
||||
```bash
|
||||
npm install @tensorflow/tfjs-node
|
||||
```
|
||||
|
||||
- Browser tfjs works on server but is significantly slower (no native BLAS)
|
||||
- `@tensorflow/tfjs-node` uses libtensorflow C library for ~10-100x speedup
|
||||
- Verify native bindings install correctly (may need `@tensorflow/tfjs-node-gpu` for GPU, but CPU is fine for this use case)
|
||||
- Fallback chain remains: tfjs-node → tfjs (browser) → mock
|
||||
|
||||
4. **Verify model loads from filesystem**:
|
||||
|
||||
```typescript
|
||||
const model = await tf.loadGraphModel(`file://${MODEL_JSON_PATH}`);
|
||||
console.log("Model loaded:", model.inputs, model.outputs);
|
||||
// Expected:
|
||||
// inputs: [{ shape: [-1, 160, 160, 3], dtype: 'float32' }]
|
||||
// outputs: [{ shape: [-1, 38], dtype: 'float32' }]
|
||||
```
|
||||
|
||||
- Verify `model.inputs[0].shape` matches `[null, 160, 160, 3]`
|
||||
- Verify `model.outputs[0].shape` matches `[null, 38]`
|
||||
- Verify model has `predict()` method (GraphModel uses `predict()`, not `execute()`)
|
||||
|
||||
5. **Run inference smoke test**:
|
||||
|
||||
```typescript
|
||||
// Create a test tensor (random normalized values)
|
||||
const testTensor = new Float32Array(3 * 160 * 160);
|
||||
for (let i = 0; i < testTensor.length; i++) {
|
||||
testTensor[i] = (Math.random() - 0.5) * 2;
|
||||
}
|
||||
// Reshape to NHWC for TF.js
|
||||
const input = tf.tensor4d(
|
||||
Array.from(testTensor),
|
||||
[1, 160, 160, 3], // NHWC
|
||||
);
|
||||
const output = model.predict(input);
|
||||
const data = await output.data();
|
||||
console.log("Output shape:", output.shape);
|
||||
console.log(
|
||||
"Output sum:",
|
||||
data.reduce((a, b) => a + b, 0),
|
||||
);
|
||||
console.log("Output max:", Math.max(...data));
|
||||
console.log("Output min:", Math.min(...data));
|
||||
```
|
||||
|
||||
- Output should be [1, 38] with 38 float values
|
||||
- If values are probabilities: sum ≈ 1.0, all values ≥ 0
|
||||
- If values are logits: sum ≠ 1.0, may have negative values
|
||||
|
||||
6. **Fix `model-loader.ts` `getStatus()` to report real class count**:
|
||||
|
||||
```typescript
|
||||
getStatus(): ModelStatus {
|
||||
return {
|
||||
loaded: true,
|
||||
backend: "tfjs",
|
||||
modelId: MODEL_ID,
|
||||
numClasses: 38, // PlantVillage, not 95
|
||||
};
|
||||
}
|
||||
```
|
||||
|
||||
7. **Add memory management** — dispose tensors after use to prevent memory leaks:
|
||||
|
||||
```typescript
|
||||
// In predict():
|
||||
tf.tidy(() => {
|
||||
const input = tf.tensor4d(...);
|
||||
const output = model.predict(input);
|
||||
return output.dataSync();
|
||||
});
|
||||
```
|
||||
|
||||
- Or manually dispose: `inputTensor.dispose()`, `outputTensor.dispose()`
|
||||
- Use `tf.memory()` to monitor tensor count during development
|
||||
|
||||
8. **Handle model load failures gracefully**:
|
||||
- If model files are corrupted, log the specific error
|
||||
- If tfjs-node native bindings fail, fall back to browser tfjs with a warning
|
||||
- Never crash the server on model load failure — fall back to mock mode with clear logging
|
||||
|
||||
tests:
|
||||
|
||||
- Integration: model loads from `public/models/plant-disease-classifier/model.json` without errors
|
||||
- Integration: `model.inputs[0].shape` is `[-1, 160, 160, 3]`
|
||||
- Integration: `model.outputs[0].shape` is `[-1, 38]`
|
||||
- Integration: inference on random tensor produces [38] float output
|
||||
- Integration: if output is probabilities, sum is within 0.99–1.01
|
||||
- Integration: `getStatus()` returns `{ loaded: true, backend: "tfjs", numClasses: 38 }`
|
||||
- Unit: `validateInput()` correctly rejects tensors with wrong length
|
||||
- Unit: NCHW → NHWC transpose produces correct layout
|
||||
- Performance: inference completes in < 500ms on a typical server (with tfjs-node)
|
||||
|
||||
acceptance_criteria:
|
||||
|
||||
- `getModel()` returns a model with `loaded: true` and `backend: "tfjs"`
|
||||
- `model.predict()` on a valid [1, 160, 160, 3] input returns [1, 38] output without errors
|
||||
- Output interpretation is correctly determined (logits vs probabilities) and handled
|
||||
- `@tensorflow/tfjs-node` is installed and used as primary backend
|
||||
- No memory leaks: tensor count stays stable after repeated inference calls
|
||||
- Fallback chain works: tfjs-node → tfjs → mock (each failure logs warning)
|
||||
- Model load time < 30 seconds on first request
|
||||
- Inference time < 500ms per image on server
|
||||
|
||||
validation:
|
||||
|
||||
- `npm install @tensorflow/tfjs-node` — native bindings install successfully
|
||||
- `npx vitest run src/lib/ml/model-loader.test.ts` — all loading tests pass
|
||||
- `npx vitest run src/lib/ml/inference.test.ts` — all inference tests pass
|
||||
- Manual: `curl -X POST http://localhost:3000/api/identify -H "Content-Type: application/json" -d '{"imageId":"<existing-id>"}'` — returns real predictions (no `demo_mode: true`)
|
||||
- Check server logs for `[model-loader] Loaded TF.js model` (not mock fallback)
|
||||
|
||||
notes:
|
||||
|
||||
- The model file `best_mnv2_pv_original.keras` is the original Keras file — the TF.js conversion is already done (model.json + 3 weight shards)
|
||||
- The `.keras` file can be deleted after confirming TF.js works, saving ~27MB
|
||||
- `@tensorflow/tfjs-node` requires libtensorflow — it downloads automatically during npm install
|
||||
- The `file://` protocol for `loadGraphModel` works with `@tensorflow/tfjs-node` but may not work with browser tfjs (which uses fetch) — if using browser tfjs fallback, need to read file and use `tf.io.loadGraphModel` with a custom loader
|
||||
- ImageNet normalization in `preprocessImageBuffer()` uses mean=[0.485, 0.456, 0.406] and std=[0.229, 0.224, 0.225] — verify this matches what the PlantVillage model expects (it should, since MobileNetV2 is typically trained with ImageNet preprocessing)
|
||||
207
tasks/production-ml-pipeline/04-confidence-calibration.md
Normal file
207
tasks/production-ml-pipeline/04-confidence-calibration.md
Normal file
@@ -0,0 +1,207 @@
|
||||
# 04. Confidence Calibration for PlantVillage Model
|
||||
|
||||
meta:
|
||||
id: production-ml-pipeline-04
|
||||
feature: production-ml-pipeline
|
||||
priority: P1
|
||||
depends_on: [production-ml-pipeline-03]
|
||||
tags: [implementation, ml, tests-required]
|
||||
|
||||
objective:
|
||||
|
||||
- Implement proper confidence calibration for the PlantVillage model's softmax output
|
||||
- Replace the trivial `raw * 1.02` linear calibration with temperature scaling or entropy-based confidence
|
||||
- Produce meaningful confidence labels (high/medium/low) that correlate with actual correctness
|
||||
- Handle the "healthy" class output correctly (healthy predictions need different confidence interpretation)
|
||||
|
||||
deliverables:
|
||||
|
||||
- `src/lib/ml/confidence.ts` — rewritten calibration with temperature scaling
|
||||
- `src/lib/ml/calibration-params.ts` — calibration parameters (temperature, bias) for PlantVillage model
|
||||
- `src/lib/ml/confidence.test.ts` — updated tests for new calibration logic
|
||||
- `scripts/calibrate-model.ts` — script to compute optimal temperature from validation data
|
||||
|
||||
steps:
|
||||
|
||||
1. **Determine output type** — based on task 03's findings:
|
||||
- If model output is already softmax probabilities: use entropy-based confidence or inverse-softmax + temperature scaling
|
||||
- If model output is logits: apply temperature-scaled softmax directly
|
||||
|
||||
2. **Implement temperature scaling**:
|
||||
|
||||
```typescript
|
||||
// src/lib/ml/confidence.ts
|
||||
const DEFAULT_TEMPERATURE = 1.5; // Default for PlantVillage (typically 1.0–3.0)
|
||||
|
||||
export function temperatureScaledSoftmax(
|
||||
logits: Float32Array,
|
||||
temperature: number = DEFAULT_TEMPERATURE,
|
||||
): Float32Array {
|
||||
const scaled = new Float32Array(logits.length);
|
||||
for (let i = 0; i < logits.length; i++) {
|
||||
scaled[i] = logits[i] / temperature;
|
||||
}
|
||||
return softmaxFloat32(scaled);
|
||||
}
|
||||
```
|
||||
|
||||
- Temperature > 1.0 softens the distribution (less confident, more uniform)
|
||||
- Temperature < 1.0 sharpens the distribution (more confident)
|
||||
- Temperature = 1.0 is standard softmax (no calibration)
|
||||
- Typical value for MobileNetV2 on PlantVillage: 1.2–1.8
|
||||
|
||||
3. **Implement entropy-based confidence**:
|
||||
|
||||
```typescript
|
||||
export function computeEntropy(probabilities: Float32Array): number {
|
||||
let entropy = 0;
|
||||
for (let i = 0; i < probabilities.length; i++) {
|
||||
if (probabilities[i] > 1e-10) {
|
||||
entropy -= probabilities[i] * Math.log(probabilities[i]);
|
||||
}
|
||||
}
|
||||
return entropy;
|
||||
}
|
||||
|
||||
export function entropyToConfidence(
|
||||
entropy: number,
|
||||
maxEntropy: number, // ln(numClasses)
|
||||
): number {
|
||||
// Normalize entropy to [0, 1], then invert (low entropy = high confidence)
|
||||
const normalized = entropy / maxEntropy;
|
||||
return 1 - normalized;
|
||||
}
|
||||
```
|
||||
|
||||
- For 38 classes: `maxEntropy = Math.log(38) ≈ 3.64`
|
||||
- Entropy close to 0 → one class dominates → high confidence
|
||||
- Entropy close to max → uniform distribution → low confidence
|
||||
|
||||
4. **Implement combined calibration**:
|
||||
|
||||
```typescript
|
||||
export function calibratePrediction(
|
||||
output: Float32Array,
|
||||
isLogits: boolean,
|
||||
temperature: number = DEFAULT_TEMPERATURE,
|
||||
): ConfidenceResult {
|
||||
// Get probabilities (apply softmax if logits, or use directly if already probabilities)
|
||||
const probs = isLogits ? temperatureScaledSoftmax(output, temperature) : output;
|
||||
|
||||
// Get top prediction
|
||||
let maxIdx = 0;
|
||||
for (let i = 1; i < probs.length; i++) {
|
||||
if (probs[i] > probs[maxIdx]) maxIdx = i;
|
||||
}
|
||||
const topProb = probs[maxIdx];
|
||||
|
||||
// Compute entropy-based confidence
|
||||
const entropy = computeEntropy(probs);
|
||||
const maxEntropy = Math.log(probs.length);
|
||||
const entropyConfidence = entropyToConfidence(entropy, maxEntropy);
|
||||
|
||||
// Combine: weighted average of top probability and entropy confidence
|
||||
const adjusted = 0.7 * topProb + 0.3 * entropyConfidence;
|
||||
|
||||
return {
|
||||
raw: topProb,
|
||||
adjusted: Math.min(1, Math.max(0, adjusted)),
|
||||
label: getConfidenceLabel(adjusted),
|
||||
entropy,
|
||||
classIndex: maxIdx,
|
||||
};
|
||||
}
|
||||
```
|
||||
|
||||
5. **Update `getConfidenceLabel` thresholds** for PlantVillage's 38-class output:
|
||||
|
||||
```typescript
|
||||
const CONFIDENCE_THRESHOLDS = {
|
||||
HIGH: 0.65, // Lowered from 0.8 — PlantVillage softmax is less peaked
|
||||
MEDIUM: 0.35, // Lowered from 0.5
|
||||
} as const;
|
||||
```
|
||||
|
||||
- With 38 classes, even correct predictions may have lower top probability
|
||||
- These thresholds should be tuned against a validation set (start with defaults, adjust after testing)
|
||||
|
||||
6. **Handle healthy class confidence**:
|
||||
- When the top prediction is a healthy class (index 3, 4, 6, 10, 14, 17, 19, 22, 23, 24, 27, 37), the confidence represents "how confident the model is the plant is healthy"
|
||||
- Healthy predictions with high confidence → "No disease detected" (good)
|
||||
- Healthy predictions with low confidence → "Uncertain — may have early symptoms"
|
||||
- Update `calibrateConfidence()` to accept a `isHealthy` flag and adjust label accordingly
|
||||
|
||||
7. **Create calibration parameter module**:
|
||||
|
||||
```typescript
|
||||
// src/lib/ml/calibration-params.ts
|
||||
export const PLANTVILLAGE_CALIBRATION = {
|
||||
temperature: 1.5,
|
||||
confidenceHigh: 0.65,
|
||||
confidenceMedium: 0.35,
|
||||
maxEntropy: Math.log(38),
|
||||
entropyWeight: 0.3,
|
||||
probabilityWeight: 0.7,
|
||||
} as const;
|
||||
```
|
||||
|
||||
8. **Create calibration script** `scripts/calibrate-model.ts`:
|
||||
- Load the model
|
||||
- Run inference on a set of labeled validation images (from PlantVillage validation split)
|
||||
- Compute optimal temperature using Nelder-Mead or grid search on negative log-likelihood
|
||||
- Output the optimal temperature value
|
||||
- This is optional — start with default 1.5 and refine later
|
||||
|
||||
9. **Update `InferenceResult` type** to include calibration metadata:
|
||||
```typescript
|
||||
export interface InferenceResult {
|
||||
predictions: RawPrediction[];
|
||||
inferenceTimeMs: number;
|
||||
calibration?: {
|
||||
temperature: number;
|
||||
entropy: number;
|
||||
entropyConfidence: number;
|
||||
};
|
||||
}
|
||||
```
|
||||
|
||||
tests:
|
||||
|
||||
- Unit: `temperatureScaledSoftmax` with T=1.0 equals standard softmax
|
||||
- Unit: `temperatureScaledSoftmax` with T=2.0 produces more uniform distribution than T=1.0
|
||||
- Unit: `computeEntropy` of uniform distribution = `Math.log(38)` ≈ 3.64
|
||||
- Unit: `computeEntropy` of one-hot distribution = 0
|
||||
- Unit: `entropyToConfidence(0, maxEntropy)` = 1.0 (maximum confidence)
|
||||
- Unit: `entropyToConfidence(maxEntropy, maxEntropy)` = 0.0 (minimum confidence)
|
||||
- Unit: `calibratePrediction` with high-peak input returns high confidence
|
||||
- Unit: `calibratePrediction` with flat input returns low confidence
|
||||
- Unit: `getConfidenceLabel(0.7)` returns "high"
|
||||
- Unit: `getConfidenceLabel(0.4)` returns "medium"
|
||||
- Unit: `getConfidenceLabel(0.2)` returns "low"
|
||||
- Integration: calibration on known PlantVillage test image produces reasonable confidence
|
||||
|
||||
acceptance_criteria:
|
||||
|
||||
- `calibratePrediction()` produces meaningful confidence scores that correlate with prediction quality
|
||||
- Temperature scaling is implemented and configurable (default T=1.5)
|
||||
- Entropy-based confidence is implemented
|
||||
- Combined calibration (weighted probability + entropy) is the default
|
||||
- Healthy class predictions are handled correctly
|
||||
- Confidence thresholds are tuned for 38-class output (HIGH ≥ 0.65, MEDIUM ≥ 0.35)
|
||||
- All unit tests pass
|
||||
- Calibration parameters are documented and configurable
|
||||
|
||||
validation:
|
||||
|
||||
- `npx vitest run src/lib/ml/confidence.test.ts`
|
||||
- Manual: run identification on a known disease image → confidence should be "high" (> 0.65)
|
||||
- Manual: run identification on a random/unrelated image → confidence should be "low" (< 0.35)
|
||||
- Check server logs: entropy values should be reasonable (1.0–3.5 range for 38 classes)
|
||||
|
||||
notes:
|
||||
|
||||
- Temperature scaling is a post-hoc calibration method — it doesn't change the model, only the confidence interpretation
|
||||
- The default temperature of 1.5 is a reasonable starting point for MobileNetV2 on PlantVillage. Optimal value depends on the specific training run.
|
||||
- If a validation set of PlantVillage images is available, run `scripts/calibrate-model.ts` to find the optimal temperature
|
||||
- The entropy-based approach works even without a validation set — it's a model-agnostic confidence measure
|
||||
- For healthy predictions, consider showing a different UI (e.g., "No disease detected" with confidence) rather than treating them as disease predictions
|
||||
279
tasks/production-ml-pipeline/05-pipeline-integration.md
Normal file
279
tasks/production-ml-pipeline/05-pipeline-integration.md
Normal file
@@ -0,0 +1,279 @@
|
||||
# 05. Real Model Integration into Identification Pipeline
|
||||
|
||||
meta:
|
||||
id: production-ml-pipeline-05
|
||||
feature: production-ml-pipeline
|
||||
priority: P0
|
||||
depends_on: [production-ml-pipeline-02, production-ml-pipeline-03, production-ml-pipeline-04]
|
||||
tags: [implementation, integration, tests-required]
|
||||
|
||||
objective:
|
||||
|
||||
- Wire the real TF.js model into the `/api/identify` endpoint
|
||||
- Replace demo/mock predictions with real model inference
|
||||
- Use the PlantVillage label mapping (task 02) to resolve class indices to disease IDs
|
||||
- Apply confidence calibration (task 04) to produce meaningful confidence scores
|
||||
- Remove the `demo_mode` fallback path
|
||||
- Handle healthy class predictions correctly (return "no disease detected" message)
|
||||
|
||||
deliverables:
|
||||
|
||||
- `src/app/api/identify/route.ts` — rewritten to use real model inference
|
||||
- `src/lib/ml/inference.ts` — updated to use calibration and return structured results
|
||||
- `src/lib/api/identify.ts` — client-side API updated for new response shape
|
||||
- `src/components/ResultsDashboard.tsx` — handle healthy predictions and remove demo mode badge
|
||||
- `src/components/HealthyResult.tsx` — new component for "no disease detected" state
|
||||
|
||||
steps:
|
||||
|
||||
1. **Rewrite `/api/identify` route handler** to use real inference:
|
||||
|
||||
```typescript
|
||||
export async function POST(request: NextRequest) {
|
||||
// 1. Parse request, validate imageId
|
||||
// 2. Load and preprocess image (existing code)
|
||||
// 3. Run inference with real model
|
||||
const { probabilities, inferenceTimeMs } = await runInference(tensor);
|
||||
|
||||
// 4. Calibrate confidence
|
||||
const calibrated = calibratePrediction(probabilities, isLogits);
|
||||
|
||||
// 5. Map to disease using PlantVillage labels
|
||||
const diseaseId = getDiseaseIdForIndex(calibrated.classIndex);
|
||||
const isHealthy = isHealthyClass(calibrated.classIndex);
|
||||
|
||||
// 6. If healthy, return healthy result
|
||||
if (isHealthy && calibrated.adjusted > 0.5) {
|
||||
return NextResponse.json({
|
||||
healthy: true,
|
||||
plantId: getPlantIdForIndex(calibrated.classIndex),
|
||||
confidence: calibrated,
|
||||
metadata: { model: MODEL_ID, inferenceTimeMs, imageId },
|
||||
});
|
||||
}
|
||||
|
||||
// 7. Get top-K predictions (not just top-1)
|
||||
const topK = getTopKFloat32(probabilities, 5);
|
||||
const predictions = await enrichPredictions(topK);
|
||||
|
||||
// 8. Return results
|
||||
return NextResponse.json({
|
||||
predictions,
|
||||
metadata: { model: MODEL_ID, inferenceTimeMs, imageId },
|
||||
demo_mode: false, // or remove this field entirely
|
||||
});
|
||||
}
|
||||
```
|
||||
|
||||
2. **Update `runInference()` to return calibrated results**:
|
||||
|
||||
```typescript
|
||||
export async function runInference(
|
||||
imageTensor: Float32Array,
|
||||
topK: number = 5,
|
||||
): Promise<InferenceResult> {
|
||||
const model = await getModel();
|
||||
const modelStatus = model.getStatus();
|
||||
|
||||
if (!modelStatus.loaded) {
|
||||
throw new Error("Model not loaded. Cannot run inference.");
|
||||
}
|
||||
|
||||
const { output, inferenceTimeMs } = await model.predict(imageTensor);
|
||||
|
||||
// Determine if output is logits or probabilities
|
||||
const isLogits = !isProbabilities(output);
|
||||
|
||||
// Apply calibration
|
||||
const calibration = calibratePrediction(output, isLogits);
|
||||
|
||||
// Get top-K predictions
|
||||
const probs = isLogits ? temperatureScaledSoftmax(output) : output;
|
||||
const topKPredictions = getTopKFloat32(probs, topK);
|
||||
|
||||
return {
|
||||
predictions: topKPredictions,
|
||||
inferenceTimeMs,
|
||||
calibration: {
|
||||
temperature: PLANTVILLAGE_CALIBRATION.temperature,
|
||||
entropy: calibration.entropy,
|
||||
entropyConfidence: calibration.entropyConfidence,
|
||||
},
|
||||
};
|
||||
}
|
||||
|
||||
function isProbabilities(output: Float32Array): boolean {
|
||||
const sum = output.reduce((a, b) => a + b, 0);
|
||||
return Math.abs(sum - 1.0) < 0.01;
|
||||
}
|
||||
```
|
||||
|
||||
3. **Update `enrichPredictions()` to use new label mapping**:
|
||||
|
||||
```typescript
|
||||
async function enrichPredictions(
|
||||
topPredictions: Array<{ classIndex: number; probability: number }>,
|
||||
): Promise<PredictionResult[]> {
|
||||
const results: PredictionResult[] = [];
|
||||
|
||||
for (const pred of topPredictions) {
|
||||
// Skip healthy classes in top-K (they're handled separately)
|
||||
if (isHealthyClass(pred.classIndex)) continue;
|
||||
|
||||
const diseaseId = getDiseaseIdForIndex(pred.classIndex);
|
||||
const plantId = getPlantIdForIndex(pred.classIndex);
|
||||
|
||||
if (!diseaseId || diseaseId === "healthy") continue;
|
||||
|
||||
const disease = await getDiseaseById(diseaseId);
|
||||
if (!disease) continue;
|
||||
|
||||
// Use probability as raw confidence, calibrate with entropy
|
||||
const confidence = calibrateConfidence(pred.probability);
|
||||
|
||||
const plant = await getPlantById(disease.plantId).catch(() => null);
|
||||
|
||||
results.push({
|
||||
diseaseId,
|
||||
disease,
|
||||
confidence,
|
||||
lookalikes: disease.lookalikeDiseaseIds,
|
||||
plant: plant ?? null,
|
||||
});
|
||||
}
|
||||
|
||||
results.sort((a, b) => b.confidence.adjusted - a.confidence.adjusted);
|
||||
return results;
|
||||
}
|
||||
```
|
||||
|
||||
4. **Update response types** to support healthy result:
|
||||
|
||||
```typescript
|
||||
// src/lib/types.ts
|
||||
export interface IdentifyResponse {
|
||||
predictions?: PredictionResult[];
|
||||
healthy?: boolean;
|
||||
plantId?: string;
|
||||
confidence?: ConfidenceResult;
|
||||
metadata: InferenceMetadata;
|
||||
demo_mode?: boolean; // Remove or always false
|
||||
}
|
||||
```
|
||||
|
||||
5. **Update `ResultsDashboard` component** to handle healthy result:
|
||||
|
||||
```tsx
|
||||
// If response.healthy === true, show HealthyResult component instead of prediction cards
|
||||
if (response?.healthy) {
|
||||
return <HealthyResult plantId={response.plantId} confidence={response.confidence} />;
|
||||
}
|
||||
```
|
||||
|
||||
6. **Create `HealthyResult` component** `src/components/HealthyResult.tsx`:
|
||||
|
||||
```tsx
|
||||
export default function HealthyResult({ plantId, confidence }) {
|
||||
const plant = usePlant(plantId); // fetch plant data
|
||||
return (
|
||||
<div className="...">
|
||||
<div className="text-6xl">🌿</div>
|
||||
<h2>No Disease Detected</h2>
|
||||
<p>
|
||||
The image appears healthy{plant ? ` (${plant.commonName})` : ""}. Confidence:{" "}
|
||||
{Math.round(confidence.adjusted * 100)}%
|
||||
</p>
|
||||
<p className="text-sm text-zinc-500">
|
||||
If symptoms persist, try uploading a clearer photo of the affected area.
|
||||
</p>
|
||||
</div>
|
||||
);
|
||||
}
|
||||
```
|
||||
|
||||
7. **Remove demo mode logic**:
|
||||
- In `model-loader.ts`: remove `createMockModel()` fallback (or keep it but only for development)
|
||||
- In `route.ts`: remove `demo_mode: true` branch
|
||||
- In `ResultsDashboard.tsx`: remove "Demo mode" badge
|
||||
- In `src/lib/api/identify.ts`: remove `demo_mode` from response type
|
||||
|
||||
8. **Add error handling for model not loaded**:
|
||||
|
||||
```typescript
|
||||
const model = await getModel();
|
||||
if (!model.getStatus().loaded) {
|
||||
return NextResponse.json(
|
||||
{
|
||||
error: "Model not available",
|
||||
message: "ML model failed to load. Please try again later.",
|
||||
},
|
||||
{ status: 503 },
|
||||
);
|
||||
}
|
||||
```
|
||||
|
||||
9. **Update client-side API** `src/lib/api/identify.ts`:
|
||||
|
||||
```typescript
|
||||
export interface IdentifyResponse {
|
||||
predictions?: PredictionResult[];
|
||||
healthy?: boolean;
|
||||
plantId?: string;
|
||||
confidence?: ConfidenceResult;
|
||||
metadata: InferenceMetadata;
|
||||
}
|
||||
```
|
||||
|
||||
10. **Add structured logging** for inference requests:
|
||||
```typescript
|
||||
console.log(
|
||||
JSON.stringify({
|
||||
event: "inference",
|
||||
imageId,
|
||||
modelId: MODEL_ID,
|
||||
inferenceTimeMs,
|
||||
topPrediction: predictions[0]?.diseaseId,
|
||||
confidence: predictions[0]?.confidence.adjusted,
|
||||
entropy: calibration?.entropy,
|
||||
}),
|
||||
);
|
||||
```
|
||||
|
||||
tests:
|
||||
|
||||
- Integration: POST `/api/identify` with valid imageId returns real predictions (no `demo_mode: true`)
|
||||
- Integration: response includes `predictions` array with valid diseaseIds from KB
|
||||
- Integration: confidence scores are calibrated (not raw softmax)
|
||||
- Integration: healthy predictions return `healthy: true` with plantId
|
||||
- Unit: `enrichPredictions()` skips healthy classes in top-K
|
||||
- Unit: `isProbabilities()` correctly identifies probability output
|
||||
- Unit: `runInference()` throws error if model not loaded
|
||||
- E2E: upload a tomato leaf image → get tomato disease predictions
|
||||
- E2E: upload a healthy plant image → get healthy result
|
||||
|
||||
acceptance_criteria:
|
||||
|
||||
- `/api/identify` returns real model predictions (not mock)
|
||||
- All diseaseIds in response are valid KB entries (verifiable via `getDiseaseById()`)
|
||||
- Confidence scores use temperature-scaled calibration (not raw softmax)
|
||||
- Healthy predictions return `{ healthy: true, plantId, confidence }` instead of disease predictions
|
||||
- Demo mode is completely removed from production path
|
||||
- Error handling: model not loaded → 503 response with clear message
|
||||
- Structured logging for every inference request
|
||||
- Client-side API handles new response shape (healthy vs predictions)
|
||||
|
||||
validation:
|
||||
|
||||
- `npx vitest run src/app/api/identify/identify.test.ts`
|
||||
- `npx vitest run src/lib/ml/inference.test.ts`
|
||||
- `curl -X POST http://localhost:3000/api/identify -H "Content-Type: application/json" -d '{"imageId":"<test-id>"}'` — response has real predictions
|
||||
- Upload a test image via UI → see real disease names (not demo mode)
|
||||
- Check server logs: `event: "inference"` with valid modelId and inferenceTimeMs
|
||||
|
||||
notes:
|
||||
|
||||
- This task depends on tasks 02, 03, and 04 being complete. Do not start until all dependencies are met.
|
||||
- The `enrichPredictions()` function now skips healthy classes — they're handled by the healthy result path
|
||||
- If the model is not loaded, return 503 (Service Unavailable) instead of falling back to mock
|
||||
- Structured logging should be JSON for easy parsing by log aggregators
|
||||
- The `demo_mode` field can be removed entirely or kept as `false` for backwards compatibility
|
||||
284
tasks/production-ml-pipeline/06-plant-context-identification.md
Normal file
284
tasks/production-ml-pipeline/06-plant-context-identification.md
Normal file
@@ -0,0 +1,284 @@
|
||||
# 06. Plant-Context-Aware Identification
|
||||
|
||||
meta:
|
||||
id: production-ml-pipeline-06
|
||||
feature: production-ml-pipeline
|
||||
priority: P2
|
||||
depends_on: [production-ml-pipeline-05]
|
||||
tags: [implementation, ux, tests-required]
|
||||
|
||||
objective:
|
||||
|
||||
- Allow users to optionally specify which plant they're diagnosing before identification
|
||||
- Boost predictions for the selected plant's diseases (multiply confidence by plant-context factor)
|
||||
- Update the upload flow to include optional plant selection
|
||||
- Improve prediction accuracy when plant context is known
|
||||
|
||||
deliverables:
|
||||
|
||||
- `src/app/api/identify/route.ts` — accept optional `plantId` parameter
|
||||
- `src/lib/ml/plant-context.ts` — new module for plant-context scoring adjustment
|
||||
- `src/components/PlantSelector.tsx` — new component for optional plant selection
|
||||
- `src/app/upload/page.tsx` — integrate PlantSelector before upload
|
||||
- `src/lib/api/identify.ts` — client API updated to pass plantId
|
||||
|
||||
steps:
|
||||
|
||||
1. **Create plant-context scoring module** `src/lib/ml/plant-context.ts`:
|
||||
|
||||
```typescript
|
||||
import { PLANTVILLAGE_CLASSES } from "./plantvillage-classes";
|
||||
|
||||
/**
|
||||
* Adjust prediction scores based on plant context.
|
||||
* If plantId is provided, boost predictions for diseases of that plant.
|
||||
*
|
||||
* @param predictions - Top-K predictions with classIndex and probability
|
||||
* @param plantId - Optional plant ID from user selection
|
||||
* @param boostFactor - Multiplier for matching plant diseases (default 1.5)
|
||||
* @returns Adjusted predictions with updated probabilities
|
||||
*/
|
||||
export function applyPlantContext(
|
||||
predictions: Array<{ classIndex: number; probability: number }>,
|
||||
plantId: string | null,
|
||||
boostFactor: number = 1.5,
|
||||
): Array<{ classIndex: number; probability: number; contextBoosted: boolean }> {
|
||||
if (!plantId) {
|
||||
return predictions.map((p) => ({ ...p, contextBoosted: false }));
|
||||
}
|
||||
|
||||
// Find which class indices belong to this plant
|
||||
const plantIndices = new Set(
|
||||
PLANTVILLAGE_CLASSES.filter((c) => c.plantId === plantId && !c.isHealthy).map(
|
||||
(c) => c.index,
|
||||
),
|
||||
);
|
||||
|
||||
return predictions.map((pred) => {
|
||||
const matchesPlant = plantIndices.has(pred.classIndex);
|
||||
return {
|
||||
classIndex: pred.classIndex,
|
||||
probability: matchesPlant
|
||||
? Math.min(1.0, pred.probability * boostFactor)
|
||||
: pred.probability,
|
||||
contextBoosted: matchesPlant,
|
||||
};
|
||||
});
|
||||
}
|
||||
```
|
||||
|
||||
2. **Update `/api/identify` route** to accept `plantId`:
|
||||
|
||||
```typescript
|
||||
export async function POST(request: NextRequest) {
|
||||
const body = await request.json();
|
||||
const { imageId, plantId } = body; // plantId is optional
|
||||
|
||||
// ... existing preprocessing ...
|
||||
|
||||
const { probabilities, inferenceTimeMs } = await runInference(tensor);
|
||||
|
||||
// Get top-K predictions
|
||||
const topK = getTopKFloat32(probabilities, 5);
|
||||
|
||||
// Apply plant context if provided
|
||||
const adjusted = applyPlantContext(topK, plantId ?? null);
|
||||
|
||||
// Enrich with KB data
|
||||
const predictions = await enrichPredictions(adjusted);
|
||||
|
||||
return NextResponse.json({
|
||||
predictions,
|
||||
metadata: { model: MODEL_ID, inferenceTimeMs, imageId, plantContext: plantId ?? null },
|
||||
});
|
||||
}
|
||||
```
|
||||
|
||||
3. **Update `IdentifyRequest` type**:
|
||||
|
||||
```typescript
|
||||
// src/lib/types.ts
|
||||
export interface IdentifyRequest {
|
||||
imageId: string;
|
||||
plantId?: string; // Optional plant context
|
||||
}
|
||||
```
|
||||
|
||||
4. **Create `PlantSelector` component** `src/components/PlantSelector.tsx`:
|
||||
|
||||
```tsx
|
||||
"use client";
|
||||
|
||||
import { useState, useEffect } from "react";
|
||||
|
||||
interface Plant {
|
||||
id: string;
|
||||
commonName: string;
|
||||
imageUrl?: string;
|
||||
}
|
||||
|
||||
export default function PlantSelector({
|
||||
value,
|
||||
onChange,
|
||||
}: {
|
||||
value: string | null;
|
||||
onChange: (plantId: string | null) => void;
|
||||
}) {
|
||||
const [plants, setPlants] = useState<Plant[]>([]);
|
||||
const [search, setSearch] = useState("");
|
||||
|
||||
useEffect(() => {
|
||||
fetch("/api/plants?limit=50")
|
||||
.then((r) => r.json())
|
||||
.then((data) => setPlants(data.items ?? []));
|
||||
}, []);
|
||||
|
||||
const filtered = plants.filter((p) =>
|
||||
p.commonName.toLowerCase().includes(search.toLowerCase()),
|
||||
);
|
||||
|
||||
return (
|
||||
<div className="...">
|
||||
<label>Plant (optional)</label>
|
||||
<input
|
||||
type="text"
|
||||
placeholder="Search plants..."
|
||||
value={search}
|
||||
onChange={(e) => setSearch(e.target.value)}
|
||||
/>
|
||||
{value && (
|
||||
<div className="...">
|
||||
Selected: {plants.find((p) => p.id === value)?.commonName}
|
||||
<button onClick={() => onChange(null)}>Clear</button>
|
||||
</div>
|
||||
)}
|
||||
<ul>
|
||||
{filtered.slice(0, 10).map((plant) => (
|
||||
<li key={plant.id} onClick={() => onChange(plant.id)}>
|
||||
{plant.commonName}
|
||||
</li>
|
||||
))}
|
||||
</ul>
|
||||
</div>
|
||||
);
|
||||
}
|
||||
```
|
||||
|
||||
5. **Update upload page** to include plant selector:
|
||||
|
||||
```tsx
|
||||
// src/app/upload/page.tsx
|
||||
export default function UploadPage() {
|
||||
const [selectedPlant, setSelectedPlant] = useState<string | null>(null);
|
||||
|
||||
const handleUpload = useCallback(
|
||||
async (file: File) => {
|
||||
// 1. Upload image
|
||||
const uploadResponse = await uploadImage(file);
|
||||
|
||||
// 2. Identify with plant context
|
||||
const identifyResponse = await identifyPlant(uploadResponse.imageId, selectedPlant);
|
||||
|
||||
// 3. Navigate to results
|
||||
router.push(`/results/${uploadResponse.imageId}`);
|
||||
},
|
||||
[selectedPlant],
|
||||
);
|
||||
|
||||
return (
|
||||
<div>
|
||||
<PlantSelector value={selectedPlant} onChange={setSelectedPlant} />
|
||||
<ImageUpload onUpload={handleUpload} />
|
||||
</div>
|
||||
);
|
||||
}
|
||||
```
|
||||
|
||||
6. **Update client-side API** to pass plantId:
|
||||
|
||||
```typescript
|
||||
// src/lib/api/identify.ts
|
||||
export async function identifyPlant(
|
||||
imageId: string,
|
||||
plantId?: string,
|
||||
): Promise<IdentifyResponse> {
|
||||
const body: IdentifyRequest = { imageId };
|
||||
if (plantId) body.plantId = plantId;
|
||||
|
||||
const response = await fetch("/api/identify", {
|
||||
method: "POST",
|
||||
headers: { "Content-Type": "application/json" },
|
||||
body: JSON.stringify(body),
|
||||
});
|
||||
|
||||
return response.json();
|
||||
}
|
||||
```
|
||||
|
||||
7. **Update `PredictionResult` type** to include context boost info:
|
||||
|
||||
```typescript
|
||||
export interface PredictionResult {
|
||||
diseaseId: string;
|
||||
disease: Disease;
|
||||
confidence: ConfidenceResult;
|
||||
lookalikes: string[];
|
||||
plant: Plant | null;
|
||||
contextBoosted?: boolean; // true if boosted by plant context
|
||||
}
|
||||
```
|
||||
|
||||
8. **Update `ResultsDashboard`** to show context boost indicator:
|
||||
|
||||
```tsx
|
||||
{
|
||||
prediction.contextBoosted && (
|
||||
<span className="text-xs text-leaf-green-600">✓ Matches selected plant</span>
|
||||
);
|
||||
}
|
||||
```
|
||||
|
||||
9. **Store plant context in results page** — pass plantId through URL or state:
|
||||
```typescript
|
||||
// src/app/results/[imageId]/page.tsx
|
||||
const plantId = searchParams.get("plant"); // optional
|
||||
const response = await identifyPlant(imageId, plantId);
|
||||
```
|
||||
|
||||
tests:
|
||||
|
||||
- Unit: `applyPlantContext()` with no plantId returns predictions unchanged
|
||||
- Unit: `applyPlantContext()` with plantId="tomato" boosts tomato disease predictions
|
||||
- Unit: boosted probabilities are capped at 1.0
|
||||
- Unit: non-matching plant predictions are unchanged
|
||||
- Unit: `contextBoosted` flag is set correctly
|
||||
- Integration: POST `/api/identify` with plantId returns boosted predictions
|
||||
- Integration: POST `/api/identify` without plantId returns normal predictions
|
||||
- E2E: select "Tomato" in UI → upload tomato leaf → tomato diseases appear first
|
||||
|
||||
acceptance_criteria:
|
||||
|
||||
- Plant context is optional — identification works without it
|
||||
- When plantId is provided, predictions for that plant's diseases are boosted by 1.5x
|
||||
- Boosted probabilities are capped at 1.0
|
||||
- `contextBoosted` flag is set on boosted predictions
|
||||
- UI shows "Matches selected plant" indicator on boosted predictions
|
||||
- Plant selector component works (search, select, clear)
|
||||
- Upload flow includes optional plant selection step
|
||||
- Results page receives and displays plant context
|
||||
|
||||
validation:
|
||||
|
||||
- `npx vitest run src/lib/ml/plant-context.test.ts`
|
||||
- `npx vitest run src/components/PlantSelector.test.tsx`
|
||||
- Manual: select "Tomato" → upload image → tomato diseases appear with boost indicator
|
||||
- Manual: don't select plant → upload image → normal predictions (no boost)
|
||||
- Check API response: `predictions[0].contextBoosted` is true when plant matches
|
||||
|
||||
notes:
|
||||
|
||||
- Plant context is a scoring heuristic, not a hard filter. It boosts confidence but doesn't exclude other predictions.
|
||||
- The default boost factor is 1.5 — this can be tuned based on user feedback.
|
||||
- Plant selector is optional — users can skip it and get unboosted predictions.
|
||||
- The plant context feature is most useful when the user knows what plant they're diagnosing but the model is uncertain between multiple diseases.
|
||||
- For PlantVillage, each plant has 1–9 diseases, so the boost is specific enough to be useful without being overly restrictive.
|
||||
292
tasks/production-ml-pipeline/07-end-to-end-testing.md
Normal file
292
tasks/production-ml-pipeline/07-end-to-end-testing.md
Normal file
@@ -0,0 +1,292 @@
|
||||
# 07. End-to-End Integration Testing
|
||||
|
||||
meta:
|
||||
id: production-ml-pipeline-07
|
||||
feature: production-ml-pipeline
|
||||
priority: P1
|
||||
depends_on: [production-ml-pipeline-05]
|
||||
tags: [testing, integration, e2e]
|
||||
|
||||
objective:
|
||||
|
||||
- Create comprehensive end-to-end tests that validate the full pipeline from image upload to disease diagnosis
|
||||
- Verify real model inference produces valid, calibrated predictions
|
||||
- Test all code paths: normal flow, healthy result, error cases, plant context
|
||||
- Ensure all components work together correctly in a realistic scenario
|
||||
|
||||
deliverables:
|
||||
|
||||
- `tests/e2e/pipeline.test.ts` — full pipeline E2E tests
|
||||
- `tests/e2e/fixtures/` — test images and expected results
|
||||
- `tests/e2e/utils.ts` — test utilities (upload helper, identify helper)
|
||||
- Updated `vitest.config.ts` — E2E test configuration
|
||||
|
||||
steps:
|
||||
|
||||
1. **Create test fixtures** `tests/e2e/fixtures/`:
|
||||
- `tomato-early-blight.jpg` — known tomato early blight image (from PlantVillage test set)
|
||||
- `tomato-healthy.jpg` — known healthy tomato image
|
||||
- `unknown-plant.jpg` — unrelated image (should produce low confidence)
|
||||
- `invalid-image.txt` — non-image file (should fail validation)
|
||||
- `expected-results.json` — expected disease IDs and confidence ranges for each test image
|
||||
|
||||
2. **Create E2E test utilities** `tests/e2e/utils.ts`:
|
||||
|
||||
```typescript
|
||||
import fs from "fs/promises";
|
||||
import path from "path";
|
||||
|
||||
export async function uploadTestImage(
|
||||
filename: string,
|
||||
): Promise<{ imageId: string; previewUrl: string }> {
|
||||
const imagePath = path.join(__dirname, "fixtures", filename);
|
||||
const imageBuffer = await fs.readFile(imagePath);
|
||||
|
||||
const formData = new FormData();
|
||||
formData.append("image", new Blob([imageBuffer], { type: "image/jpeg" }), filename);
|
||||
|
||||
const response = await fetch("http://localhost:3000/api/upload", {
|
||||
method: "POST",
|
||||
body: formData,
|
||||
});
|
||||
|
||||
if (!response.ok) {
|
||||
throw new Error(`Upload failed: ${response.status}`);
|
||||
}
|
||||
|
||||
return response.json();
|
||||
}
|
||||
|
||||
export async function identifyImage(imageId: string, plantId?: string): Promise<any> {
|
||||
const response = await fetch("http://localhost:3000/api/identify", {
|
||||
method: "POST",
|
||||
headers: { "Content-Type": "application/json" },
|
||||
body: JSON.stringify({ imageId, plantId }),
|
||||
});
|
||||
|
||||
if (!response.ok) {
|
||||
throw new Error(`Identify failed: ${response.status}`);
|
||||
}
|
||||
|
||||
return response.json();
|
||||
}
|
||||
```
|
||||
|
||||
3. **Write full pipeline E2E test** `tests/e2e/pipeline.test.ts`:
|
||||
|
||||
```typescript
|
||||
import { describe, it, expect, beforeAll } from "vitest";
|
||||
import { uploadTestImage, identifyImage } from "./utils";
|
||||
import expectedResults from "./fixtures/expected-results.json";
|
||||
|
||||
describe("End-to-End Pipeline", () => {
|
||||
describe("Normal flow: disease detection", () => {
|
||||
it("uploads a tomato early blight image and returns correct diagnosis", async () => {
|
||||
// 1. Upload
|
||||
const { imageId } = await uploadTestImage("tomato-early-blight.jpg");
|
||||
expect(imageId).toBeDefined();
|
||||
|
||||
// 2. Identify
|
||||
const result = await identifyImage(imageId);
|
||||
|
||||
// 3. Verify response structure
|
||||
expect(result.predictions).toBeDefined();
|
||||
expect(result.predictions.length).toBeGreaterThan(0);
|
||||
expect(result.metadata).toBeDefined();
|
||||
expect(result.metadata.model).toBe("plant-classifier-v1");
|
||||
expect(result.metadata.inferenceTimeMs).toBeGreaterThan(0);
|
||||
expect(result.demo_mode).toBeFalsy();
|
||||
|
||||
// 4. Verify top prediction is early blight
|
||||
const topPrediction = result.predictions[0];
|
||||
expect(topPrediction.diseaseId).toBe("early-blight");
|
||||
expect(topPrediction.disease.name).toContain("Early Blight");
|
||||
expect(topPrediction.plant.id).toBe("tomato");
|
||||
|
||||
// 5. Verify confidence is calibrated
|
||||
expect(topPrediction.confidence.adjusted).toBeGreaterThan(0.5);
|
||||
expect(topPrediction.confidence.label).toBe("high");
|
||||
|
||||
// 6. Verify disease data is enriched
|
||||
expect(topPrediction.disease.symptoms.length).toBeGreaterThanOrEqual(3);
|
||||
expect(topPrediction.disease.treatment.length).toBeGreaterThanOrEqual(3);
|
||||
expect(topPrediction.disease.prevention.length).toBeGreaterThanOrEqual(2);
|
||||
});
|
||||
});
|
||||
|
||||
describe("Healthy result", () => {
|
||||
it("returns healthy result for healthy plant image", async () => {
|
||||
const { imageId } = await uploadTestImage("tomato-healthy.jpg");
|
||||
const result = await identifyImage(imageId);
|
||||
|
||||
// Should return healthy: true or top prediction is a healthy class
|
||||
if (result.healthy) {
|
||||
expect(result.healthy).toBe(true);
|
||||
expect(result.plantId).toBe("tomato");
|
||||
expect(result.confidence.adjusted).toBeGreaterThan(0.5);
|
||||
} else {
|
||||
// If not healthy result, confidence should be low
|
||||
const topPrediction = result.predictions[0];
|
||||
expect(topPrediction.confidence.adjusted).toBeLessThan(0.5);
|
||||
}
|
||||
});
|
||||
});
|
||||
|
||||
describe("Unknown image", () => {
|
||||
it("returns low confidence for unrelated image", async () => {
|
||||
const { imageId } = await uploadTestImage("unknown-plant.jpg");
|
||||
const result = await identifyImage(imageId);
|
||||
|
||||
// Should have predictions but with low confidence
|
||||
if (result.predictions) {
|
||||
const topPrediction = result.predictions[0];
|
||||
expect(topPrediction.confidence.adjusted).toBeLessThan(0.5);
|
||||
expect(topPrediction.confidence.label).toBe("low");
|
||||
}
|
||||
});
|
||||
});
|
||||
|
||||
describe("Plant context", () => {
|
||||
it("boosts predictions when plantId is provided", async () => {
|
||||
const { imageId } = await uploadTestImage("tomato-early-blight.jpg");
|
||||
|
||||
// Without plant context
|
||||
const resultNoContext = await identifyImage(imageId);
|
||||
const confidenceNoContext = resultNoContext.predictions[0].confidence.adjusted;
|
||||
|
||||
// With plant context
|
||||
const resultWithContext = await identifyImage(imageId, "tomato");
|
||||
const confidenceWithContext = resultWithContext.predictions[0].confidence.adjusted;
|
||||
|
||||
// Context should boost confidence (or at least not reduce it)
|
||||
expect(confidenceWithContext).toBeGreaterThanOrEqual(confidenceNoContext);
|
||||
|
||||
// Boosted prediction should have contextBoosted flag
|
||||
const boosted = resultWithContext.predictions.find((p) => p.contextBoosted);
|
||||
expect(boosted).toBeDefined();
|
||||
});
|
||||
});
|
||||
|
||||
describe("Error cases", () => {
|
||||
it("returns 404 for non-existent imageId", async () => {
|
||||
const response = await fetch("http://localhost:3000/api/identify", {
|
||||
method: "POST",
|
||||
headers: { "Content-Type": "application/json" },
|
||||
body: JSON.stringify({ imageId: "non-existent-id" }),
|
||||
});
|
||||
|
||||
expect(response.status).toBe(404);
|
||||
});
|
||||
|
||||
it("returns 400 for invalid image upload", async () => {
|
||||
const formData = new FormData();
|
||||
formData.append("image", new Blob(["not an image"], { type: "text/plain" }), "test.txt");
|
||||
|
||||
const response = await fetch("http://localhost:3000/api/upload", {
|
||||
method: "POST",
|
||||
body: formData,
|
||||
});
|
||||
|
||||
expect(response.status).toBe(400);
|
||||
});
|
||||
});
|
||||
|
||||
describe("Performance", () => {
|
||||
it("completes inference in under 500ms", async () => {
|
||||
const { imageId } = await uploadTestImage("tomato-early-blight.jpg");
|
||||
|
||||
const start = Date.now();
|
||||
await identifyImage(imageId);
|
||||
const elapsed = Date.now() - start;
|
||||
|
||||
expect(elapsed).toBeLessThan(500);
|
||||
});
|
||||
});
|
||||
});
|
||||
```
|
||||
|
||||
4. **Create expected results fixture** `tests/e2e/fixtures/expected-results.json`:
|
||||
|
||||
```json
|
||||
{
|
||||
"tomato-early-blight.jpg": {
|
||||
"expectedDiseaseId": "early-blight",
|
||||
"expectedPlantId": "tomato",
|
||||
"minConfidence": 0.6,
|
||||
"expectedConfidenceLabel": "high"
|
||||
},
|
||||
"tomato-healthy.jpg": {
|
||||
"expectedHealthy": true,
|
||||
"expectedPlantId": "tomato",
|
||||
"minConfidence": 0.5
|
||||
},
|
||||
"unknown-plant.jpg": {
|
||||
"maxConfidence": 0.5,
|
||||
"expectedConfidenceLabel": "low"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
5. **Update vitest config** to support E2E tests:
|
||||
|
||||
```typescript
|
||||
// vitest.config.ts
|
||||
export default defineConfig({
|
||||
test: {
|
||||
// ... existing config ...
|
||||
include: ["src/**/*.test.ts", "src/**/*.test.tsx", "tests/**/*.test.ts"],
|
||||
},
|
||||
});
|
||||
```
|
||||
|
||||
6. **Add E2E test script** to `package.json`:
|
||||
|
||||
```json
|
||||
{
|
||||
"scripts": {
|
||||
"test:e2e": "vitest run tests/e2e"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
7. **Document E2E test setup** in `tests/e2e/README.md`:
|
||||
- Requires dev server running (`npm run dev`)
|
||||
- Requires model files present (`public/models/plant-disease-classifier/`)
|
||||
- Requires test fixtures (download PlantVillage test images)
|
||||
- Run with `npm run test:e2e`
|
||||
|
||||
8. **Download test images** from PlantVillage dataset:
|
||||
- Use images from the PlantVillage test split (not training)
|
||||
- Place in `tests/e2e/fixtures/`
|
||||
- Document source and license
|
||||
|
||||
tests:
|
||||
|
||||
- E2E: full pipeline test (upload → identify → verify results)
|
||||
- E2E: healthy result detection
|
||||
- E2E: unknown image produces low confidence
|
||||
- E2E: plant context boosts predictions
|
||||
- E2E: error cases (404, 400)
|
||||
- E2E: performance (< 500ms inference)
|
||||
|
||||
acceptance_criteria:
|
||||
|
||||
- All E2E tests pass with real model inference
|
||||
- Test fixtures are documented and licensed appropriately
|
||||
- E2E tests can be run with `npm run test:e2e`
|
||||
- Tests cover: normal flow, healthy result, unknown image, plant context, errors, performance
|
||||
- Test results are deterministic (no flaky tests)
|
||||
|
||||
validation:
|
||||
|
||||
- `npm run test:e2e` — all tests pass
|
||||
- Manual: run tests against dev server and verify output
|
||||
- Check test coverage: all major code paths are exercised
|
||||
|
||||
notes:
|
||||
|
||||
- E2E tests require the dev server to be running (`npm run dev`)
|
||||
- Test images should be from PlantVillage test split (not training) to avoid overfitting concerns
|
||||
- If test images are not available, use synthetic test data (random tensors) for CI
|
||||
- Performance test threshold (500ms) is generous — actual inference should be < 200ms with tfjs-node
|
||||
- E2E tests are separate from unit tests — run them in CI after deployment to staging
|
||||
405
tasks/production-ml-pipeline/08-production-hardening.md
Normal file
405
tasks/production-ml-pipeline/08-production-hardening.md
Normal file
@@ -0,0 +1,405 @@
|
||||
# 08. Production Hardening and Observability
|
||||
|
||||
meta:
|
||||
id: production-ml-pipeline-08
|
||||
feature: production-ml-pipeline
|
||||
priority: P1
|
||||
depends_on: [production-ml-pipeline-07]
|
||||
tags: [implementation, production, observability]
|
||||
|
||||
objective:
|
||||
|
||||
- Add comprehensive error handling at every layer of the pipeline
|
||||
- Implement structured logging for observability
|
||||
- Add rate limiting to prevent abuse
|
||||
- Create a health endpoint that reports model status and inference metrics
|
||||
- Ensure the system is production-ready with monitoring, cleanup, and resilience
|
||||
|
||||
deliverables:
|
||||
|
||||
- `src/app/api/health/route.ts` — enhanced health endpoint with model status
|
||||
- `src/lib/middleware/rate-limit.ts` — rate limiting middleware
|
||||
- `src/lib/middleware/error-handler.ts` — global error handler
|
||||
- `src/lib/observability/logger.ts` — structured logger
|
||||
- `src/lib/observability/metrics.ts` — inference metrics tracker
|
||||
- Updated API routes with error handling and logging
|
||||
- Updated `next.config.ts` with rate limiting configuration
|
||||
|
||||
steps:
|
||||
|
||||
1. **Create structured logger** `src/lib/observability/logger.ts`:
|
||||
|
||||
```typescript
|
||||
export interface LogEntry {
|
||||
timestamp: string;
|
||||
level: "debug" | "info" | "warn" | "error";
|
||||
event: string;
|
||||
data?: Record<string, any>;
|
||||
error?: { message: string; stack?: string };
|
||||
}
|
||||
|
||||
export function log(level: LogEntry["level"], event: string, data?: Record<string, any>) {
|
||||
const entry: LogEntry = {
|
||||
timestamp: new Date().toISOString(),
|
||||
level,
|
||||
event,
|
||||
data,
|
||||
};
|
||||
|
||||
if (level === "error" && data?.error) {
|
||||
entry.error = {
|
||||
message: data.error.message,
|
||||
stack: data.error.stack,
|
||||
};
|
||||
}
|
||||
|
||||
console.log(JSON.stringify(entry));
|
||||
}
|
||||
|
||||
export const logger = {
|
||||
debug: (event: string, data?: any) => log("debug", event, data),
|
||||
info: (event: string, data?: any) => log("info", event, data),
|
||||
warn: (event: string, data?: any) => log("warn", event, data),
|
||||
error: (event: string, data?: any) => log("error", event, data),
|
||||
};
|
||||
```
|
||||
|
||||
2. **Create metrics tracker** `src/lib/observability/metrics.ts`:
|
||||
|
||||
```typescript
|
||||
interface InferenceMetrics {
|
||||
totalInferences: number;
|
||||
totalErrors: number;
|
||||
avgInferenceTimeMs: number;
|
||||
lastInferenceAt: string | null;
|
||||
modelLoaded: boolean;
|
||||
modelLoadTimeMs: number | null;
|
||||
}
|
||||
|
||||
class MetricsTracker {
|
||||
private metrics: InferenceMetrics = {
|
||||
totalInferences: 0,
|
||||
totalErrors: 0,
|
||||
avgInferenceTimeMs: 0,
|
||||
lastInferenceAt: null,
|
||||
modelLoaded: false,
|
||||
modelLoadTimeMs: null,
|
||||
};
|
||||
|
||||
recordInference(inferenceTimeMs: number) {
|
||||
this.metrics.totalInferences++;
|
||||
this.metrics.lastInferenceAt = new Date().toISOString();
|
||||
// Running average
|
||||
this.metrics.avgInferenceTimeMs =
|
||||
(this.metrics.avgInferenceTimeMs * (this.metrics.totalInferences - 1) + inferenceTimeMs) /
|
||||
this.metrics.totalInferences;
|
||||
}
|
||||
|
||||
recordError() {
|
||||
this.metrics.totalErrors++;
|
||||
}
|
||||
|
||||
setModelStatus(loaded: boolean, loadTimeMs?: number) {
|
||||
this.metrics.modelLoaded = loaded;
|
||||
if (loadTimeMs !== undefined) {
|
||||
this.metrics.modelLoadTimeMs = loadTimeMs;
|
||||
}
|
||||
}
|
||||
|
||||
getMetrics(): InferenceMetrics {
|
||||
return { ...this.metrics };
|
||||
}
|
||||
}
|
||||
|
||||
export const metrics = new MetricsTracker();
|
||||
```
|
||||
|
||||
3. **Enhance health endpoint** `src/app/api/health/route.ts`:
|
||||
|
||||
```typescript
|
||||
import { NextResponse } from "next/server";
|
||||
import { getModel } from "@/lib/ml/model-loader";
|
||||
import { metrics } from "@/lib/observability/metrics";
|
||||
|
||||
export async function GET() {
|
||||
const model = await getModel();
|
||||
const modelStatus = model.getStatus();
|
||||
|
||||
return NextResponse.json({
|
||||
status: "ok",
|
||||
timestamp: new Date().toISOString(),
|
||||
model: {
|
||||
loaded: modelStatus.loaded,
|
||||
backend: modelStatus.backend,
|
||||
modelId: modelStatus.modelId,
|
||||
numClasses: modelStatus.numClasses,
|
||||
error: modelStatus.error,
|
||||
},
|
||||
metrics: metrics.getMetrics(),
|
||||
uptime: process.uptime(),
|
||||
});
|
||||
}
|
||||
```
|
||||
|
||||
4. **Create rate limiting middleware** `src/lib/middleware/rate-limit.ts`:
|
||||
|
||||
```typescript
|
||||
import { NextRequest, NextResponse } from "next/server";
|
||||
|
||||
// Simple in-memory rate limiter (for production, use Redis or similar)
|
||||
const requestCounts = new Map<string, { count: number; resetAt: number }>();
|
||||
|
||||
const RATE_LIMIT = {
|
||||
maxRequests: 10, // 10 requests per window
|
||||
windowMs: 60 * 1000, // 1 minute window
|
||||
};
|
||||
|
||||
export function rateLimit(request: NextRequest): NextResponse | null {
|
||||
const ip = request.headers.get("x-forwarded-for") || "unknown";
|
||||
const now = Date.now();
|
||||
|
||||
let record = requestCounts.get(ip);
|
||||
|
||||
if (!record || now > record.resetAt) {
|
||||
record = { count: 0, resetAt: now + RATE_LIMIT.windowMs };
|
||||
requestCounts.set(ip, record);
|
||||
}
|
||||
|
||||
record.count++;
|
||||
|
||||
if (record.count > RATE_LIMIT.maxRequests) {
|
||||
return NextResponse.json(
|
||||
{ error: "Rate limit exceeded", message: "Too many requests. Please try again later." },
|
||||
{ status: 429 },
|
||||
);
|
||||
}
|
||||
|
||||
return null; // No rate limit hit
|
||||
}
|
||||
```
|
||||
|
||||
5. **Create global error handler** `src/lib/middleware/error-handler.ts`:
|
||||
|
||||
```typescript
|
||||
import { NextResponse } from "next/server";
|
||||
import { logger } from "@/lib/observability/logger";
|
||||
|
||||
export function handleError(error: unknown, context: string): NextResponse {
|
||||
logger.error("unhandled_error", {
|
||||
context,
|
||||
error:
|
||||
error instanceof Error
|
||||
? { message: error.message, stack: error.stack }
|
||||
: { message: String(error) },
|
||||
});
|
||||
|
||||
return NextResponse.json(
|
||||
{
|
||||
error: "Internal server error",
|
||||
message: "An unexpected error occurred. Please try again later.",
|
||||
context,
|
||||
},
|
||||
{ status: 500 },
|
||||
);
|
||||
}
|
||||
```
|
||||
|
||||
6. **Add error handling to `/api/upload`**:
|
||||
|
||||
```typescript
|
||||
import { rateLimit } from "@/lib/middleware/rate-limit";
|
||||
import { handleError } from "@/lib/middleware/error-handler";
|
||||
import { logger } from "@/lib/observability/logger";
|
||||
|
||||
export async function POST(request: NextRequest) {
|
||||
// Rate limiting
|
||||
const rateLimitError = rateLimit(request);
|
||||
if (rateLimitError) return rateLimitError;
|
||||
|
||||
try {
|
||||
logger.info("upload_start", { ip: request.headers.get("x-forwarded-for") });
|
||||
|
||||
// ... existing upload logic ...
|
||||
|
||||
logger.info("upload_success", { imageId, fileSize: buffer.length });
|
||||
return NextResponse.json({ imageId, tensorShape, previewUrl });
|
||||
} catch (error) {
|
||||
return handleError(error, "upload");
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
7. **Add error handling to `/api/identify`**:
|
||||
|
||||
```typescript
|
||||
export async function POST(request: NextRequest) {
|
||||
const rateLimitError = rateLimit(request);
|
||||
if (rateLimitError) return rateLimitError;
|
||||
|
||||
try {
|
||||
logger.info("identify_start", { imageId, plantId });
|
||||
|
||||
const startTime = Date.now();
|
||||
|
||||
// ... existing identify logic ...
|
||||
|
||||
const inferenceTimeMs = Date.now() - startTime;
|
||||
metrics.recordInference(inferenceTimeMs);
|
||||
|
||||
logger.info("identify_success", {
|
||||
imageId,
|
||||
inferenceTimeMs,
|
||||
topPrediction: predictions[0]?.diseaseId,
|
||||
confidence: predictions[0]?.confidence.adjusted,
|
||||
});
|
||||
|
||||
return NextResponse.json({ predictions, metadata });
|
||||
} catch (error) {
|
||||
metrics.recordError();
|
||||
|
||||
if (error instanceof Error && error.message.includes("not loaded")) {
|
||||
return NextResponse.json(
|
||||
{
|
||||
error: "Model not available",
|
||||
message: "ML model failed to load. Please try again later.",
|
||||
},
|
||||
{ status: 503 },
|
||||
);
|
||||
}
|
||||
|
||||
return handleError(error, "identify");
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
8. **Add model status tracking to `model-loader.ts`**:
|
||||
|
||||
```typescript
|
||||
import { metrics } from "@/lib/observability/metrics";
|
||||
|
||||
async function loadModel(): Promise<PlantDiseaseModel> {
|
||||
const startTime = Date.now();
|
||||
|
||||
try {
|
||||
const model = await tryLoadTFJS();
|
||||
if (model) {
|
||||
const loadTimeMs = Date.now() - startTime;
|
||||
metrics.setModelStatus(true, loadTimeMs);
|
||||
logger.info("model_loaded", { backend: "tfjs", loadTimeMs });
|
||||
return model;
|
||||
}
|
||||
} catch (error) {
|
||||
logger.warn("model_load_failed", { backend: "tfjs", error });
|
||||
}
|
||||
|
||||
// ... fallback to mock ...
|
||||
metrics.setModelStatus(false);
|
||||
return createMockModel();
|
||||
}
|
||||
```
|
||||
|
||||
9. **Add cleanup for old uploads**:
|
||||
|
||||
```typescript
|
||||
// src/lib/cleanup.ts
|
||||
import fs from "fs/promises";
|
||||
import path from "path";
|
||||
|
||||
const UPLOADS_DIR = path.join(process.cwd(), "public", "uploads");
|
||||
const MAX_AGE_MS = 24 * 60 * 60 * 1000; // 24 hours
|
||||
|
||||
export async function cleanupOldUploads() {
|
||||
const files = await fs.readdir(UPLOADS_DIR);
|
||||
const now = Date.now();
|
||||
|
||||
for (const file of files) {
|
||||
const filePath = path.join(UPLOADS_DIR, file);
|
||||
const stat = await fs.stat(filePath);
|
||||
|
||||
if (now - stat.mtimeMs > MAX_AGE_MS) {
|
||||
await fs.unlink(filePath);
|
||||
logger.info("upload_cleaned", { file, ageMs: now - stat.mtimeMs });
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// Run cleanup on server start and periodically
|
||||
if (process.env.NODE_ENV === "production") {
|
||||
cleanupOldUploads();
|
||||
setInterval(cleanupOldUploads, 60 * 60 * 1000); // Every hour
|
||||
}
|
||||
```
|
||||
|
||||
10. **Update `next.config.ts`** with security headers and rate limiting:
|
||||
|
||||
```typescript
|
||||
const nextConfig = {
|
||||
// ... existing config ...
|
||||
async headers() {
|
||||
return [
|
||||
{
|
||||
source: "/api/:path*",
|
||||
headers: [
|
||||
{ key: "X-Content-Type-Options", value: "nosniff" },
|
||||
{ key: "X-Frame-Options", value: "DENY" },
|
||||
{ key: "X-XSS-Protection", value: "1; mode=block" },
|
||||
],
|
||||
},
|
||||
];
|
||||
},
|
||||
};
|
||||
```
|
||||
|
||||
11. **Add monitoring dashboard** (optional) `src/app/admin/metrics/page.tsx`:
|
||||
- Simple page showing inference metrics
|
||||
- Model status
|
||||
- Recent inference times
|
||||
- Error rate
|
||||
- Protected by authentication (admin only)
|
||||
|
||||
12. **Document production checklist** in `docs/production-checklist.md`:
|
||||
- Environment variables needed
|
||||
- Model deployment steps
|
||||
- Monitoring setup
|
||||
- Backup strategy
|
||||
- Rollback procedure
|
||||
|
||||
tests:
|
||||
|
||||
- Unit: rate limiter blocks after max requests
|
||||
- Unit: rate limiter resets after window
|
||||
- Unit: metrics tracker records inference correctly
|
||||
- Unit: metrics tracker computes running average
|
||||
- Unit: logger produces valid JSON output
|
||||
- Integration: health endpoint returns model status and metrics
|
||||
- Integration: rate limit returns 429 after max requests
|
||||
- Integration: error handler catches unhandled errors and returns 500
|
||||
|
||||
acceptance_criteria:
|
||||
|
||||
- All API routes have rate limiting (10 requests per minute per IP)
|
||||
- All API routes have structured logging (JSON format)
|
||||
- Health endpoint reports model status, inference metrics, uptime
|
||||
- Error handler catches all unhandled errors and returns 500 with clear message
|
||||
- Old uploads are cleaned up automatically (24-hour TTL)
|
||||
- Metrics tracker records inference time, error rate, model status
|
||||
- Security headers are set (X-Content-Type-Options, X-Frame-Options, X-XSS-Protection)
|
||||
- Production checklist is documented
|
||||
|
||||
validation:
|
||||
|
||||
- `npx vitest run src/lib/middleware/rate-limit.test.ts`
|
||||
- `npx vitest run src/lib/observability/metrics.test.ts`
|
||||
- `curl http://localhost:3000/api/health` — returns model status and metrics
|
||||
- `curl -X POST http://localhost:3000/api/identify ...` (11 times) — 11th request returns 429
|
||||
- Check server logs: JSON-formatted log entries for all requests
|
||||
- Wait 25 minutes: old uploads are cleaned up
|
||||
|
||||
notes:
|
||||
|
||||
- Rate limiter uses in-memory storage — for multi-instance deployments, use Redis or similar
|
||||
- Metrics are in-memory — for persistent metrics, use a time-series database
|
||||
- Health endpoint should be monitored by uptime monitoring service (e.g., Pingdom, UptimeRobot)
|
||||
- Cleanup runs every hour in production — adjust frequency based on upload volume
|
||||
- Security headers are basic — consider adding CSP, HSTS for full security hardening
|
||||
- Production checklist should be reviewed before each deployment
|
||||
40
tasks/production-ml-pipeline/README.md
Normal file
40
tasks/production-ml-pipeline/README.md
Normal file
@@ -0,0 +1,40 @@
|
||||
# Production ML Pipeline
|
||||
|
||||
Objective: Get the plant disease identification ML pipeline to full production readiness with real model inference, proper class mapping, and production-grade error handling.
|
||||
|
||||
Status legend: [ ] todo, [~] in-progress, [x] done
|
||||
|
||||
## Tasks
|
||||
|
||||
- [ ] 01 — PlantVillage class inventory and knowledge base mapping → `01-plantvillage-class-inventory.md`
|
||||
- [ ] 02 — Label mapping layer implementation → `02-label-mapping-implementation.md`
|
||||
- [ ] 03 — TensorFlow.js model loading verification and fixes → `03-model-loading-verification.md`
|
||||
- [ ] 04 — Confidence calibration for PlantVillage model → `04-confidence-calibration.md`
|
||||
- [ ] 05 — Real model integration into identification pipeline → `05-pipeline-integration.md`
|
||||
- [ ] 06 — Plant-context-aware identification → `06-plant-context-identification.md`
|
||||
- [ ] 07 — End-to-end integration testing → `07-end-to-end-testing.md`
|
||||
- [ ] 08 — Production hardening and observability → `08-production-hardening.md`
|
||||
|
||||
## Dependencies
|
||||
|
||||
- 01 → 02 (mapping data feeds label layer)
|
||||
- 02 → 05 (labels feed pipeline)
|
||||
- 03 → 05 (verified model loading feeds pipeline)
|
||||
- 04 → 05 (calibration feeds pipeline)
|
||||
- 05 → 06 (real model enables plant context)
|
||||
- 05 → 07 (integrated pipeline enables e2e testing)
|
||||
- 07 → 08 (tested pipeline enables production hardening)
|
||||
|
||||
## Exit Criteria
|
||||
|
||||
- The feature is complete when:
|
||||
- Model loads successfully and produces real (non-mock) predictions
|
||||
- All 38 PlantVillage classes map to valid knowledge base disease IDs
|
||||
- End-to-end pipeline works: upload image → get real disease diagnoses with calibrated confidence
|
||||
- Confidence scores are meaningful (high confidence for clear cases, low for ambiguous)
|
||||
- Plant context optionally boosts relevant predictions
|
||||
- Full integration test suite passes
|
||||
- Error handling, logging, and monitoring in place
|
||||
- No demo mode fallback in production
|
||||
- Rate limiting and input sanitization active
|
||||
- Health endpoint reports model status and inference metrics
|
||||
Reference in New Issue
Block a user