re-init

2026-06-08 16:42:04 -04:00
commit 8bda14ab63
179 changed files with 48104 additions and 0 deletions
--- a/tasks/production-ml-pipeline/01-plantvillage-class-inventory.md
+++ b/tasks/production-ml-pipeline/01-plantvillage-class-inventory.md
@@ -0,0 +1,152 @@
+# 01. PlantVillage Class Inventory and Knowledge Base Mapping
+
+meta:
+id: production-ml-pipeline-01
+feature: production-ml-pipeline
+priority: P0
+depends_on: []
+tags: [data, mapping, research]
+
+objective:
+
+- Document all 38 PlantVillage model output classes
+- Map each class index to a definitive disease ID in the knowledge base
+- Identify which plants and diseases are missing from the KB and must be added
+- Produce a complete, authoritative mapping file that subsequent tasks consume
+
+deliverables:
+
+- `src/lib/ml/plantvillage-classes.ts` — definitive mapping of all 38 class indices to structured metadata
+- Updated `tasks/production-ml-pipeline/class-mapping-reference.md` — human-readable reference document
+
+steps:
+
+1. Document the canonical 38 PlantVillage class labels in order (index 0–37):
+
+   ```
+   0:  Apple___Apple_scab
+   1:  Apple___Black_rot
+   2:  Apple___Cedar_apple_rust
+   3:  Apple___healthy
+   4:  Blueberry___healthy
+   5:  Cherry_(including_sour)___Powdery_mildew
+   6:  Cherry_(including_sour)___healthy
+   7:  Corn_(maize)___Cercospora_leaf_spot Gray_leaf_spot
+   8:  Corn_(maize)___Common_rust_
+   9:  Corn_(maize)___Northern_Leaf_Blight
+   10: Corn_(maize)___healthy
+   11: Grape___Black_rot
+   12: Grape___Esca_(Black_Measles)
+   13: Grape___Leaf_blight_(Isariopsis_Leaf_Spot)
+   14: Grape___healthy
+   15: Orange___Haunglongbing_(Citrus_greening)
+   16: Peach___Bacterial_spot
+   17: Peach___healthy
+   18: Pepper,_bell___Bacterial_spot
+   19: Pepper,_bell___healthy
+   20: Potato___Early_blight
+   21: Potato___Late_blight
+   22: Potato___healthy
+   23: Raspberry___healthy
+   24: Soybean___healthy
+   25: Squash___Powdery_mildew
+   26: Strawberry___Leaf_scorch
+   27: Strawberry___healthy
+   28: Tomato___Bacterial_spot
+   29: Tomato___Early_blight
+   30: Tomato___Late_blight
+   31: Tomato___Leaf_Mold
+   32: Tomato___Septoria_leaf_spot
+   33: Tomato___Spider_mites Two-spotted_spider_mite
+   34: Tomato___Target_Spot
+   35: Tomato___Tomato_Yellow_Leaf_Curl_Virus
+   36: Tomato___Tomato_mosaic_virus
+   37: Tomato___healthy
+   ```
+
+2. For each class, determine the mapping target:
+   - **Healthy classes** (13 total: indices 3, 4, 6, 10, 14, 17, 19, 22, 23, 24, 27, 37): map to a special `"healthy"` sentinel. These indicate the model detected no disease.
+   - **Disease classes with exact KB match**: map directly to existing disease ID.
+     - 28 → `bacterial-leaf-spot-tomato` (Tomato Bacterial_spot ≈ bacterial-leaf-spot-tomato)
+     - 29 → `early-blight`
+     - 30 → `late-blight`
+     - 32 → `septoria-leaf-spot`
+     - 25 → `squash-powdery-mildew`
+     - 26 → `strawberry-leaf-scorch`
+     - 18 → `pepper-bacterial-wilt` (closest match to Pepper Bacterial_spot)
+   - **Disease classes needing new KB entries** (no existing disease in our KB):
+     - 0: Apple_scab → new disease `apple-scab` under plant `apple`
+     - 1: Apple_black_rot → new disease `apple-black-rot` under plant `apple`
+     - 2: Apple_cedar_apple_rust → new disease `apple-cedar-apple-rust` under plant `apple`
+     - 5: Cherry_powdery_mildew → new disease `cherry-powdery-mildew` under plant `cherry`
+     - 7: Corn_cercospora_leaf_spot → new disease `corn-gray-leaf-spot` under plant `corn`
+     - 8: Corn_common_rust → new disease `corn-common-rust` under plant `corn`
+     - 9: Corn_northern_leaf_blight → new disease `corn-northern-leaf-blight` under plant `corn`
+     - 11: Grape_black_rot → new disease `grape-black-rot` under plant `grape`
+     - 12: Grape_esca → new disease `grape-esca` under plant `grape`
+     - 13: Grape_leaf_blight → new disease `grape-leaf-blight` under plant `grape`
+     - 15: Orange_huanglongbing → new disease `orange-citrus-greening` under plant `orange`
+     - 16: Peach_bacterial_spot → new disease `peach-bacterial-spot` under plant `peach`
+     - 20: Potato_early_blight → new disease `potato-early-blight` under plant `potato`
+     - 21: Potato_late_blight → new disease `potato-late-blight` under plant `potato`
+     - 31: Tomato_leaf_mold → new disease `tomato-leaf-mold` under plant `tomato`
+     - 33: Tomato_spider_mites → new disease `tomato-spider-mites` under plant `tomato`
+     - 34: Tomato_target_spot → new disease `tomato-target-spot` under plant `tomato`
+     - 35: Tomato_yellow_leaf_curl_virus → new disease `tomato-yellow-leaf-curl-virus` under plant `tomato`
+     - 36: Tomato_mosaic_virus → new disease `tomato-mosaic-virus` under plant `tomato`
+
+3. Create the mapping type and data structure in `src/lib/ml/plantvillage-classes.ts`:
+
+   ```typescript
+   export interface PlantVillageClass {
+     index: number;
+     rawLabel: string;
+     plantId: string;        // KB plant slug
+     diseaseId: string | null; // null for healthy classes
+     isHealthy: boolean;
+     displayName: string;     // human-readable disease name
+   }
+
+   export const PLANTVILLAGE_CLASSES: readonly PlantVillageClass[] = [ ... ];
+   ```
+
+4. For each class, also record:
+   - The PlantVillage plant name (e.g., "Tomato", "Apple")
+   - The target KB plantId (e.g., "tomato", "apple")
+   - The target KB diseaseId (e.g., "early-blight") or null for healthy
+   - Whether the disease needs to be added to the KB (boolean flag for task 02)
+
+5. Verify the mapping covers all 38 indices with no gaps or duplicates.
+
+tests:
+
+- Unit: mapping has exactly 38 entries
+- Unit: indices 0–37 are all present, no gaps
+- Unit: each non-healthy entry has a non-null diseaseId
+- Unit: each healthy entry has null diseaseId and isHealthy=true
+- Unit: no duplicate diseaseIds across non-healthy entries
+- Unit: all plantIds are valid slugs (lowercase, kebab-case)
+
+acceptance_criteria:
+
+- `src/lib/ml/plantvillage-classes.ts` exports `PLANTVILLAGE_CLASSES` array with exactly 38 entries
+- Every index 0–37 maps to exactly one entry
+- 13 entries are healthy (isHealthy=true, diseaseId=null)
+- 25 entries are diseases with valid plantId and diseaseId
+- Each entry includes rawLabel, plantId, diseaseId, displayName
+- All new disease IDs follow kebab-case convention matching existing KB pattern
+- Reference document `class-mapping-reference.md` lists all 38 classes with their KB mappings
+
+validation:
+
+- `npx vitest run src/lib/ml/plantvillage-classes.test.ts` — all mapping tests pass
+- Manual review: each of the 25 disease entries maps to a plausible disease in our KB
+
+notes:
+
+- This task produces the authoritative mapping consumed by task 02 (KB expansion) and task 03 (label mapping)
+- The PlantVillage class order is fixed by the model's training — do NOT reorder
+- "Tomato Bacterial_spot" maps to our existing `bacterial-leaf-spot-tomato` — this is the closest match, not a perfect one
+- "Pepper Bacterial_spot" maps to `pepper-bacterial-wilt` — imperfect but closest available match
+- 10 new plants must be added to the KB: apple, blueberry, cherry, corn, grape, orange, peach, potato, raspberry, soybean
+- Blueberry, Raspberry, Soybean only have "healthy" class — still need plant entries for context but no new disease entries
--- a/tasks/production-ml-pipeline/02-label-mapping-implementation.md
+++ b/tasks/production-ml-pipeline/02-label-mapping-implementation.md
@@ -0,0 +1,149 @@
+# 02. Label Mapping Layer Implementation
+
+meta:
+id: production-ml-pipeline-02
+feature: production-ml-pipeline
+priority: P0
+depends_on: [production-ml-pipeline-01]
+tags: [implementation, knowledge-base, tests-required]
+
+objective:
+
+- Expand the knowledge base to cover all PlantVillage plants and diseases
+- Rewrite `src/lib/ml/labels.ts` to use the PlantVillage class mapping from task 01
+- Ensure every model output index resolves to a valid KB disease or the "healthy" sentinel
+- The label layer must be the single source of truth for model-index → disease mapping
+
+deliverables:
+
+- Updated `src/data/plants.json` — 10 new PlantVillage plants added (apple, blueberry, cherry, corn, grape, orange, peach, potato, raspberry, soybean)
+- Updated `src/data/diseases.json` — 19 new disease entries added for PlantVillage diseases not yet in KB
+- `src/lib/ml/labels.ts` — fully rewritten to use PlantVillage class mapping
+- `src/lib/ml/labels.test.ts` — updated to validate against new mapping
+- `scripts/seed-plantvillage-kb.ts` — DB migration script to insert new plants and diseases into Turso
+
+steps:
+
+1. **Add 10 new plants to `src/data/plants.json`** — each with proper metadata:
+
+   ```typescript
+   // New plants needed (PlantVillage coverage):
+   { id: "apple", commonName: "Apple", scientificName: "Malus domestica", family: "Rosaceae", category: "fruit" }
+   { id: "cherry", commonName: "Cherry", scientificName: "Prunus avium", family: "Rosaceae", category: "fruit" }
+   { id: "corn", commonName: "Corn (Maize)", scientificName: "Zea mays", family: "Poaceae", category: "vegetable" }
+   { id: "grape", commonName: "Grape", scientificName: "Vitis vinifera", family: "Vitaceae", category: "fruit" }
+   { id: "orange", commonName: "Orange", scientificName: "Citrus sinensis", family: "Rutaceae", category: "fruit" }
+   { id: "peach", commonName: "Peach", scientificName: "Prunus persica", family: "Rosaceae", category: "fruit" }
+   { id: "potato", commonName: "Potato", scientificName: "Solanum tuberosum", family: "Solanaceae", category: "vegetable" }
+   { id: "blueberry", commonName: "Blueberry", scientificName: "Vaccinium corymbosum", family: "Ericaceae", category: "fruit" }
+   { id: "raspberry", commonName: "Raspberry", scientificName: "Rubus idaeus", family: "Rosaceae", category: "fruit" }
+   { id: "soybean", commonName: "Soybean", scientificName: "Glycine max", family: "Fabaceae", category: "vegetable" }
+   ```
+
+   - Add `imageUrl` for each (use Wikipedia pageimages, same pattern as `fill-plant-images.ts`)
+   - Add `careSummary` for each
+
+2. **Add 19 new diseases to `src/data/diseases.json`** — each with full structured data:
+   - Use the template-based approach from `scripts/disease-templates.ts` where possible
+   - Source disease details from:
+     - UW-Madison PDDC factsheets (pddc.wisc.edu)
+     - Cornell Plant Clinic (plantclinic.cornell.edu)
+     - University extension publications
+   - Each disease must have: `id`, `plantId`, `name`, `scientificName`, `causalAgentType`, `description`, `symptoms` (≥3), `causes` (≥2), `treatment` (≥3), `prevention` (≥2), `lookalikeDiseaseIds`, `severity`, `prevalence`
+   - New disease entries needed:
+     - apple-scab, apple-black-rot, apple-cedar-apple-rust (plant: apple)
+     - cherry-powdery-mildew (plant: cherry)
+     - corn-gray-leaf-spot, corn-common-rust, corn-northern-leaf-blight (plant: corn)
+     - grape-black-rot, grape-esca, grape-leaf-blight (plant: grape)
+     - orange-citrus-greening (plant: orange)
+     - peach-bacterial-spot (plant: peach)
+     - potato-early-blight, potato-late-blight (plant: potato)
+     - tomato-leaf-mold, tomato-spider-mites, tomato-target-spot, tomato-yellow-leaf-curl-virus, tomato-mosaic-virus (plant: tomato)
+   - Use programmatic approach: write a generator script that pulls from UW-Madison PDDC / Cornell factsheets and Wikipedia, following the same pattern as `scripts/generate-full-kb.ts`
+
+3. **Update lookalikeDiseaseIds** — cross-reference within new diseases:
+   - Apple scab ↔ Apple black rot (both cause leaf spots on apple)
+   - Potato early blight ↔ Potato late blight (both affect potato foliage)
+   - Grape black rot ↔ Grape esca (both cause fruit rot)
+   - Tomato early blight ↔ Tomato septoria leaf spot ↔ Tomato target spot (all cause leaf lesions)
+   - Tomato leaf mold ↔ Tomato septoria leaf spot (both cause leaf spots in humid conditions)
+
+4. **Rewrite `src/lib/ml/labels.ts`** to use the PlantVillage mapping:
+
+   ```typescript
+   import { PLANTVILLAGE_CLASSES } from "./plantvillage-classes";
+
+   // Total output classes from model
+   export const NUM_CLASSES = 38;
+
+   // Index 0–37 → disease lookup
+   export function getDiseaseIdForIndex(index: number): string {
+     const entry = PLANTVILLAGE_CLASSES[index];
+     if (!entry || entry.isHealthy) return "healthy";
+     return entry.diseaseId;
+   }
+
+   export function getPlantIdForIndex(index: number): string {
+     return PLANTVILLAGE_CLASSES[index]?.plantId ?? "unknown";
+   }
+
+   export function isHealthyClass(index: number): boolean {
+     return PLANTVILLAGE_CLASSES[index]?.isHealthy ?? false;
+   }
+
+   // Disease ID → index (for reverse lookup)
+   export function getIndexForDiseaseId(diseaseId: string): number {
+     const entry = PLANTVILLAGE_CLASSES.find((c) => c.diseaseId === diseaseId.toLowerCase());
+     return entry?.index ?? -1;
+   }
+   ```
+
+5. **Remove old assumptions** — the old labels.ts assumed 95 classes (93 diseases + healthy + unknown). Delete all references to `diseases.json` index ordering from labels.ts. The mapping is now defined by `plantvillage-classes.ts`, not by JSON file order.
+
+6. **Create DB migration script** `scripts/seed-plantvillage-kb.ts`:
+   - Read updated `src/data/plants.json` and `src/data/diseases.json`
+   - Insert new plants and diseases into Turso DB using Drizzle ORM
+   - Use UPSERT (INSERT OR REPLACE) to be idempotent
+   - Log what was inserted/updated
+
+7. **Run the migration** to populate the DB with new data.
+
+tests:
+
+- Unit: `labels.test.ts` validates all 38 indices map correctly
+- Unit: `getDiseaseIdForIndex(29)` returns `"early-blight"`
+- Unit: `getDiseaseIdForIndex(3)` returns `"healthy"` (Apple healthy class)
+- Unit: `getIndexForDiseaseId("early-blight")` returns `29`
+- Unit: `isHealthyClass(37)` returns `true` (Tomato healthy)
+- Unit: `isHealthyClass(29)` returns `false` (Tomato Early_blight)
+- Unit: `getPlantIdForIndex(0)` returns `"apple"`
+- Unit: All 25 non-healthy diseaseIds resolve to real DB entries via `getDiseaseById()`
+- Integration: `scripts/seed-plantvillage-kb.ts` runs without errors, inserts all 10 plants and 19 diseases
+- Integration: After seeding, DB query for each new disease returns a complete record
+
+acceptance_criteria:
+
+- `PLANTVILLAGE_CLASSES` in labels.ts has exactly 38 entries matching model output order
+- 13 healthy indices correctly return "healthy" from `getDiseaseIdForIndex()`
+- 25 disease indices correctly return valid diseaseIds
+- All 10 new plants exist in `src/data/plants.json` with valid metadata and imageUrl
+- All 19 new diseases exist in `src/data/diseases.json` with full structured data (symptoms, treatment, prevention, etc.)
+- DB migration script runs successfully, all new data queryable from Turso
+- Old `diseases.json` ordering assumption is completely removed from labels.ts
+- All existing tests still pass (no regressions in browse, search, detail pages)
+
+validation:
+
+- `npx vitest run src/lib/ml/labels.test.ts`
+- `npx vitest run src/lib/ml/plantvillage-classes.test.ts`
+- `npx tsx scripts/seed-plantvillage-kb.ts` — verify output shows correct inserts
+- `npx vitest run` — full test suite passes
+- Manual: query DB for each new plant/disease and verify complete data
+
+notes:
+
+- Disease data must come from authoritative sources (university extension services), not hand-written
+- Use the same template-based generation approach from `scripts/generate-full-kb.ts` for consistency
+- The `pepper-bacterial-wilt` disease already exists — map Pepper\_\_\_Bacterial_spot to it even though it's not a perfect match (it's the closest available)
+- Blueberry, Raspberry, and Soybean only have "healthy" classes in PlantVillage — add plant entries but no disease entries for these (they don't need new disease IDs since they always map to "healthy")
+- Total disease count after this task: 93 (existing) + 19 (new) = 112 diseases
--- a/tasks/production-ml-pipeline/03-model-loading-verification.md
+++ b/tasks/production-ml-pipeline/03-model-loading-verification.md
@@ -0,0 +1,170 @@
+# 03. TensorFlow.js Model Loading Verification and Fixes
+
+meta:
+id: production-ml-pipeline-03
+feature: production-ml-pipeline
+priority: P0
+depends_on: []
+tags: [implementation, model, tests-required]
+
+objective:
+
+- Verify the converted TF.js GraphModel loads successfully on the Node.js server
+- Fix input tensor format handling (NCHW pipeline input → NHWC model input)
+- Determine whether model output is logits or pre-computed softmax probabilities
+- Ensure inference produces valid [1, 38] output without errors
+- Install `@tensorflow/tfjs-node` for server-side native acceleration
+
+deliverables:
+
+- `src/lib/ml/model-loader.ts` — fixed and verified for real model loading
+- `src/lib/ml/model-loader.test.ts` — updated integration tests
+- `package.json` — `@tensorflow/tfjs-node` added as dependency (if needed)
+- `src/lib/ml/inference.ts` — fixed output interpretation (logits vs probabilities)
+- `src/lib/ml/inference.test.ts` — updated for real model inference
+
+steps:
+
+1. **Determine output interpretation** — inspect the graph topology to resolve whether `Identity:0` is pre-softmax logits or post-softmax probabilities:
+   - The model graph contains a `Softmax` node at `StatefulPartitionedCall/mnv2_pv_original_1/dense_1/Softmax`
+   - The output `Identity:0` may be after Softmax (probabilities) or before (logits)
+   - Test: run inference on a zero tensor — if output sums to ~1.0, it's already probabilities; if output has negative values or doesn't sum to 1.0, it's logits
+   - Fix: if output is already probabilities, remove the `softmaxFloat32()` call in `inference.ts` and use the raw output directly
+
+2. **Fix input tensor format** — the model expects NHWC `[1, 160, 160, 3]` but our pipeline produces NCHW `[3, 160, 160]`:
+
+   ```typescript
+   // Current code in model-loader.ts tryLoadTFJS():
+   const inputTensor = tf
+     .tensor4d(Array.from(tensor), [3, 160, 160])
+     .transpose([1, 2, 0]) // [160, 160, 3]
+     .expandDims(0); // [1, 160, 160, 3] NHWC
+   ```
+
+   - Verify this transpose is correct (NCHW → NHWC)
+   - Verify the tensor values are in the expected range (ImageNet-normalized: roughly -2.5 to +2.5)
+   - Alternative: reshape directly as `[1, 160, 160, 3]` if the identify endpoint produces NHWC data
+
+3. **Install `@tensorflow/tfjs-node`** for server-side native acceleration:
+
+   ```bash
+   npm install @tensorflow/tfjs-node
+   ```
+
+   - Browser tfjs works on server but is significantly slower (no native BLAS)
+   - `@tensorflow/tfjs-node` uses libtensorflow C library for ~10-100x speedup
+   - Verify native bindings install correctly (may need `@tensorflow/tfjs-node-gpu` for GPU, but CPU is fine for this use case)
+   - Fallback chain remains: tfjs-node → tfjs (browser) → mock
+
+4. **Verify model loads from filesystem**:
+
+   ```typescript
+   const model = await tf.loadGraphModel(`file://${MODEL_JSON_PATH}`);
+   console.log("Model loaded:", model.inputs, model.outputs);
+   // Expected:
+   // inputs: [{ shape: [-1, 160, 160, 3], dtype: 'float32' }]
+   // outputs: [{ shape: [-1, 38], dtype: 'float32' }]
+   ```
+
+   - Verify `model.inputs[0].shape` matches `[null, 160, 160, 3]`
+   - Verify `model.outputs[0].shape` matches `[null, 38]`
+   - Verify model has `predict()` method (GraphModel uses `predict()`, not `execute()`)
+
+5. **Run inference smoke test**:
+
+   ```typescript
+   // Create a test tensor (random normalized values)
+   const testTensor = new Float32Array(3 * 160 * 160);
+   for (let i = 0; i < testTensor.length; i++) {
+     testTensor[i] = (Math.random() - 0.5) * 2;
+   }
+   // Reshape to NHWC for TF.js
+   const input = tf.tensor4d(
+     Array.from(testTensor),
+     [1, 160, 160, 3], // NHWC
+   );
+   const output = model.predict(input);
+   const data = await output.data();
+   console.log("Output shape:", output.shape);
+   console.log(
+     "Output sum:",
+     data.reduce((a, b) => a + b, 0),
+   );
+   console.log("Output max:", Math.max(...data));
+   console.log("Output min:", Math.min(...data));
+   ```
+
+   - Output should be [1, 38] with 38 float values
+   - If values are probabilities: sum ≈ 1.0, all values ≥ 0
+   - If values are logits: sum ≠ 1.0, may have negative values
+
+6. **Fix `model-loader.ts` `getStatus()` to report real class count**:
+
+   ```typescript
+   getStatus(): ModelStatus {
+     return {
+       loaded: true,
+       backend: "tfjs",
+       modelId: MODEL_ID,
+       numClasses: 38,  // PlantVillage, not 95
+     };
+   }
+   ```
+
+7. **Add memory management** — dispose tensors after use to prevent memory leaks:
+
+   ```typescript
+   // In predict():
+   tf.tidy(() => {
+     const input = tf.tensor4d(...);
+     const output = model.predict(input);
+     return output.dataSync();
+   });
+   ```
+
+   - Or manually dispose: `inputTensor.dispose()`, `outputTensor.dispose()`
+   - Use `tf.memory()` to monitor tensor count during development
+
+8. **Handle model load failures gracefully**:
+   - If model files are corrupted, log the specific error
+   - If tfjs-node native bindings fail, fall back to browser tfjs with a warning
+   - Never crash the server on model load failure — fall back to mock mode with clear logging
+
+tests:
+
+- Integration: model loads from `public/models/plant-disease-classifier/model.json` without errors
+- Integration: `model.inputs[0].shape` is `[-1, 160, 160, 3]`
+- Integration: `model.outputs[0].shape` is `[-1, 38]`
+- Integration: inference on random tensor produces [38] float output
+- Integration: if output is probabilities, sum is within 0.99–1.01
+- Integration: `getStatus()` returns `{ loaded: true, backend: "tfjs", numClasses: 38 }`
+- Unit: `validateInput()` correctly rejects tensors with wrong length
+- Unit: NCHW → NHWC transpose produces correct layout
+- Performance: inference completes in < 500ms on a typical server (with tfjs-node)
+
+acceptance_criteria:
+
+- `getModel()` returns a model with `loaded: true` and `backend: "tfjs"`
+- `model.predict()` on a valid [1, 160, 160, 3] input returns [1, 38] output without errors
+- Output interpretation is correctly determined (logits vs probabilities) and handled
+- `@tensorflow/tfjs-node` is installed and used as primary backend
+- No memory leaks: tensor count stays stable after repeated inference calls
+- Fallback chain works: tfjs-node → tfjs → mock (each failure logs warning)
+- Model load time < 30 seconds on first request
+- Inference time < 500ms per image on server
+
+validation:
+
+- `npm install @tensorflow/tfjs-node` — native bindings install successfully
+- `npx vitest run src/lib/ml/model-loader.test.ts` — all loading tests pass
+- `npx vitest run src/lib/ml/inference.test.ts` — all inference tests pass
+- Manual: `curl -X POST http://localhost:3000/api/identify -H "Content-Type: application/json" -d '{"imageId":"<existing-id>"}'` — returns real predictions (no `demo_mode: true`)
+- Check server logs for `[model-loader] Loaded TF.js model` (not mock fallback)
+
+notes:
+
+- The model file `best_mnv2_pv_original.keras` is the original Keras file — the TF.js conversion is already done (model.json + 3 weight shards)
+- The `.keras` file can be deleted after confirming TF.js works, saving ~27MB
+- `@tensorflow/tfjs-node` requires libtensorflow — it downloads automatically during npm install
+- The `file://` protocol for `loadGraphModel` works with `@tensorflow/tfjs-node` but may not work with browser tfjs (which uses fetch) — if using browser tfjs fallback, need to read file and use `tf.io.loadGraphModel` with a custom loader
+- ImageNet normalization in `preprocessImageBuffer()` uses mean=[0.485, 0.456, 0.406] and std=[0.229, 0.224, 0.225] — verify this matches what the PlantVillage model expects (it should, since MobileNetV2 is typically trained with ImageNet preprocessing)
--- a/tasks/production-ml-pipeline/04-confidence-calibration.md
+++ b/tasks/production-ml-pipeline/04-confidence-calibration.md
@@ -0,0 +1,207 @@
+# 04. Confidence Calibration for PlantVillage Model
+
+meta:
+id: production-ml-pipeline-04
+feature: production-ml-pipeline
+priority: P1
+depends_on: [production-ml-pipeline-03]
+tags: [implementation, ml, tests-required]
+
+objective:
+
+- Implement proper confidence calibration for the PlantVillage model's softmax output
+- Replace the trivial `raw * 1.02` linear calibration with temperature scaling or entropy-based confidence
+- Produce meaningful confidence labels (high/medium/low) that correlate with actual correctness
+- Handle the "healthy" class output correctly (healthy predictions need different confidence interpretation)
+
+deliverables:
+
+- `src/lib/ml/confidence.ts` — rewritten calibration with temperature scaling
+- `src/lib/ml/calibration-params.ts` — calibration parameters (temperature, bias) for PlantVillage model
+- `src/lib/ml/confidence.test.ts` — updated tests for new calibration logic
+- `scripts/calibrate-model.ts` — script to compute optimal temperature from validation data
+
+steps:
+
+1. **Determine output type** — based on task 03's findings:
+   - If model output is already softmax probabilities: use entropy-based confidence or inverse-softmax + temperature scaling
+   - If model output is logits: apply temperature-scaled softmax directly
+
+2. **Implement temperature scaling**:
+
+   ```typescript
+   // src/lib/ml/confidence.ts
+   const DEFAULT_TEMPERATURE = 1.5; // Default for PlantVillage (typically 1.0–3.0)
+
+   export function temperatureScaledSoftmax(
+     logits: Float32Array,
+     temperature: number = DEFAULT_TEMPERATURE,
+   ): Float32Array {
+     const scaled = new Float32Array(logits.length);
+     for (let i = 0; i < logits.length; i++) {
+       scaled[i] = logits[i] / temperature;
+     }
+     return softmaxFloat32(scaled);
+   }
+   ```
+
+   - Temperature > 1.0 softens the distribution (less confident, more uniform)
+   - Temperature < 1.0 sharpens the distribution (more confident)
+   - Temperature = 1.0 is standard softmax (no calibration)
+   - Typical value for MobileNetV2 on PlantVillage: 1.2–1.8
+
+3. **Implement entropy-based confidence**:
+
+   ```typescript
+   export function computeEntropy(probabilities: Float32Array): number {
+     let entropy = 0;
+     for (let i = 0; i < probabilities.length; i++) {
+       if (probabilities[i] > 1e-10) {
+         entropy -= probabilities[i] * Math.log(probabilities[i]);
+       }
+     }
+     return entropy;
+   }
+
+   export function entropyToConfidence(
+     entropy: number,
+     maxEntropy: number, // ln(numClasses)
+   ): number {
+     // Normalize entropy to [0, 1], then invert (low entropy = high confidence)
+     const normalized = entropy / maxEntropy;
+     return 1 - normalized;
+   }
+   ```
+
+   - For 38 classes: `maxEntropy = Math.log(38) ≈ 3.64`
+   - Entropy close to 0 → one class dominates → high confidence
+   - Entropy close to max → uniform distribution → low confidence
+
+4. **Implement combined calibration**:
+
+   ```typescript
+   export function calibratePrediction(
+     output: Float32Array,
+     isLogits: boolean,
+     temperature: number = DEFAULT_TEMPERATURE,
+   ): ConfidenceResult {
+     // Get probabilities (apply softmax if logits, or use directly if already probabilities)
+     const probs = isLogits ? temperatureScaledSoftmax(output, temperature) : output;
+
+     // Get top prediction
+     let maxIdx = 0;
+     for (let i = 1; i < probs.length; i++) {
+       if (probs[i] > probs[maxIdx]) maxIdx = i;
+     }
+     const topProb = probs[maxIdx];
+
+     // Compute entropy-based confidence
+     const entropy = computeEntropy(probs);
+     const maxEntropy = Math.log(probs.length);
+     const entropyConfidence = entropyToConfidence(entropy, maxEntropy);
+
+     // Combine: weighted average of top probability and entropy confidence
+     const adjusted = 0.7 * topProb + 0.3 * entropyConfidence;
+
+     return {
+       raw: topProb,
+       adjusted: Math.min(1, Math.max(0, adjusted)),
+       label: getConfidenceLabel(adjusted),
+       entropy,
+       classIndex: maxIdx,
+     };
+   }
+   ```
+
+5. **Update `getConfidenceLabel` thresholds** for PlantVillage's 38-class output:
+
+   ```typescript
+   const CONFIDENCE_THRESHOLDS = {
+     HIGH: 0.65, // Lowered from 0.8 — PlantVillage softmax is less peaked
+     MEDIUM: 0.35, // Lowered from 0.5
+   } as const;
+   ```
+
+   - With 38 classes, even correct predictions may have lower top probability
+   - These thresholds should be tuned against a validation set (start with defaults, adjust after testing)
+
+6. **Handle healthy class confidence**:
+   - When the top prediction is a healthy class (index 3, 4, 6, 10, 14, 17, 19, 22, 23, 24, 27, 37), the confidence represents "how confident the model is the plant is healthy"
+   - Healthy predictions with high confidence → "No disease detected" (good)
+   - Healthy predictions with low confidence → "Uncertain — may have early symptoms"
+   - Update `calibrateConfidence()` to accept a `isHealthy` flag and adjust label accordingly
+
+7. **Create calibration parameter module**:
+
+   ```typescript
+   // src/lib/ml/calibration-params.ts
+   export const PLANTVILLAGE_CALIBRATION = {
+     temperature: 1.5,
+     confidenceHigh: 0.65,
+     confidenceMedium: 0.35,
+     maxEntropy: Math.log(38),
+     entropyWeight: 0.3,
+     probabilityWeight: 0.7,
+   } as const;
+   ```
+
+8. **Create calibration script** `scripts/calibrate-model.ts`:
+   - Load the model
+   - Run inference on a set of labeled validation images (from PlantVillage validation split)
+   - Compute optimal temperature using Nelder-Mead or grid search on negative log-likelihood
+   - Output the optimal temperature value
+   - This is optional — start with default 1.5 and refine later
+
+9. **Update `InferenceResult` type** to include calibration metadata:
+   ```typescript
+   export interface InferenceResult {
+     predictions: RawPrediction[];
+     inferenceTimeMs: number;
+     calibration?: {
+       temperature: number;
+       entropy: number;
+       entropyConfidence: number;
+     };
+   }
+   ```
+
+tests:
+
+- Unit: `temperatureScaledSoftmax` with T=1.0 equals standard softmax
+- Unit: `temperatureScaledSoftmax` with T=2.0 produces more uniform distribution than T=1.0
+- Unit: `computeEntropy` of uniform distribution = `Math.log(38)` ≈ 3.64
+- Unit: `computeEntropy` of one-hot distribution = 0
+- Unit: `entropyToConfidence(0, maxEntropy)` = 1.0 (maximum confidence)
+- Unit: `entropyToConfidence(maxEntropy, maxEntropy)` = 0.0 (minimum confidence)
+- Unit: `calibratePrediction` with high-peak input returns high confidence
+- Unit: `calibratePrediction` with flat input returns low confidence
+- Unit: `getConfidenceLabel(0.7)` returns "high"
+- Unit: `getConfidenceLabel(0.4)` returns "medium"
+- Unit: `getConfidenceLabel(0.2)` returns "low"
+- Integration: calibration on known PlantVillage test image produces reasonable confidence
+
+acceptance_criteria:
+
+- `calibratePrediction()` produces meaningful confidence scores that correlate with prediction quality
+- Temperature scaling is implemented and configurable (default T=1.5)
+- Entropy-based confidence is implemented
+- Combined calibration (weighted probability + entropy) is the default
+- Healthy class predictions are handled correctly
+- Confidence thresholds are tuned for 38-class output (HIGH ≥ 0.65, MEDIUM ≥ 0.35)
+- All unit tests pass
+- Calibration parameters are documented and configurable
+
+validation:
+
+- `npx vitest run src/lib/ml/confidence.test.ts`
+- Manual: run identification on a known disease image → confidence should be "high" (> 0.65)
+- Manual: run identification on a random/unrelated image → confidence should be "low" (< 0.35)
+- Check server logs: entropy values should be reasonable (1.0–3.5 range for 38 classes)
+
+notes:
+
+- Temperature scaling is a post-hoc calibration method — it doesn't change the model, only the confidence interpretation
+- The default temperature of 1.5 is a reasonable starting point for MobileNetV2 on PlantVillage. Optimal value depends on the specific training run.
+- If a validation set of PlantVillage images is available, run `scripts/calibrate-model.ts` to find the optimal temperature
+- The entropy-based approach works even without a validation set — it's a model-agnostic confidence measure
+- For healthy predictions, consider showing a different UI (e.g., "No disease detected" with confidence) rather than treating them as disease predictions
--- a/tasks/production-ml-pipeline/05-pipeline-integration.md
+++ b/tasks/production-ml-pipeline/05-pipeline-integration.md
@@ -0,0 +1,279 @@
+# 05. Real Model Integration into Identification Pipeline
+
+meta:
+id: production-ml-pipeline-05
+feature: production-ml-pipeline
+priority: P0
+depends_on: [production-ml-pipeline-02, production-ml-pipeline-03, production-ml-pipeline-04]
+tags: [implementation, integration, tests-required]
+
+objective:
+
+- Wire the real TF.js model into the `/api/identify` endpoint
+- Replace demo/mock predictions with real model inference
+- Use the PlantVillage label mapping (task 02) to resolve class indices to disease IDs
+- Apply confidence calibration (task 04) to produce meaningful confidence scores
+- Remove the `demo_mode` fallback path
+- Handle healthy class predictions correctly (return "no disease detected" message)
+
+deliverables:
+
+- `src/app/api/identify/route.ts` — rewritten to use real model inference
+- `src/lib/ml/inference.ts` — updated to use calibration and return structured results
+- `src/lib/api/identify.ts` — client-side API updated for new response shape
+- `src/components/ResultsDashboard.tsx` — handle healthy predictions and remove demo mode badge
+- `src/components/HealthyResult.tsx` — new component for "no disease detected" state
+
+steps:
+
+1. **Rewrite `/api/identify` route handler** to use real inference:
+
+   ```typescript
+   export async function POST(request: NextRequest) {
+     // 1. Parse request, validate imageId
+     // 2. Load and preprocess image (existing code)
+     // 3. Run inference with real model
+     const { probabilities, inferenceTimeMs } = await runInference(tensor);
+
+     // 4. Calibrate confidence
+     const calibrated = calibratePrediction(probabilities, isLogits);
+
+     // 5. Map to disease using PlantVillage labels
+     const diseaseId = getDiseaseIdForIndex(calibrated.classIndex);
+     const isHealthy = isHealthyClass(calibrated.classIndex);
+
+     // 6. If healthy, return healthy result
+     if (isHealthy && calibrated.adjusted > 0.5) {
+       return NextResponse.json({
+         healthy: true,
+         plantId: getPlantIdForIndex(calibrated.classIndex),
+         confidence: calibrated,
+         metadata: { model: MODEL_ID, inferenceTimeMs, imageId },
+       });
+     }
+
+     // 7. Get top-K predictions (not just top-1)
+     const topK = getTopKFloat32(probabilities, 5);
+     const predictions = await enrichPredictions(topK);
+
+     // 8. Return results
+     return NextResponse.json({
+       predictions,
+       metadata: { model: MODEL_ID, inferenceTimeMs, imageId },
+       demo_mode: false, // or remove this field entirely
+     });
+   }
+   ```
+
+2. **Update `runInference()` to return calibrated results**:
+
+   ```typescript
+   export async function runInference(
+     imageTensor: Float32Array,
+     topK: number = 5,
+   ): Promise<InferenceResult> {
+     const model = await getModel();
+     const modelStatus = model.getStatus();
+
+     if (!modelStatus.loaded) {
+       throw new Error("Model not loaded. Cannot run inference.");
+     }
+
+     const { output, inferenceTimeMs } = await model.predict(imageTensor);
+
+     // Determine if output is logits or probabilities
+     const isLogits = !isProbabilities(output);
+
+     // Apply calibration
+     const calibration = calibratePrediction(output, isLogits);
+
+     // Get top-K predictions
+     const probs = isLogits ? temperatureScaledSoftmax(output) : output;
+     const topKPredictions = getTopKFloat32(probs, topK);
+
+     return {
+       predictions: topKPredictions,
+       inferenceTimeMs,
+       calibration: {
+         temperature: PLANTVILLAGE_CALIBRATION.temperature,
+         entropy: calibration.entropy,
+         entropyConfidence: calibration.entropyConfidence,
+       },
+     };
+   }
+
+   function isProbabilities(output: Float32Array): boolean {
+     const sum = output.reduce((a, b) => a + b, 0);
+     return Math.abs(sum - 1.0) < 0.01;
+   }
+   ```
+
+3. **Update `enrichPredictions()` to use new label mapping**:
+
+   ```typescript
+   async function enrichPredictions(
+     topPredictions: Array<{ classIndex: number; probability: number }>,
+   ): Promise<PredictionResult[]> {
+     const results: PredictionResult[] = [];
+
+     for (const pred of topPredictions) {
+       // Skip healthy classes in top-K (they're handled separately)
+       if (isHealthyClass(pred.classIndex)) continue;
+
+       const diseaseId = getDiseaseIdForIndex(pred.classIndex);
+       const plantId = getPlantIdForIndex(pred.classIndex);
+
+       if (!diseaseId || diseaseId === "healthy") continue;
+
+       const disease = await getDiseaseById(diseaseId);
+       if (!disease) continue;
+
+       // Use probability as raw confidence, calibrate with entropy
+       const confidence = calibrateConfidence(pred.probability);
+
+       const plant = await getPlantById(disease.plantId).catch(() => null);
+
+       results.push({
+         diseaseId,
+         disease,
+         confidence,
+         lookalikes: disease.lookalikeDiseaseIds,
+         plant: plant ?? null,
+       });
+     }
+
+     results.sort((a, b) => b.confidence.adjusted - a.confidence.adjusted);
+     return results;
+   }
+   ```
+
+4. **Update response types** to support healthy result:
+
+   ```typescript
+   // src/lib/types.ts
+   export interface IdentifyResponse {
+     predictions?: PredictionResult[];
+     healthy?: boolean;
+     plantId?: string;
+     confidence?: ConfidenceResult;
+     metadata: InferenceMetadata;
+     demo_mode?: boolean; // Remove or always false
+   }
+   ```
+
+5. **Update `ResultsDashboard` component** to handle healthy result:
+
+   ```tsx
+   // If response.healthy === true, show HealthyResult component instead of prediction cards
+   if (response?.healthy) {
+     return <HealthyResult plantId={response.plantId} confidence={response.confidence} />;
+   }
+   ```
+
+6. **Create `HealthyResult` component** `src/components/HealthyResult.tsx`:
+
+   ```tsx
+   export default function HealthyResult({ plantId, confidence }) {
+     const plant = usePlant(plantId); // fetch plant data
+     return (
+       <div className="...">
+         <div className="text-6xl">🌿</div>
+         <h2>No Disease Detected</h2>
+         <p>
+           The image appears healthy{plant ? ` (${plant.commonName})` : ""}. Confidence:{" "}
+           {Math.round(confidence.adjusted * 100)}%
+         </p>
+         <p className="text-sm text-zinc-500">
+           If symptoms persist, try uploading a clearer photo of the affected area.
+         </p>
+       </div>
+     );
+   }
+   ```
+
+7. **Remove demo mode logic**:
+   - In `model-loader.ts`: remove `createMockModel()` fallback (or keep it but only for development)
+   - In `route.ts`: remove `demo_mode: true` branch
+   - In `ResultsDashboard.tsx`: remove "Demo mode" badge
+   - In `src/lib/api/identify.ts`: remove `demo_mode` from response type
+
+8. **Add error handling for model not loaded**:
+
+   ```typescript
+   const model = await getModel();
+   if (!model.getStatus().loaded) {
+     return NextResponse.json(
+       {
+         error: "Model not available",
+         message: "ML model failed to load. Please try again later.",
+       },
+       { status: 503 },
+     );
+   }
+   ```
+
+9. **Update client-side API** `src/lib/api/identify.ts`:
+
+   ```typescript
+   export interface IdentifyResponse {
+     predictions?: PredictionResult[];
+     healthy?: boolean;
+     plantId?: string;
+     confidence?: ConfidenceResult;
+     metadata: InferenceMetadata;
+   }
+   ```
+
+10. **Add structured logging** for inference requests:
+    ```typescript
+    console.log(
+      JSON.stringify({
+        event: "inference",
+        imageId,
+        modelId: MODEL_ID,
+        inferenceTimeMs,
+        topPrediction: predictions[0]?.diseaseId,
+        confidence: predictions[0]?.confidence.adjusted,
+        entropy: calibration?.entropy,
+      }),
+    );
+    ```
+
+tests:
+
+- Integration: POST `/api/identify` with valid imageId returns real predictions (no `demo_mode: true`)
+- Integration: response includes `predictions` array with valid diseaseIds from KB
+- Integration: confidence scores are calibrated (not raw softmax)
+- Integration: healthy predictions return `healthy: true` with plantId
+- Unit: `enrichPredictions()` skips healthy classes in top-K
+- Unit: `isProbabilities()` correctly identifies probability output
+- Unit: `runInference()` throws error if model not loaded
+- E2E: upload a tomato leaf image → get tomato disease predictions
+- E2E: upload a healthy plant image → get healthy result
+
+acceptance_criteria:
+
+- `/api/identify` returns real model predictions (not mock)
+- All diseaseIds in response are valid KB entries (verifiable via `getDiseaseById()`)
+- Confidence scores use temperature-scaled calibration (not raw softmax)
+- Healthy predictions return `{ healthy: true, plantId, confidence }` instead of disease predictions
+- Demo mode is completely removed from production path
+- Error handling: model not loaded → 503 response with clear message
+- Structured logging for every inference request
+- Client-side API handles new response shape (healthy vs predictions)
+
+validation:
+
+- `npx vitest run src/app/api/identify/identify.test.ts`
+- `npx vitest run src/lib/ml/inference.test.ts`
+- `curl -X POST http://localhost:3000/api/identify -H "Content-Type: application/json" -d '{"imageId":"<test-id>"}'` — response has real predictions
+- Upload a test image via UI → see real disease names (not demo mode)
+- Check server logs: `event: "inference"` with valid modelId and inferenceTimeMs
+
+notes:
+
+- This task depends on tasks 02, 03, and 04 being complete. Do not start until all dependencies are met.
+- The `enrichPredictions()` function now skips healthy classes — they're handled by the healthy result path
+- If the model is not loaded, return 503 (Service Unavailable) instead of falling back to mock
+- Structured logging should be JSON for easy parsing by log aggregators
+- The `demo_mode` field can be removed entirely or kept as `false` for backwards compatibility
--- a/tasks/production-ml-pipeline/06-plant-context-identification.md
+++ b/tasks/production-ml-pipeline/06-plant-context-identification.md
@@ -0,0 +1,284 @@
+# 06. Plant-Context-Aware Identification
+
+meta:
+id: production-ml-pipeline-06
+feature: production-ml-pipeline
+priority: P2
+depends_on: [production-ml-pipeline-05]
+tags: [implementation, ux, tests-required]
+
+objective:
+
+- Allow users to optionally specify which plant they're diagnosing before identification
+- Boost predictions for the selected plant's diseases (multiply confidence by plant-context factor)
+- Update the upload flow to include optional plant selection
+- Improve prediction accuracy when plant context is known
+
+deliverables:
+
+- `src/app/api/identify/route.ts` — accept optional `plantId` parameter
+- `src/lib/ml/plant-context.ts` — new module for plant-context scoring adjustment
+- `src/components/PlantSelector.tsx` — new component for optional plant selection
+- `src/app/upload/page.tsx` — integrate PlantSelector before upload
+- `src/lib/api/identify.ts` — client API updated to pass plantId
+
+steps:
+
+1. **Create plant-context scoring module** `src/lib/ml/plant-context.ts`:
+
+   ```typescript
+   import { PLANTVILLAGE_CLASSES } from "./plantvillage-classes";
+
+   /**
+    * Adjust prediction scores based on plant context.
+    * If plantId is provided, boost predictions for diseases of that plant.
+    *
+    * @param predictions - Top-K predictions with classIndex and probability
+    * @param plantId - Optional plant ID from user selection
+    * @param boostFactor - Multiplier for matching plant diseases (default 1.5)
+    * @returns Adjusted predictions with updated probabilities
+    */
+   export function applyPlantContext(
+     predictions: Array<{ classIndex: number; probability: number }>,
+     plantId: string | null,
+     boostFactor: number = 1.5,
+   ): Array<{ classIndex: number; probability: number; contextBoosted: boolean }> {
+     if (!plantId) {
+       return predictions.map((p) => ({ ...p, contextBoosted: false }));
+     }
+
+     // Find which class indices belong to this plant
+     const plantIndices = new Set(
+       PLANTVILLAGE_CLASSES.filter((c) => c.plantId === plantId && !c.isHealthy).map(
+         (c) => c.index,
+       ),
+     );
+
+     return predictions.map((pred) => {
+       const matchesPlant = plantIndices.has(pred.classIndex);
+       return {
+         classIndex: pred.classIndex,
+         probability: matchesPlant
+           ? Math.min(1.0, pred.probability * boostFactor)
+           : pred.probability,
+         contextBoosted: matchesPlant,
+       };
+     });
+   }
+   ```
+
+2. **Update `/api/identify` route** to accept `plantId`:
+
+   ```typescript
+   export async function POST(request: NextRequest) {
+     const body = await request.json();
+     const { imageId, plantId } = body; // plantId is optional
+
+     // ... existing preprocessing ...
+
+     const { probabilities, inferenceTimeMs } = await runInference(tensor);
+
+     // Get top-K predictions
+     const topK = getTopKFloat32(probabilities, 5);
+
+     // Apply plant context if provided
+     const adjusted = applyPlantContext(topK, plantId ?? null);
+
+     // Enrich with KB data
+     const predictions = await enrichPredictions(adjusted);
+
+     return NextResponse.json({
+       predictions,
+       metadata: { model: MODEL_ID, inferenceTimeMs, imageId, plantContext: plantId ?? null },
+     });
+   }
+   ```
+
+3. **Update `IdentifyRequest` type**:
+
+   ```typescript
+   // src/lib/types.ts
+   export interface IdentifyRequest {
+     imageId: string;
+     plantId?: string; // Optional plant context
+   }
+   ```
+
+4. **Create `PlantSelector` component** `src/components/PlantSelector.tsx`:
+
+   ```tsx
+   "use client";
+
+   import { useState, useEffect } from "react";
+
+   interface Plant {
+     id: string;
+     commonName: string;
+     imageUrl?: string;
+   }
+
+   export default function PlantSelector({
+     value,
+     onChange,
+   }: {
+     value: string | null;
+     onChange: (plantId: string | null) => void;
+   }) {
+     const [plants, setPlants] = useState<Plant[]>([]);
+     const [search, setSearch] = useState("");
+
+     useEffect(() => {
+       fetch("/api/plants?limit=50")
+         .then((r) => r.json())
+         .then((data) => setPlants(data.items ?? []));
+     }, []);
+
+     const filtered = plants.filter((p) =>
+       p.commonName.toLowerCase().includes(search.toLowerCase()),
+     );
+
+     return (
+       <div className="...">
+         <label>Plant (optional)</label>
+         <input
+           type="text"
+           placeholder="Search plants..."
+           value={search}
+           onChange={(e) => setSearch(e.target.value)}
+         />
+         {value && (
+           <div className="...">
+             Selected: {plants.find((p) => p.id === value)?.commonName}
+             <button onClick={() => onChange(null)}>Clear</button>
+           </div>
+         )}
+         <ul>
+           {filtered.slice(0, 10).map((plant) => (
+             <li key={plant.id} onClick={() => onChange(plant.id)}>
+               {plant.commonName}
+             </li>
+           ))}
+         </ul>
+       </div>
+     );
+   }
+   ```
+
+5. **Update upload page** to include plant selector:
+
+   ```tsx
+   // src/app/upload/page.tsx
+   export default function UploadPage() {
+     const [selectedPlant, setSelectedPlant] = useState<string | null>(null);
+
+     const handleUpload = useCallback(
+       async (file: File) => {
+         // 1. Upload image
+         const uploadResponse = await uploadImage(file);
+
+         // 2. Identify with plant context
+         const identifyResponse = await identifyPlant(uploadResponse.imageId, selectedPlant);
+
+         // 3. Navigate to results
+         router.push(`/results/${uploadResponse.imageId}`);
+       },
+       [selectedPlant],
+     );
+
+     return (
+       <div>
+         <PlantSelector value={selectedPlant} onChange={setSelectedPlant} />
+         <ImageUpload onUpload={handleUpload} />
+       </div>
+     );
+   }
+   ```
+
+6. **Update client-side API** to pass plantId:
+
+   ```typescript
+   // src/lib/api/identify.ts
+   export async function identifyPlant(
+     imageId: string,
+     plantId?: string,
+   ): Promise<IdentifyResponse> {
+     const body: IdentifyRequest = { imageId };
+     if (plantId) body.plantId = plantId;
+
+     const response = await fetch("/api/identify", {
+       method: "POST",
+       headers: { "Content-Type": "application/json" },
+       body: JSON.stringify(body),
+     });
+
+     return response.json();
+   }
+   ```
+
+7. **Update `PredictionResult` type** to include context boost info:
+
+   ```typescript
+   export interface PredictionResult {
+     diseaseId: string;
+     disease: Disease;
+     confidence: ConfidenceResult;
+     lookalikes: string[];
+     plant: Plant | null;
+     contextBoosted?: boolean; // true if boosted by plant context
+   }
+   ```
+
+8. **Update `ResultsDashboard`** to show context boost indicator:
+
+   ```tsx
+   {
+     prediction.contextBoosted && (
+       <span className="text-xs text-leaf-green-600">✓ Matches selected plant</span>
+     );
+   }
+   ```
+
+9. **Store plant context in results page** — pass plantId through URL or state:
+   ```typescript
+   // src/app/results/[imageId]/page.tsx
+   const plantId = searchParams.get("plant"); // optional
+   const response = await identifyPlant(imageId, plantId);
+   ```
+
+tests:
+
+- Unit: `applyPlantContext()` with no plantId returns predictions unchanged
+- Unit: `applyPlantContext()` with plantId="tomato" boosts tomato disease predictions
+- Unit: boosted probabilities are capped at 1.0
+- Unit: non-matching plant predictions are unchanged
+- Unit: `contextBoosted` flag is set correctly
+- Integration: POST `/api/identify` with plantId returns boosted predictions
+- Integration: POST `/api/identify` without plantId returns normal predictions
+- E2E: select "Tomato" in UI → upload tomato leaf → tomato diseases appear first
+
+acceptance_criteria:
+
+- Plant context is optional — identification works without it
+- When plantId is provided, predictions for that plant's diseases are boosted by 1.5x
+- Boosted probabilities are capped at 1.0
+- `contextBoosted` flag is set on boosted predictions
+- UI shows "Matches selected plant" indicator on boosted predictions
+- Plant selector component works (search, select, clear)
+- Upload flow includes optional plant selection step
+- Results page receives and displays plant context
+
+validation:
+
+- `npx vitest run src/lib/ml/plant-context.test.ts`
+- `npx vitest run src/components/PlantSelector.test.tsx`
+- Manual: select "Tomato" → upload image → tomato diseases appear with boost indicator
+- Manual: don't select plant → upload image → normal predictions (no boost)
+- Check API response: `predictions[0].contextBoosted` is true when plant matches
+
+notes:
+
+- Plant context is a scoring heuristic, not a hard filter. It boosts confidence but doesn't exclude other predictions.
+- The default boost factor is 1.5 — this can be tuned based on user feedback.
+- Plant selector is optional — users can skip it and get unboosted predictions.
+- The plant context feature is most useful when the user knows what plant they're diagnosing but the model is uncertain between multiple diseases.
+- For PlantVillage, each plant has 1–9 diseases, so the boost is specific enough to be useful without being overly restrictive.
--- a/tasks/production-ml-pipeline/07-end-to-end-testing.md
+++ b/tasks/production-ml-pipeline/07-end-to-end-testing.md
@@ -0,0 +1,292 @@
+# 07. End-to-End Integration Testing
+
+meta:
+id: production-ml-pipeline-07
+feature: production-ml-pipeline
+priority: P1
+depends_on: [production-ml-pipeline-05]
+tags: [testing, integration, e2e]
+
+objective:
+
+- Create comprehensive end-to-end tests that validate the full pipeline from image upload to disease diagnosis
+- Verify real model inference produces valid, calibrated predictions
+- Test all code paths: normal flow, healthy result, error cases, plant context
+- Ensure all components work together correctly in a realistic scenario
+
+deliverables:
+
+- `tests/e2e/pipeline.test.ts` — full pipeline E2E tests
+- `tests/e2e/fixtures/` — test images and expected results
+- `tests/e2e/utils.ts` — test utilities (upload helper, identify helper)
+- Updated `vitest.config.ts` — E2E test configuration
+
+steps:
+
+1. **Create test fixtures** `tests/e2e/fixtures/`:
+   - `tomato-early-blight.jpg` — known tomato early blight image (from PlantVillage test set)
+   - `tomato-healthy.jpg` — known healthy tomato image
+   - `unknown-plant.jpg` — unrelated image (should produce low confidence)
+   - `invalid-image.txt` — non-image file (should fail validation)
+   - `expected-results.json` — expected disease IDs and confidence ranges for each test image
+
+2. **Create E2E test utilities** `tests/e2e/utils.ts`:
+
+   ```typescript
+   import fs from "fs/promises";
+   import path from "path";
+
+   export async function uploadTestImage(
+     filename: string,
+   ): Promise<{ imageId: string; previewUrl: string }> {
+     const imagePath = path.join(__dirname, "fixtures", filename);
+     const imageBuffer = await fs.readFile(imagePath);
+
+     const formData = new FormData();
+     formData.append("image", new Blob([imageBuffer], { type: "image/jpeg" }), filename);
+
+     const response = await fetch("http://localhost:3000/api/upload", {
+       method: "POST",
+       body: formData,
+     });
+
+     if (!response.ok) {
+       throw new Error(`Upload failed: ${response.status}`);
+     }
+
+     return response.json();
+   }
+
+   export async function identifyImage(imageId: string, plantId?: string): Promise<any> {
+     const response = await fetch("http://localhost:3000/api/identify", {
+       method: "POST",
+       headers: { "Content-Type": "application/json" },
+       body: JSON.stringify({ imageId, plantId }),
+     });
+
+     if (!response.ok) {
+       throw new Error(`Identify failed: ${response.status}`);
+     }
+
+     return response.json();
+   }
+   ```
+
+3. **Write full pipeline E2E test** `tests/e2e/pipeline.test.ts`:
+
+   ```typescript
+   import { describe, it, expect, beforeAll } from "vitest";
+   import { uploadTestImage, identifyImage } from "./utils";
+   import expectedResults from "./fixtures/expected-results.json";
+
+   describe("End-to-End Pipeline", () => {
+     describe("Normal flow: disease detection", () => {
+       it("uploads a tomato early blight image and returns correct diagnosis", async () => {
+         // 1. Upload
+         const { imageId } = await uploadTestImage("tomato-early-blight.jpg");
+         expect(imageId).toBeDefined();
+
+         // 2. Identify
+         const result = await identifyImage(imageId);
+
+         // 3. Verify response structure
+         expect(result.predictions).toBeDefined();
+         expect(result.predictions.length).toBeGreaterThan(0);
+         expect(result.metadata).toBeDefined();
+         expect(result.metadata.model).toBe("plant-classifier-v1");
+         expect(result.metadata.inferenceTimeMs).toBeGreaterThan(0);
+         expect(result.demo_mode).toBeFalsy();
+
+         // 4. Verify top prediction is early blight
+         const topPrediction = result.predictions[0];
+         expect(topPrediction.diseaseId).toBe("early-blight");
+         expect(topPrediction.disease.name).toContain("Early Blight");
+         expect(topPrediction.plant.id).toBe("tomato");
+
+         // 5. Verify confidence is calibrated
+         expect(topPrediction.confidence.adjusted).toBeGreaterThan(0.5);
+         expect(topPrediction.confidence.label).toBe("high");
+
+         // 6. Verify disease data is enriched
+         expect(topPrediction.disease.symptoms.length).toBeGreaterThanOrEqual(3);
+         expect(topPrediction.disease.treatment.length).toBeGreaterThanOrEqual(3);
+         expect(topPrediction.disease.prevention.length).toBeGreaterThanOrEqual(2);
+       });
+     });
+
+     describe("Healthy result", () => {
+       it("returns healthy result for healthy plant image", async () => {
+         const { imageId } = await uploadTestImage("tomato-healthy.jpg");
+         const result = await identifyImage(imageId);
+
+         // Should return healthy: true or top prediction is a healthy class
+         if (result.healthy) {
+           expect(result.healthy).toBe(true);
+           expect(result.plantId).toBe("tomato");
+           expect(result.confidence.adjusted).toBeGreaterThan(0.5);
+         } else {
+           // If not healthy result, confidence should be low
+           const topPrediction = result.predictions[0];
+           expect(topPrediction.confidence.adjusted).toBeLessThan(0.5);
+         }
+       });
+     });
+
+     describe("Unknown image", () => {
+       it("returns low confidence for unrelated image", async () => {
+         const { imageId } = await uploadTestImage("unknown-plant.jpg");
+         const result = await identifyImage(imageId);
+
+         // Should have predictions but with low confidence
+         if (result.predictions) {
+           const topPrediction = result.predictions[0];
+           expect(topPrediction.confidence.adjusted).toBeLessThan(0.5);
+           expect(topPrediction.confidence.label).toBe("low");
+         }
+       });
+     });
+
+     describe("Plant context", () => {
+       it("boosts predictions when plantId is provided", async () => {
+         const { imageId } = await uploadTestImage("tomato-early-blight.jpg");
+
+         // Without plant context
+         const resultNoContext = await identifyImage(imageId);
+         const confidenceNoContext = resultNoContext.predictions[0].confidence.adjusted;
+
+         // With plant context
+         const resultWithContext = await identifyImage(imageId, "tomato");
+         const confidenceWithContext = resultWithContext.predictions[0].confidence.adjusted;
+
+         // Context should boost confidence (or at least not reduce it)
+         expect(confidenceWithContext).toBeGreaterThanOrEqual(confidenceNoContext);
+
+         // Boosted prediction should have contextBoosted flag
+         const boosted = resultWithContext.predictions.find((p) => p.contextBoosted);
+         expect(boosted).toBeDefined();
+       });
+     });
+
+     describe("Error cases", () => {
+       it("returns 404 for non-existent imageId", async () => {
+         const response = await fetch("http://localhost:3000/api/identify", {
+           method: "POST",
+           headers: { "Content-Type": "application/json" },
+           body: JSON.stringify({ imageId: "non-existent-id" }),
+         });
+
+         expect(response.status).toBe(404);
+       });
+
+       it("returns 400 for invalid image upload", async () => {
+         const formData = new FormData();
+         formData.append("image", new Blob(["not an image"], { type: "text/plain" }), "test.txt");
+
+         const response = await fetch("http://localhost:3000/api/upload", {
+           method: "POST",
+           body: formData,
+         });
+
+         expect(response.status).toBe(400);
+       });
+     });
+
+     describe("Performance", () => {
+       it("completes inference in under 500ms", async () => {
+         const { imageId } = await uploadTestImage("tomato-early-blight.jpg");
+
+         const start = Date.now();
+         await identifyImage(imageId);
+         const elapsed = Date.now() - start;
+
+         expect(elapsed).toBeLessThan(500);
+       });
+     });
+   });
+   ```
+
+4. **Create expected results fixture** `tests/e2e/fixtures/expected-results.json`:
+
+   ```json
+   {
+     "tomato-early-blight.jpg": {
+       "expectedDiseaseId": "early-blight",
+       "expectedPlantId": "tomato",
+       "minConfidence": 0.6,
+       "expectedConfidenceLabel": "high"
+     },
+     "tomato-healthy.jpg": {
+       "expectedHealthy": true,
+       "expectedPlantId": "tomato",
+       "minConfidence": 0.5
+     },
+     "unknown-plant.jpg": {
+       "maxConfidence": 0.5,
+       "expectedConfidenceLabel": "low"
+     }
+   }
+   ```
+
+5. **Update vitest config** to support E2E tests:
+
+   ```typescript
+   // vitest.config.ts
+   export default defineConfig({
+     test: {
+       // ... existing config ...
+       include: ["src/**/*.test.ts", "src/**/*.test.tsx", "tests/**/*.test.ts"],
+     },
+   });
+   ```
+
+6. **Add E2E test script** to `package.json`:
+
+   ```json
+   {
+     "scripts": {
+       "test:e2e": "vitest run tests/e2e"
+     }
+   }
+   ```
+
+7. **Document E2E test setup** in `tests/e2e/README.md`:
+   - Requires dev server running (`npm run dev`)
+   - Requires model files present (`public/models/plant-disease-classifier/`)
+   - Requires test fixtures (download PlantVillage test images)
+   - Run with `npm run test:e2e`
+
+8. **Download test images** from PlantVillage dataset:
+   - Use images from the PlantVillage test split (not training)
+   - Place in `tests/e2e/fixtures/`
+   - Document source and license
+
+tests:
+
+- E2E: full pipeline test (upload → identify → verify results)
+- E2E: healthy result detection
+- E2E: unknown image produces low confidence
+- E2E: plant context boosts predictions
+- E2E: error cases (404, 400)
+- E2E: performance (< 500ms inference)
+
+acceptance_criteria:
+
+- All E2E tests pass with real model inference
+- Test fixtures are documented and licensed appropriately
+- E2E tests can be run with `npm run test:e2e`
+- Tests cover: normal flow, healthy result, unknown image, plant context, errors, performance
+- Test results are deterministic (no flaky tests)
+
+validation:
+
+- `npm run test:e2e` — all tests pass
+- Manual: run tests against dev server and verify output
+- Check test coverage: all major code paths are exercised
+
+notes:
+
+- E2E tests require the dev server to be running (`npm run dev`)
+- Test images should be from PlantVillage test split (not training) to avoid overfitting concerns
+- If test images are not available, use synthetic test data (random tensors) for CI
+- Performance test threshold (500ms) is generous — actual inference should be < 200ms with tfjs-node
+- E2E tests are separate from unit tests — run them in CI after deployment to staging
--- a/tasks/production-ml-pipeline/08-production-hardening.md
+++ b/tasks/production-ml-pipeline/08-production-hardening.md
@@ -0,0 +1,405 @@
+# 08. Production Hardening and Observability
+
+meta:
+id: production-ml-pipeline-08
+feature: production-ml-pipeline
+priority: P1
+depends_on: [production-ml-pipeline-07]
+tags: [implementation, production, observability]
+
+objective:
+
+- Add comprehensive error handling at every layer of the pipeline
+- Implement structured logging for observability
+- Add rate limiting to prevent abuse
+- Create a health endpoint that reports model status and inference metrics
+- Ensure the system is production-ready with monitoring, cleanup, and resilience
+
+deliverables:
+
+- `src/app/api/health/route.ts` — enhanced health endpoint with model status
+- `src/lib/middleware/rate-limit.ts` — rate limiting middleware
+- `src/lib/middleware/error-handler.ts` — global error handler
+- `src/lib/observability/logger.ts` — structured logger
+- `src/lib/observability/metrics.ts` — inference metrics tracker
+- Updated API routes with error handling and logging
+- Updated `next.config.ts` with rate limiting configuration
+
+steps:
+
+1. **Create structured logger** `src/lib/observability/logger.ts`:
+
+   ```typescript
+   export interface LogEntry {
+     timestamp: string;
+     level: "debug" | "info" | "warn" | "error";
+     event: string;
+     data?: Record<string, any>;
+     error?: { message: string; stack?: string };
+   }
+
+   export function log(level: LogEntry["level"], event: string, data?: Record<string, any>) {
+     const entry: LogEntry = {
+       timestamp: new Date().toISOString(),
+       level,
+       event,
+       data,
+     };
+
+     if (level === "error" && data?.error) {
+       entry.error = {
+         message: data.error.message,
+         stack: data.error.stack,
+       };
+     }
+
+     console.log(JSON.stringify(entry));
+   }
+
+   export const logger = {
+     debug: (event: string, data?: any) => log("debug", event, data),
+     info: (event: string, data?: any) => log("info", event, data),
+     warn: (event: string, data?: any) => log("warn", event, data),
+     error: (event: string, data?: any) => log("error", event, data),
+   };
+   ```
+
+2. **Create metrics tracker** `src/lib/observability/metrics.ts`:
+
+   ```typescript
+   interface InferenceMetrics {
+     totalInferences: number;
+     totalErrors: number;
+     avgInferenceTimeMs: number;
+     lastInferenceAt: string | null;
+     modelLoaded: boolean;
+     modelLoadTimeMs: number | null;
+   }
+
+   class MetricsTracker {
+     private metrics: InferenceMetrics = {
+       totalInferences: 0,
+       totalErrors: 0,
+       avgInferenceTimeMs: 0,
+       lastInferenceAt: null,
+       modelLoaded: false,
+       modelLoadTimeMs: null,
+     };
+
+     recordInference(inferenceTimeMs: number) {
+       this.metrics.totalInferences++;
+       this.metrics.lastInferenceAt = new Date().toISOString();
+       // Running average
+       this.metrics.avgInferenceTimeMs =
+         (this.metrics.avgInferenceTimeMs * (this.metrics.totalInferences - 1) + inferenceTimeMs) /
+         this.metrics.totalInferences;
+     }
+
+     recordError() {
+       this.metrics.totalErrors++;
+     }
+
+     setModelStatus(loaded: boolean, loadTimeMs?: number) {
+       this.metrics.modelLoaded = loaded;
+       if (loadTimeMs !== undefined) {
+         this.metrics.modelLoadTimeMs = loadTimeMs;
+       }
+     }
+
+     getMetrics(): InferenceMetrics {
+       return { ...this.metrics };
+     }
+   }
+
+   export const metrics = new MetricsTracker();
+   ```
+
+3. **Enhance health endpoint** `src/app/api/health/route.ts`:
+
+   ```typescript
+   import { NextResponse } from "next/server";
+   import { getModel } from "@/lib/ml/model-loader";
+   import { metrics } from "@/lib/observability/metrics";
+
+   export async function GET() {
+     const model = await getModel();
+     const modelStatus = model.getStatus();
+
+     return NextResponse.json({
+       status: "ok",
+       timestamp: new Date().toISOString(),
+       model: {
+         loaded: modelStatus.loaded,
+         backend: modelStatus.backend,
+         modelId: modelStatus.modelId,
+         numClasses: modelStatus.numClasses,
+         error: modelStatus.error,
+       },
+       metrics: metrics.getMetrics(),
+       uptime: process.uptime(),
+     });
+   }
+   ```
+
+4. **Create rate limiting middleware** `src/lib/middleware/rate-limit.ts`:
+
+   ```typescript
+   import { NextRequest, NextResponse } from "next/server";
+
+   // Simple in-memory rate limiter (for production, use Redis or similar)
+   const requestCounts = new Map<string, { count: number; resetAt: number }>();
+
+   const RATE_LIMIT = {
+     maxRequests: 10, // 10 requests per window
+     windowMs: 60 * 1000, // 1 minute window
+   };
+
+   export function rateLimit(request: NextRequest): NextResponse | null {
+     const ip = request.headers.get("x-forwarded-for") || "unknown";
+     const now = Date.now();
+
+     let record = requestCounts.get(ip);
+
+     if (!record || now > record.resetAt) {
+       record = { count: 0, resetAt: now + RATE_LIMIT.windowMs };
+       requestCounts.set(ip, record);
+     }
+
+     record.count++;
+
+     if (record.count > RATE_LIMIT.maxRequests) {
+       return NextResponse.json(
+         { error: "Rate limit exceeded", message: "Too many requests. Please try again later." },
+         { status: 429 },
+       );
+     }
+
+     return null; // No rate limit hit
+   }
+   ```
+
+5. **Create global error handler** `src/lib/middleware/error-handler.ts`:
+
+   ```typescript
+   import { NextResponse } from "next/server";
+   import { logger } from "@/lib/observability/logger";
+
+   export function handleError(error: unknown, context: string): NextResponse {
+     logger.error("unhandled_error", {
+       context,
+       error:
+         error instanceof Error
+           ? { message: error.message, stack: error.stack }
+           : { message: String(error) },
+     });
+
+     return NextResponse.json(
+       {
+         error: "Internal server error",
+         message: "An unexpected error occurred. Please try again later.",
+         context,
+       },
+       { status: 500 },
+     );
+   }
+   ```
+
+6. **Add error handling to `/api/upload`**:
+
+   ```typescript
+   import { rateLimit } from "@/lib/middleware/rate-limit";
+   import { handleError } from "@/lib/middleware/error-handler";
+   import { logger } from "@/lib/observability/logger";
+
+   export async function POST(request: NextRequest) {
+     // Rate limiting
+     const rateLimitError = rateLimit(request);
+     if (rateLimitError) return rateLimitError;
+
+     try {
+       logger.info("upload_start", { ip: request.headers.get("x-forwarded-for") });
+
+       // ... existing upload logic ...
+
+       logger.info("upload_success", { imageId, fileSize: buffer.length });
+       return NextResponse.json({ imageId, tensorShape, previewUrl });
+     } catch (error) {
+       return handleError(error, "upload");
+     }
+   }
+   ```
+
+7. **Add error handling to `/api/identify`**:
+
+   ```typescript
+   export async function POST(request: NextRequest) {
+     const rateLimitError = rateLimit(request);
+     if (rateLimitError) return rateLimitError;
+
+     try {
+       logger.info("identify_start", { imageId, plantId });
+
+       const startTime = Date.now();
+
+       // ... existing identify logic ...
+
+       const inferenceTimeMs = Date.now() - startTime;
+       metrics.recordInference(inferenceTimeMs);
+
+       logger.info("identify_success", {
+         imageId,
+         inferenceTimeMs,
+         topPrediction: predictions[0]?.diseaseId,
+         confidence: predictions[0]?.confidence.adjusted,
+       });
+
+       return NextResponse.json({ predictions, metadata });
+     } catch (error) {
+       metrics.recordError();
+
+       if (error instanceof Error && error.message.includes("not loaded")) {
+         return NextResponse.json(
+           {
+             error: "Model not available",
+             message: "ML model failed to load. Please try again later.",
+           },
+           { status: 503 },
+         );
+       }
+
+       return handleError(error, "identify");
+     }
+   }
+   ```
+
+8. **Add model status tracking to `model-loader.ts`**:
+
+   ```typescript
+   import { metrics } from "@/lib/observability/metrics";
+
+   async function loadModel(): Promise<PlantDiseaseModel> {
+     const startTime = Date.now();
+
+     try {
+       const model = await tryLoadTFJS();
+       if (model) {
+         const loadTimeMs = Date.now() - startTime;
+         metrics.setModelStatus(true, loadTimeMs);
+         logger.info("model_loaded", { backend: "tfjs", loadTimeMs });
+         return model;
+       }
+     } catch (error) {
+       logger.warn("model_load_failed", { backend: "tfjs", error });
+     }
+
+     // ... fallback to mock ...
+     metrics.setModelStatus(false);
+     return createMockModel();
+   }
+   ```
+
+9. **Add cleanup for old uploads**:
+
+   ```typescript
+   // src/lib/cleanup.ts
+   import fs from "fs/promises";
+   import path from "path";
+
+   const UPLOADS_DIR = path.join(process.cwd(), "public", "uploads");
+   const MAX_AGE_MS = 24 * 60 * 60 * 1000; // 24 hours
+
+   export async function cleanupOldUploads() {
+     const files = await fs.readdir(UPLOADS_DIR);
+     const now = Date.now();
+
+     for (const file of files) {
+       const filePath = path.join(UPLOADS_DIR, file);
+       const stat = await fs.stat(filePath);
+
+       if (now - stat.mtimeMs > MAX_AGE_MS) {
+         await fs.unlink(filePath);
+         logger.info("upload_cleaned", { file, ageMs: now - stat.mtimeMs });
+       }
+     }
+   }
+
+   // Run cleanup on server start and periodically
+   if (process.env.NODE_ENV === "production") {
+     cleanupOldUploads();
+     setInterval(cleanupOldUploads, 60 * 60 * 1000); // Every hour
+   }
+   ```
+
+10. **Update `next.config.ts`** with security headers and rate limiting:
+
+    ```typescript
+    const nextConfig = {
+      // ... existing config ...
+      async headers() {
+        return [
+          {
+            source: "/api/:path*",
+            headers: [
+              { key: "X-Content-Type-Options", value: "nosniff" },
+              { key: "X-Frame-Options", value: "DENY" },
+              { key: "X-XSS-Protection", value: "1; mode=block" },
+            ],
+          },
+        ];
+      },
+    };
+    ```
+
+11. **Add monitoring dashboard** (optional) `src/app/admin/metrics/page.tsx`:
+    - Simple page showing inference metrics
+    - Model status
+    - Recent inference times
+    - Error rate
+    - Protected by authentication (admin only)
+
+12. **Document production checklist** in `docs/production-checklist.md`:
+    - Environment variables needed
+    - Model deployment steps
+    - Monitoring setup
+    - Backup strategy
+    - Rollback procedure
+
+tests:
+
+- Unit: rate limiter blocks after max requests
+- Unit: rate limiter resets after window
+- Unit: metrics tracker records inference correctly
+- Unit: metrics tracker computes running average
+- Unit: logger produces valid JSON output
+- Integration: health endpoint returns model status and metrics
+- Integration: rate limit returns 429 after max requests
+- Integration: error handler catches unhandled errors and returns 500
+
+acceptance_criteria:
+
+- All API routes have rate limiting (10 requests per minute per IP)
+- All API routes have structured logging (JSON format)
+- Health endpoint reports model status, inference metrics, uptime
+- Error handler catches all unhandled errors and returns 500 with clear message
+- Old uploads are cleaned up automatically (24-hour TTL)
+- Metrics tracker records inference time, error rate, model status
+- Security headers are set (X-Content-Type-Options, X-Frame-Options, X-XSS-Protection)
+- Production checklist is documented
+
+validation:
+
+- `npx vitest run src/lib/middleware/rate-limit.test.ts`
+- `npx vitest run src/lib/observability/metrics.test.ts`
+- `curl http://localhost:3000/api/health` — returns model status and metrics
+- `curl -X POST http://localhost:3000/api/identify ...` (11 times) — 11th request returns 429
+- Check server logs: JSON-formatted log entries for all requests
+- Wait 25 minutes: old uploads are cleaned up
+
+notes:
+
+- Rate limiter uses in-memory storage — for multi-instance deployments, use Redis or similar
+- Metrics are in-memory — for persistent metrics, use a time-series database
+- Health endpoint should be monitored by uptime monitoring service (e.g., Pingdom, UptimeRobot)
+- Cleanup runs every hour in production — adjust frequency based on upload volume
+- Security headers are basic — consider adding CSP, HSTS for full security hardening
+- Production checklist should be reviewed before each deployment
--- a/tasks/production-ml-pipeline/README.md
+++ b/tasks/production-ml-pipeline/README.md
@@ -0,0 +1,40 @@
+# Production ML Pipeline
+
+Objective: Get the plant disease identification ML pipeline to full production readiness with real model inference, proper class mapping, and production-grade error handling.
+
+Status legend: [ ] todo, [~] in-progress, [x] done
+
+## Tasks
+
+- [ ] 01 — PlantVillage class inventory and knowledge base mapping → `01-plantvillage-class-inventory.md`
+- [ ] 02 — Label mapping layer implementation → `02-label-mapping-implementation.md`
+- [ ] 03 — TensorFlow.js model loading verification and fixes → `03-model-loading-verification.md`
+- [ ] 04 — Confidence calibration for PlantVillage model → `04-confidence-calibration.md`
+- [ ] 05 — Real model integration into identification pipeline → `05-pipeline-integration.md`
+- [ ] 06 — Plant-context-aware identification → `06-plant-context-identification.md`
+- [ ] 07 — End-to-end integration testing → `07-end-to-end-testing.md`
+- [ ] 08 — Production hardening and observability → `08-production-hardening.md`
+
+## Dependencies
+
+- 01 → 02 (mapping data feeds label layer)
+- 02 → 05 (labels feed pipeline)
+- 03 → 05 (verified model loading feeds pipeline)
+- 04 → 05 (calibration feeds pipeline)
+- 05 → 06 (real model enables plant context)
+- 05 → 07 (integrated pipeline enables e2e testing)
+- 07 → 08 (tested pipeline enables production hardening)
+
+## Exit Criteria
+
+- The feature is complete when:
+  - Model loads successfully and produces real (non-mock) predictions
+  - All 38 PlantVillage classes map to valid knowledge base disease IDs
+  - End-to-end pipeline works: upload image → get real disease diagnoses with calibrated confidence
+  - Confidence scores are meaningful (high confidence for clear cases, low for ambiguous)
+  - Plant context optionally boosts relevant predictions
+  - Full integration test suite passes
+  - Error handling, logging, and monitoring in place
+  - No demo mode fallback in production
+  - Rate limiting and input sanitization active
+  - Health endpoint reports model status and inference metrics