This commit is contained in:
2026-06-08 16:42:04 -04:00
commit 8bda14ab63
179 changed files with 48104 additions and 0 deletions

View File

@@ -0,0 +1,152 @@
# 01. PlantVillage Class Inventory and Knowledge Base Mapping
meta:
id: production-ml-pipeline-01
feature: production-ml-pipeline
priority: P0
depends_on: []
tags: [data, mapping, research]
objective:
- Document all 38 PlantVillage model output classes
- Map each class index to a definitive disease ID in the knowledge base
- Identify which plants and diseases are missing from the KB and must be added
- Produce a complete, authoritative mapping file that subsequent tasks consume
deliverables:
- `src/lib/ml/plantvillage-classes.ts` — definitive mapping of all 38 class indices to structured metadata
- Updated `tasks/production-ml-pipeline/class-mapping-reference.md` — human-readable reference document
steps:
1. Document the canonical 38 PlantVillage class labels in order (index 037):
```
0: Apple___Apple_scab
1: Apple___Black_rot
2: Apple___Cedar_apple_rust
3: Apple___healthy
4: Blueberry___healthy
5: Cherry_(including_sour)___Powdery_mildew
6: Cherry_(including_sour)___healthy
7: Corn_(maize)___Cercospora_leaf_spot Gray_leaf_spot
8: Corn_(maize)___Common_rust_
9: Corn_(maize)___Northern_Leaf_Blight
10: Corn_(maize)___healthy
11: Grape___Black_rot
12: Grape___Esca_(Black_Measles)
13: Grape___Leaf_blight_(Isariopsis_Leaf_Spot)
14: Grape___healthy
15: Orange___Haunglongbing_(Citrus_greening)
16: Peach___Bacterial_spot
17: Peach___healthy
18: Pepper,_bell___Bacterial_spot
19: Pepper,_bell___healthy
20: Potato___Early_blight
21: Potato___Late_blight
22: Potato___healthy
23: Raspberry___healthy
24: Soybean___healthy
25: Squash___Powdery_mildew
26: Strawberry___Leaf_scorch
27: Strawberry___healthy
28: Tomato___Bacterial_spot
29: Tomato___Early_blight
30: Tomato___Late_blight
31: Tomato___Leaf_Mold
32: Tomato___Septoria_leaf_spot
33: Tomato___Spider_mites Two-spotted_spider_mite
34: Tomato___Target_Spot
35: Tomato___Tomato_Yellow_Leaf_Curl_Virus
36: Tomato___Tomato_mosaic_virus
37: Tomato___healthy
```
2. For each class, determine the mapping target:
- **Healthy classes** (13 total: indices 3, 4, 6, 10, 14, 17, 19, 22, 23, 24, 27, 37): map to a special `"healthy"` sentinel. These indicate the model detected no disease.
- **Disease classes with exact KB match**: map directly to existing disease ID.
- 28 → `bacterial-leaf-spot-tomato` (Tomato Bacterial_spot ≈ bacterial-leaf-spot-tomato)
- 29 → `early-blight`
- 30 → `late-blight`
- 32 → `septoria-leaf-spot`
- 25 → `squash-powdery-mildew`
- 26 → `strawberry-leaf-scorch`
- 18 → `pepper-bacterial-wilt` (closest match to Pepper Bacterial_spot)
- **Disease classes needing new KB entries** (no existing disease in our KB):
- 0: Apple_scab → new disease `apple-scab` under plant `apple`
- 1: Apple_black_rot → new disease `apple-black-rot` under plant `apple`
- 2: Apple_cedar_apple_rust → new disease `apple-cedar-apple-rust` under plant `apple`
- 5: Cherry_powdery_mildew → new disease `cherry-powdery-mildew` under plant `cherry`
- 7: Corn_cercospora_leaf_spot → new disease `corn-gray-leaf-spot` under plant `corn`
- 8: Corn_common_rust → new disease `corn-common-rust` under plant `corn`
- 9: Corn_northern_leaf_blight → new disease `corn-northern-leaf-blight` under plant `corn`
- 11: Grape_black_rot → new disease `grape-black-rot` under plant `grape`
- 12: Grape_esca → new disease `grape-esca` under plant `grape`
- 13: Grape_leaf_blight → new disease `grape-leaf-blight` under plant `grape`
- 15: Orange_huanglongbing → new disease `orange-citrus-greening` under plant `orange`
- 16: Peach_bacterial_spot → new disease `peach-bacterial-spot` under plant `peach`
- 20: Potato_early_blight → new disease `potato-early-blight` under plant `potato`
- 21: Potato_late_blight → new disease `potato-late-blight` under plant `potato`
- 31: Tomato_leaf_mold → new disease `tomato-leaf-mold` under plant `tomato`
- 33: Tomato_spider_mites → new disease `tomato-spider-mites` under plant `tomato`
- 34: Tomato_target_spot → new disease `tomato-target-spot` under plant `tomato`
- 35: Tomato_yellow_leaf_curl_virus → new disease `tomato-yellow-leaf-curl-virus` under plant `tomato`
- 36: Tomato_mosaic_virus → new disease `tomato-mosaic-virus` under plant `tomato`
3. Create the mapping type and data structure in `src/lib/ml/plantvillage-classes.ts`:
```typescript
export interface PlantVillageClass {
index: number;
rawLabel: string;
plantId: string; // KB plant slug
diseaseId: string | null; // null for healthy classes
isHealthy: boolean;
displayName: string; // human-readable disease name
}
export const PLANTVILLAGE_CLASSES: readonly PlantVillageClass[] = [ ... ];
```
4. For each class, also record:
- The PlantVillage plant name (e.g., "Tomato", "Apple")
- The target KB plantId (e.g., "tomato", "apple")
- The target KB diseaseId (e.g., "early-blight") or null for healthy
- Whether the disease needs to be added to the KB (boolean flag for task 02)
5. Verify the mapping covers all 38 indices with no gaps or duplicates.
tests:
- Unit: mapping has exactly 38 entries
- Unit: indices 037 are all present, no gaps
- Unit: each non-healthy entry has a non-null diseaseId
- Unit: each healthy entry has null diseaseId and isHealthy=true
- Unit: no duplicate diseaseIds across non-healthy entries
- Unit: all plantIds are valid slugs (lowercase, kebab-case)
acceptance_criteria:
- `src/lib/ml/plantvillage-classes.ts` exports `PLANTVILLAGE_CLASSES` array with exactly 38 entries
- Every index 037 maps to exactly one entry
- 13 entries are healthy (isHealthy=true, diseaseId=null)
- 25 entries are diseases with valid plantId and diseaseId
- Each entry includes rawLabel, plantId, diseaseId, displayName
- All new disease IDs follow kebab-case convention matching existing KB pattern
- Reference document `class-mapping-reference.md` lists all 38 classes with their KB mappings
validation:
- `npx vitest run src/lib/ml/plantvillage-classes.test.ts` — all mapping tests pass
- Manual review: each of the 25 disease entries maps to a plausible disease in our KB
notes:
- This task produces the authoritative mapping consumed by task 02 (KB expansion) and task 03 (label mapping)
- The PlantVillage class order is fixed by the model's training — do NOT reorder
- "Tomato Bacterial_spot" maps to our existing `bacterial-leaf-spot-tomato` — this is the closest match, not a perfect one
- "Pepper Bacterial_spot" maps to `pepper-bacterial-wilt` — imperfect but closest available match
- 10 new plants must be added to the KB: apple, blueberry, cherry, corn, grape, orange, peach, potato, raspberry, soybean
- Blueberry, Raspberry, Soybean only have "healthy" class — still need plant entries for context but no new disease entries

View File

@@ -0,0 +1,149 @@
# 02. Label Mapping Layer Implementation
meta:
id: production-ml-pipeline-02
feature: production-ml-pipeline
priority: P0
depends_on: [production-ml-pipeline-01]
tags: [implementation, knowledge-base, tests-required]
objective:
- Expand the knowledge base to cover all PlantVillage plants and diseases
- Rewrite `src/lib/ml/labels.ts` to use the PlantVillage class mapping from task 01
- Ensure every model output index resolves to a valid KB disease or the "healthy" sentinel
- The label layer must be the single source of truth for model-index → disease mapping
deliverables:
- Updated `src/data/plants.json` — 10 new PlantVillage plants added (apple, blueberry, cherry, corn, grape, orange, peach, potato, raspberry, soybean)
- Updated `src/data/diseases.json` — 19 new disease entries added for PlantVillage diseases not yet in KB
- `src/lib/ml/labels.ts` — fully rewritten to use PlantVillage class mapping
- `src/lib/ml/labels.test.ts` — updated to validate against new mapping
- `scripts/seed-plantvillage-kb.ts` — DB migration script to insert new plants and diseases into Turso
steps:
1. **Add 10 new plants to `src/data/plants.json`** — each with proper metadata:
```typescript
// New plants needed (PlantVillage coverage):
{ id: "apple", commonName: "Apple", scientificName: "Malus domestica", family: "Rosaceae", category: "fruit" }
{ id: "cherry", commonName: "Cherry", scientificName: "Prunus avium", family: "Rosaceae", category: "fruit" }
{ id: "corn", commonName: "Corn (Maize)", scientificName: "Zea mays", family: "Poaceae", category: "vegetable" }
{ id: "grape", commonName: "Grape", scientificName: "Vitis vinifera", family: "Vitaceae", category: "fruit" }
{ id: "orange", commonName: "Orange", scientificName: "Citrus sinensis", family: "Rutaceae", category: "fruit" }
{ id: "peach", commonName: "Peach", scientificName: "Prunus persica", family: "Rosaceae", category: "fruit" }
{ id: "potato", commonName: "Potato", scientificName: "Solanum tuberosum", family: "Solanaceae", category: "vegetable" }
{ id: "blueberry", commonName: "Blueberry", scientificName: "Vaccinium corymbosum", family: "Ericaceae", category: "fruit" }
{ id: "raspberry", commonName: "Raspberry", scientificName: "Rubus idaeus", family: "Rosaceae", category: "fruit" }
{ id: "soybean", commonName: "Soybean", scientificName: "Glycine max", family: "Fabaceae", category: "vegetable" }
```
- Add `imageUrl` for each (use Wikipedia pageimages, same pattern as `fill-plant-images.ts`)
- Add `careSummary` for each
2. **Add 19 new diseases to `src/data/diseases.json`** — each with full structured data:
- Use the template-based approach from `scripts/disease-templates.ts` where possible
- Source disease details from:
- UW-Madison PDDC factsheets (pddc.wisc.edu)
- Cornell Plant Clinic (plantclinic.cornell.edu)
- University extension publications
- Each disease must have: `id`, `plantId`, `name`, `scientificName`, `causalAgentType`, `description`, `symptoms` (≥3), `causes` (≥2), `treatment` (≥3), `prevention` (≥2), `lookalikeDiseaseIds`, `severity`, `prevalence`
- New disease entries needed:
- apple-scab, apple-black-rot, apple-cedar-apple-rust (plant: apple)
- cherry-powdery-mildew (plant: cherry)
- corn-gray-leaf-spot, corn-common-rust, corn-northern-leaf-blight (plant: corn)
- grape-black-rot, grape-esca, grape-leaf-blight (plant: grape)
- orange-citrus-greening (plant: orange)
- peach-bacterial-spot (plant: peach)
- potato-early-blight, potato-late-blight (plant: potato)
- tomato-leaf-mold, tomato-spider-mites, tomato-target-spot, tomato-yellow-leaf-curl-virus, tomato-mosaic-virus (plant: tomato)
- Use programmatic approach: write a generator script that pulls from UW-Madison PDDC / Cornell factsheets and Wikipedia, following the same pattern as `scripts/generate-full-kb.ts`
3. **Update lookalikeDiseaseIds** — cross-reference within new diseases:
- Apple scab ↔ Apple black rot (both cause leaf spots on apple)
- Potato early blight ↔ Potato late blight (both affect potato foliage)
- Grape black rot ↔ Grape esca (both cause fruit rot)
- Tomato early blight ↔ Tomato septoria leaf spot ↔ Tomato target spot (all cause leaf lesions)
- Tomato leaf mold ↔ Tomato septoria leaf spot (both cause leaf spots in humid conditions)
4. **Rewrite `src/lib/ml/labels.ts`** to use the PlantVillage mapping:
```typescript
import { PLANTVILLAGE_CLASSES } from "./plantvillage-classes";
// Total output classes from model
export const NUM_CLASSES = 38;
// Index 037 → disease lookup
export function getDiseaseIdForIndex(index: number): string {
const entry = PLANTVILLAGE_CLASSES[index];
if (!entry || entry.isHealthy) return "healthy";
return entry.diseaseId;
}
export function getPlantIdForIndex(index: number): string {
return PLANTVILLAGE_CLASSES[index]?.plantId ?? "unknown";
}
export function isHealthyClass(index: number): boolean {
return PLANTVILLAGE_CLASSES[index]?.isHealthy ?? false;
}
// Disease ID → index (for reverse lookup)
export function getIndexForDiseaseId(diseaseId: string): number {
const entry = PLANTVILLAGE_CLASSES.find((c) => c.diseaseId === diseaseId.toLowerCase());
return entry?.index ?? -1;
}
```
5. **Remove old assumptions** — the old labels.ts assumed 95 classes (93 diseases + healthy + unknown). Delete all references to `diseases.json` index ordering from labels.ts. The mapping is now defined by `plantvillage-classes.ts`, not by JSON file order.
6. **Create DB migration script** `scripts/seed-plantvillage-kb.ts`:
- Read updated `src/data/plants.json` and `src/data/diseases.json`
- Insert new plants and diseases into Turso DB using Drizzle ORM
- Use UPSERT (INSERT OR REPLACE) to be idempotent
- Log what was inserted/updated
7. **Run the migration** to populate the DB with new data.
tests:
- Unit: `labels.test.ts` validates all 38 indices map correctly
- Unit: `getDiseaseIdForIndex(29)` returns `"early-blight"`
- Unit: `getDiseaseIdForIndex(3)` returns `"healthy"` (Apple healthy class)
- Unit: `getIndexForDiseaseId("early-blight")` returns `29`
- Unit: `isHealthyClass(37)` returns `true` (Tomato healthy)
- Unit: `isHealthyClass(29)` returns `false` (Tomato Early_blight)
- Unit: `getPlantIdForIndex(0)` returns `"apple"`
- Unit: All 25 non-healthy diseaseIds resolve to real DB entries via `getDiseaseById()`
- Integration: `scripts/seed-plantvillage-kb.ts` runs without errors, inserts all 10 plants and 19 diseases
- Integration: After seeding, DB query for each new disease returns a complete record
acceptance_criteria:
- `PLANTVILLAGE_CLASSES` in labels.ts has exactly 38 entries matching model output order
- 13 healthy indices correctly return "healthy" from `getDiseaseIdForIndex()`
- 25 disease indices correctly return valid diseaseIds
- All 10 new plants exist in `src/data/plants.json` with valid metadata and imageUrl
- All 19 new diseases exist in `src/data/diseases.json` with full structured data (symptoms, treatment, prevention, etc.)
- DB migration script runs successfully, all new data queryable from Turso
- Old `diseases.json` ordering assumption is completely removed from labels.ts
- All existing tests still pass (no regressions in browse, search, detail pages)
validation:
- `npx vitest run src/lib/ml/labels.test.ts`
- `npx vitest run src/lib/ml/plantvillage-classes.test.ts`
- `npx tsx scripts/seed-plantvillage-kb.ts` — verify output shows correct inserts
- `npx vitest run` — full test suite passes
- Manual: query DB for each new plant/disease and verify complete data
notes:
- Disease data must come from authoritative sources (university extension services), not hand-written
- Use the same template-based generation approach from `scripts/generate-full-kb.ts` for consistency
- The `pepper-bacterial-wilt` disease already exists — map Pepper\_\_\_Bacterial_spot to it even though it's not a perfect match (it's the closest available)
- Blueberry, Raspberry, and Soybean only have "healthy" classes in PlantVillage — add plant entries but no disease entries for these (they don't need new disease IDs since they always map to "healthy")
- Total disease count after this task: 93 (existing) + 19 (new) = 112 diseases

View File

@@ -0,0 +1,170 @@
# 03. TensorFlow.js Model Loading Verification and Fixes
meta:
id: production-ml-pipeline-03
feature: production-ml-pipeline
priority: P0
depends_on: []
tags: [implementation, model, tests-required]
objective:
- Verify the converted TF.js GraphModel loads successfully on the Node.js server
- Fix input tensor format handling (NCHW pipeline input → NHWC model input)
- Determine whether model output is logits or pre-computed softmax probabilities
- Ensure inference produces valid [1, 38] output without errors
- Install `@tensorflow/tfjs-node` for server-side native acceleration
deliverables:
- `src/lib/ml/model-loader.ts` — fixed and verified for real model loading
- `src/lib/ml/model-loader.test.ts` — updated integration tests
- `package.json``@tensorflow/tfjs-node` added as dependency (if needed)
- `src/lib/ml/inference.ts` — fixed output interpretation (logits vs probabilities)
- `src/lib/ml/inference.test.ts` — updated for real model inference
steps:
1. **Determine output interpretation** — inspect the graph topology to resolve whether `Identity:0` is pre-softmax logits or post-softmax probabilities:
- The model graph contains a `Softmax` node at `StatefulPartitionedCall/mnv2_pv_original_1/dense_1/Softmax`
- The output `Identity:0` may be after Softmax (probabilities) or before (logits)
- Test: run inference on a zero tensor — if output sums to ~1.0, it's already probabilities; if output has negative values or doesn't sum to 1.0, it's logits
- Fix: if output is already probabilities, remove the `softmaxFloat32()` call in `inference.ts` and use the raw output directly
2. **Fix input tensor format** — the model expects NHWC `[1, 160, 160, 3]` but our pipeline produces NCHW `[3, 160, 160]`:
```typescript
// Current code in model-loader.ts tryLoadTFJS():
const inputTensor = tf
.tensor4d(Array.from(tensor), [3, 160, 160])
.transpose([1, 2, 0]) // [160, 160, 3]
.expandDims(0); // [1, 160, 160, 3] NHWC
```
- Verify this transpose is correct (NCHW → NHWC)
- Verify the tensor values are in the expected range (ImageNet-normalized: roughly -2.5 to +2.5)
- Alternative: reshape directly as `[1, 160, 160, 3]` if the identify endpoint produces NHWC data
3. **Install `@tensorflow/tfjs-node`** for server-side native acceleration:
```bash
npm install @tensorflow/tfjs-node
```
- Browser tfjs works on server but is significantly slower (no native BLAS)
- `@tensorflow/tfjs-node` uses libtensorflow C library for ~10-100x speedup
- Verify native bindings install correctly (may need `@tensorflow/tfjs-node-gpu` for GPU, but CPU is fine for this use case)
- Fallback chain remains: tfjs-node → tfjs (browser) → mock
4. **Verify model loads from filesystem**:
```typescript
const model = await tf.loadGraphModel(`file://${MODEL_JSON_PATH}`);
console.log("Model loaded:", model.inputs, model.outputs);
// Expected:
// inputs: [{ shape: [-1, 160, 160, 3], dtype: 'float32' }]
// outputs: [{ shape: [-1, 38], dtype: 'float32' }]
```
- Verify `model.inputs[0].shape` matches `[null, 160, 160, 3]`
- Verify `model.outputs[0].shape` matches `[null, 38]`
- Verify model has `predict()` method (GraphModel uses `predict()`, not `execute()`)
5. **Run inference smoke test**:
```typescript
// Create a test tensor (random normalized values)
const testTensor = new Float32Array(3 * 160 * 160);
for (let i = 0; i < testTensor.length; i++) {
testTensor[i] = (Math.random() - 0.5) * 2;
}
// Reshape to NHWC for TF.js
const input = tf.tensor4d(
Array.from(testTensor),
[1, 160, 160, 3], // NHWC
);
const output = model.predict(input);
const data = await output.data();
console.log("Output shape:", output.shape);
console.log(
"Output sum:",
data.reduce((a, b) => a + b, 0),
);
console.log("Output max:", Math.max(...data));
console.log("Output min:", Math.min(...data));
```
- Output should be [1, 38] with 38 float values
- If values are probabilities: sum ≈ 1.0, all values ≥ 0
- If values are logits: sum ≠ 1.0, may have negative values
6. **Fix `model-loader.ts` `getStatus()` to report real class count**:
```typescript
getStatus(): ModelStatus {
return {
loaded: true,
backend: "tfjs",
modelId: MODEL_ID,
numClasses: 38, // PlantVillage, not 95
};
}
```
7. **Add memory management** — dispose tensors after use to prevent memory leaks:
```typescript
// In predict():
tf.tidy(() => {
const input = tf.tensor4d(...);
const output = model.predict(input);
return output.dataSync();
});
```
- Or manually dispose: `inputTensor.dispose()`, `outputTensor.dispose()`
- Use `tf.memory()` to monitor tensor count during development
8. **Handle model load failures gracefully**:
- If model files are corrupted, log the specific error
- If tfjs-node native bindings fail, fall back to browser tfjs with a warning
- Never crash the server on model load failure — fall back to mock mode with clear logging
tests:
- Integration: model loads from `public/models/plant-disease-classifier/model.json` without errors
- Integration: `model.inputs[0].shape` is `[-1, 160, 160, 3]`
- Integration: `model.outputs[0].shape` is `[-1, 38]`
- Integration: inference on random tensor produces [38] float output
- Integration: if output is probabilities, sum is within 0.991.01
- Integration: `getStatus()` returns `{ loaded: true, backend: "tfjs", numClasses: 38 }`
- Unit: `validateInput()` correctly rejects tensors with wrong length
- Unit: NCHW → NHWC transpose produces correct layout
- Performance: inference completes in < 500ms on a typical server (with tfjs-node)
acceptance_criteria:
- `getModel()` returns a model with `loaded: true` and `backend: "tfjs"`
- `model.predict()` on a valid [1, 160, 160, 3] input returns [1, 38] output without errors
- Output interpretation is correctly determined (logits vs probabilities) and handled
- `@tensorflow/tfjs-node` is installed and used as primary backend
- No memory leaks: tensor count stays stable after repeated inference calls
- Fallback chain works: tfjs-node → tfjs → mock (each failure logs warning)
- Model load time < 30 seconds on first request
- Inference time < 500ms per image on server
validation:
- `npm install @tensorflow/tfjs-node` — native bindings install successfully
- `npx vitest run src/lib/ml/model-loader.test.ts` — all loading tests pass
- `npx vitest run src/lib/ml/inference.test.ts` — all inference tests pass
- Manual: `curl -X POST http://localhost:3000/api/identify -H "Content-Type: application/json" -d '{"imageId":"<existing-id>"}'` — returns real predictions (no `demo_mode: true`)
- Check server logs for `[model-loader] Loaded TF.js model` (not mock fallback)
notes:
- The model file `best_mnv2_pv_original.keras` is the original Keras file — the TF.js conversion is already done (model.json + 3 weight shards)
- The `.keras` file can be deleted after confirming TF.js works, saving ~27MB
- `@tensorflow/tfjs-node` requires libtensorflow — it downloads automatically during npm install
- The `file://` protocol for `loadGraphModel` works with `@tensorflow/tfjs-node` but may not work with browser tfjs (which uses fetch) — if using browser tfjs fallback, need to read file and use `tf.io.loadGraphModel` with a custom loader
- ImageNet normalization in `preprocessImageBuffer()` uses mean=[0.485, 0.456, 0.406] and std=[0.229, 0.224, 0.225] — verify this matches what the PlantVillage model expects (it should, since MobileNetV2 is typically trained with ImageNet preprocessing)

View File

@@ -0,0 +1,207 @@
# 04. Confidence Calibration for PlantVillage Model
meta:
id: production-ml-pipeline-04
feature: production-ml-pipeline
priority: P1
depends_on: [production-ml-pipeline-03]
tags: [implementation, ml, tests-required]
objective:
- Implement proper confidence calibration for the PlantVillage model's softmax output
- Replace the trivial `raw * 1.02` linear calibration with temperature scaling or entropy-based confidence
- Produce meaningful confidence labels (high/medium/low) that correlate with actual correctness
- Handle the "healthy" class output correctly (healthy predictions need different confidence interpretation)
deliverables:
- `src/lib/ml/confidence.ts` — rewritten calibration with temperature scaling
- `src/lib/ml/calibration-params.ts` — calibration parameters (temperature, bias) for PlantVillage model
- `src/lib/ml/confidence.test.ts` — updated tests for new calibration logic
- `scripts/calibrate-model.ts` — script to compute optimal temperature from validation data
steps:
1. **Determine output type** — based on task 03's findings:
- If model output is already softmax probabilities: use entropy-based confidence or inverse-softmax + temperature scaling
- If model output is logits: apply temperature-scaled softmax directly
2. **Implement temperature scaling**:
```typescript
// src/lib/ml/confidence.ts
const DEFAULT_TEMPERATURE = 1.5; // Default for PlantVillage (typically 1.03.0)
export function temperatureScaledSoftmax(
logits: Float32Array,
temperature: number = DEFAULT_TEMPERATURE,
): Float32Array {
const scaled = new Float32Array(logits.length);
for (let i = 0; i < logits.length; i++) {
scaled[i] = logits[i] / temperature;
}
return softmaxFloat32(scaled);
}
```
- Temperature > 1.0 softens the distribution (less confident, more uniform)
- Temperature < 1.0 sharpens the distribution (more confident)
- Temperature = 1.0 is standard softmax (no calibration)
- Typical value for MobileNetV2 on PlantVillage: 1.21.8
3. **Implement entropy-based confidence**:
```typescript
export function computeEntropy(probabilities: Float32Array): number {
let entropy = 0;
for (let i = 0; i < probabilities.length; i++) {
if (probabilities[i] > 1e-10) {
entropy -= probabilities[i] * Math.log(probabilities[i]);
}
}
return entropy;
}
export function entropyToConfidence(
entropy: number,
maxEntropy: number, // ln(numClasses)
): number {
// Normalize entropy to [0, 1], then invert (low entropy = high confidence)
const normalized = entropy / maxEntropy;
return 1 - normalized;
}
```
- For 38 classes: `maxEntropy = Math.log(38) ≈ 3.64`
- Entropy close to 0 → one class dominates → high confidence
- Entropy close to max → uniform distribution → low confidence
4. **Implement combined calibration**:
```typescript
export function calibratePrediction(
output: Float32Array,
isLogits: boolean,
temperature: number = DEFAULT_TEMPERATURE,
): ConfidenceResult {
// Get probabilities (apply softmax if logits, or use directly if already probabilities)
const probs = isLogits ? temperatureScaledSoftmax(output, temperature) : output;
// Get top prediction
let maxIdx = 0;
for (let i = 1; i < probs.length; i++) {
if (probs[i] > probs[maxIdx]) maxIdx = i;
}
const topProb = probs[maxIdx];
// Compute entropy-based confidence
const entropy = computeEntropy(probs);
const maxEntropy = Math.log(probs.length);
const entropyConfidence = entropyToConfidence(entropy, maxEntropy);
// Combine: weighted average of top probability and entropy confidence
const adjusted = 0.7 * topProb + 0.3 * entropyConfidence;
return {
raw: topProb,
adjusted: Math.min(1, Math.max(0, adjusted)),
label: getConfidenceLabel(adjusted),
entropy,
classIndex: maxIdx,
};
}
```
5. **Update `getConfidenceLabel` thresholds** for PlantVillage's 38-class output:
```typescript
const CONFIDENCE_THRESHOLDS = {
HIGH: 0.65, // Lowered from 0.8 — PlantVillage softmax is less peaked
MEDIUM: 0.35, // Lowered from 0.5
} as const;
```
- With 38 classes, even correct predictions may have lower top probability
- These thresholds should be tuned against a validation set (start with defaults, adjust after testing)
6. **Handle healthy class confidence**:
- When the top prediction is a healthy class (index 3, 4, 6, 10, 14, 17, 19, 22, 23, 24, 27, 37), the confidence represents "how confident the model is the plant is healthy"
- Healthy predictions with high confidence → "No disease detected" (good)
- Healthy predictions with low confidence → "Uncertain — may have early symptoms"
- Update `calibrateConfidence()` to accept a `isHealthy` flag and adjust label accordingly
7. **Create calibration parameter module**:
```typescript
// src/lib/ml/calibration-params.ts
export const PLANTVILLAGE_CALIBRATION = {
temperature: 1.5,
confidenceHigh: 0.65,
confidenceMedium: 0.35,
maxEntropy: Math.log(38),
entropyWeight: 0.3,
probabilityWeight: 0.7,
} as const;
```
8. **Create calibration script** `scripts/calibrate-model.ts`:
- Load the model
- Run inference on a set of labeled validation images (from PlantVillage validation split)
- Compute optimal temperature using Nelder-Mead or grid search on negative log-likelihood
- Output the optimal temperature value
- This is optional — start with default 1.5 and refine later
9. **Update `InferenceResult` type** to include calibration metadata:
```typescript
export interface InferenceResult {
predictions: RawPrediction[];
inferenceTimeMs: number;
calibration?: {
temperature: number;
entropy: number;
entropyConfidence: number;
};
}
```
tests:
- Unit: `temperatureScaledSoftmax` with T=1.0 equals standard softmax
- Unit: `temperatureScaledSoftmax` with T=2.0 produces more uniform distribution than T=1.0
- Unit: `computeEntropy` of uniform distribution = `Math.log(38)` ≈ 3.64
- Unit: `computeEntropy` of one-hot distribution = 0
- Unit: `entropyToConfidence(0, maxEntropy)` = 1.0 (maximum confidence)
- Unit: `entropyToConfidence(maxEntropy, maxEntropy)` = 0.0 (minimum confidence)
- Unit: `calibratePrediction` with high-peak input returns high confidence
- Unit: `calibratePrediction` with flat input returns low confidence
- Unit: `getConfidenceLabel(0.7)` returns "high"
- Unit: `getConfidenceLabel(0.4)` returns "medium"
- Unit: `getConfidenceLabel(0.2)` returns "low"
- Integration: calibration on known PlantVillage test image produces reasonable confidence
acceptance_criteria:
- `calibratePrediction()` produces meaningful confidence scores that correlate with prediction quality
- Temperature scaling is implemented and configurable (default T=1.5)
- Entropy-based confidence is implemented
- Combined calibration (weighted probability + entropy) is the default
- Healthy class predictions are handled correctly
- Confidence thresholds are tuned for 38-class output (HIGH ≥ 0.65, MEDIUM ≥ 0.35)
- All unit tests pass
- Calibration parameters are documented and configurable
validation:
- `npx vitest run src/lib/ml/confidence.test.ts`
- Manual: run identification on a known disease image → confidence should be "high" (> 0.65)
- Manual: run identification on a random/unrelated image → confidence should be "low" (< 0.35)
- Check server logs: entropy values should be reasonable (1.03.5 range for 38 classes)
notes:
- Temperature scaling is a post-hoc calibration method — it doesn't change the model, only the confidence interpretation
- The default temperature of 1.5 is a reasonable starting point for MobileNetV2 on PlantVillage. Optimal value depends on the specific training run.
- If a validation set of PlantVillage images is available, run `scripts/calibrate-model.ts` to find the optimal temperature
- The entropy-based approach works even without a validation set — it's a model-agnostic confidence measure
- For healthy predictions, consider showing a different UI (e.g., "No disease detected" with confidence) rather than treating them as disease predictions

View File

@@ -0,0 +1,279 @@
# 05. Real Model Integration into Identification Pipeline
meta:
id: production-ml-pipeline-05
feature: production-ml-pipeline
priority: P0
depends_on: [production-ml-pipeline-02, production-ml-pipeline-03, production-ml-pipeline-04]
tags: [implementation, integration, tests-required]
objective:
- Wire the real TF.js model into the `/api/identify` endpoint
- Replace demo/mock predictions with real model inference
- Use the PlantVillage label mapping (task 02) to resolve class indices to disease IDs
- Apply confidence calibration (task 04) to produce meaningful confidence scores
- Remove the `demo_mode` fallback path
- Handle healthy class predictions correctly (return "no disease detected" message)
deliverables:
- `src/app/api/identify/route.ts` — rewritten to use real model inference
- `src/lib/ml/inference.ts` — updated to use calibration and return structured results
- `src/lib/api/identify.ts` — client-side API updated for new response shape
- `src/components/ResultsDashboard.tsx` — handle healthy predictions and remove demo mode badge
- `src/components/HealthyResult.tsx` — new component for "no disease detected" state
steps:
1. **Rewrite `/api/identify` route handler** to use real inference:
```typescript
export async function POST(request: NextRequest) {
// 1. Parse request, validate imageId
// 2. Load and preprocess image (existing code)
// 3. Run inference with real model
const { probabilities, inferenceTimeMs } = await runInference(tensor);
// 4. Calibrate confidence
const calibrated = calibratePrediction(probabilities, isLogits);
// 5. Map to disease using PlantVillage labels
const diseaseId = getDiseaseIdForIndex(calibrated.classIndex);
const isHealthy = isHealthyClass(calibrated.classIndex);
// 6. If healthy, return healthy result
if (isHealthy && calibrated.adjusted > 0.5) {
return NextResponse.json({
healthy: true,
plantId: getPlantIdForIndex(calibrated.classIndex),
confidence: calibrated,
metadata: { model: MODEL_ID, inferenceTimeMs, imageId },
});
}
// 7. Get top-K predictions (not just top-1)
const topK = getTopKFloat32(probabilities, 5);
const predictions = await enrichPredictions(topK);
// 8. Return results
return NextResponse.json({
predictions,
metadata: { model: MODEL_ID, inferenceTimeMs, imageId },
demo_mode: false, // or remove this field entirely
});
}
```
2. **Update `runInference()` to return calibrated results**:
```typescript
export async function runInference(
imageTensor: Float32Array,
topK: number = 5,
): Promise<InferenceResult> {
const model = await getModel();
const modelStatus = model.getStatus();
if (!modelStatus.loaded) {
throw new Error("Model not loaded. Cannot run inference.");
}
const { output, inferenceTimeMs } = await model.predict(imageTensor);
// Determine if output is logits or probabilities
const isLogits = !isProbabilities(output);
// Apply calibration
const calibration = calibratePrediction(output, isLogits);
// Get top-K predictions
const probs = isLogits ? temperatureScaledSoftmax(output) : output;
const topKPredictions = getTopKFloat32(probs, topK);
return {
predictions: topKPredictions,
inferenceTimeMs,
calibration: {
temperature: PLANTVILLAGE_CALIBRATION.temperature,
entropy: calibration.entropy,
entropyConfidence: calibration.entropyConfidence,
},
};
}
function isProbabilities(output: Float32Array): boolean {
const sum = output.reduce((a, b) => a + b, 0);
return Math.abs(sum - 1.0) < 0.01;
}
```
3. **Update `enrichPredictions()` to use new label mapping**:
```typescript
async function enrichPredictions(
topPredictions: Array<{ classIndex: number; probability: number }>,
): Promise<PredictionResult[]> {
const results: PredictionResult[] = [];
for (const pred of topPredictions) {
// Skip healthy classes in top-K (they're handled separately)
if (isHealthyClass(pred.classIndex)) continue;
const diseaseId = getDiseaseIdForIndex(pred.classIndex);
const plantId = getPlantIdForIndex(pred.classIndex);
if (!diseaseId || diseaseId === "healthy") continue;
const disease = await getDiseaseById(diseaseId);
if (!disease) continue;
// Use probability as raw confidence, calibrate with entropy
const confidence = calibrateConfidence(pred.probability);
const plant = await getPlantById(disease.plantId).catch(() => null);
results.push({
diseaseId,
disease,
confidence,
lookalikes: disease.lookalikeDiseaseIds,
plant: plant ?? null,
});
}
results.sort((a, b) => b.confidence.adjusted - a.confidence.adjusted);
return results;
}
```
4. **Update response types** to support healthy result:
```typescript
// src/lib/types.ts
export interface IdentifyResponse {
predictions?: PredictionResult[];
healthy?: boolean;
plantId?: string;
confidence?: ConfidenceResult;
metadata: InferenceMetadata;
demo_mode?: boolean; // Remove or always false
}
```
5. **Update `ResultsDashboard` component** to handle healthy result:
```tsx
// If response.healthy === true, show HealthyResult component instead of prediction cards
if (response?.healthy) {
return <HealthyResult plantId={response.plantId} confidence={response.confidence} />;
}
```
6. **Create `HealthyResult` component** `src/components/HealthyResult.tsx`:
```tsx
export default function HealthyResult({ plantId, confidence }) {
const plant = usePlant(plantId); // fetch plant data
return (
<div className="...">
<div className="text-6xl">🌿</div>
<h2>No Disease Detected</h2>
<p>
The image appears healthy{plant ? ` (${plant.commonName})` : ""}. Confidence:{" "}
{Math.round(confidence.adjusted * 100)}%
</p>
<p className="text-sm text-zinc-500">
If symptoms persist, try uploading a clearer photo of the affected area.
</p>
</div>
);
}
```
7. **Remove demo mode logic**:
- In `model-loader.ts`: remove `createMockModel()` fallback (or keep it but only for development)
- In `route.ts`: remove `demo_mode: true` branch
- In `ResultsDashboard.tsx`: remove "Demo mode" badge
- In `src/lib/api/identify.ts`: remove `demo_mode` from response type
8. **Add error handling for model not loaded**:
```typescript
const model = await getModel();
if (!model.getStatus().loaded) {
return NextResponse.json(
{
error: "Model not available",
message: "ML model failed to load. Please try again later.",
},
{ status: 503 },
);
}
```
9. **Update client-side API** `src/lib/api/identify.ts`:
```typescript
export interface IdentifyResponse {
predictions?: PredictionResult[];
healthy?: boolean;
plantId?: string;
confidence?: ConfidenceResult;
metadata: InferenceMetadata;
}
```
10. **Add structured logging** for inference requests:
```typescript
console.log(
JSON.stringify({
event: "inference",
imageId,
modelId: MODEL_ID,
inferenceTimeMs,
topPrediction: predictions[0]?.diseaseId,
confidence: predictions[0]?.confidence.adjusted,
entropy: calibration?.entropy,
}),
);
```
tests:
- Integration: POST `/api/identify` with valid imageId returns real predictions (no `demo_mode: true`)
- Integration: response includes `predictions` array with valid diseaseIds from KB
- Integration: confidence scores are calibrated (not raw softmax)
- Integration: healthy predictions return `healthy: true` with plantId
- Unit: `enrichPredictions()` skips healthy classes in top-K
- Unit: `isProbabilities()` correctly identifies probability output
- Unit: `runInference()` throws error if model not loaded
- E2E: upload a tomato leaf image → get tomato disease predictions
- E2E: upload a healthy plant image → get healthy result
acceptance_criteria:
- `/api/identify` returns real model predictions (not mock)
- All diseaseIds in response are valid KB entries (verifiable via `getDiseaseById()`)
- Confidence scores use temperature-scaled calibration (not raw softmax)
- Healthy predictions return `{ healthy: true, plantId, confidence }` instead of disease predictions
- Demo mode is completely removed from production path
- Error handling: model not loaded → 503 response with clear message
- Structured logging for every inference request
- Client-side API handles new response shape (healthy vs predictions)
validation:
- `npx vitest run src/app/api/identify/identify.test.ts`
- `npx vitest run src/lib/ml/inference.test.ts`
- `curl -X POST http://localhost:3000/api/identify -H "Content-Type: application/json" -d '{"imageId":"<test-id>"}'` — response has real predictions
- Upload a test image via UI → see real disease names (not demo mode)
- Check server logs: `event: "inference"` with valid modelId and inferenceTimeMs
notes:
- This task depends on tasks 02, 03, and 04 being complete. Do not start until all dependencies are met.
- The `enrichPredictions()` function now skips healthy classes — they're handled by the healthy result path
- If the model is not loaded, return 503 (Service Unavailable) instead of falling back to mock
- Structured logging should be JSON for easy parsing by log aggregators
- The `demo_mode` field can be removed entirely or kept as `false` for backwards compatibility

View File

@@ -0,0 +1,284 @@
# 06. Plant-Context-Aware Identification
meta:
id: production-ml-pipeline-06
feature: production-ml-pipeline
priority: P2
depends_on: [production-ml-pipeline-05]
tags: [implementation, ux, tests-required]
objective:
- Allow users to optionally specify which plant they're diagnosing before identification
- Boost predictions for the selected plant's diseases (multiply confidence by plant-context factor)
- Update the upload flow to include optional plant selection
- Improve prediction accuracy when plant context is known
deliverables:
- `src/app/api/identify/route.ts` — accept optional `plantId` parameter
- `src/lib/ml/plant-context.ts` — new module for plant-context scoring adjustment
- `src/components/PlantSelector.tsx` — new component for optional plant selection
- `src/app/upload/page.tsx` — integrate PlantSelector before upload
- `src/lib/api/identify.ts` — client API updated to pass plantId
steps:
1. **Create plant-context scoring module** `src/lib/ml/plant-context.ts`:
```typescript
import { PLANTVILLAGE_CLASSES } from "./plantvillage-classes";
/**
* Adjust prediction scores based on plant context.
* If plantId is provided, boost predictions for diseases of that plant.
*
* @param predictions - Top-K predictions with classIndex and probability
* @param plantId - Optional plant ID from user selection
* @param boostFactor - Multiplier for matching plant diseases (default 1.5)
* @returns Adjusted predictions with updated probabilities
*/
export function applyPlantContext(
predictions: Array<{ classIndex: number; probability: number }>,
plantId: string | null,
boostFactor: number = 1.5,
): Array<{ classIndex: number; probability: number; contextBoosted: boolean }> {
if (!plantId) {
return predictions.map((p) => ({ ...p, contextBoosted: false }));
}
// Find which class indices belong to this plant
const plantIndices = new Set(
PLANTVILLAGE_CLASSES.filter((c) => c.plantId === plantId && !c.isHealthy).map(
(c) => c.index,
),
);
return predictions.map((pred) => {
const matchesPlant = plantIndices.has(pred.classIndex);
return {
classIndex: pred.classIndex,
probability: matchesPlant
? Math.min(1.0, pred.probability * boostFactor)
: pred.probability,
contextBoosted: matchesPlant,
};
});
}
```
2. **Update `/api/identify` route** to accept `plantId`:
```typescript
export async function POST(request: NextRequest) {
const body = await request.json();
const { imageId, plantId } = body; // plantId is optional
// ... existing preprocessing ...
const { probabilities, inferenceTimeMs } = await runInference(tensor);
// Get top-K predictions
const topK = getTopKFloat32(probabilities, 5);
// Apply plant context if provided
const adjusted = applyPlantContext(topK, plantId ?? null);
// Enrich with KB data
const predictions = await enrichPredictions(adjusted);
return NextResponse.json({
predictions,
metadata: { model: MODEL_ID, inferenceTimeMs, imageId, plantContext: plantId ?? null },
});
}
```
3. **Update `IdentifyRequest` type**:
```typescript
// src/lib/types.ts
export interface IdentifyRequest {
imageId: string;
plantId?: string; // Optional plant context
}
```
4. **Create `PlantSelector` component** `src/components/PlantSelector.tsx`:
```tsx
"use client";
import { useState, useEffect } from "react";
interface Plant {
id: string;
commonName: string;
imageUrl?: string;
}
export default function PlantSelector({
value,
onChange,
}: {
value: string | null;
onChange: (plantId: string | null) => void;
}) {
const [plants, setPlants] = useState<Plant[]>([]);
const [search, setSearch] = useState("");
useEffect(() => {
fetch("/api/plants?limit=50")
.then((r) => r.json())
.then((data) => setPlants(data.items ?? []));
}, []);
const filtered = plants.filter((p) =>
p.commonName.toLowerCase().includes(search.toLowerCase()),
);
return (
<div className="...">
<label>Plant (optional)</label>
<input
type="text"
placeholder="Search plants..."
value={search}
onChange={(e) => setSearch(e.target.value)}
/>
{value && (
<div className="...">
Selected: {plants.find((p) => p.id === value)?.commonName}
<button onClick={() => onChange(null)}>Clear</button>
</div>
)}
<ul>
{filtered.slice(0, 10).map((plant) => (
<li key={plant.id} onClick={() => onChange(plant.id)}>
{plant.commonName}
</li>
))}
</ul>
</div>
);
}
```
5. **Update upload page** to include plant selector:
```tsx
// src/app/upload/page.tsx
export default function UploadPage() {
const [selectedPlant, setSelectedPlant] = useState<string | null>(null);
const handleUpload = useCallback(
async (file: File) => {
// 1. Upload image
const uploadResponse = await uploadImage(file);
// 2. Identify with plant context
const identifyResponse = await identifyPlant(uploadResponse.imageId, selectedPlant);
// 3. Navigate to results
router.push(`/results/${uploadResponse.imageId}`);
},
[selectedPlant],
);
return (
<div>
<PlantSelector value={selectedPlant} onChange={setSelectedPlant} />
<ImageUpload onUpload={handleUpload} />
</div>
);
}
```
6. **Update client-side API** to pass plantId:
```typescript
// src/lib/api/identify.ts
export async function identifyPlant(
imageId: string,
plantId?: string,
): Promise<IdentifyResponse> {
const body: IdentifyRequest = { imageId };
if (plantId) body.plantId = plantId;
const response = await fetch("/api/identify", {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify(body),
});
return response.json();
}
```
7. **Update `PredictionResult` type** to include context boost info:
```typescript
export interface PredictionResult {
diseaseId: string;
disease: Disease;
confidence: ConfidenceResult;
lookalikes: string[];
plant: Plant | null;
contextBoosted?: boolean; // true if boosted by plant context
}
```
8. **Update `ResultsDashboard`** to show context boost indicator:
```tsx
{
prediction.contextBoosted && (
<span className="text-xs text-leaf-green-600">✓ Matches selected plant</span>
);
}
```
9. **Store plant context in results page** — pass plantId through URL or state:
```typescript
// src/app/results/[imageId]/page.tsx
const plantId = searchParams.get("plant"); // optional
const response = await identifyPlant(imageId, plantId);
```
tests:
- Unit: `applyPlantContext()` with no plantId returns predictions unchanged
- Unit: `applyPlantContext()` with plantId="tomato" boosts tomato disease predictions
- Unit: boosted probabilities are capped at 1.0
- Unit: non-matching plant predictions are unchanged
- Unit: `contextBoosted` flag is set correctly
- Integration: POST `/api/identify` with plantId returns boosted predictions
- Integration: POST `/api/identify` without plantId returns normal predictions
- E2E: select "Tomato" in UI → upload tomato leaf → tomato diseases appear first
acceptance_criteria:
- Plant context is optional — identification works without it
- When plantId is provided, predictions for that plant's diseases are boosted by 1.5x
- Boosted probabilities are capped at 1.0
- `contextBoosted` flag is set on boosted predictions
- UI shows "Matches selected plant" indicator on boosted predictions
- Plant selector component works (search, select, clear)
- Upload flow includes optional plant selection step
- Results page receives and displays plant context
validation:
- `npx vitest run src/lib/ml/plant-context.test.ts`
- `npx vitest run src/components/PlantSelector.test.tsx`
- Manual: select "Tomato" → upload image → tomato diseases appear with boost indicator
- Manual: don't select plant → upload image → normal predictions (no boost)
- Check API response: `predictions[0].contextBoosted` is true when plant matches
notes:
- Plant context is a scoring heuristic, not a hard filter. It boosts confidence but doesn't exclude other predictions.
- The default boost factor is 1.5 — this can be tuned based on user feedback.
- Plant selector is optional — users can skip it and get unboosted predictions.
- The plant context feature is most useful when the user knows what plant they're diagnosing but the model is uncertain between multiple diseases.
- For PlantVillage, each plant has 19 diseases, so the boost is specific enough to be useful without being overly restrictive.

View File

@@ -0,0 +1,292 @@
# 07. End-to-End Integration Testing
meta:
id: production-ml-pipeline-07
feature: production-ml-pipeline
priority: P1
depends_on: [production-ml-pipeline-05]
tags: [testing, integration, e2e]
objective:
- Create comprehensive end-to-end tests that validate the full pipeline from image upload to disease diagnosis
- Verify real model inference produces valid, calibrated predictions
- Test all code paths: normal flow, healthy result, error cases, plant context
- Ensure all components work together correctly in a realistic scenario
deliverables:
- `tests/e2e/pipeline.test.ts` — full pipeline E2E tests
- `tests/e2e/fixtures/` — test images and expected results
- `tests/e2e/utils.ts` — test utilities (upload helper, identify helper)
- Updated `vitest.config.ts` — E2E test configuration
steps:
1. **Create test fixtures** `tests/e2e/fixtures/`:
- `tomato-early-blight.jpg` — known tomato early blight image (from PlantVillage test set)
- `tomato-healthy.jpg` — known healthy tomato image
- `unknown-plant.jpg` — unrelated image (should produce low confidence)
- `invalid-image.txt` — non-image file (should fail validation)
- `expected-results.json` — expected disease IDs and confidence ranges for each test image
2. **Create E2E test utilities** `tests/e2e/utils.ts`:
```typescript
import fs from "fs/promises";
import path from "path";
export async function uploadTestImage(
filename: string,
): Promise<{ imageId: string; previewUrl: string }> {
const imagePath = path.join(__dirname, "fixtures", filename);
const imageBuffer = await fs.readFile(imagePath);
const formData = new FormData();
formData.append("image", new Blob([imageBuffer], { type: "image/jpeg" }), filename);
const response = await fetch("http://localhost:3000/api/upload", {
method: "POST",
body: formData,
});
if (!response.ok) {
throw new Error(`Upload failed: ${response.status}`);
}
return response.json();
}
export async function identifyImage(imageId: string, plantId?: string): Promise<any> {
const response = await fetch("http://localhost:3000/api/identify", {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({ imageId, plantId }),
});
if (!response.ok) {
throw new Error(`Identify failed: ${response.status}`);
}
return response.json();
}
```
3. **Write full pipeline E2E test** `tests/e2e/pipeline.test.ts`:
```typescript
import { describe, it, expect, beforeAll } from "vitest";
import { uploadTestImage, identifyImage } from "./utils";
import expectedResults from "./fixtures/expected-results.json";
describe("End-to-End Pipeline", () => {
describe("Normal flow: disease detection", () => {
it("uploads a tomato early blight image and returns correct diagnosis", async () => {
// 1. Upload
const { imageId } = await uploadTestImage("tomato-early-blight.jpg");
expect(imageId).toBeDefined();
// 2. Identify
const result = await identifyImage(imageId);
// 3. Verify response structure
expect(result.predictions).toBeDefined();
expect(result.predictions.length).toBeGreaterThan(0);
expect(result.metadata).toBeDefined();
expect(result.metadata.model).toBe("plant-classifier-v1");
expect(result.metadata.inferenceTimeMs).toBeGreaterThan(0);
expect(result.demo_mode).toBeFalsy();
// 4. Verify top prediction is early blight
const topPrediction = result.predictions[0];
expect(topPrediction.diseaseId).toBe("early-blight");
expect(topPrediction.disease.name).toContain("Early Blight");
expect(topPrediction.plant.id).toBe("tomato");
// 5. Verify confidence is calibrated
expect(topPrediction.confidence.adjusted).toBeGreaterThan(0.5);
expect(topPrediction.confidence.label).toBe("high");
// 6. Verify disease data is enriched
expect(topPrediction.disease.symptoms.length).toBeGreaterThanOrEqual(3);
expect(topPrediction.disease.treatment.length).toBeGreaterThanOrEqual(3);
expect(topPrediction.disease.prevention.length).toBeGreaterThanOrEqual(2);
});
});
describe("Healthy result", () => {
it("returns healthy result for healthy plant image", async () => {
const { imageId } = await uploadTestImage("tomato-healthy.jpg");
const result = await identifyImage(imageId);
// Should return healthy: true or top prediction is a healthy class
if (result.healthy) {
expect(result.healthy).toBe(true);
expect(result.plantId).toBe("tomato");
expect(result.confidence.adjusted).toBeGreaterThan(0.5);
} else {
// If not healthy result, confidence should be low
const topPrediction = result.predictions[0];
expect(topPrediction.confidence.adjusted).toBeLessThan(0.5);
}
});
});
describe("Unknown image", () => {
it("returns low confidence for unrelated image", async () => {
const { imageId } = await uploadTestImage("unknown-plant.jpg");
const result = await identifyImage(imageId);
// Should have predictions but with low confidence
if (result.predictions) {
const topPrediction = result.predictions[0];
expect(topPrediction.confidence.adjusted).toBeLessThan(0.5);
expect(topPrediction.confidence.label).toBe("low");
}
});
});
describe("Plant context", () => {
it("boosts predictions when plantId is provided", async () => {
const { imageId } = await uploadTestImage("tomato-early-blight.jpg");
// Without plant context
const resultNoContext = await identifyImage(imageId);
const confidenceNoContext = resultNoContext.predictions[0].confidence.adjusted;
// With plant context
const resultWithContext = await identifyImage(imageId, "tomato");
const confidenceWithContext = resultWithContext.predictions[0].confidence.adjusted;
// Context should boost confidence (or at least not reduce it)
expect(confidenceWithContext).toBeGreaterThanOrEqual(confidenceNoContext);
// Boosted prediction should have contextBoosted flag
const boosted = resultWithContext.predictions.find((p) => p.contextBoosted);
expect(boosted).toBeDefined();
});
});
describe("Error cases", () => {
it("returns 404 for non-existent imageId", async () => {
const response = await fetch("http://localhost:3000/api/identify", {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({ imageId: "non-existent-id" }),
});
expect(response.status).toBe(404);
});
it("returns 400 for invalid image upload", async () => {
const formData = new FormData();
formData.append("image", new Blob(["not an image"], { type: "text/plain" }), "test.txt");
const response = await fetch("http://localhost:3000/api/upload", {
method: "POST",
body: formData,
});
expect(response.status).toBe(400);
});
});
describe("Performance", () => {
it("completes inference in under 500ms", async () => {
const { imageId } = await uploadTestImage("tomato-early-blight.jpg");
const start = Date.now();
await identifyImage(imageId);
const elapsed = Date.now() - start;
expect(elapsed).toBeLessThan(500);
});
});
});
```
4. **Create expected results fixture** `tests/e2e/fixtures/expected-results.json`:
```json
{
"tomato-early-blight.jpg": {
"expectedDiseaseId": "early-blight",
"expectedPlantId": "tomato",
"minConfidence": 0.6,
"expectedConfidenceLabel": "high"
},
"tomato-healthy.jpg": {
"expectedHealthy": true,
"expectedPlantId": "tomato",
"minConfidence": 0.5
},
"unknown-plant.jpg": {
"maxConfidence": 0.5,
"expectedConfidenceLabel": "low"
}
}
```
5. **Update vitest config** to support E2E tests:
```typescript
// vitest.config.ts
export default defineConfig({
test: {
// ... existing config ...
include: ["src/**/*.test.ts", "src/**/*.test.tsx", "tests/**/*.test.ts"],
},
});
```
6. **Add E2E test script** to `package.json`:
```json
{
"scripts": {
"test:e2e": "vitest run tests/e2e"
}
}
```
7. **Document E2E test setup** in `tests/e2e/README.md`:
- Requires dev server running (`npm run dev`)
- Requires model files present (`public/models/plant-disease-classifier/`)
- Requires test fixtures (download PlantVillage test images)
- Run with `npm run test:e2e`
8. **Download test images** from PlantVillage dataset:
- Use images from the PlantVillage test split (not training)
- Place in `tests/e2e/fixtures/`
- Document source and license
tests:
- E2E: full pipeline test (upload → identify → verify results)
- E2E: healthy result detection
- E2E: unknown image produces low confidence
- E2E: plant context boosts predictions
- E2E: error cases (404, 400)
- E2E: performance (< 500ms inference)
acceptance_criteria:
- All E2E tests pass with real model inference
- Test fixtures are documented and licensed appropriately
- E2E tests can be run with `npm run test:e2e`
- Tests cover: normal flow, healthy result, unknown image, plant context, errors, performance
- Test results are deterministic (no flaky tests)
validation:
- `npm run test:e2e` — all tests pass
- Manual: run tests against dev server and verify output
- Check test coverage: all major code paths are exercised
notes:
- E2E tests require the dev server to be running (`npm run dev`)
- Test images should be from PlantVillage test split (not training) to avoid overfitting concerns
- If test images are not available, use synthetic test data (random tensors) for CI
- Performance test threshold (500ms) is generous — actual inference should be < 200ms with tfjs-node
- E2E tests are separate from unit tests — run them in CI after deployment to staging

View File

@@ -0,0 +1,405 @@
# 08. Production Hardening and Observability
meta:
id: production-ml-pipeline-08
feature: production-ml-pipeline
priority: P1
depends_on: [production-ml-pipeline-07]
tags: [implementation, production, observability]
objective:
- Add comprehensive error handling at every layer of the pipeline
- Implement structured logging for observability
- Add rate limiting to prevent abuse
- Create a health endpoint that reports model status and inference metrics
- Ensure the system is production-ready with monitoring, cleanup, and resilience
deliverables:
- `src/app/api/health/route.ts` — enhanced health endpoint with model status
- `src/lib/middleware/rate-limit.ts` — rate limiting middleware
- `src/lib/middleware/error-handler.ts` — global error handler
- `src/lib/observability/logger.ts` — structured logger
- `src/lib/observability/metrics.ts` — inference metrics tracker
- Updated API routes with error handling and logging
- Updated `next.config.ts` with rate limiting configuration
steps:
1. **Create structured logger** `src/lib/observability/logger.ts`:
```typescript
export interface LogEntry {
timestamp: string;
level: "debug" | "info" | "warn" | "error";
event: string;
data?: Record<string, any>;
error?: { message: string; stack?: string };
}
export function log(level: LogEntry["level"], event: string, data?: Record<string, any>) {
const entry: LogEntry = {
timestamp: new Date().toISOString(),
level,
event,
data,
};
if (level === "error" && data?.error) {
entry.error = {
message: data.error.message,
stack: data.error.stack,
};
}
console.log(JSON.stringify(entry));
}
export const logger = {
debug: (event: string, data?: any) => log("debug", event, data),
info: (event: string, data?: any) => log("info", event, data),
warn: (event: string, data?: any) => log("warn", event, data),
error: (event: string, data?: any) => log("error", event, data),
};
```
2. **Create metrics tracker** `src/lib/observability/metrics.ts`:
```typescript
interface InferenceMetrics {
totalInferences: number;
totalErrors: number;
avgInferenceTimeMs: number;
lastInferenceAt: string | null;
modelLoaded: boolean;
modelLoadTimeMs: number | null;
}
class MetricsTracker {
private metrics: InferenceMetrics = {
totalInferences: 0,
totalErrors: 0,
avgInferenceTimeMs: 0,
lastInferenceAt: null,
modelLoaded: false,
modelLoadTimeMs: null,
};
recordInference(inferenceTimeMs: number) {
this.metrics.totalInferences++;
this.metrics.lastInferenceAt = new Date().toISOString();
// Running average
this.metrics.avgInferenceTimeMs =
(this.metrics.avgInferenceTimeMs * (this.metrics.totalInferences - 1) + inferenceTimeMs) /
this.metrics.totalInferences;
}
recordError() {
this.metrics.totalErrors++;
}
setModelStatus(loaded: boolean, loadTimeMs?: number) {
this.metrics.modelLoaded = loaded;
if (loadTimeMs !== undefined) {
this.metrics.modelLoadTimeMs = loadTimeMs;
}
}
getMetrics(): InferenceMetrics {
return { ...this.metrics };
}
}
export const metrics = new MetricsTracker();
```
3. **Enhance health endpoint** `src/app/api/health/route.ts`:
```typescript
import { NextResponse } from "next/server";
import { getModel } from "@/lib/ml/model-loader";
import { metrics } from "@/lib/observability/metrics";
export async function GET() {
const model = await getModel();
const modelStatus = model.getStatus();
return NextResponse.json({
status: "ok",
timestamp: new Date().toISOString(),
model: {
loaded: modelStatus.loaded,
backend: modelStatus.backend,
modelId: modelStatus.modelId,
numClasses: modelStatus.numClasses,
error: modelStatus.error,
},
metrics: metrics.getMetrics(),
uptime: process.uptime(),
});
}
```
4. **Create rate limiting middleware** `src/lib/middleware/rate-limit.ts`:
```typescript
import { NextRequest, NextResponse } from "next/server";
// Simple in-memory rate limiter (for production, use Redis or similar)
const requestCounts = new Map<string, { count: number; resetAt: number }>();
const RATE_LIMIT = {
maxRequests: 10, // 10 requests per window
windowMs: 60 * 1000, // 1 minute window
};
export function rateLimit(request: NextRequest): NextResponse | null {
const ip = request.headers.get("x-forwarded-for") || "unknown";
const now = Date.now();
let record = requestCounts.get(ip);
if (!record || now > record.resetAt) {
record = { count: 0, resetAt: now + RATE_LIMIT.windowMs };
requestCounts.set(ip, record);
}
record.count++;
if (record.count > RATE_LIMIT.maxRequests) {
return NextResponse.json(
{ error: "Rate limit exceeded", message: "Too many requests. Please try again later." },
{ status: 429 },
);
}
return null; // No rate limit hit
}
```
5. **Create global error handler** `src/lib/middleware/error-handler.ts`:
```typescript
import { NextResponse } from "next/server";
import { logger } from "@/lib/observability/logger";
export function handleError(error: unknown, context: string): NextResponse {
logger.error("unhandled_error", {
context,
error:
error instanceof Error
? { message: error.message, stack: error.stack }
: { message: String(error) },
});
return NextResponse.json(
{
error: "Internal server error",
message: "An unexpected error occurred. Please try again later.",
context,
},
{ status: 500 },
);
}
```
6. **Add error handling to `/api/upload`**:
```typescript
import { rateLimit } from "@/lib/middleware/rate-limit";
import { handleError } from "@/lib/middleware/error-handler";
import { logger } from "@/lib/observability/logger";
export async function POST(request: NextRequest) {
// Rate limiting
const rateLimitError = rateLimit(request);
if (rateLimitError) return rateLimitError;
try {
logger.info("upload_start", { ip: request.headers.get("x-forwarded-for") });
// ... existing upload logic ...
logger.info("upload_success", { imageId, fileSize: buffer.length });
return NextResponse.json({ imageId, tensorShape, previewUrl });
} catch (error) {
return handleError(error, "upload");
}
}
```
7. **Add error handling to `/api/identify`**:
```typescript
export async function POST(request: NextRequest) {
const rateLimitError = rateLimit(request);
if (rateLimitError) return rateLimitError;
try {
logger.info("identify_start", { imageId, plantId });
const startTime = Date.now();
// ... existing identify logic ...
const inferenceTimeMs = Date.now() - startTime;
metrics.recordInference(inferenceTimeMs);
logger.info("identify_success", {
imageId,
inferenceTimeMs,
topPrediction: predictions[0]?.diseaseId,
confidence: predictions[0]?.confidence.adjusted,
});
return NextResponse.json({ predictions, metadata });
} catch (error) {
metrics.recordError();
if (error instanceof Error && error.message.includes("not loaded")) {
return NextResponse.json(
{
error: "Model not available",
message: "ML model failed to load. Please try again later.",
},
{ status: 503 },
);
}
return handleError(error, "identify");
}
}
```
8. **Add model status tracking to `model-loader.ts`**:
```typescript
import { metrics } from "@/lib/observability/metrics";
async function loadModel(): Promise<PlantDiseaseModel> {
const startTime = Date.now();
try {
const model = await tryLoadTFJS();
if (model) {
const loadTimeMs = Date.now() - startTime;
metrics.setModelStatus(true, loadTimeMs);
logger.info("model_loaded", { backend: "tfjs", loadTimeMs });
return model;
}
} catch (error) {
logger.warn("model_load_failed", { backend: "tfjs", error });
}
// ... fallback to mock ...
metrics.setModelStatus(false);
return createMockModel();
}
```
9. **Add cleanup for old uploads**:
```typescript
// src/lib/cleanup.ts
import fs from "fs/promises";
import path from "path";
const UPLOADS_DIR = path.join(process.cwd(), "public", "uploads");
const MAX_AGE_MS = 24 * 60 * 60 * 1000; // 24 hours
export async function cleanupOldUploads() {
const files = await fs.readdir(UPLOADS_DIR);
const now = Date.now();
for (const file of files) {
const filePath = path.join(UPLOADS_DIR, file);
const stat = await fs.stat(filePath);
if (now - stat.mtimeMs > MAX_AGE_MS) {
await fs.unlink(filePath);
logger.info("upload_cleaned", { file, ageMs: now - stat.mtimeMs });
}
}
}
// Run cleanup on server start and periodically
if (process.env.NODE_ENV === "production") {
cleanupOldUploads();
setInterval(cleanupOldUploads, 60 * 60 * 1000); // Every hour
}
```
10. **Update `next.config.ts`** with security headers and rate limiting:
```typescript
const nextConfig = {
// ... existing config ...
async headers() {
return [
{
source: "/api/:path*",
headers: [
{ key: "X-Content-Type-Options", value: "nosniff" },
{ key: "X-Frame-Options", value: "DENY" },
{ key: "X-XSS-Protection", value: "1; mode=block" },
],
},
];
},
};
```
11. **Add monitoring dashboard** (optional) `src/app/admin/metrics/page.tsx`:
- Simple page showing inference metrics
- Model status
- Recent inference times
- Error rate
- Protected by authentication (admin only)
12. **Document production checklist** in `docs/production-checklist.md`:
- Environment variables needed
- Model deployment steps
- Monitoring setup
- Backup strategy
- Rollback procedure
tests:
- Unit: rate limiter blocks after max requests
- Unit: rate limiter resets after window
- Unit: metrics tracker records inference correctly
- Unit: metrics tracker computes running average
- Unit: logger produces valid JSON output
- Integration: health endpoint returns model status and metrics
- Integration: rate limit returns 429 after max requests
- Integration: error handler catches unhandled errors and returns 500
acceptance_criteria:
- All API routes have rate limiting (10 requests per minute per IP)
- All API routes have structured logging (JSON format)
- Health endpoint reports model status, inference metrics, uptime
- Error handler catches all unhandled errors and returns 500 with clear message
- Old uploads are cleaned up automatically (24-hour TTL)
- Metrics tracker records inference time, error rate, model status
- Security headers are set (X-Content-Type-Options, X-Frame-Options, X-XSS-Protection)
- Production checklist is documented
validation:
- `npx vitest run src/lib/middleware/rate-limit.test.ts`
- `npx vitest run src/lib/observability/metrics.test.ts`
- `curl http://localhost:3000/api/health` — returns model status and metrics
- `curl -X POST http://localhost:3000/api/identify ...` (11 times) — 11th request returns 429
- Check server logs: JSON-formatted log entries for all requests
- Wait 25 minutes: old uploads are cleaned up
notes:
- Rate limiter uses in-memory storage — for multi-instance deployments, use Redis or similar
- Metrics are in-memory — for persistent metrics, use a time-series database
- Health endpoint should be monitored by uptime monitoring service (e.g., Pingdom, UptimeRobot)
- Cleanup runs every hour in production — adjust frequency based on upload volume
- Security headers are basic — consider adding CSP, HSTS for full security hardening
- Production checklist should be reviewed before each deployment

View File

@@ -0,0 +1,40 @@
# Production ML Pipeline
Objective: Get the plant disease identification ML pipeline to full production readiness with real model inference, proper class mapping, and production-grade error handling.
Status legend: [ ] todo, [~] in-progress, [x] done
## Tasks
- [ ] 01 — PlantVillage class inventory and knowledge base mapping → `01-plantvillage-class-inventory.md`
- [ ] 02 — Label mapping layer implementation → `02-label-mapping-implementation.md`
- [ ] 03 — TensorFlow.js model loading verification and fixes → `03-model-loading-verification.md`
- [ ] 04 — Confidence calibration for PlantVillage model → `04-confidence-calibration.md`
- [ ] 05 — Real model integration into identification pipeline → `05-pipeline-integration.md`
- [ ] 06 — Plant-context-aware identification → `06-plant-context-identification.md`
- [ ] 07 — End-to-end integration testing → `07-end-to-end-testing.md`
- [ ] 08 — Production hardening and observability → `08-production-hardening.md`
## Dependencies
- 01 → 02 (mapping data feeds label layer)
- 02 → 05 (labels feed pipeline)
- 03 → 05 (verified model loading feeds pipeline)
- 04 → 05 (calibration feeds pipeline)
- 05 → 06 (real model enables plant context)
- 05 → 07 (integrated pipeline enables e2e testing)
- 07 → 08 (tested pipeline enables production hardening)
## Exit Criteria
- The feature is complete when:
- Model loads successfully and produces real (non-mock) predictions
- All 38 PlantVillage classes map to valid knowledge base disease IDs
- End-to-end pipeline works: upload image → get real disease diagnoses with calibrated confidence
- Confidence scores are meaningful (high confidence for clear cases, low for ambiguous)
- Plant context optionally boosts relevant predictions
- Full integration test suite passes
- Error handling, logging, and monitoring in place
- No demo mode fallback in production
- Rate limiting and input sanitization active
- Health endpoint reports model status and inference metrics