04. Confidence Calibration for PlantVillage Model

meta: id: production-ml-pipeline-04 feature: production-ml-pipeline priority: P1 depends_on: [production-ml-pipeline-03] tags: [implementation, ml, tests-required]

objective:

Implement proper confidence calibration for the PlantVillage model's softmax output
Replace the trivial raw * 1.02 linear calibration with temperature scaling or entropy-based confidence
Produce meaningful confidence labels (high/medium/low) that correlate with actual correctness
Handle the "healthy" class output correctly (healthy predictions need different confidence interpretation)

deliverables:

src/lib/ml/confidence.ts — rewritten calibration with temperature scaling
src/lib/ml/calibration-params.ts — calibration parameters (temperature, bias) for PlantVillage model
src/lib/ml/confidence.test.ts — updated tests for new calibration logic
scripts/calibrate-model.ts — script to compute optimal temperature from validation data

steps:

Determine output type — based on task 03's findings:
- If model output is already softmax probabilities: use entropy-based confidence or inverse-softmax + temperature scaling
- If model output is logits: apply temperature-scaled softmax directly

Implement temperature scaling:

// src/lib/ml/confidence.ts
const DEFAULT_TEMPERATURE = 1.5; // Default for PlantVillage (typically 1.0–3.0)

export function temperatureScaledSoftmax(
  logits: Float32Array,
  temperature: number = DEFAULT_TEMPERATURE,
): Float32Array {
  const scaled = new Float32Array(logits.length);
  for (let i = 0; i < logits.length; i++) {
    scaled[i] = logits[i] / temperature;
  }
  return softmaxFloat32(scaled);
}

Temperature > 1.0 softens the distribution (less confident, more uniform)
Temperature < 1.0 sharpens the distribution (more confident)
Temperature = 1.0 is standard softmax (no calibration)
Typical value for MobileNetV2 on PlantVillage: 1.2–1.8

Implement entropy-based confidence:

export function computeEntropy(probabilities: Float32Array): number {
  let entropy = 0;
  for (let i = 0; i < probabilities.length; i++) {
    if (probabilities[i] > 1e-10) {
      entropy -= probabilities[i] * Math.log(probabilities[i]);
    }
  }
  return entropy;
}

export function entropyToConfidence(
  entropy: number,
  maxEntropy: number, // ln(numClasses)
): number {
  // Normalize entropy to [0, 1], then invert (low entropy = high confidence)
  const normalized = entropy / maxEntropy;
  return 1 - normalized;
}

For 38 classes: maxEntropy = Math.log(38) ≈ 3.64
Entropy close to 0 → one class dominates → high confidence
Entropy close to max → uniform distribution → low confidence

Implement combined calibration:

export function calibratePrediction(
  output: Float32Array,
  isLogits: boolean,
  temperature: number = DEFAULT_TEMPERATURE,
): ConfidenceResult {
  // Get probabilities (apply softmax if logits, or use directly if already probabilities)
  const probs = isLogits ? temperatureScaledSoftmax(output, temperature) : output;

  // Get top prediction
  let maxIdx = 0;
  for (let i = 1; i < probs.length; i++) {
    if (probs[i] > probs[maxIdx]) maxIdx = i;
  }
  const topProb = probs[maxIdx];

  // Compute entropy-based confidence
  const entropy = computeEntropy(probs);
  const maxEntropy = Math.log(probs.length);
  const entropyConfidence = entropyToConfidence(entropy, maxEntropy);

  // Combine: weighted average of top probability and entropy confidence
  const adjusted = 0.7 * topProb + 0.3 * entropyConfidence;

  return {
    raw: topProb,
    adjusted: Math.min(1, Math.max(0, adjusted)),
    label: getConfidenceLabel(adjusted),
    entropy,
    classIndex: maxIdx,
  };
}

Update getConfidenceLabel thresholds for PlantVillage's 38-class output:
```
const CONFIDENCE_THRESHOLDS = {
  HIGH: 0.65, // Lowered from 0.8 — PlantVillage softmax is less peaked
  MEDIUM: 0.35, // Lowered from 0.5
} as const;
```
- With 38 classes, even correct predictions may have lower top probability
- These thresholds should be tuned against a validation set (start with defaults, adjust after testing)
Handle healthy class confidence:
- When the top prediction is a healthy class (index 3, 4, 6, 10, 14, 17, 19, 22, 23, 24, 27, 37), the confidence represents "how confident the model is the plant is healthy"
- Healthy predictions with high confidence → "No disease detected" (good)
- Healthy predictions with low confidence → "Uncertain — may have early symptoms"
- Update calibrateConfidence() to accept a isHealthy flag and adjust label accordingly

Create calibration parameter module:

// src/lib/ml/calibration-params.ts
export const PLANTVILLAGE_CALIBRATION = {
  temperature: 1.5,
  confidenceHigh: 0.65,
  confidenceMedium: 0.35,
  maxEntropy: Math.log(38),
  entropyWeight: 0.3,
  probabilityWeight: 0.7,
} as const;

Create calibration script scripts/calibrate-model.ts:
- Load the model
- Run inference on a set of labeled validation images (from PlantVillage validation split)
- Compute optimal temperature using Nelder-Mead or grid search on negative log-likelihood
- Output the optimal temperature value
- This is optional — start with default 1.5 and refine later

Update InferenceResult type to include calibration metadata:

export interface InferenceResult {
  predictions: RawPrediction[];
  inferenceTimeMs: number;
  calibration?: {
    temperature: number;
    entropy: number;
    entropyConfidence: number;
  };
}

tests:

Unit: temperatureScaledSoftmax with T=1.0 equals standard softmax
Unit: temperatureScaledSoftmax with T=2.0 produces more uniform distribution than T=1.0
Unit: computeEntropy of uniform distribution = Math.log(38) ≈ 3.64
Unit: computeEntropy of one-hot distribution = 0
Unit: entropyToConfidence(0, maxEntropy) = 1.0 (maximum confidence)
Unit: entropyToConfidence(maxEntropy, maxEntropy) = 0.0 (minimum confidence)
Unit: calibratePrediction with high-peak input returns high confidence
Unit: calibratePrediction with flat input returns low confidence
Unit: getConfidenceLabel(0.7) returns "high"
Unit: getConfidenceLabel(0.4) returns "medium"
Unit: getConfidenceLabel(0.2) returns "low"
Integration: calibration on known PlantVillage test image produces reasonable confidence

acceptance_criteria:

calibratePrediction() produces meaningful confidence scores that correlate with prediction quality
Temperature scaling is implemented and configurable (default T=1.5)
Entropy-based confidence is implemented
Combined calibration (weighted probability + entropy) is the default
Healthy class predictions are handled correctly
Confidence thresholds are tuned for 38-class output (HIGH ≥ 0.65, MEDIUM ≥ 0.35)
All unit tests pass
Calibration parameters are documented and configurable

validation:

npx vitest run src/lib/ml/confidence.test.ts
Manual: run identification on a known disease image → confidence should be "high" (> 0.65)
Manual: run identification on a random/unrelated image → confidence should be "low" (< 0.35)
Check server logs: entropy values should be reasonable (1.0–3.5 range for 38 classes)

notes:

Temperature scaling is a post-hoc calibration method — it doesn't change the model, only the confidence interpretation
The default temperature of 1.5 is a reasonable starting point for MobileNetV2 on PlantVillage. Optimal value depends on the specific training run.
If a validation set of PlantVillage images is available, run scripts/calibrate-model.ts to find the optimal temperature
The entropy-based approach works even without a validation set — it's a model-agnostic confidence measure
For healthy predictions, consider showing a different UI (e.g., "No disease detected" with confidence) rather than treating them as disease predictions

8.2 KiB Raw Blame History Unescape Escape

04. Confidence Calibration for PlantVillage Model

8.2 KiB

Raw Blame History