Phase 5 — Browser Model & Hybrid Integration

Blocked by: Phase 4 (server inference pipeline) Est. time: 2-3 days Machine: Any (development on Strix Halo or M3 Pro)

Objective

Train a lightweight browser-compatible model (TF.js) and implement the hybrid routing logic: fast first pass in-browser, server fallback when confidence is low.

Hybrid Flow

User uploads image
        │
        ▼
┌──────────────────────┐
│ Browser:             │
│ EfficientNet-Lite    │  ← ~5MB TF.js model in browser
│ (TF.js)              │     Predicts species + top-5 diseases
│                      │
│ Species confidence?  │
│ ┌────┴────┐         │
│ │ ≥90%    │ <90%    │
│ └────┬────┘         │
│      │               │
│  Show result         │
│  (instant)  │        │
└────────────┼────────┘
             │ (background if >90%,
             │  foreground if <90%)
             ▼
┌──────────────────────┐
│ Server:              │
│ Full Swin-Tiny       │  ← Only when browser is uncertain
│ (ONNX Runtime)       │     or user requests "detailed analysis"
│                      │
│ Returns enriched     │
│ results with full    │
│ treatment info       │
└──────────────────────┘

Steps

5.1 Train lightweight browser model

Use the hierarchical training data to train a EfficientNet-Lite0 model that outputs both species and disease predictions:

import timm
import tensorflow as tf  # For TF.js export

# Train in PyTorch first (for accuracy), then convert
model = timm.create_model('efficientnet_lite0', pretrained=True)
# Add: species head (320) + disease head (11,499 flat)
# Or use hierarchical with just top-50 diseases per species

# Training: 10 epochs frozen backbone, 10 epochs fine-tune
# Target: <5MB model size, runs in <100ms on mobile device

Export to TF.js:

# Convert PyTorch → ONNX → TF.js
python -m tf2onnx.convert --pytorch-model browser_model.pt --output browser_model.onnx
tensorflowjs_converter --input_format=tf_saved_model browser_model/ browser_tfjs/

Model size target: < 5MB (EfficientNet-Lite0 is ~4.7MB with INT8 quantization).

5.2 Browser inference integration

// src/lib/ml/inference.ts — Updated with hybrid routing

export type InferenceSource = "browser" | "server";
export type InferenceMode = "quick" | "detailed";

export async function identifyPlant(
  image: HTMLImageElement | File,
  mode: InferenceMode = "quick",
): Promise<InferenceResult> {
  // 1. Run browser model (always, it's fast)
  const browserResult = await runBrowserInference(image);

  // 2. Decide: is this confident enough?
  if (mode === "quick" && browserResult.topConfidence >= 0.9) {
    // Browser alone is sufficient
    return {
      ...browserResult,
      source: "browser",
      inferenceTimeMs: browserResult.inferenceTimeMs,
    };
  }

  // 3. Fall back to server for detailed analysis
  const serverResult = await runServerInference(image);

  return {
    ...serverResult,
    source: "server",
    browserConfidence: browserResult.topConfidence,
    serverConfidence: serverResult.topConfidence,
  };
}

async function runBrowserInference(image: HTMLImageElement): Promise<BrowserResult> {
  const model = await getBrowserModel(); // Lazy load EfficientNet-Lite
  const tensor = await preprocessBrowser(image); // TF.js preprocessing
  const output = await model.predict(tensor);
  return parseOutput(output);
}

5.3 UI integration

// src/components/ImageUpload.tsx — Updated

function ImageUpload() {
  const [result, setResult] = useState<InferenceResult | null>(null);
  const [mode, setMode] = useState<InferenceMode>('quick');
  const [source, setSource] = useState<InferenceSource | null>(null);

  async function handleUpload(image: File) {
    // Run browser model (instant)
    const browserResult = await identifyPlant(image, 'quick');
    setResult(browserResult);
    setSource(browserResult.source);

    // If server was called in background, show loading indicator
    if (browserResult.source === 'server') {
      // Show "Getting detailed analysis..." spinner
    }
  }

  return (
    <div>
      <ImageUploader onUpload={handleUpload} />
      {result && (
        <div>
          <ResultCard result={result} />
          <ConfidenceBadge
            confidence={result.topConfidence}
            source={source}  // "browser" or "server"
          />
        </div>
      )}
    </div>
  );
}

5.4 User-facing indication

Show a subtle badge indicating which model made the prediction:

Source	Badge	UX
Browser (high conf)	✅ Instant ID	Green badge, "Analyzed on device"
Server (full model)	🧠 Detailed Analysis	Blue badge, "Deep analysis"
Server (fallback)	🔄 Upgraded	Yellow badge, "Upgraded for accuracy"

5.5 Progressive enhancement

The system should degrade gracefully:

Scenario	Behavior
Offline	Browser model only (may be less accurate for unusual diseases)
Slow network	Browser model shows results immediately, server updates in background
Server down	Browser model alone, with note: "Limited to quick analysis"
New disease (not in browser model)	Server model handles it, browser shows "could be unusual"
No camera / file	Error message, "Upload an image to identify"

Edge Cases & Gotchas

Model loading race: If the browser model hasn't loaded yet, show a loading spinner rather than falling through to server. Lazy-load the model on page mount.
Discrepancy between browser and server: If browser and server disagree on the top prediction, show both with confidence bars. The server model is authoritative.
Retina / high-DPI images: TF.js may handle these differently from ONNX. Ensure preprocessing (resize, normalize) produces identical tensors.
Cache busting: When the model is updated, increment a version hash in the URL to avoid stale cached models.
Memory: EfficientNet-Lite takes ~5MB in memory. Older phones may struggle; add a cleanup step after inference (model.dispose()).

Performance Targets

Metric	Target
Browser model load time (warm)	< 1s
Browser model inference	< 100ms
Server model inference (warm)	< 200ms
Hybrid fast path (browser only)	< 200ms total
Hybrid server path	< 1.5s total (including network)
Model file size (browser)	< 5MB

Verification

Browser model loads in Chrome, Firefox, Safari (desktop + mobile)
Browser model inference completes in < 100ms on mid-range phone
Hybrid routing works: conf ≥90% → browser result, conf <90% → server result
Server fallback fires within 200ms of browser model completing
UI shows source badge ("Instant ID" vs "Deep Analysis")
Offline mode: browser model works without network
Server degraded: system still works with browser model only
No memory leaks on repeated inferences (10+ images in succession)
Identical image produces same top prediction on browser and server (within margin)
All existing tests pass with hybrid pipeline

8.0 KiB Raw Blame History