08. Production Hardening and Observability

meta: id: production-ml-pipeline-08 feature: production-ml-pipeline priority: P1 depends_on: [production-ml-pipeline-07] tags: [implementation, production, observability]

objective:

Add comprehensive error handling at every layer of the pipeline
Implement structured logging for observability
Add rate limiting to prevent abuse
Create a health endpoint that reports model status and inference metrics
Ensure the system is production-ready with monitoring, cleanup, and resilience

deliverables:

src/app/api/health/route.ts — enhanced health endpoint with model status
src/lib/middleware/rate-limit.ts — rate limiting middleware
src/lib/middleware/error-handler.ts — global error handler
src/lib/observability/logger.ts — structured logger
src/lib/observability/metrics.ts — inference metrics tracker
Updated API routes with error handling and logging
Updated next.config.ts with rate limiting configuration

steps:

Create structured logger src/lib/observability/logger.ts:

export interface LogEntry {
  timestamp: string;
  level: "debug" | "info" | "warn" | "error";
  event: string;
  data?: Record<string, any>;
  error?: { message: string; stack?: string };
}

export function log(level: LogEntry["level"], event: string, data?: Record<string, any>) {
  const entry: LogEntry = {
    timestamp: new Date().toISOString(),
    level,
    event,
    data,
  };

  if (level === "error" && data?.error) {
    entry.error = {
      message: data.error.message,
      stack: data.error.stack,
    };
  }

  console.log(JSON.stringify(entry));
}

export const logger = {
  debug: (event: string, data?: any) => log("debug", event, data),
  info: (event: string, data?: any) => log("info", event, data),
  warn: (event: string, data?: any) => log("warn", event, data),
  error: (event: string, data?: any) => log("error", event, data),
};

Create metrics tracker src/lib/observability/metrics.ts:

interface InferenceMetrics {
  totalInferences: number;
  totalErrors: number;
  avgInferenceTimeMs: number;
  lastInferenceAt: string | null;
  modelLoaded: boolean;
  modelLoadTimeMs: number | null;
}

class MetricsTracker {
  private metrics: InferenceMetrics = {
    totalInferences: 0,
    totalErrors: 0,
    avgInferenceTimeMs: 0,
    lastInferenceAt: null,
    modelLoaded: false,
    modelLoadTimeMs: null,
  };

  recordInference(inferenceTimeMs: number) {
    this.metrics.totalInferences++;
    this.metrics.lastInferenceAt = new Date().toISOString();
    // Running average
    this.metrics.avgInferenceTimeMs =
      (this.metrics.avgInferenceTimeMs * (this.metrics.totalInferences - 1) + inferenceTimeMs) /
      this.metrics.totalInferences;
  }

  recordError() {
    this.metrics.totalErrors++;
  }

  setModelStatus(loaded: boolean, loadTimeMs?: number) {
    this.metrics.modelLoaded = loaded;
    if (loadTimeMs !== undefined) {
      this.metrics.modelLoadTimeMs = loadTimeMs;
    }
  }

  getMetrics(): InferenceMetrics {
    return { ...this.metrics };
  }
}

export const metrics = new MetricsTracker();

Enhance health endpoint src/app/api/health/route.ts:

import { NextResponse } from "next/server";
import { getModel } from "@/lib/ml/model-loader";
import { metrics } from "@/lib/observability/metrics";

export async function GET() {
  const model = await getModel();
  const modelStatus = model.getStatus();

  return NextResponse.json({
    status: "ok",
    timestamp: new Date().toISOString(),
    model: {
      loaded: modelStatus.loaded,
      backend: modelStatus.backend,
      modelId: modelStatus.modelId,
      numClasses: modelStatus.numClasses,
      error: modelStatus.error,
    },
    metrics: metrics.getMetrics(),
    uptime: process.uptime(),
  });
}

Create rate limiting middleware src/lib/middleware/rate-limit.ts:

import { NextRequest, NextResponse } from "next/server";

// Simple in-memory rate limiter (for production, use Redis or similar)
const requestCounts = new Map<string, { count: number; resetAt: number }>();

const RATE_LIMIT = {
  maxRequests: 10, // 10 requests per window
  windowMs: 60 * 1000, // 1 minute window
};

export function rateLimit(request: NextRequest): NextResponse | null {
  const ip = request.headers.get("x-forwarded-for") || "unknown";
  const now = Date.now();

  let record = requestCounts.get(ip);

  if (!record || now > record.resetAt) {
    record = { count: 0, resetAt: now + RATE_LIMIT.windowMs };
    requestCounts.set(ip, record);
  }

  record.count++;

  if (record.count > RATE_LIMIT.maxRequests) {
    return NextResponse.json(
      { error: "Rate limit exceeded", message: "Too many requests. Please try again later." },
      { status: 429 },
    );
  }

  return null; // No rate limit hit
}

Create global error handler src/lib/middleware/error-handler.ts:

import { NextResponse } from "next/server";
import { logger } from "@/lib/observability/logger";

export function handleError(error: unknown, context: string): NextResponse {
  logger.error("unhandled_error", {
    context,
    error:
      error instanceof Error
        ? { message: error.message, stack: error.stack }
        : { message: String(error) },
  });

  return NextResponse.json(
    {
      error: "Internal server error",
      message: "An unexpected error occurred. Please try again later.",
      context,
    },
    { status: 500 },
  );
}

Add error handling to /api/upload:

import { rateLimit } from "@/lib/middleware/rate-limit";
import { handleError } from "@/lib/middleware/error-handler";
import { logger } from "@/lib/observability/logger";

export async function POST(request: NextRequest) {
  // Rate limiting
  const rateLimitError = rateLimit(request);
  if (rateLimitError) return rateLimitError;

  try {
    logger.info("upload_start", { ip: request.headers.get("x-forwarded-for") });

    // ... existing upload logic ...

    logger.info("upload_success", { imageId, fileSize: buffer.length });
    return NextResponse.json({ imageId, tensorShape, previewUrl });
  } catch (error) {
    return handleError(error, "upload");
  }
}

Add error handling to /api/identify:

export async function POST(request: NextRequest) {
  const rateLimitError = rateLimit(request);
  if (rateLimitError) return rateLimitError;

  try {
    logger.info("identify_start", { imageId, plantId });

    const startTime = Date.now();

    // ... existing identify logic ...

    const inferenceTimeMs = Date.now() - startTime;
    metrics.recordInference(inferenceTimeMs);

    logger.info("identify_success", {
      imageId,
      inferenceTimeMs,
      topPrediction: predictions[0]?.diseaseId,
      confidence: predictions[0]?.confidence.adjusted,
    });

    return NextResponse.json({ predictions, metadata });
  } catch (error) {
    metrics.recordError();

    if (error instanceof Error && error.message.includes("not loaded")) {
      return NextResponse.json(
        {
          error: "Model not available",
          message: "ML model failed to load. Please try again later.",
        },
        { status: 503 },
      );
    }

    return handleError(error, "identify");
  }
}

Add model status tracking to model-loader.ts:

import { metrics } from "@/lib/observability/metrics";

async function loadModel(): Promise<PlantDiseaseModel> {
  const startTime = Date.now();

  try {
    const model = await tryLoadTFJS();
    if (model) {
      const loadTimeMs = Date.now() - startTime;
      metrics.setModelStatus(true, loadTimeMs);
      logger.info("model_loaded", { backend: "tfjs", loadTimeMs });
      return model;
    }
  } catch (error) {
    logger.warn("model_load_failed", { backend: "tfjs", error });
  }

  // ... fallback to mock ...
  metrics.setModelStatus(false);
  return createMockModel();
}

Add cleanup for old uploads:

// src/lib/cleanup.ts
import fs from "fs/promises";
import path from "path";

const UPLOADS_DIR = path.join(process.cwd(), "public", "uploads");
const MAX_AGE_MS = 24 * 60 * 60 * 1000; // 24 hours

export async function cleanupOldUploads() {
  const files = await fs.readdir(UPLOADS_DIR);
  const now = Date.now();

  for (const file of files) {
    const filePath = path.join(UPLOADS_DIR, file);
    const stat = await fs.stat(filePath);

    if (now - stat.mtimeMs > MAX_AGE_MS) {
      await fs.unlink(filePath);
      logger.info("upload_cleaned", { file, ageMs: now - stat.mtimeMs });
    }
  }
}

// Run cleanup on server start and periodically
if (process.env.NODE_ENV === "production") {
  cleanupOldUploads();
  setInterval(cleanupOldUploads, 60 * 60 * 1000); // Every hour
}

Update next.config.ts with security headers and rate limiting:

const nextConfig = {
  // ... existing config ...
  async headers() {
    return [
      {
        source: "/api/:path*",
        headers: [
          { key: "X-Content-Type-Options", value: "nosniff" },
          { key: "X-Frame-Options", value: "DENY" },
          { key: "X-XSS-Protection", value: "1; mode=block" },
        ],
      },
    ];
  },
};

Add monitoring dashboard (optional) src/app/admin/metrics/page.tsx:
- Simple page showing inference metrics
- Model status
- Recent inference times
- Error rate
- Protected by authentication (admin only)
Document production checklist in docs/production-checklist.md:
- Environment variables needed
- Model deployment steps
- Monitoring setup
- Backup strategy
- Rollback procedure

tests:

Unit: rate limiter blocks after max requests
Unit: rate limiter resets after window
Unit: metrics tracker records inference correctly
Unit: metrics tracker computes running average
Unit: logger produces valid JSON output
Integration: health endpoint returns model status and metrics
Integration: rate limit returns 429 after max requests
Integration: error handler catches unhandled errors and returns 500

acceptance_criteria:

All API routes have rate limiting (10 requests per minute per IP)
All API routes have structured logging (JSON format)
Health endpoint reports model status, inference metrics, uptime
Error handler catches all unhandled errors and returns 500 with clear message
Old uploads are cleaned up automatically (24-hour TTL)
Metrics tracker records inference time, error rate, model status
Security headers are set (X-Content-Type-Options, X-Frame-Options, X-XSS-Protection)
Production checklist is documented

validation:

npx vitest run src/lib/middleware/rate-limit.test.ts
npx vitest run src/lib/observability/metrics.test.ts
curl http://localhost:3000/api/health — returns model status and metrics
curl -X POST http://localhost:3000/api/identify ... (11 times) — 11th request returns 429
Check server logs: JSON-formatted log entries for all requests
Wait 25 minutes: old uploads are cleaned up

notes:

Rate limiter uses in-memory storage — for multi-instance deployments, use Redis or similar
Metrics are in-memory — for persistent metrics, use a time-series database
Health endpoint should be monitored by uptime monitoring service (e.g., Pingdom, UptimeRobot)
Cleanup runs every hour in production — adjust frequency based on upload volume
Security headers are basic — consider adding CSP, HSTS for full security hardening
Production checklist should be reviewed before each deployment

12 KiB Raw Blame History

08. Production Hardening and Observability

12 KiB

Raw Blame History