Files
FrenoCorp/analysis/fre4806_datadog_sentry_integration.md
2026-05-11 14:19:01 -04:00

22 KiB
Raw Blame History

FRE-4806: Datadog APM + Sentry Integration Implementation Plan

Overview

This document outlines the implementation approach for integrating Datadog APM and Sentry into the FrenoCorp platform. This integration provides comprehensive observability, error tracking, and performance monitoring across all services.

Architecture Decision Record (ADR)

ADR-0042: Observability Stack Selection

Decision: Integrate Datadog APM for distributed tracing and performance monitoring, combined with Sentry for error tracking and release management.

Context:

  • Current monitoring relies on basic logging and metrics
  • No centralized error tracking or distributed tracing
  • Multiple microservices require coordinated observability
  • Need to support debugging production issues efficiently

Alternatives Considered:

Option Pros Cons
Datadog + Sentry Industry standard, rich ecosystem, excellent DX Cost at scale
OpenTelemetry + ELK Open source, flexible Higher operational overhead
New Relic Good APM, unified platform Less flexible error tracking

Decision Rationale:

  • Datadog APM provides best-in-class distributed tracing
  • Sentry offers superior developer experience for error tracking
  • Both have excellent Node.js, TypeScript, and Go support
  • Integration with existing CI/CD pipelines

Implementation Plan

Phase 1: Datadog APM Integration

1.1 Install and Configure Datadog SDK

Node.js Services:

// package.json
devDependencies: {
  "@datadog/pprof": "^1.0.0",
  "dd-trace": "^5.19.0",
}

// datadog.config.js
dd-trace.init({
  service: 'freno-corpservice',
  version: '1.0.0',
  env: process.env.NODE_ENV,
  sampling: 1.0,
  headers: {
    'Datadog-Trace-Propagation': 'w3c',
  },
});

Go Services:

// go.mod
go.mod: require (
    github.com/DataDog/dd-trace-go/v2 v2.1.0
)

// main.go
import (
    "github.com/DataDog/dd-trace-go/v2/ddtrace/opentelemetry"
    "go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp"
)

func initTracer() {
    otel.OTelTraceProvider(&otelo.TraceProviderConfig{
        ServiceName: "freno-corpservice",
    })
}

1.2 Configure Tracing Endpoints

datadog.yaml configuration:

# Datadog configuration
dd_trace_enabled: true
dd_apm_enabled: true
dd_api_key: "${DD_API_KEY}"
dd_app_key: "${DD_APP_KEY}"
dd_site: "datadoghq.com"

# Tracing configuration
dd_tracing_enabled: true
dd_trace_sample_rate: 1.0
dd_tracing_sampling_rules:
  - service: "api" rate: 1.0
  - service: "worker" rate: 0.5
  - service: "scheduler" rate: 0.1

# Performance monitoring
dd_profiling_enabled: true
dd_live_metrics: true

1.3 Implement Distributed Tracing

Request Context Propagation:

// middleware/tracing.ts
import { trace, Span } from '@datadog/pprof';
import { createContext } from 'express';

export const tracingMiddleware = (req: Request, res: Response, next: NextFunction) => {
    const span = trace.startSpan('http.request', {
        service: 'api',
        resource: `${req.method} ${req.path}`,
        tags: {
            'http.url': req.url,
            'http.method': req.method,
            'user.id': req.user?.id,
        },
    });

    // Attach span to request context
    req.span = span;
    
    res.on('finish', () => {
        span.finish();
    });
    
    next();
};

1.4 Database Query Tracing

PostgreSQL:

// middleware/db-tracing.ts
import { trace } from '@datadog/pprof';

export const dbTracingMiddleware = async (sql: string, params: unknown[]) => {
    const span = trace.startSpan('db.query', {
        service: 'database',
        resource: sql.substring(0, 100),
        tags: {
            'db.system': 'postgresql',
            'db.statement': sql,
        },
    });

    try {
        const start = Date.now();
        const result = await query(sql, params);
        const duration = Date.now() - start;
        
        span.setTags({
            'db.query.duration': duration,
            'db.query.rows': result.rowCount,
        });
        
        return result;
    } catch (error) {
        span.setError(error);
        throw error;
    } finally {
        span.finish();
    }
};

Redis:

// middleware/redis-tracing.ts
import { trace } from '@datadog/pprof';

export const redisTracingMiddleware = async (redis: Redis, key: string, command: string) => {
    const span = trace.startSpan('redis.command', {
        service: 'cache',
        resource: `${command}:${key.substring(0, 50)}`,
        tags: {
            'redis.key': key,
            'redis.command': command,
        },
    });

    const start = Date.now();
    try {
        const result = await redis[command](key);
        const duration = Date.now() - start;
        
        span.setTags({
            'redis.duration': duration,
            'redis.result': JSON.stringify(result),
        });
        
        return result;
    } finally {
        span.finish();
    }
};

1.5 External Service Tracing

HTTP Client Instrumentation:

// middleware/http-client-tracing.ts
import { trace } from '@datadog/pprof';
import { createProxyAgent } from 'http-proxy-agent';

export const httpTracingAgent = new http.Agent({
    keepAlive: true,
    keepAliveMsecs: 1000,
    maxSockets: 256,
    maxFreeSockets: 256,
});

export const httpTracingMiddleware = (url: URL, options: RequestOptions) => {
    const span = trace.startSpan('http.outbound', {
        service: 'external-api',
        resource: `${url.hostname}:${url.port || 443} ${options.method || 'GET'}`,
        tags: {
            'url': url.href,
            'method': options.method,
        },
    });

    return new Promise((resolve, reject) => {
        const client = new https.Agent({
            ...httpTracingAgent,
            createConnection: (options, cb) => {
                const span = trace.startSpan('tcp.socket', {
                    service: 'network',
                    resource: `${options.host}:${options.port}`,
                });
                
                const socket = net.createConnection(options, () => {
                    span.finish();
                    cb(null, socket);
                });
                
                socket.on('error', (err) => {
                    span.setError(err);
                    span.finish();
                    reject(err);
                });
                
                return socket;
            },
        });

        const req = https.request(url, options as any, (res) => {
            const duration = Date.now() - start;
            
            span.setTags({
                'http.response.status': res.statusCode,
                'http.response.duration': duration,
            });
            
            span.finish();
            resolve(res);
        });

        req.on('error', (err) => {
            span.setError(err);
            span.finish();
            reject(err);
        });

        req.setTimeout(30000);
        req.end();
    });
};

1.6 Trace Sampling and Performance

Smart Sampling Strategy:

// config/tracing.config.ts
export const tracingConfig = {
    // Sample 100% of requests with user_id for debugging
    sampleRateByUser: (userId: string) => {
        const hash = djb2Hash(userId);
        return hash % 100 === 0 ? 1.0 : 0.0;
    },
    
    // Sample 10% of error requests for analysis
    sampleRateOnError: 0.1,
    
    // Sample 5% of slow requests (duration > 100ms)
    sampleRateByDuration: (duration: number) => {
        return duration > 100 ? 0.05 : 0.0;
    },
    
    // Sample 1% of all requests for load testing
    defaultSampleRate: 0.01,
};

Phase 2: Sentry Integration

2.1 Install and Configure Sentry SDK

Node.js Configuration:

// sentry.ts
import * as Sentry from '@sentry/node';
import { Express } from '@sentry/express';
import { NodeProfilingIntegration } from '@sentry/node/integrations';

const sentryConfig: Sentry.NodeOptions = {
    dsn: process.env.SENTRY_DSN,
    environment: process.env.NODE_ENV,
    release: `freno-corp@${pkg.version}-${process.env.GIT_SHA || 'local'}`,
    tracesSampleRate: process.env.NODE_ENV === 'production' ? 0.1 : 1.0,
    profilesSampleRate: 1.0,
    
    // Integrations
    integrations: [
        new Sentry.Integrations.Express({ expr: app }),
        new NodeProfilingIntegration(),
        new Sentry.Integrations.Http({
            tracing: true,
            // Exclude internal calls
            ignoreUrls: [
                /\/api\/internal\//,
                /\/health\//,
                /\/metrics\//,
            ],
            // Include external API calls
            includeUrls: [
                /\/api\/external\//,
                /\/api\/partner\//,
            ],
        }),
    ],
    
    // Performance monitoring
    beforeSendTransaction(event: Sentry.TransactionEvent) {
        // Filter out internal transactions
        if (event.transaction.startsWith('/internal')) {
            return null;
        }
        return event;
    },
    
    // Error filtering
    beforeSend(event: Sentry.Event, hint: Sentry.EventHint) {
        // Filter out known issues
        const knownIssues = [
            /ECONNREFUSED/,
            /ETIMEDOUT/,
            /Rate limit exceeded/,
        ];
        
        const message = event.message?.toString() || '';
        if (knownIssues.some(regex => regex.test(message))) {
            return null;
        }
        
        return event;
    },
};

export const initSentry = () => {
    Sentry.init(sentryConfig);
};

2.2 React/Next.js Integration

Error Boundaries:

// components/SentryErrorBoundary.tsx
import * as Sentry from '@sentry/react';
import React, { Component, ErrorInfo, ReactNode } from 'react';

interface Props {
    children: ReactNode;
    fallback?: ReactNode;
}

interface State {
    hasError: boolean;
    error: Error | null;
}

export class SentryErrorBoundary extends Component<Props, State> {
    constructor(props: Props) {
        super(props);
        this.state = { hasError: false, error: null };
    }

    static getDerivedStateFromError(error: Error): State {
        return { hasError: true, error };
    }

    componentDidCatch(error: Error, errorInfo: ErrorInfo) {
        Sentry.captureException(error, { 
            contexts: { 
                react: { componentStack: errorInfo.componentStack } 
            }
        });
    }

    render() {
        if (this.state.hasError) {
            return this.props.fallback || <SentryErrorFallback />;
        }
        return this.props.children;
    }
}

Global Error Handler:

// middleware/global-error-handler.ts
export const errorHandler = (err: Error, req: Request, res: Response, next: NextFunction) => {
    // Capture error in Sentry
    Sentry.captureException(err, {
        extra: {
            url: req.url,
            method: req.method,
            userAgent: req.headers['user-agent'],
        },
    });

    // Log to Datadog
    const span = req.span;
    if (span) {
        span.setError(err);
        span.setTag('error', 'unhandled');
    }

    // Standard error handling
    const statusCode = err.statusCode || 500;
    res.status(statusCode).json({
        error: err.message,
        ...(process.env.NODE_ENV === 'development' && { stack: err.stack }),
    });
};

2.3 Browser SDK Configuration

Next.js Configuration:

// next.config.js
/** @type {import('next').NextConfig} */
const nextConfig = {
    env: {
        SENTRY_DSN: process.env.SENTRY_DSN,
    },
    experimental: {
        serverComponentsExternalPackages: ['@sentry/nextjs'],
    },
};

export default nextConfig;

Sentry Browser SDK:

// components/Sentry.tsx
'use client';

import * as Sentry from '@sentry/browser';
import { ReactRouter6BrowserTracingIntegration } from '@sentry/react';

Sentry.init({
    dsn: process.env.NEXT_PUBLIC_SENTRY_DSN,
    environment: process.env.NEXT_PUBLIC_ENV,
    release: `freno-corp@${pkg.version}-${process.env.GIT_SHA || 'local'}`,
    
    tracesSampleRate: 1.0,
    
    integrations: [
        new ReactRouter6BrowserTracingIntegration({
            router: useRouter(),
        }),
    ],

    // Performance monitoring
    beforeSendTransaction(event) {
        // Filter sensitive endpoints
        if (/(token|secret|password)/i.test(event.name)) {
            return null;
        }
        return event;
    },
});

2.4 React Query Integration

Automatic Tracking:

// hooks/useSentryQuery.ts
import { useQuery, UseQueryOptions } from '@tanstack/react-query';
import * as Sentry from '@sentry/react';

/**
 * React Query hook with automatic Sentry integration
 * Automatically captures query errors and performance
 */
export function useSentryQuery<TData, TError = Error>(
    queryKey: unknown[],
    queryFn: () => Promise<TData>,
    options?: UseQueryOptions<TData, TError>
) {
    return useQuery<TData, TError>(
        queryKey,
        queryFn,
        {
            ...options,
            onError: (error) => {
                // Only capture non-4xx errors
                if (error instanceof Error && !(error as any).statusCode) {
                    Sentry.captureException(error, {
                        tags: {
                            query: JSON.stringify(queryKey),
                        },
                    });
                }
            },
        }
    );
}

2.5 Component Performance Monitoring

Component Profiling:

// components/ProfiledComponent.tsx
import * as Sentry from '@sentry/react';
import { createProfiler } from '@sentry/profiling';

/**
 * Wrap components for Sentry profiling
 */
export function ProfiledComponent<TProps>(
    Component: React.ComponentType<TProps>,
    name: string
) {
    return function ProfiledComponentWrapper(props: TProps) {
        const [profiler, setProfiler] = useState<Sentry.Profiler | null>(null);

        const startProfiler = () => {
            const profiler = createProfiler();
            setProfiler(profiler);
            
            profiler.start((result) => {
                Sentry.profiler.recordResult(result);
            });
        };

        const stopProfiler = () => {
            if (profiler) {
                profiler.stop();
            }
        };

        return (
            <>
                <Profiler
                    name={name}
                    onRender={startProfiler}
                    onExit={stopProfiler}
                >
                    <Component {...props} />
                </Profiler>
            </>
        );
    };
}

Phase 3: Unified Observability

3.1 Correlate Datadog and Sentry Data

Request Correlation:

// middleware/correlation.ts
import { trace } from '@datadog/pprof';
import * as Sentry from '@sentry/node';

export const correlationMiddleware = (req: Request, res: Response, next: NextFunction) => {
    // Generate correlation ID
    const correlationId = uuidv4();
    req.correlationId = correlationId;

    // Set correlation headers
    res.setHeader('X-Correlation-ID', correlationId);

    // Start Datadog trace
    const ddSpan = trace.startSpan('http.request', {
        service: 'api',
        resource: `${req.method} ${req.path}`,
        tags: {
            'correlation.id': correlationId,
        },
    });

    // Create Sentry transaction
    Sentry.startSpan({
        op: 'http.server',
        name: req.method + ' ' + req.url,
        attributes: {
            'http.request.method': req.method,
            'http.request.url': req.url,
            'correlation.id': correlationId,
        },
    });

    // Store correlation ID in request context
    req.correlationId = correlationId;

    res.on('finish', () => {
        // Finish Datadog span with correlation ID
        ddSpan.setTags({
            'http.response.status': res.statusCode,
        });
        ddSpan.finish();
    });

    next();
};

3.2 Unified Metrics Dashboard

Metrics Collection:

// lib/metrics.ts
import { trace } from '@datadog/pprof';
import * as Sentry from '@sentry/node';

/**
 * Unified metrics that send to both Datadog and Sentry
 */
export class UnifiedMetrics {
    private ddMeters: Map<string, Datadog.Meter> = new Map();

    incrementCounter(name: string, value: number = 1, tags?: Record<string, string>) {
        // Datadog
        const meter = this.ddMeters.get(name) || new Datadog.Meter(name);
        meter.increment(value, tags);

        // Sentry
        Sentry.metrics.increment(name, value, { tags });
    }

    distribution(name: string, value: number, unit: string, tags?: Record<string, string>) {
        // Datadog
        const meter = this.ddMeters.get(name) || new Datadog.Meter(name);
        meter.distribution(value, unit, tags);

        // Sentry
        Sentry.metrics.distribution(name, value, { unit, tags });
    }

    gauge(name: string, value: number, tags?: Record<string, string>) {
        // Datadog
        const meter = this.ddMeters.get(name) || new Datadog.Meter(name);
        meter.gauge(value, tags);

        // Sentry
        Sentry.metrics.gauge(name, value, { tags });
    }
}

// Usage
const metrics = new UnifiedMetrics();

// In middleware
export const metricsMiddleware = (req: Request, res: Response, next: NextFunction) => {
    const startTime = Date.now();
    
    // Track request duration
    metrics.distribution(
        'http.request.duration',
        Date.now() - startTime,
        'ms',
        {
            'http.method': req.method,
            'http.path': req.path,
            'correlation.id': req.correlationId,
        }
    );

    next();
};

3.3 Alerting Configuration

Datadog Alerts:

# datadog-alerts.yaml
alerts:
  - name: 'High Error Rate'
    type: 'threshold'
    query: 'last:1m'
    conditions:
      - metric: 'http.errors'
        operator: 'gt'
        value: 5
    notifications:
      - type: 'email'
        to: 'platform-team@freno.corp'
      - type: 'slack'
        channel: '#platform-alerts'

  - name: 'Slow API Response'
    type: 'threshold'
    query: 'last:1m'
    conditions:
      - metric: 'http.response_time.p99'
        operator: 'gt'
        value: 1000
    notifications:
      - type: 'pagerduty'
        service: 'platform-oncall'

  - name: 'Database Connection Pool Exhaustion'
    type: 'threshold'
    query: 'last:1m'
    conditions:
      - metric: 'db.connections.active'
        operator: 'gt'
        value: 95
    notifications:
      - type: 'slack'
        channel: '#database-alerts'

Sentry Alerts:

// config/sentry-alerts.ts
import * as Sentry from '@sentry/node';

Sentry.init({
    // ... other config
    
    // Error rate alerting
    beforeSendTransaction(event) {
        if (event.transaction === '/api/errors') {
            // Custom Sentry alert logic
        }
        return event;
    },
});

Implementation Timeline

Phase Tasks Duration Dependencies
Phase 1 Datadog APM setup 2-3 days None
Tracing middleware 1-2 days Phase 1.1
Database/Cache tracing 1-2 days Phase 1.1
External service tracing 1-2 days Phase 1.1
Phase 2 Sentry setup 1-2 days None
React/Next.js integration 2-3 days Phase 2.1
Error boundaries 1-2 days Phase 2.1
Browser SDK 1 day Phase 2.1
Phase 3 Correlation layer 1-2 days Phase 1, 2
Unified metrics 1-2 days Phase 1, 2
Alerting setup 1 day Phase 3.1, 3.2
Phase 4 Testing 2-3 days All phases
Documentation 1-2 days All phases

Total Estimated Time: 18-25 days


Verification Checklist

Phase 1: Datadog

  • SDK installed and configured
  • Tracing enabled on all services
  • Distributed tracing working (trace ID propagates)
  • Database queries traced
  • External API calls traced
  • Sampling rules configured
  • Metrics visible in Datadog dashboard
  • Profiling enabled

Phase 2: Sentry

  • SDK installed and configured
  • Error tracking working
  • Performance monitoring active
  • React/Next.js integration complete
  • Error boundaries functional
  • Browser SDK tracking user interactions
  • Release tracking enabled

Phase 3: Unified

  • Correlation IDs working
  • Metrics synchronized
  • Alerts configured and tested
  • Dashboard accessible

Rollback Plan

If issues arise during or after implementation:

  1. Disable tracing:

    # Set sampling rate to 0
    export DD_TRACE_SAMPLE_RATE=0
    export SENTRY_TRACES_SAMPLE_RATE=0
    
  2. Remove SDKs:

    # Uninstall packages
    npm uninstall dd-trace @sentry/node
    # Remove initialization code
    
  3. Restore from backup:

    git checkout HEAD~1 -- lib/tracing/ config/*.ts
    

Cost Estimation

Service Monthly Cost (1M transactions) Notes
Datadog APM ~$1,000 Includes tracing, metrics, profiling
Datadog Logs ~$500 Log ingestion and retention
Sentry ~$249 Error tracking and release management
Total ~$1,749 Scales with usage

Costs subject to change based on actual usage and feature requirements.


Next Steps

  1. Create technical analysis document (current task)
  2. Create implementation plan (in progress)
  3. Implement Datadog APM integration
  4. Implement Sentry integration
  5. Configure unified observability
  6. Test and validate
  7. Deploy to staging
  8. Production rollout

Document Author: CTO (Agent) Date: 2026-05-11 Status: Implementation Plan Complete