Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Select an option

  • Save carefree-ladka/713dc5c5dff7c88e4ab8abe83474aac8 to your computer and use it in GitHub Desktop.

Select an option

Save carefree-ladka/713dc5c5dff7c88e4ab8abe83474aac8 to your computer and use it in GitHub Desktop.
Debugging Production Issues in Frontend Applications

Debugging Production Issues in Frontend Applications


Table of Contents

  1. Introduction
  2. Understanding Production vs Development Environments
  3. Setting Up Observability Before Issues Arise
  4. Reproducing Production Bugs Locally
  5. Debugging JavaScript Errors
  6. Network and API Debugging
  7. Performance Debugging
  8. State Management Debugging
  9. CSS and Layout Issues in Production
  10. Cross-Browser and Cross-Device Debugging
  11. Debugging Deployed Builds
  12. Incident Response Workflow
  13. Post-Mortem and Prevention
  14. Tools Reference

Introduction

Debugging production issues in frontend applications is one of the most challenging aspects of software engineering. Unlike development environments, production systems face real user traffic, minified code, CDN caching layers, diverse device configurations, and constraints that make bugs hard to reproduce and diagnose.

This guide walks through systematic strategies — from prevention and observability to active debugging and incident response — to help you resolve production issues confidently and quickly.


Understanding Production vs Development Environments

Before debugging, understand why production behaves differently:

Factor Development Production
Code Unminified, source maps available Minified, often no source maps
Environment variables .env.local, verbose Secrets in CI/CD, restricted
Build optimizations None or minimal Tree shaking, code splitting, caching
Error verbosity Full stack traces Often swallowed or sanitized
Network Localhost, no CDN CDN, edge caching, real latency
User data Mocked or seeded Real, unexpected edge cases
Browser/Device Your machine Hundreds of configurations

Understanding this gap is the first step. Many bugs only occur in production because of minification, async race conditions under load, missing environment variables, or third-party integrations behaving differently.


Setting Up Observability Before Issues Arise

The best time to set up debugging infrastructure is before a production incident. Reactive debugging without observability is like navigating in the dark.

Error Monitoring

Integrate an error tracking service such as Sentry, Datadog RUM, or Bugsnag into your app. At minimum, capture:

  • Unhandled JavaScript exceptions
  • Unhandled promise rejections
  • Console errors (optionally)
  • User context (anonymized session ID, browser, OS)
// Example: Sentry initialization (React)
import * as Sentry from "@sentry/react";

Sentry.init({
  dsn: process.env.REACT_APP_SENTRY_DSN,
  environment: process.env.NODE_ENV,
  release: process.env.REACT_APP_VERSION,
  integrations: [new Sentry.BrowserTracing()],
  tracesSampleRate: 0.2,
});

Tip: Tag errors with a release version that matches your Git SHA or package version. This lets you track which deployment introduced a regression.

Performance Monitoring

Use Real User Monitoring (RUM) to capture:

  • Core Web Vitals (LCP, FID/INP, CLS)
  • Time to First Byte (TTFB)
  • Long tasks and JavaScript execution time
  • Resource load timings

Tools: Sentry Performance, Datadog RUM, SpeedCurve, Grafana Faro.

Logging Strategy

Avoid console.log in production (it leaks internals and wastes memory). Instead, use a structured logging utility:

const logger = {
  info: (msg, meta = {}) => {
    if (process.env.NODE_ENV !== "production") console.info(msg, meta);
    // Send to log aggregator in production
    sendToLogService({ level: "info", message: msg, ...meta });
  },
  error: (msg, error, meta = {}) => {
    console.error(msg, error);
    Sentry.captureException(error, { extra: { message: msg, ...meta } });
  },
};

Reproducing Production Bugs Locally

Environment Parity

Production bugs that don't reproduce locally are often caused by:

  • Different environment variables — check .env.production vs .env.local
  • Different API endpoints — production APIs may behave differently
  • Build-time differences — run npm run build && npx serve -s build locally
  • Different Node/package versions — use .nvmrc and lock files (package-lock.json / yarn.lock)

Always test with a production build locally before concluding a bug is environment-specific.

Feature Flags and Configuration

If your app uses feature flags (LaunchDarkly, Unleash, custom toggles), replicate the production flag state in your local environment. A flag enabled for 10% of users might be what's causing the bug for that cohort.


Debugging JavaScript Errors

Source Maps

Source maps translate minified production code back into readable source code. Configure your bundler to upload source maps to your error tracker without exposing them publicly:

// vite.config.js
export default {
  build: {
    sourcemap: true, // Generate source maps
  },
};
# Upload to Sentry after build (keep maps off the CDN)
npx @sentry/cli releases files $RELEASE upload-sourcemaps ./dist \
  --url-prefix '~/assets'

Never serve source maps publicly in production — they expose your full application source code.

Stack Trace Analysis

When reading a production stack trace:

  1. Identify the topmost frame in your own code — ignore framework internals.
  2. Look for async boundaries — errors in Promise chains often show truncated traces.
  3. Check the error message carefullyCannot read properties of undefined usually means a null data issue upstream.
  4. Use breadcrumbs — error trackers capture user actions leading up to the crash.

Runtime Errors vs Logic Errors

  • Runtime errors (exceptions, crashes): captured by error monitoring automatically.
  • Logic errors (wrong data rendered, incorrect calculations, silent failures): require logging, assertions, and user reports to surface.

For logic errors, add invariant checks at critical data boundaries:

function renderUserProfile(user) {
  if (!user?.id) {
    logger.error("renderUserProfile called with invalid user", new Error("InvalidUser"), { user });
    return null;
  }
  // ...
}

Network and API Debugging

Failed Requests

Intercept and log all failed network requests:

// Axios interceptor example
axios.interceptors.response.use(
  (response) => response,
  (error) => {
    logger.error("API request failed", error, {
      url: error.config?.url,
      method: error.config?.method,
      status: error.response?.status,
    });
    return Promise.reject(error);
  }
);

In your error tracker, attach the request URL, method, status code, and response body (sanitized) to every network error.

CORS Issues

CORS errors in production are frequently caused by:

  • Forgetting to add the production domain to the API's allowed origins
  • HTTP vs HTTPS mismatches
  • Missing headers on preflight (OPTIONS) requests

Check the browser's Network tab for blocked preflight requests and compare Access-Control-Allow-Origin headers between environments.

Payload and Response Inspection

Use HAR (HTTP Archive) exports from the browser DevTools Network tab to share exact request/response data with backend teams without needing to reproduce the issue live.


Performance Debugging

Core Web Vitals

Poor Core Web Vitals directly affect user experience and SEO. Use these tools to diagnose them:

Metric Tool Common Causes
LCP (Largest Contentful Paint) PageSpeed Insights, WebPageTest Large images, slow server response, render-blocking resources
INP (Interaction to Next Paint) Chrome DevTools Performance Long JavaScript tasks, event handler bottlenecks
CLS (Cumulative Layout Shift) Lighthouse Images without dimensions, dynamically injected content

In the Chrome DevTools Performance panel:

  1. Record a page load or interaction.
  2. Look for Long Tasks (red blocks > 50ms).
  3. Drill into the flame chart to find the responsible function.

Memory Leaks

Signs of memory leaks in production: increasing RAM over time, degraded performance after extended use, eventual browser tab crashes.

Diagnose with Chrome DevTools Memory panel:

  1. Take a heap snapshot baseline.
  2. Perform the suspected leaking action several times.
  3. Take another snapshot and compare — look for detached DOM nodes and growing object counts.

Common frontend memory leak sources:

  • Event listeners not removed on component unmount
  • Intervals/timeouts not cleared
  • Global caches that grow unbounded
  • Closure references holding large objects
// React: clean up subscriptions and listeners
useEffect(() => {
  const handler = (e) => { /* ... */ };
  window.addEventListener("resize", handler);
  return () => window.removeEventListener("resize", handler); // critical
}, []);

Bundle Size Analysis

Large bundles cause slow initial loads. After a production regression in load time, analyze your bundle:

# Webpack Bundle Analyzer
npx webpack-bundle-analyzer stats.json

# Vite
npx vite-bundle-visualizer

Look for:

  • Accidental duplication of libraries
  • Large dependencies that could be lazy-loaded
  • Entire utility libraries imported instead of specific functions (e.g., import _ from 'lodash' vs import debounce from 'lodash/debounce')

State Management Debugging

State bugs are often invisible in monitoring tools. Strategies to surface them:

Redux / Zustand / Pinia:

  • Enable Redux DevTools in development and use time-travel debugging to replay state transitions.
  • Log state snapshots around critical user actions in production.

React Query / SWR:

  • Check staleTime and cacheTime configurations — stale data is a common production-only issue.
  • Log cache keys and query states when data appears incorrect.

General:

  • Instrument state transitions with structured logs that include before/after snapshots (sanitized for PII).
  • Add assertions on impossible state combinations:
// Example: assert cart state invariant
if (cart.items.length > 0 && cart.total === 0) {
  logger.error("Cart invariant violated: items present but total is zero", new Error("CartInvariant"), { cart });
}

CSS and Layout Issues in Production

CSS bugs that only appear in production are usually caused by:

  • CSS purging removing classes that were dynamically computed (e.g., Tailwind's text-${color}-500 pattern)
  • CSS specificity conflicts with third-party stylesheets loaded in production
  • Different browser rendering on devices not tested locally

Debugging approaches:

  1. Use browser DevTools (inspect element, computed styles) directly on the production URL.
  2. For purging issues (Tailwind/PurgeCSS), check your content or safelist configuration.
  3. For third-party conflicts, use the browser's Styles panel to find which stylesheet is overriding your rules.
  4. Use BrowserStack or LambdaTest to inspect on real devices and browsers.

Cross-Browser and Cross-Device Debugging

Production users don't all use Chrome on a MacBook. Approach cross-environment bugs systematically:

  1. Identify affected environments from your error tracker's browser/OS breakdown.
  2. Use can I use to check if a web API you're using has gaps in affected browsers.
  3. Test on real devices via BrowserStack, Sauce Labs, or physical device labs.
  4. Check polyfills — confirm that your Babel/transpiler config targets the browsers you're supporting.

For iOS Safari-specific bugs (common due to WebKit's distinct rendering engine):

  • Use Safari's Web Inspector via a Mac connected to an iPhone/iPad.
  • Be alert to differences in position: fixed, 100vh, scroll event behavior, and certain CSS properties.

Debugging Deployed Builds

CI/CD Pipeline Artifacts

If a bug was introduced in a specific deployment:

  1. Identify the deploy using your release tracking (Sentry releases, deploy markers in Datadog).
  2. Diff the commits between the last known-good and the broken release.
  3. Download the build artifact from CI (GitHub Actions, CircleCI, etc.) and serve it locally to reproduce.
  4. Bisect if necessary — deploy intermediate commits to a staging environment to isolate the regression.

Rollback Strategies

When a production incident is severe, the fastest fix is a rollback:

Strategy Speed Risk
Re-deploy last known-good artifact Fast Low — exact previous build
Revert commit and redeploy Medium Low — clean Git history
Feature flag disable Instant Low — no deployment needed
Hotfix branch Slow Medium — new code under pressure

Always prefer re-deploying a previous artifact over a hotfix under pressure. Hotfixes written during incidents often introduce new bugs.


Incident Response Workflow

When a production issue is reported, follow a disciplined process:

1. TRIAGE
   ├── Confirm the issue is real (not a single user)
   ├── Assess severity (% of users affected, business impact)
   └── Assign an incident commander

2. COMMUNICATE
   ├── Post in the incident channel immediately
   ├── Set a status page update if user-facing
   └── Set an update cadence (every 15-30 min)

3. INVESTIGATE
   ├── Check error tracker for spike in errors
   ├── Check recent deployments
   ├── Check external service status (APIs, CDN, auth provider)
   └── Narrow down affected users/browsers/routes

4. MITIGATE
   ├── Roll back if a deployment is the cause
   ├── Disable feature via feature flag
   └── Apply a targeted hotfix only if rollback is impossible

5. RESOLVE
   ├── Confirm metrics return to baseline
   ├── Close the incident
   └── Schedule a post-mortem

Post-Mortem and Prevention

Every significant production incident deserves a blameless post-mortem. Document:

  • Timeline of detection, investigation, and resolution
  • Root cause (not just the symptom)
  • Impact (users affected, duration, revenue if applicable)
  • What worked well in the response
  • Action items with owners and deadlines

Common preventive actions after frontend incidents:

  • Add a regression test for the exact scenario that broke
  • Improve alerting thresholds so similar issues are caught faster
  • Add canary/staged rollouts so new deployments only reach 5-10% of users first
  • Improve source map uploads if stack traces were unreadable
  • Add synthetic monitoring (Playwright or Cypress in CI + production) for critical user flows

Tools Reference

Category Tool Purpose
Error Tracking Sentry, Bugsnag, Rollbar Capture and group JS exceptions
RUM / Performance Datadog RUM, Sentry Performance, SpeedCurve Real user metrics, Core Web Vitals
Logging Datadog Logs, Logtail, Grafana Loki Structured log aggregation
Feature Flags LaunchDarkly, Unleash, Flagsmith Instant kill switches
Bundle Analysis webpack-bundle-analyzer, vite-bundle-visualizer Identify large dependencies
Cross-browser Testing BrowserStack, LambdaTest, Sauce Labs Test on real devices/browsers
Network Inspection Chrome DevTools, Charles Proxy, Proxyman Inspect HTTP traffic
Deployment GitHub Actions, CircleCI, Vercel, Netlify CI/CD with artifact management
Incident Management PagerDuty, Opsgenie, Linear Alerting and incident tracking

Key Takeaway: Effective production debugging is 80% preparation (observability, source maps, logging) and 20% investigation. The teams that resolve incidents fastest are those who invested in monitoring before the incident occurred.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment