- Introduction
- Understanding Production vs Development Environments
- Setting Up Observability Before Issues Arise
- Reproducing Production Bugs Locally
- Debugging JavaScript Errors
- Network and API Debugging
- Performance Debugging
- State Management Debugging
- CSS and Layout Issues in Production
- Cross-Browser and Cross-Device Debugging
- Debugging Deployed Builds
- Incident Response Workflow
- Post-Mortem and Prevention
- Tools Reference
Debugging production issues in frontend applications is one of the most challenging aspects of software engineering. Unlike development environments, production systems face real user traffic, minified code, CDN caching layers, diverse device configurations, and constraints that make bugs hard to reproduce and diagnose.
This guide walks through systematic strategies — from prevention and observability to active debugging and incident response — to help you resolve production issues confidently and quickly.
Before debugging, understand why production behaves differently:
| Factor | Development | Production |
|---|---|---|
| Code | Unminified, source maps available | Minified, often no source maps |
| Environment variables | .env.local, verbose |
Secrets in CI/CD, restricted |
| Build optimizations | None or minimal | Tree shaking, code splitting, caching |
| Error verbosity | Full stack traces | Often swallowed or sanitized |
| Network | Localhost, no CDN | CDN, edge caching, real latency |
| User data | Mocked or seeded | Real, unexpected edge cases |
| Browser/Device | Your machine | Hundreds of configurations |
Understanding this gap is the first step. Many bugs only occur in production because of minification, async race conditions under load, missing environment variables, or third-party integrations behaving differently.
The best time to set up debugging infrastructure is before a production incident. Reactive debugging without observability is like navigating in the dark.
Integrate an error tracking service such as Sentry, Datadog RUM, or Bugsnag into your app. At minimum, capture:
- Unhandled JavaScript exceptions
- Unhandled promise rejections
- Console errors (optionally)
- User context (anonymized session ID, browser, OS)
// Example: Sentry initialization (React)
import * as Sentry from "@sentry/react";
Sentry.init({
dsn: process.env.REACT_APP_SENTRY_DSN,
environment: process.env.NODE_ENV,
release: process.env.REACT_APP_VERSION,
integrations: [new Sentry.BrowserTracing()],
tracesSampleRate: 0.2,
});Tip: Tag errors with a
releaseversion that matches your Git SHA or package version. This lets you track which deployment introduced a regression.
Use Real User Monitoring (RUM) to capture:
- Core Web Vitals (LCP, FID/INP, CLS)
- Time to First Byte (TTFB)
- Long tasks and JavaScript execution time
- Resource load timings
Tools: Sentry Performance, Datadog RUM, SpeedCurve, Grafana Faro.
Avoid console.log in production (it leaks internals and wastes memory). Instead, use a structured logging utility:
const logger = {
info: (msg, meta = {}) => {
if (process.env.NODE_ENV !== "production") console.info(msg, meta);
// Send to log aggregator in production
sendToLogService({ level: "info", message: msg, ...meta });
},
error: (msg, error, meta = {}) => {
console.error(msg, error);
Sentry.captureException(error, { extra: { message: msg, ...meta } });
},
};Production bugs that don't reproduce locally are often caused by:
- Different environment variables — check
.env.productionvs.env.local - Different API endpoints — production APIs may behave differently
- Build-time differences — run
npm run build && npx serve -s buildlocally - Different Node/package versions — use
.nvmrcand lock files (package-lock.json/yarn.lock)
Always test with a production build locally before concluding a bug is environment-specific.
If your app uses feature flags (LaunchDarkly, Unleash, custom toggles), replicate the production flag state in your local environment. A flag enabled for 10% of users might be what's causing the bug for that cohort.
Source maps translate minified production code back into readable source code. Configure your bundler to upload source maps to your error tracker without exposing them publicly:
// vite.config.js
export default {
build: {
sourcemap: true, // Generate source maps
},
};# Upload to Sentry after build (keep maps off the CDN)
npx @sentry/cli releases files $RELEASE upload-sourcemaps ./dist \
--url-prefix '~/assets'Never serve source maps publicly in production — they expose your full application source code.
When reading a production stack trace:
- Identify the topmost frame in your own code — ignore framework internals.
- Look for async boundaries — errors in
Promisechains often show truncated traces. - Check the error message carefully —
Cannot read properties of undefinedusually means a null data issue upstream. - Use breadcrumbs — error trackers capture user actions leading up to the crash.
- Runtime errors (exceptions, crashes): captured by error monitoring automatically.
- Logic errors (wrong data rendered, incorrect calculations, silent failures): require logging, assertions, and user reports to surface.
For logic errors, add invariant checks at critical data boundaries:
function renderUserProfile(user) {
if (!user?.id) {
logger.error("renderUserProfile called with invalid user", new Error("InvalidUser"), { user });
return null;
}
// ...
}Intercept and log all failed network requests:
// Axios interceptor example
axios.interceptors.response.use(
(response) => response,
(error) => {
logger.error("API request failed", error, {
url: error.config?.url,
method: error.config?.method,
status: error.response?.status,
});
return Promise.reject(error);
}
);In your error tracker, attach the request URL, method, status code, and response body (sanitized) to every network error.
CORS errors in production are frequently caused by:
- Forgetting to add the production domain to the API's allowed origins
- HTTP vs HTTPS mismatches
- Missing headers on preflight (
OPTIONS) requests
Check the browser's Network tab for blocked preflight requests and compare Access-Control-Allow-Origin headers between environments.
Use HAR (HTTP Archive) exports from the browser DevTools Network tab to share exact request/response data with backend teams without needing to reproduce the issue live.
Poor Core Web Vitals directly affect user experience and SEO. Use these tools to diagnose them:
| Metric | Tool | Common Causes |
|---|---|---|
| LCP (Largest Contentful Paint) | PageSpeed Insights, WebPageTest | Large images, slow server response, render-blocking resources |
| INP (Interaction to Next Paint) | Chrome DevTools Performance | Long JavaScript tasks, event handler bottlenecks |
| CLS (Cumulative Layout Shift) | Lighthouse | Images without dimensions, dynamically injected content |
In the Chrome DevTools Performance panel:
- Record a page load or interaction.
- Look for Long Tasks (red blocks > 50ms).
- Drill into the flame chart to find the responsible function.
Signs of memory leaks in production: increasing RAM over time, degraded performance after extended use, eventual browser tab crashes.
Diagnose with Chrome DevTools Memory panel:
- Take a heap snapshot baseline.
- Perform the suspected leaking action several times.
- Take another snapshot and compare — look for detached DOM nodes and growing object counts.
Common frontend memory leak sources:
- Event listeners not removed on component unmount
- Intervals/timeouts not cleared
- Global caches that grow unbounded
- Closure references holding large objects
// React: clean up subscriptions and listeners
useEffect(() => {
const handler = (e) => { /* ... */ };
window.addEventListener("resize", handler);
return () => window.removeEventListener("resize", handler); // critical
}, []);Large bundles cause slow initial loads. After a production regression in load time, analyze your bundle:
# Webpack Bundle Analyzer
npx webpack-bundle-analyzer stats.json
# Vite
npx vite-bundle-visualizerLook for:
- Accidental duplication of libraries
- Large dependencies that could be lazy-loaded
- Entire utility libraries imported instead of specific functions (e.g.,
import _ from 'lodash'vsimport debounce from 'lodash/debounce')
State bugs are often invisible in monitoring tools. Strategies to surface them:
Redux / Zustand / Pinia:
- Enable Redux DevTools in development and use time-travel debugging to replay state transitions.
- Log state snapshots around critical user actions in production.
React Query / SWR:
- Check
staleTimeandcacheTimeconfigurations — stale data is a common production-only issue. - Log cache keys and query states when data appears incorrect.
General:
- Instrument state transitions with structured logs that include before/after snapshots (sanitized for PII).
- Add assertions on impossible state combinations:
// Example: assert cart state invariant
if (cart.items.length > 0 && cart.total === 0) {
logger.error("Cart invariant violated: items present but total is zero", new Error("CartInvariant"), { cart });
}CSS bugs that only appear in production are usually caused by:
- CSS purging removing classes that were dynamically computed (e.g., Tailwind's
text-${color}-500pattern) - CSS specificity conflicts with third-party stylesheets loaded in production
- Different browser rendering on devices not tested locally
Debugging approaches:
- Use browser DevTools (inspect element, computed styles) directly on the production URL.
- For purging issues (Tailwind/PurgeCSS), check your
contentorsafelistconfiguration. - For third-party conflicts, use the browser's Styles panel to find which stylesheet is overriding your rules.
- Use BrowserStack or LambdaTest to inspect on real devices and browsers.
Production users don't all use Chrome on a MacBook. Approach cross-environment bugs systematically:
- Identify affected environments from your error tracker's browser/OS breakdown.
- Use
can I useto check if a web API you're using has gaps in affected browsers. - Test on real devices via BrowserStack, Sauce Labs, or physical device labs.
- Check polyfills — confirm that your Babel/transpiler config targets the browsers you're supporting.
For iOS Safari-specific bugs (common due to WebKit's distinct rendering engine):
- Use Safari's Web Inspector via a Mac connected to an iPhone/iPad.
- Be alert to differences in
position: fixed,100vh,scrollevent behavior, and certain CSS properties.
If a bug was introduced in a specific deployment:
- Identify the deploy using your release tracking (Sentry releases, deploy markers in Datadog).
- Diff the commits between the last known-good and the broken release.
- Download the build artifact from CI (GitHub Actions, CircleCI, etc.) and serve it locally to reproduce.
- Bisect if necessary — deploy intermediate commits to a staging environment to isolate the regression.
When a production incident is severe, the fastest fix is a rollback:
| Strategy | Speed | Risk |
|---|---|---|
| Re-deploy last known-good artifact | Fast | Low — exact previous build |
| Revert commit and redeploy | Medium | Low — clean Git history |
| Feature flag disable | Instant | Low — no deployment needed |
| Hotfix branch | Slow | Medium — new code under pressure |
Always prefer re-deploying a previous artifact over a hotfix under pressure. Hotfixes written during incidents often introduce new bugs.
When a production issue is reported, follow a disciplined process:
1. TRIAGE
├── Confirm the issue is real (not a single user)
├── Assess severity (% of users affected, business impact)
└── Assign an incident commander
2. COMMUNICATE
├── Post in the incident channel immediately
├── Set a status page update if user-facing
└── Set an update cadence (every 15-30 min)
3. INVESTIGATE
├── Check error tracker for spike in errors
├── Check recent deployments
├── Check external service status (APIs, CDN, auth provider)
└── Narrow down affected users/browsers/routes
4. MITIGATE
├── Roll back if a deployment is the cause
├── Disable feature via feature flag
└── Apply a targeted hotfix only if rollback is impossible
5. RESOLVE
├── Confirm metrics return to baseline
├── Close the incident
└── Schedule a post-mortem
Every significant production incident deserves a blameless post-mortem. Document:
- Timeline of detection, investigation, and resolution
- Root cause (not just the symptom)
- Impact (users affected, duration, revenue if applicable)
- What worked well in the response
- Action items with owners and deadlines
Common preventive actions after frontend incidents:
- Add a regression test for the exact scenario that broke
- Improve alerting thresholds so similar issues are caught faster
- Add canary/staged rollouts so new deployments only reach 5-10% of users first
- Improve source map uploads if stack traces were unreadable
- Add synthetic monitoring (Playwright or Cypress in CI + production) for critical user flows
| Category | Tool | Purpose |
|---|---|---|
| Error Tracking | Sentry, Bugsnag, Rollbar | Capture and group JS exceptions |
| RUM / Performance | Datadog RUM, Sentry Performance, SpeedCurve | Real user metrics, Core Web Vitals |
| Logging | Datadog Logs, Logtail, Grafana Loki | Structured log aggregation |
| Feature Flags | LaunchDarkly, Unleash, Flagsmith | Instant kill switches |
| Bundle Analysis | webpack-bundle-analyzer, vite-bundle-visualizer | Identify large dependencies |
| Cross-browser Testing | BrowserStack, LambdaTest, Sauce Labs | Test on real devices/browsers |
| Network Inspection | Chrome DevTools, Charles Proxy, Proxyman | Inspect HTTP traffic |
| Deployment | GitHub Actions, CircleCI, Vercel, Netlify | CI/CD with artifact management |
| Incident Management | PagerDuty, Opsgenie, Linear | Alerting and incident tracking |
Key Takeaway: Effective production debugging is 80% preparation (observability, source maps, logging) and 20% investigation. The teams that resolve incidents fastest are those who invested in monitoring before the incident occurred.