Sentry Across the SDLC: Exception Tracking That Actually Helps
I've been running Sentry in production at IBM for a few years now across a mix of Go microservices and Python applications. This post is about how I actually use it — not just "install the SDK and ship it," but how it fits into the whole development lifecycle from local work through CI, staging, and production. There's a lot of tooling advice that describes the happy path. This is more honest than that.
The Problem Sentry Solves
Log files are reactive. A user files a ticket, you SSH into a server (or pull logs from your aggregator), and you grep for something vaguely related to the timestamp in the ticket. If you're lucky, there's a stack trace buried in the output. If you're not, there's a generic error message and you start bisecting your code looking for what could have produced it. This is a miserable way to debug.
Sentry is different in a specific and important way: it captures exceptions at the moment they occur, along with the full stack trace, the state of every variable in the call chain, breadcrumbs leading up to the event (HTTP requests made, database queries run, log messages emitted), the user who was affected, and the environment the code was running in. You don't go looking for the error. The error comes to you, fully annotated.
But the more significant difference — the one that changes how you think about software quality — is that Sentry shows you bugs users haven't reported yet. In a typical app, only a small fraction of affected users actually file support tickets. They just leave, or they work around the problem, or they assume it's their fault. Sentry shows you the full picture: this exception has occurred 847 times in the last 7 days, affecting 134 users, and exactly zero of them opened a ticket. Without Sentry you would have no idea this was happening.
That shift from reactive debugging to proactive awareness is the whole product, really. The SDKs are just the mechanism.
Python SDK Setup
The Python SDK is sentry-sdk from PyPI. Basic initialization looks like this:
import sentry_sdk
from sentry_sdk.integrations.flask import FlaskIntegration
from sentry_sdk.integrations.sqlalchemy import SqlalchemyIntegration
sentry_sdk.init(
dsn="https://your-key@sentry.io/your-project-id",
environment="production",
release="myapp@1.4.2", # or a git SHA — more on this later
integrations=[
FlaskIntegration(),
SqlalchemyIntegration(),
],
traces_sample_rate=0.1,
send_default_pii=False,
before_send=scrub_pii, # your own hook — see below
debug=False,
)
The FlaskIntegration (or DjangoIntegration, same pattern) does several things automatically: it captures unhandled exceptions and attaches the full request context — URL, method, headers, POST data — to every event. It also captures the authenticated user if you're using Flask-Login or Django's auth system. The SqlalchemyIntegration adds database query breadcrumbs, which is enormously useful when you're looking at an exception and want to know what queries ran immediately before it.
traces_sample_rate enables Sentry's performance monitoring — it samples that fraction of requests as full distributed traces. At 0.1 you're capturing 10% of traffic, which is usually enough signal in production without blowing up your quota. In staging I run this at 1.0.
The before_send hook is a function that receives the event dict before it's transmitted. You can inspect it, modify it, or return None to drop the event entirely. Use it to strip PII — passwords, tokens, PII fields from request bodies — before anything leaves your application. This is not optional if you're handling user data.
Go SDK Setup
The Go SDK is github.com/getsentry/sentry-go. Initialization is similar in spirit:
import "github.com/getsentry/sentry-go"
func initSentry() error {
return sentry.Init(sentry.ClientOptions{
Dsn: "https://your-key@sentry.io/your-project-id",
Environment: os.Getenv("APP_ENV"),
Release: os.Getenv("GIT_SHA"),
TracesSampleRate: 0.1,
Debug: false,
})
}
Go doesn't have the automatic framework integrations that Python does, so you wire things up more explicitly. For a gRPC service, I wrap the error handling in an interceptor:
func sentryUnaryInterceptor(
ctx context.Context,
req interface{},
info *grpc.UnaryServerInfo,
handler grpc.UnaryHandler,
) (resp interface{}, err error) {
hub := sentry.GetHubFromContext(ctx)
if hub == nil {
hub = sentry.CurrentHub().Clone()
ctx = sentry.SetHubOnContext(ctx, hub)
}
defer func() {
if r := recover(); r != nil {
hub.RecoverWithContext(ctx, r)
sentry.Flush(2 * time.Second)
panic(r) // re-panic after capturing
}
}()
hub.Scope().SetTag("grpc.method", info.FullMethod)
resp, err = handler(ctx, req)
if err != nil {
hub.CaptureException(err)
}
return resp, err
}
For manual captures elsewhere in the code:
if err := doSomething(); err != nil {
sentry.WithScope(func(scope *sentry.Scope) {
scope.SetUser(sentry.User{ID: userID, Email: userEmail})
scope.SetTag("operation", "payment.process")
scope.AddBreadcrumb(&sentry.Breadcrumb{
Category: "payment",
Message: "attempted charge for order " + orderID,
Level: sentry.LevelInfo,
}, nil)
sentry.CaptureException(err)
})
}
The sentry.Flush(2 * time.Second) call matters in short-lived goroutines and Lambda-style handlers — Sentry sends events asynchronously and you need to give it a moment to drain before the process exits.
Local Development
Most developers I've worked with either disable Sentry locally or never configure it at all. I think this is a mistake. Running Sentry in your local environment means that when you trigger an exception while testing, you get the same rich context you'd have in production: the full trace, the variables, the breadcrumbs. It's faster than reading a stack trace printed to your terminal, and it trains you to look at errors the way your future production self will.
Set environment: "development" and either use a separate dev project in Sentry, or use the same project and filter by environment in the UI. The overhead is negligible locally and the signal is real.
Two settings I always enable locally:
sentry_sdk.init(
dsn="...",
environment="development",
debug=True, # prints what Sentry is sending to stdout
traces_sample_rate=1.0, # capture everything locally
)
debug=True logs each event to stdout as it's sent. This is useful for confirming that your before_send hook is actually running and stripping what you think it's stripping. It also makes it obvious when exceptions are being swallowed somewhere and never reaching Sentry.
For JavaScript frontends, configure source maps in your build pipeline and upload them to Sentry. Without source maps, stack traces in minified JS are useless. The @sentry/webpack-plugin or @sentry/vite-plugin handles this at build time.
In CI
The release field in your Sentry initialization is not just a label. It's the linchpin of Sentry's Releases feature. When you set release to your git SHA or tag, Sentry tracks which issues were first seen in which release and which releases are deployed to which environments.
The important behavior: if you mark an issue as "resolved" in Sentry and then it reappears in a subsequent release, Sentry automatically marks it as "regressed." This is a real quality signal — it means a fix you thought was shipped has broken again, and it surfaces without anyone having to manually track that.
In CI, set the release to the git SHA:
# GitHub Actions example
- name: Build and test
env:
SENTRY_RELEASE: ${{ github.sha }}
SENTRY_ENVIRONMENT: staging
Then use the Sentry CLI to register the release and associate commits:
sentry-cli releases new "$SENTRY_RELEASE"
sentry-cli releases set-commits "$SENTRY_RELEASE" --auto
sentry-cli releases finalize "$SENTRY_RELEASE"
--auto pulls the commit history from the local git repo and uploads it to Sentry. This is what enables commit-level blame in the Sentry UI — when an issue first appears in release abc123, Sentry can show you which commits touched the files in the stack trace. Combined with the GitHub integration (which links directly to the relevant line in your repo), this cuts the time from "Sentry alert" to "found the offending commit" down to seconds.
Staging Environment
Staging is where I'm most aggressive with Sentry's performance monitoring. In production I sample at 10% to control costs and overhead. In staging I run traces_sample_rate=1.0 — full capture on every request — and I simulate realistic load before promoting a release.
What this buys you: Sentry's transaction waterfall will show you exactly where time is going across your service calls. The patterns to look for before hitting production:
- N+1 queries. The SQLAlchemy integration will show individual queries as spans in the transaction. If you're making 50 queries in a loop where one should do, it's visible immediately.
- Slow external calls. HTTP calls to downstream services that are taking 800ms in staging will be 800ms in production too.
- High p95 latency on specific endpoints. The mean looks fine; the 95th percentile is where users are suffering.
I also configure Sentry to alert on new issues in the staging environment before they reach production. A Slack notification for any new issue in staging is low-noise and high-value — it catches regressions during the QA cycle rather than after deployment.
Production
Production configuration is about being deliberate with sampling and obsessive about privacy.
sentry_sdk.init(
dsn="...",
environment="production",
release=os.getenv("GIT_SHA"),
traces_sample_rate=0.1, # 10% of requests
profiles_sample_rate=0.01, # 1% if on Business plan
send_default_pii=False,
before_send=scrub_pii,
)
profiles_sample_rate enables continuous profiling — Sentry captures wall-clock stack traces during sampled transactions. This shows you where CPU time is actually going within a request, which is different from where latency is going. It has real overhead, so keep the rate low unless you're investigating a specific performance problem.
Alert rules I run in production:
- New issue created — Slack notification to the team channel. Not paging, just awareness.
- Issue volume spike — If a single issue generates more than 100 events in 10 minutes, page via PagerDuty. This catches cascading failures early.
- Error rate on specific transactions — Alert if the error rate on
/api/checkoutexceeds 1%. Transaction-level alerts let you catch degradation on critical paths without being buried in noise from lower-priority flows.
The before_send hook in production is non-negotiable for us. We strip anything that looks like a password, API key, or email address from event payloads before transmission. The exact implementation depends on your data model, but the pattern is:
def scrub_pii(event, hint):
# Strip request body fields that may contain PII
if "request" in event and "data" in event["request"]:
sensitive = {"password", "token", "credit_card", "ssn"}
event["request"]["data"] = {
k: "[Filtered]" if k.lower() in sensitive else v
for k, v in event["request"]["data"].items()
}
return event
Issue Triage Workflow
We run a triage review at the start of each sprint. The workflow:
-
New issues this sprint — filter by "first seen: last 7 days." These are regressions or newly exposed code paths. Each one gets looked at by a developer before the sprint planning session ends.
-
Volume x impact ranking — Sentry's default issue list sorts by event count. I also look at "users affected" — an issue affecting 5 users but occurring 1000 times is probably a single bad actor or a retry loop. An issue affecting 400 users occurring 400 times is a real user-facing bug.
-
Ignore or suppress deliberately — Sentry has an "Ignore" status for known issues you're not going to fix immediately (third-party library noise, known edge cases in legacy code). Using this keeps your "unresolved" queue meaningful. An unresolved queue full of noise you're ignoring mentally trains you to stop looking at it.
-
Assign ownership — Sentry supports assigning issues to users or teams. We use this in combination with CODEOWNERS: if a stack trace touches a file owned by a team, Sentry can auto-assign the issue to that team. This eliminates the "whose bug is this" conversation.
-
GitHub/JIRA integration — From the Sentry issue view, one click creates a GitHub issue or JIRA ticket pre-populated with the stack trace, event count, affected users, and relevant commits. This is the link between "Sentry shows us something is wrong" and "it's in the sprint backlog."
Performance Monitoring
Distributed tracing in a polyglot microservices environment is where Sentry earns its keep beyond simple exception tracking. The Python SDK and the Go SDK both support propagating trace context across service boundaries.
In Go, you start a transaction explicitly:
transaction := sentry.StartTransaction(ctx, "order.process")
defer transaction.Finish()
span := transaction.StartChild("db.query")
// ... run your query
span.Finish()
In Python (with Flask), the FlaskIntegration starts transactions automatically for each request. For outgoing HTTP calls, wrapping with sentry_sdk.start_span() adds the span to the current transaction.
When these traces are connected — HTTP headers carry the sentry-trace header from service to service — the Sentry UI shows you a waterfall: the full end-to-end time for a user request, broken down by which service was doing what at each moment. If a particular gRPC call is taking 400ms of a 600ms total response time, that's where you investigate. Without distributed tracing, you know the total is slow but you're guessing about where.
The integration with Sentry's issues view is also useful here: if a transaction fails, the associated error event is linked to the transaction trace. You can go from "this transaction was slow and then errored" to "here's the exact exception that caused it" in two clicks.
What Sentry Doesn't Replace
This is worth saying explicitly because I've seen teams try to use Sentry as their only observability tool, and it doesn't work.
Sentry is exception tracking. It is very good at answering "what is my application code doing wrong." It does not replace:
-
Structured logging. You still need logs for audit trails, operational queries ("show me all requests from this user in the last hour"), and debugging issues that don't produce exceptions. Sentry doesn't capture "normal" application log output — it captures events. Use a log aggregator (Loki, Elasticsearch, CloudWatch Logs) alongside Sentry, not instead of it.
-
Infrastructure metrics. Sentry doesn't tell you CPU utilization, memory pressure, disk I/O, or database connection pool exhaustion. Prometheus and Grafana (or Datadog, or whatever you use) own that layer. When a Sentry alert fires, the infrastructure metrics tell you whether the problem is in your code or in the environment your code is running in.
-
Uptime monitoring. Sentry will tell you about exceptions in running code. It won't tell you your service is down. Run separate synthetic monitors.
Sentry's sweet spot is the gap between "the infrastructure looks fine" and "users are unhappy." That gap is usually filled with bad application code, and Sentry is excellent at surfacing exactly that.
Self-Hosting
We run a self-hosted Sentry instance internally. The open-source version of Sentry (available on GitHub) is fully capable — it includes exception tracking, performance monitoring, releases, alerts, and integrations. We run it because our compliance requirements don't allow sending application event data (which can include stack traces containing business logic and occasionally PII despite our filtering) to third-party SaaS.
Self-hosting Sentry is not trivial — it runs on Docker Compose with a reasonably large set of services (Kafka, Redis, Postgres, ClickHouse, Celery workers, Snuba). Plan for operational overhead. If you don't have compliance constraints, the hosted version at sentry.io is significantly easier to operate and the team can focus on using Sentry rather than running it.
Updated March 2026: Sentry has shipped significant capabilities since this was written. Session Replay (capturing anonymized DOM recordings of user sessions leading up to an error) is now generally available and genuinely useful for frontend issues. Profiling has reached GA and the overhead story has improved considerably. Crons monitoring (formerly called Check-ins) lets you track scheduled job execution and alert on missed runs — a use case Sentry didn't cover well in 2021. The Codecov integration ties test coverage data to the stack traces in error events, showing you whether the code path that errored has test coverage. The core SDK patterns described in this post — initialization, environments, releases, before_send — are unchanged.
