Instrument AI Application
From Black Box to Clear View
Your AI app works. A customer types "find me running shoes under $120", and your assistant answers. But what actually happened in between? Which model ran, what it was told, what it cost, how long it took, whether the answer was any good - and, when something goes wrong, where?
By default, all of that is invisible. An AI app is a black box: you see what goes in and what comes out, and nothing of the journey between. Instrumentation is how you open the box. This guide explains what it is, how it works with Kubit, how you add it, and - most importantly - what it gives you back.
๐ฒ Without instrumentation | ๐ With instrumentation |
|---|---|
You send a question in and get an answer back. Everything in between is a black box - no idea which model ran, what it was told, what it cost, or where it slowed down. | You see the whole chain: retrieve products โ call model โ check inventory โ write answer, each step tagged with its model, latency, tokens, and cost. |
You only see the edges - until instrumentation shows you every step.
What instrumentation actually is
"Instrumentation" sounds heavy. The idea is simple: your app emits a small record of what it did, as it does it. Think of it as a flight recorder for every interaction - quietly noting each step so you can play it back later.
When your shopping assistant handles one request, instrumentation captures that request as a trace: the complete story of that single interaction, start to finish. Inside the trace are spans - one for each step. Retrieving matching products is a span. Calling the language model is a span. Looking up inventory with a tool is a span. Each span records a few attributes - facts about that step, like which model ran, how many tokens it used, how long it took, and the prompt and response themselves. And because real conversations span multiple turns, related traces can be grouped into a session tied to a user.
That's the whole vocabulary: trace (one interaction), span (one step inside it), attribute (a fact about a step), session/user (who, across time). You'll see these exact words throughout Kubit and across the wider observability world - they aren't Kubit inventions, they come from an open standard (more on that below).
Inside one trace - "Find me running shoes under $120":
# | Span (step) | Type | Time | Key attributes |
|---|---|---|---|---|
1 | Retrieve product matches | retrieval | 40 ms | 12 products matched |
2 | Call model ( | generation | 850 ms | input 1,210 tok ยท output 95 tok ยท $0.004 |
3 |
| tool | 60 ms | 9 of 12 in stock |
4 | Write answer to user | generation | 120 ms | - |
One trace, four spans. Each row is a step; the columns are its attributes.
How it works with Kubit
Once your app is instrumented, the records have to go somewhere. With Kubit, the flow is four steps:
Your AI app โ Instrumentation โ Kubit โ Insights
Stage | What happens |
|---|---|
Your AI app | The shopping assistant runs exactly as it does today. |
Instrumentation | Records each step of every request as a trace. |
Kubit | Receives the traces over a secure connection, then cleans and organizes them. |
Insights | You explore them - for debugging and for product analytics. |
Here's the part worth being precise about, because it's what gives you confidence in what Kubit is and isn't doing: the Kubit SDK is a delivery service, not a surveillance system. It does not crawl your codebase, hook into your servers, or quietly capture things you didn't ask it to. It takes the traces your instrumentation already produces and ships them - over an encrypted connection, authenticated with your own key - to Kubit, where they're cleaned up and organized into the views you'll actually use.
In other words: something creates the spans, and the Kubit SDK transports them. That distinction matters for the next section, because what creates the spans is your choice.
How you add it: pick your effort level
There are three ways to get your app emitting spans. They differ mainly in how much code you write - and the Kubit SDK rides underneath all three as the shared transport, so the destination is the same no matter which you choose.
Path | Effort | What you do | Best for |
|---|---|---|---|
Auto-instrumentation | Lowest | Drop in a library - OpenLLMetry, OpenInference, the Vercel AI SDK, or LangChain - that emits spans for you | Most teams; the fastest start |
Use your existing tool | Low | Already trace with Langfuse or Braintrust? Point it at Kubit too | Teams already tracing |
Manual | Highest | Hand-write spans and tag them with the attributes you choose | Custom steps; full control |
All three feed the Kubit SDK, which filters out non-LLM noise, optionally masks sensitive fields, and ships everything securely to Kubit.
Auto-instrumentation is where most teams begin: a library wraps your model calls and emits a span for each one with no changes to your business logic - usually a few lines at startup. If you already use a tracing tool like Langfuse or Braintrust, you don't have to rip it out; those tools speak the same open standard, so you can point them at Kubit as an additional destination and keep your setup running side by side. And for custom steps no library knows about, manual instrumentation lets you write spans yourself - with Kubit's SDK that can be as little as a one-line configure() call plus a span around the operation you want to see.
A closer look: writing spans by hand
Not a developer? You can safely skip to the next section. For everyone else, here's what manual instrumentation actually looks like - it's smaller than you'd think.
Writing a span by hand is three moves: start a span, attach attributes, end it. Anything you wrap in a span becomes a step in Kubit. Here's a single model call, instrumented by hand in Python:
from kubit_otel import configure from opentelemetry import trace # One-time setup at startup: point the SDK at your Kubit workspace. configure(api_key="rg.v1.โฆ", service_name="shopping-assistant") tracer = trace.get_tracer("shopping-assistant") # Around the work you want to see: with tracer.start_as_current_span("recommend_products") as span: span.set_attribute("gen_ai.request.model", "gpt-4o") span.set_attribute("gen_ai.prompt", user_question) span.set_attribute("gen_ai.completion", answer) span.set_attribute("gen_ai.usage.input_tokens", 1210) span.set_attribute("gen_ai.usage.output_tokens", 95)
The with block opens the span and closes it automatically when the work finishes - so timing is recorded for you. The attribute names aren't arbitrary: they follow the GenAI semantic conventions, an open naming standard Kubit understands out of the box. A handful map straight to the fields you see in the product:
You setโฆ | Kubit shows it asโฆ |
|---|---|
| model |
| input |
| output |
| token usage |
| cost |
The same call in TypeScript looks like this - note you call end() yourself instead of using a with block:
import { configure } from "@kubit-ai/otel"; import { trace } from "@opentelemetry/api"; configure({ apiKey: "rg.v1.โฆ", serviceName: "shopping-assistant" }); const tracer = trace.getTracer("shopping-assistant"); const span = tracer.startSpan("recommend_products"); span.setAttribute("gen_ai.request.model", "gpt-4o"); span.setAttribute("gen_ai.prompt", userQuestion); span.setAttribute("gen_ai.completion", answer); span.setAttribute("gen_ai.usage.input_tokens", 1210); span.setAttribute("gen_ai.usage.output_tokens", 95); span.end();
Spans don't have to be model calls. Wrap any step - a database query, a retrieval, a tool call - and tag it with whatever facts matter to you. Nest spans inside one another and you reproduce the trace structure from earlier: a parent for the whole turn, a child for each step.
with tracer.start_as_current_span("assistant_turn") as turn: turn.set_attribute("user.id", user_id) with tracer.start_as_current_span("retrieve_products") as retrieval: products = search_catalog(user_question) retrieval.set_attribute("products.matched", len(products)) with tracer.start_as_current_span("recommend_products") as gen: gen.set_attribute("gen_ai.request.model", "gpt-4o") gen.set_attribute("gen_ai.prompt", user_question) gen.set_attribute("gen_ai.completion", answer)
That's the exact trace from the anatomy table above - built by hand. Notice the custom attributes (user.id, products.matched): they ride along too, ready to filter and segment on inside Kubit. That's the upside of manual instrumentation - you decide precisely what each step records.
You don't need to decide perfectly up front - these mix freely, and you can start with auto-instrumentation and add manual spans later. When you're ready for the step-by-step wiring (install commands, the endpoint, your API key, and code for Python, Node, and more), the OTel integration guide is the hands-on recipe.
What this means for your data
Instrumentation means sending data about your app to a service, so it's fair to ask exactly what that involves. Four things are worth knowing:
It's an open standard - no lock-in. Kubit is built on OpenTelemetry, the vendor-neutral industry standard for telemetry. There's no proprietary black-box agent buried in your app and nothing your team can't inspect. The same instrumentation can point anywhere - you're never trapped.
You control what leaves your infrastructure. You hold the API key and decide what ships. The SDK automatically filters out non-LLM noise, and you can mask or redact sensitive fields before anything leaves your servers. As an added safety net, Kubit applies baseline redaction for common sensitive patterns on its side too.
It won't slow your app down. Instrumentation is asynchronous and batched - it runs off to the side of your app's hot path, not inside the user's request. Your customers won't wait on it, and it won't add latency to your responses.
It captures only what you instrument. A span carries the operation, its timing, the model, the prompt and response, token counts, cost, and any attributes you add - and nothing else. There's no scraping of your wider codebase and no capture of data you didn't choose to record.
Put simply: you decide what to record, you can redact before it ships, it travels securely under your own key, and it stays out of your users' way.
What you actually get
Opening the black box pays off on two levels.
The floor: observability. Because AI is non-deterministic - the same question can produce different answers - you need to see why it behaved the way it did. With traces flowing into Kubit, you can replay any interaction step by step, find the slow span, see what the model was actually told, track cost per request, and catch quality problems before your customers report them. This is the classic value of instrumentation: turning "it's acting weird" into "here's exactly what happened."
The ceiling: product analytics. This is where Kubit goes beyond debugging. Those same traces describe how real people use your AI product - so Kubit turns them into the product-analytics views you'd expect for any modern app: funnels, user journeys, retention, and segments, built on top of your AI interactions. For the shopping assistant, that means questions like: Of everyone who asked the assistant for a recommendation, how many got a useful answer? How many added an item to the cart? How many bought?
Shopping-assistant conversion funnel, built from traces:
Stage | Conversion |
|---|---|
Asked the assistant |
|
Got a useful answer |
|
Added to cart |
|
Purchased |
|
The same traces that help you debug also reveal where users drop off - and a retention view shows who comes back week over week.
Debugging tells you why one interaction behaved a certain way. Product analytics tells you how thousands of them add up to a product that grows - or doesn't. Instrumentation is the one foundation that gives you both.
Get started
You're three steps from the clear view:
Create your Kubit workspace and grab your API key.
Add instrumentation using whichever path fits - auto-instrumentation is the fastest start.
Point it at Kubit and watch your first traces arrive.
When you're ready to wire it up, head to the OTel integration guide for the exact code and configuration. Your black box is about to become a clear view.