Masking sensitive data in Kubit OTel SDK
This cookbook shows how to mask sensitive content — PII, secrets, regulated data — from your traces before they leave your application. Masking is supported in both SDKs and the public interface is identical:
Language | Package | Helper import |
|---|---|---|
Python |
|
|
Node.js |
|
|
1. What masking is — and what it is not
A mask is a function you write that runs on every span the SDK is about to export. It receives the span, you mutate it in place to remove or rewrite sensitive fields, and you return it.
Mask is… | Mask is not… |
|---|---|
A transform — rewrites fields on the span | A filter — to drop a span use |
Synchronous — no | A place for I/O, network calls, or remote PII detection |
Opt-in — zero cost if you don't configure one | Applied automatically — you must pass it explicitly |
Local-only — runs in your process | A server-side scrubber — nothing in the cloud touches your payload |
Where it runs in the pipeline
your code emits span │ ▼ filter (drops noise spans) │ ▼ mask (your function) ◀── runs here, on the producer thread │ ▼ batch queue │ ▼ network → Kubit
Two important consequences:
The filter runs first, so your mask is never called on spans that are about to be dropped as noise.
The mask runs before the batch queue, so un-masked spans never sit in memory waiting to be flushed.
2. Configure a mask
Python
from kubit_otel import configure from kubit_otel.mask import set_attr def mask(span): set_attr(span, "gen_ai.prompt", "[REDACTED]") return span configure(api_key="rg.v1.xxx", mask=mask)
You can also pass mask= to attach() (when another OTel-based library has already registered a provider) and to KubitSpanProcessor(...) (when you construct a TracerProvider yourself).
Node.js
configure({ apiKey: "rg.v1.xxx", mask: (span) => { setAttr(span, "gen_ai.prompt", "[REDACTED]"); return span; }, });
You can also pass mask to new KubitSpanProcessor({ ... }) when you build your own NodeTracerProvider.
3. The helper API
Three helpers cover the surface most masks need. They encapsulate the OTel private-attribute access so your code stays portable across Kubit OTel SDK upgrades.
Helper | What it does |
|---|---|
| Overwrite or add an attribute. Works on a span or an event. |
| Remove an attribute. No-op if it isn't there. Works on a span or an event. |
| Iterate the span's events. Return the event to keep it, return |
You do not need to reach into span._attributes (Python) or (span as any).attributes (Node) yourself — the helpers do this for you, correctly.
4. Recipes
Recipe A — Scrub a regex pattern from a prompt attribute
OTel GenAI v1 (and OpenLLMetry / Traceloop) put the prompt on the span as the gen_ai.prompt attribute.
Python
import re from kubit_otel import configure from kubit_otel.mask import set_attr CARD_RE = re.compile(r"\b(?:\d[ -]*?){13,19}\b") def mask(span): prompt = (span.attributes or {}).get("gen_ai.prompt") if isinstance(prompt, str): set_attr(span, "gen_ai.prompt", CARD_RE.sub("[REDACTED CC]", prompt)) return span configure(api_key="rg.v1.xxx", mask=mask)
Node
import { configure } from "@kubit-ai/otel"; import { setAttr } from "@kubit-ai/otel/mask"; const CARD_RE = /\b(?:\d[ -]*?){13,19}\b/g; configure({ apiKey: "rg.v1.xxx", mask: (span) => { const attrs = span.attributes as Record<string, unknown>; const prompt = attrs["gen_ai.prompt"]; if (typeof prompt === "string") { setAttr(span, "gen_ai.prompt", prompt.replace(CARD_RE, "[REDACTED CC]")); } return span; }, });
Recipe B — Drop a whole class of events
OTel GenAI v2 puts user messages, assistant messages, and tool calls on the span as events (one event per message), not attributes. To drop an entire class — say, anything tagged gen_ai.user.message — use mask_events.
Python
from kubit_otel import configure from kubit_otel.mask import mask_events def mask(span): mask_events(span, lambda e: None if e.name == "gen_ai.user.message" else e) return span configure(api_key="rg.v1.xxx", mask=mask)
Node
import { configure } from "@kubit-ai/otel"; import { maskEvents } from "@kubit-ai/otel/mask"; configure({ apiKey: "rg.v1.xxx", mask: (span) => { maskEvents(span, (e) => (e.name === "gen_ai.user.message" ? null : e)); return span; }, });
Recipe C — Rewrite a field on each event
Keep the events, but redact a single attribute on each one.
Python
from kubit_otel import configure from kubit_otel.mask import set_attr, mask_events def scrub_email(event): body = (event.attributes or {}).get("gen_ai.event.content") if isinstance(body, str) and "@" in body: set_attr(event, "gen_ai.event.content", "[REDACTED EMAIL]") return event def mask(span): mask_events(span, scrub_email) return span configure(api_key="rg.v1.xxx", mask=mask)
Node
import { configure } from "@kubit-ai/otel"; import { setAttr, maskEvents } from "@kubit-ai/otel/mask"; configure({ apiKey: "rg.v1.xxx", mask: (span) => { maskEvents(span, (e) => { const body = (e.attributes ?? {})["gen_ai.event.content"]; if (typeof body === "string" && body.includes("@")) { setAttr(e, "gen_ai.event.content", "[REDACTED EMAIL]"); } return e; }); return span; }, });
Recipe D — Remove a request/response body wholesale
Some auto-instrumentation stamps full HTTP request and response bodies on the span. If those are off-limits, delete them.
Python
from kubit_otel.mask import delete_attr def mask(span): delete_attr(span, "http.request.body") delete_attr(span, "http.response.body") return span
Node
import { deleteAttr } from "@kubit-ai/otel/mask"; const mask = (span) => { deleteAttr(span, "http.request.body"); deleteAttr(span, "http.response.body"); return span; };
Recipe E — Allow-list by span name or framework
A mask runs on every span the SDK ships, not just your top-level LLM calls. Child spans, tool-call spans, framework coordination spans, retries - they all flow through it. If you only know the shape of your top-level calls, narrow your work by name or by instrumentation scope.
Python
from kubit_otel.mask import set_attr def mask(span): scope = getattr(span.instrumentation_scope, "name", "") or "" if scope.startswith("opentelemetry.instrumentation.openai"): # only scrub for spans coming from this instrumentor set_attr(span, "gen_ai.prompt", "[REDACTED]") return span
Node
const mask = (span) => { const scope = span.instrumentationScope?.name ?? ""; if (scope.startsWith("@traceloop/")) { setAttr(span, "gen_ai.prompt", "[REDACTED]"); } return span; };
5. What happens when a mask fails
The mask is the only thing standing between sensitive data and the network, so the SDK is fail-closed: if your function raises an exception or returns None / null, the original (un-masked) span never ships.
Instead, a tombstone ships in its place:
The trace structure stays intact: trace ID, span ID, parent, start/end time, kind, name, and resource are all preserved - so child spans still attach to a real parent and the trace tree isn't broken.
All payload-bearing fields are wiped: attributes, events, links, status description
statusis forced toERROR.A single attribute,
kubit.sdk.mask_error, names the cause:the exception class name (e.g.
"AttributeError","TypeError") for a raisethe literal
"returned_none"(Python) or"returned_null"(Node) for a null return
The full exception (with traceback / stack) is logged at error level to the SDK's internal logger (`kubit_otel` in Python, the shared logger in Node). Set KUBIT_OTEL_LOG_LEVEL=error (or lower) to see it.
The practical takeaway: you can find the bug because the failure is logged and is visible in your trace tree, but you can't accidentally leak the data the mask was supposed to redact.
6. Authoring guidelines
A few rules that make masks safer and more maintainable.
Do
Keep the function pure and fast. It runs on the producer thread, on every exported span.
Be defensive about types.
span.attributesandevent.attributescan beNone/undefined, and values can be of any OTel-allowed type (string, number, bool, or homogeneous array).Author for every span shape, or scope your logic with an allow-list (see Recipe E). Don't assume "my LLM spans" — child tool calls, retries, and framework coordination spans all flow through the same mask.
Prefer
delete_attr/deleteAttroverset_attr(..., ""). An empty string still ships and still uses bytes.Unit-test your mask. The helpers work on plain
ReadableSpan/TimedEventinstances you fabricate in a test — you don't have to spin up the whole pipeline.
Don't
Don't do I/O or network calls. Don't call a remote PII detector. Don't await anything — the contract is synchronous and Promises / coroutines will not be awaited.
Don't return
None/nullto mean "drop the span." That's treated as a programming error and produces a tombstone. To drop a span useshould_export_span/shouldExportSpaninstead.Don't reach into private attributes (`span._attributes`,
event._attributes,(span as any).attributes) yourself. The helpers handle this and stay compatible across OTel SDK upgrades.Don't try to rewrite
kubit.sdk.nameorkubit.sdk.version— they are re-stamped after your mask runs.
7. Where masking does and does not apply
Entry point | Inherits masking? |
|---|---|
| ✅ Yes |
| ✅ Yes |
| ✅ Yes |
Bare | ❌ No — apply the helpers in your processor |
Masking lives on the processor so that dropped spans never enter the batch queue. If you've built your own SpanProcessor around the raw exporter, you're responsible for calling the mask helpers in your onEnd / on_end implementation.
8. Verifying it works
The simplest end-to-end check: configure the SDK against an in-memory exporter, end a span, and assert on what came out.
Python
from opentelemetry import trace from opentelemetry.sdk.trace import TracerProvider from opentelemetry.sdk.trace.export import SimpleSpanProcessor from opentelemetry.sdk.trace.export.in_memory_span_exporter import InMemorySpanExporter from kubit_otel import KubitSpanProcessor from kubit_otel.mask import set_attr memory = InMemorySpanExporter() provider = TracerProvider() provider.add_span_processor(SimpleSpanProcessor(memory)) provider.add_span_processor( KubitSpanProcessor( api_key="rg.v1.xxx", mask=lambda s: (set_attr(s, "gen_ai.prompt", "[REDACTED]"), s)[1], ) ) trace.set_tracer_provider(provider) tracer = trace.get_tracer("test") with tracer.start_as_current_span("chat") as span: span.set_attribute("gen_ai.prompt", "secret") assert memory.get_finished_spans()[0].attributes["gen_ai.prompt"] == "[REDACTED]"
Node
import { NodeTracerProvider } from "@opentelemetry/sdk-trace-node"; import { InMemorySpanExporter, SimpleSpanProcessor } from "@opentelemetry/sdk-trace-base"; import { trace } from "@opentelemetry/api"; import { KubitSpanProcessor } from "@kubit-ai/otel"; import { setAttr } from "@kubit-ai/otel/mask"; const memory = new InMemorySpanExporter(); const provider = new NodeTracerProvider({ spanProcessors: [ new SimpleSpanProcessor(memory), new KubitSpanProcessor({ apiKey: "rg.v1.xxx", mask: (s) => { setAttr(s, "gen_ai.prompt", "[REDACTED]"); return s; }, }), ], }); provider.register(); const tracer = trace.getTracer("test"); tracer.startActiveSpan("chat", (span) => { span.setAttribute("gen_ai.prompt", "secret"); span.end(); }); console.log(memory.getFinishedSpans()[0].attributes["gen_ai.prompt"]); // → "[REDACTED]"
9. Not using our SDKs?
If you ship spans via stock OpenTelemetry (Java, Kotlin, Go, Ruby, or just the bare OTLP exporter in Python/Node), you have three good options. Pick based on where you want the redaction to run.
Where it runs | How | Best for |
|---|---|---|
In your app, in code | Wrap the OTLP exporter in a custom | Languages without our SDK; teams that want redaction next to business code |
In an OTel Collector (sidecar / gateway) |
| Standard PII (cards, tokens, emails) managed as YAML, not code |
In an OTel Collector |
| Surgical rewrites: |
Trade-off: the Collector options are language-agnostic and keep app code clean, but spans leave your process before being scrubbed - so the Collector must run inside your trust boundary. If the data must never cross your process boundary, do the redaction in-app (or use one of our SDKs).
Custom SpanProcessor (any language)
The pattern is identical across SDKs — wrap a BatchSpanProcessor, mutate the span in onEnd, fail closed if your scrub raises. Python sketch:
import re from opentelemetry.sdk.trace import SpanProcessor from opentelemetry.sdk.trace.export import BatchSpanProcessor CARD_RE = re.compile(r"\b(?:\d[ -]*?){13,19}\b") class MaskingSpanProcessor(SpanProcessor): def __init__(self, inner: BatchSpanProcessor): self._inner = inner def on_start(self, span, parent_context=None): self._inner.on_start(span, parent_context) def on_end(self, span): try: prompt = (span.attributes or {}).get("gen_ai.prompt") if isinstance(prompt, str): span._attributes["gen_ai.prompt"] = CARD_RE.sub("[REDACTED CC]", prompt) except Exception: return # fail closed: drop rather than leak self._inner.on_end(span) def shutdown(self): self._inner.shutdown() def force_flush(self, t=30000): return self._inner.force_flush(t)
Java equivalent: implement SpanProcessor.onEnding(ReadWriteSpan). Go equivalent: wrap sdktrace.SpanProcessor and mutate via ReadWriteSpan in OnEnd. Two rules to keep from this SDK's design: mask before the batch queue, and fail closed when the scrub raises.
OTel Collector — redaction processor
processors: redaction: allow_all_keys: true # keep keys; only scrub values redact_all_types: true # check numeric attrs too blocked_key_patterns: [".*token.*", ".*api_key.*", ".*password.*"] blocked_values: - "4[0-9]{12}(?:[0-9]{3})?" # Visa - "(5[1-5][0-9]{14})" # MasterCard - "\\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\\.[A-Za-z]{2,}\\b" summary: info exporters: otlphttp/kubit: traces_endpoint: https://otel.kubit.ai/v1/traces headers: { x-api-key: ${env:KUBIT_API_KEY} } service: pipelines: traces: { processors: [redaction, batch], exporters: [otlphttp/kubit] }
allowed_values takes precedence over blocked_values for carve-outs. hash_function: md5|sha1|sha256 swaps the fixed mask for a deterministic hash if you still want to group by the value downstream.
OTel Collector — transform processor (OTTL)
For rewrites the redaction processor can't express:
processors: transform/scrub: error_mode: ignore trace_statements: - delete_key(span.attributes, "http.request.body") - delete_key(span.attributes, "http.response.body") - replace_pattern(span.attributes["process.command_line"], "password\\=[^\\s]*(\\s?)", "password=***") - truncate_all(span.attributes, 4096) - set(span.attributes["gen_ai.prompt"], "[REDACTED]") where instrumentation_scope.name == "my-internal-llm"
redaction and transform compose — run transform first for surgical edits, then redaction for pattern-based scrubbing on whatever survives.
Or: don't put it on the span
The simplest option if your code is the source of the sensitive value: redact before calling setAttribute(...). No new infrastructure, and it defends against bugs in both the mask and the exporter.
10. FAQ
Can I write an async mask?
No. OpenTelemetry's onEnd / on_end contract is synchronous in both SDKs. Promises / coroutines are not awaited and passing one is undefined behaviour.
Can I drop a span from the mask?
No — that's a filter, not a transform. Use should_export_span / shouldExportSpan to drop spans, and use the mask only to rewrite or delete fields on spans that are otherwise kept.
Does the mask see resource attributes or span links?
Not in this release. Neither typically carries PII; if you need them, open a feature request.
Does Kubit scrub anything server-side?
Yes — a small, always-on baseline runs on every ingested record as a safety net for a few high-confidence patterns:
credit-card numbers (Luhn-validated)
US Social Security numbers
well-known API keys and access tokens from common providers (cloud, source-control, chat, LLM vendors, …)
Scope is limited to message-body and tool-call payload fields — the free-text content of prompts, completions, tool arguments, and tool responses. Span identifiers, names, attributes, and other metadata are deliberately left alone so trace structure stays intact.
Treat the baseline as a backstop, not a substitute. The SDK-side mask is still the right place for anything that:
you do not want crossing your process boundary in the first place,
is specific to your domain (internal IDs, customer names, addresses), or
lives on a span attribute rather than a message body.
Does masking add latency to my LLM calls?
Effectively no. The mask runs on the OTel exporter thread after your application code has returned, not on the request path. The only cost it adds is whatever CPU your function itself does.