Skip to content
Kubit Guide home
Kubit Guide home

Masking sensitive data in Kubit OTel SDK

This cookbook shows how to mask sensitive content — PII, secrets, regulated data — from your traces before they leave your application. Masking is supported in both SDKs and the public interface is identical:

Language

Package

Helper import

Python

kubit-otel (≥ 0.7.3)

from kubit_otel.mask import set_attr, delete_attr, mask_events

Node.js

@kubit-ai/otel (≥ 0.7.3)

import { setAttr, deleteAttr, maskEvents } from "@kubit-ai/otel/mask"

1. What masking is — and what it is not

A mask is a function you write that runs on every span the SDK is about to export. It receives the span, you mutate it in place to remove or rewrite sensitive fields, and you return it.

Mask is…

Mask is not…

A transform — rewrites fields on the span

A filter — to drop a span use should_export_span / shouldExportSpan

Synchronous — no await, no Promises

A place for I/O, network calls, or remote PII detection

Opt-in — zero cost if you don't configure one

Applied automatically — you must pass it explicitly

Local-only — runs in your process

A server-side scrubber — nothing in the cloud touches your payload

Where it runs in the pipeline

your code emits span │ ▼ filter (drops noise spans) │ ▼ mask (your function) ◀── runs here, on the producer thread │ ▼ batch queue │ ▼ network → Kubit

Two important consequences:

  • The filter runs first, so your mask is never called on spans that are about to be dropped as noise.

  • The mask runs before the batch queue, so un-masked spans never sit in memory waiting to be flushed.

2. Configure a mask

Python

from kubit_otel import configure from kubit_otel.mask import set_attr def mask(span): set_attr(span, "gen_ai.prompt", "[REDACTED]") return span configure(api_key="rg.v1.xxx", mask=mask)

You can also pass mask= to attach() (when another OTel-based library has already registered a provider) and to KubitSpanProcessor(...) (when you construct a TracerProvider yourself).

Node.js

configure({ apiKey: "rg.v1.xxx", mask: (span) => { setAttr(span, "gen_ai.prompt", "[REDACTED]"); return span; }, });

You can also pass mask to new KubitSpanProcessor({ ... }) when you build your own NodeTracerProvider.

3. The helper API

Three helpers cover the surface most masks need. They encapsulate the OTel private-attribute access so your code stays portable across Kubit OTel SDK upgrades.

Helper

What it does

set_attr / setAttr

Overwrite or add an attribute. Works on a span or an event.

delete_attr / deleteAttr

Remove an attribute. No-op if it isn't there. Works on a span or an event.

mask_events / maskEvents

Iterate the span's events. Return the event to keep it, return None / null to drop it.

You do not need to reach into span._attributes (Python) or (span as any).attributes (Node) yourself — the helpers do this for you, correctly.

4. Recipes

Recipe A — Scrub a regex pattern from a prompt attribute

OTel GenAI v1 (and OpenLLMetry / Traceloop) put the prompt on the span as the gen_ai.prompt attribute.

Python

import re from kubit_otel import configure from kubit_otel.mask import set_attr CARD_RE = re.compile(r"\b(?:\d[ -]*?){13,19}\b") def mask(span): prompt = (span.attributes or {}).get("gen_ai.prompt") if isinstance(prompt, str): set_attr(span, "gen_ai.prompt", CARD_RE.sub("[REDACTED CC]", prompt)) return span configure(api_key="rg.v1.xxx", mask=mask)

Node

import { configure } from "@kubit-ai/otel"; import { setAttr } from "@kubit-ai/otel/mask"; const CARD_RE = /\b(?:\d[ -]*?){13,19}\b/g; configure({ apiKey: "rg.v1.xxx", mask: (span) => { const attrs = span.attributes as Record<string, unknown>; const prompt = attrs["gen_ai.prompt"]; if (typeof prompt === "string") { setAttr(span, "gen_ai.prompt", prompt.replace(CARD_RE, "[REDACTED CC]")); } return span; }, });

Recipe B — Drop a whole class of events

OTel GenAI v2 puts user messages, assistant messages, and tool calls on the span as events (one event per message), not attributes. To drop an entire class — say, anything tagged gen_ai.user.message — use mask_events.

Python

from kubit_otel import configure from kubit_otel.mask import mask_events def mask(span): mask_events(span, lambda e: None if e.name == "gen_ai.user.message" else e) return span configure(api_key="rg.v1.xxx", mask=mask)

Node

import { configure } from "@kubit-ai/otel"; import { maskEvents } from "@kubit-ai/otel/mask"; configure({ apiKey: "rg.v1.xxx", mask: (span) => { maskEvents(span, (e) => (e.name === "gen_ai.user.message" ? null : e)); return span; }, });

Recipe C — Rewrite a field on each event

Keep the events, but redact a single attribute on each one.

Python

from kubit_otel import configure from kubit_otel.mask import set_attr, mask_events def scrub_email(event): body = (event.attributes or {}).get("gen_ai.event.content") if isinstance(body, str) and "@" in body: set_attr(event, "gen_ai.event.content", "[REDACTED EMAIL]") return event def mask(span): mask_events(span, scrub_email) return span configure(api_key="rg.v1.xxx", mask=mask)

Node

import { configure } from "@kubit-ai/otel"; import { setAttr, maskEvents } from "@kubit-ai/otel/mask"; configure({ apiKey: "rg.v1.xxx", mask: (span) => { maskEvents(span, (e) => { const body = (e.attributes ?? {})["gen_ai.event.content"]; if (typeof body === "string" && body.includes("@")) { setAttr(e, "gen_ai.event.content", "[REDACTED EMAIL]"); } return e; }); return span; }, });

Recipe D — Remove a request/response body wholesale

Some auto-instrumentation stamps full HTTP request and response bodies on the span. If those are off-limits, delete them.

Python

from kubit_otel.mask import delete_attr def mask(span): delete_attr(span, "http.request.body") delete_attr(span, "http.response.body") return span

Node

import { deleteAttr } from "@kubit-ai/otel/mask"; const mask = (span) => { deleteAttr(span, "http.request.body"); deleteAttr(span, "http.response.body"); return span; };

Recipe E — Allow-list by span name or framework

A mask runs on every span the SDK ships, not just your top-level LLM calls. Child spans, tool-call spans, framework coordination spans, retries - they all flow through it. If you only know the shape of your top-level calls, narrow your work by name or by instrumentation scope.

Python

from kubit_otel.mask import set_attr def mask(span): scope = getattr(span.instrumentation_scope, "name", "") or "" if scope.startswith("opentelemetry.instrumentation.openai"): # only scrub for spans coming from this instrumentor set_attr(span, "gen_ai.prompt", "[REDACTED]") return span

Node

const mask = (span) => { const scope = span.instrumentationScope?.name ?? ""; if (scope.startsWith("@traceloop/")) { setAttr(span, "gen_ai.prompt", "[REDACTED]"); } return span; };

5. What happens when a mask fails

The mask is the only thing standing between sensitive data and the network, so the SDK is fail-closed: if your function raises an exception or returns None / null, the original (un-masked) span never ships.

Instead, a tombstone ships in its place:

  • The trace structure stays intact: trace ID, span ID, parent, start/end time, kind, name, and resource are all preserved - so child spans still attach to a real parent and the trace tree isn't broken.

  • All payload-bearing fields are wiped: attributes, events, links, status description

  • status is forced to ERROR.

  • A single attribute, kubit.sdk.mask_error, names the cause:

    • the exception class name (e.g. "AttributeError", "TypeError") for a raise

    • the literal "returned_none" (Python) or "returned_null" (Node) for a null return

The full exception (with traceback / stack) is logged at error level to the SDK's internal logger (`kubit_otel` in Python, the shared logger in Node). Set KUBIT_OTEL_LOG_LEVEL=error (or lower) to see it.

The practical takeaway: you can find the bug because the failure is logged and is visible in your trace tree, but you can't accidentally leak the data the mask was supposed to redact.

6. Authoring guidelines

A few rules that make masks safer and more maintainable.

Do

  • Keep the function pure and fast. It runs on the producer thread, on every exported span.

  • Be defensive about types. span.attributes and event.attributes can be None / undefined, and values can be of any OTel-allowed type (string, number, bool, or homogeneous array).

  • Author for every span shape, or scope your logic with an allow-list (see Recipe E). Don't assume "my LLM spans" — child tool calls, retries, and framework coordination spans all flow through the same mask.

  • Prefer delete_attr / deleteAttr over set_attr(..., ""). An empty string still ships and still uses bytes.

  • Unit-test your mask. The helpers work on plain ReadableSpan / TimedEvent instances you fabricate in a test — you don't have to spin up the whole pipeline.

Don't

  • Don't do I/O or network calls. Don't call a remote PII detector. Don't await anything — the contract is synchronous and Promises / coroutines will not be awaited.

  • Don't return None / null to mean "drop the span." That's treated as a programming error and produces a tombstone. To drop a span use should_export_span / shouldExportSpan instead.

  • Don't reach into private attributes (`span._attributes`, event._attributes, (span as any).attributes) yourself. The helpers handle this and stay compatible across OTel SDK upgrades.

  • Don't try to rewrite kubit.sdk.name or kubit.sdk.version — they are re-stamped after your mask runs.

7. Where masking does and does not apply

Entry point

Inherits masking?

configure(..., mask=...)

✅ Yes

attach(..., mask=...) (Python only)

✅ Yes

KubitSpanProcessor(..., mask=...)

✅ Yes

Bare KubitExporter wrapped in your own SpanProcessor

❌ No — apply the helpers in your processor

Masking lives on the processor so that dropped spans never enter the batch queue. If you've built your own SpanProcessor around the raw exporter, you're responsible for calling the mask helpers in your onEnd / on_end implementation.

8. Verifying it works

The simplest end-to-end check: configure the SDK against an in-memory exporter, end a span, and assert on what came out.

Python

from opentelemetry import trace from opentelemetry.sdk.trace import TracerProvider from opentelemetry.sdk.trace.export import SimpleSpanProcessor from opentelemetry.sdk.trace.export.in_memory_span_exporter import InMemorySpanExporter from kubit_otel import KubitSpanProcessor from kubit_otel.mask import set_attr memory = InMemorySpanExporter() provider = TracerProvider() provider.add_span_processor(SimpleSpanProcessor(memory)) provider.add_span_processor( KubitSpanProcessor( api_key="rg.v1.xxx", mask=lambda s: (set_attr(s, "gen_ai.prompt", "[REDACTED]"), s)[1], ) ) trace.set_tracer_provider(provider) tracer = trace.get_tracer("test") with tracer.start_as_current_span("chat") as span: span.set_attribute("gen_ai.prompt", "secret") assert memory.get_finished_spans()[0].attributes["gen_ai.prompt"] == "[REDACTED]"

Node

import { NodeTracerProvider } from "@opentelemetry/sdk-trace-node"; import { InMemorySpanExporter, SimpleSpanProcessor } from "@opentelemetry/sdk-trace-base"; import { trace } from "@opentelemetry/api"; import { KubitSpanProcessor } from "@kubit-ai/otel"; import { setAttr } from "@kubit-ai/otel/mask"; const memory = new InMemorySpanExporter(); const provider = new NodeTracerProvider({ spanProcessors: [ new SimpleSpanProcessor(memory), new KubitSpanProcessor({ apiKey: "rg.v1.xxx", mask: (s) => { setAttr(s, "gen_ai.prompt", "[REDACTED]"); return s; }, }), ], }); provider.register(); const tracer = trace.getTracer("test"); tracer.startActiveSpan("chat", (span) => { span.setAttribute("gen_ai.prompt", "secret"); span.end(); }); console.log(memory.getFinishedSpans()[0].attributes["gen_ai.prompt"]); // → "[REDACTED]"

9. Not using our SDKs?

If you ship spans via stock OpenTelemetry (Java, Kotlin, Go, Ruby, or just the bare OTLP exporter in Python/Node), you have three good options. Pick based on where you want the redaction to run.

Where it runs

How

Best for

In your app, in code

Wrap the OTLP exporter in a custom SpanProcessor whose onEnd mutates the span before delegating to a BatchSpanProcessor

Languages without our SDK; teams that want redaction next to business code

In an OTel Collector (sidecar / gateway)

redaction processor — config-only, pattern-driven

Standard PII (cards, tokens, emails) managed as YAML, not code

In an OTel Collector

transform processor with OTTL

Surgical rewrites: delete_key, replace_pattern, truncate_all, conditional where

Trade-off: the Collector options are language-agnostic and keep app code clean, but spans leave your process before being scrubbed - so the Collector must run inside your trust boundary. If the data must never cross your process boundary, do the redaction in-app (or use one of our SDKs).

Custom SpanProcessor (any language)

The pattern is identical across SDKs — wrap a BatchSpanProcessor, mutate the span in onEnd, fail closed if your scrub raises. Python sketch:

import re from opentelemetry.sdk.trace import SpanProcessor from opentelemetry.sdk.trace.export import BatchSpanProcessor CARD_RE = re.compile(r"\b(?:\d[ -]*?){13,19}\b") class MaskingSpanProcessor(SpanProcessor): def __init__(self, inner: BatchSpanProcessor): self._inner = inner def on_start(self, span, parent_context=None): self._inner.on_start(span, parent_context) def on_end(self, span): try: prompt = (span.attributes or {}).get("gen_ai.prompt") if isinstance(prompt, str): span._attributes["gen_ai.prompt"] = CARD_RE.sub("[REDACTED CC]", prompt) except Exception: return # fail closed: drop rather than leak self._inner.on_end(span) def shutdown(self): self._inner.shutdown() def force_flush(self, t=30000): return self._inner.force_flush(t)

Java equivalent: implement SpanProcessor.onEnding(ReadWriteSpan). Go equivalent: wrap sdktrace.SpanProcessor and mutate via ReadWriteSpan in OnEnd. Two rules to keep from this SDK's design: mask before the batch queue, and fail closed when the scrub raises.

OTel Collector — redaction processor

processors: redaction: allow_all_keys: true # keep keys; only scrub values redact_all_types: true # check numeric attrs too blocked_key_patterns: [".*token.*", ".*api_key.*", ".*password.*"] blocked_values: - "4[0-9]{12}(?:[0-9]{3})?" # Visa - "(5[1-5][0-9]{14})" # MasterCard - "\\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\\.[A-Za-z]{2,}\\b" summary: info exporters: otlphttp/kubit: traces_endpoint: https://otel.kubit.ai/v1/traces headers: { x-api-key: ${env:KUBIT_API_KEY} } service: pipelines: traces: { processors: [redaction, batch], exporters: [otlphttp/kubit] }

allowed_values takes precedence over blocked_values for carve-outs. hash_function: md5|sha1|sha256 swaps the fixed mask for a deterministic hash if you still want to group by the value downstream.

OTel Collector — transform processor (OTTL)

For rewrites the redaction processor can't express:

processors: transform/scrub: error_mode: ignore trace_statements: - delete_key(span.attributes, "http.request.body") - delete_key(span.attributes, "http.response.body") - replace_pattern(span.attributes["process.command_line"], "password\\=[^\\s]*(\\s?)", "password=***") - truncate_all(span.attributes, 4096) - set(span.attributes["gen_ai.prompt"], "[REDACTED]") where instrumentation_scope.name == "my-internal-llm"

redaction and transform compose — run transform first for surgical edits, then redaction for pattern-based scrubbing on whatever survives.

Or: don't put it on the span

The simplest option if your code is the source of the sensitive value: redact before calling setAttribute(...). No new infrastructure, and it defends against bugs in both the mask and the exporter.

10. FAQ

Can I write an async mask?

No. OpenTelemetry's onEnd / on_end contract is synchronous in both SDKs. Promises / coroutines are not awaited and passing one is undefined behaviour.

Can I drop a span from the mask?

No — that's a filter, not a transform. Use should_export_span / shouldExportSpan to drop spans, and use the mask only to rewrite or delete fields on spans that are otherwise kept.

Not in this release. Neither typically carries PII; if you need them, open a feature request.

Does Kubit scrub anything server-side?

Yes — a small, always-on baseline runs on every ingested record as a safety net for a few high-confidence patterns:

  • credit-card numbers (Luhn-validated)

  • US Social Security numbers

  • well-known API keys and access tokens from common providers (cloud, source-control, chat, LLM vendors, …)

Scope is limited to message-body and tool-call payload fields — the free-text content of prompts, completions, tool arguments, and tool responses. Span identifiers, names, attributes, and other metadata are deliberately left alone so trace structure stays intact.

Treat the baseline as a backstop, not a substitute. The SDK-side mask is still the right place for anything that:

  • you do not want crossing your process boundary in the first place,

  • is specific to your domain (internal IDs, customer names, addresses), or

  • lives on a span attribute rather than a message body.

Does masking add latency to my LLM calls?

Effectively no. The mask runs on the OTel exporter thread after your application code has returned, not on the request path. The only cost it adds is whatever CPU your function itself does.