Masking sensitive data

This cookbook shows how to mask sensitive content — PII, secrets, regulated data — from your traces before they leave your application. Masking is supported in both SDKs and the public interface is identical:

Language	Package	Helper import
Python	`kubit-otel` (≥ 0.7.3)	`from kubit_otel.mask import set_attr, delete_attr, mask_events`
Node.js	`@kubit-ai/otel` (≥ 0.7.3)	`import { setAttr, deleteAttr, maskEvents } from "@kubit-ai/otel/mask"`

1. What masking is — and what it is not

A mask is a function you write that runs on every span the SDK is about to export. It receives the span, you mutate it in place to remove or rewrite sensitive fields, and you return it.

Mask is…	Mask is not…
A transform — rewrites fields on the span	A filter — to drop a span use `should_export_span` / `shouldExportSpan`
Synchronous — no `await`, no Promises	A place for I/O, network calls, or remote PII detection
Opt-in — zero cost if you don't configure one	Applied automatically — you must pass it explicitly
Local-only — runs in your process	A server-side scrubber — nothing in the cloud touches your payload

Where it runs in the pipeline


your code emits span
        │
        ▼
   filter (drops noise spans)
        │
        ▼
   mask (your function) ◀── runs here, on the producer thread
        │
        ▼
   batch queue
        │
        ▼
   network → Kubit

Two important consequences:

The filter runs first, so your mask is never called on spans that are about to be dropped as noise.
The mask runs before the batch queue, so un-masked spans never sit in memory waiting to be flushed.

2. Configure a mask

Python


from kubit_otel import configure
from kubit_otel.mask import set_attr

def mask(span):
    set_attr(span, "gen_ai.prompt", "[REDACTED]")
    return span

configure(api_key="rg.v1.xxx", mask=mask)

You can also pass mask= to attach() (when another OTel-based library has already registered a provider) and to KubitSpanProcessor(...) (when you construct a TracerProvider yourself).

Node.js


configure({
  apiKey: "rg.v1.xxx",
  mask: (span) => {
    setAttr(span, "gen_ai.prompt", "[REDACTED]");
    return span;
  },
});

You can also pass mask to new KubitSpanProcessor({ ... }) when you build your own NodeTracerProvider.

3. The helper API

Three helpers cover the surface most masks need. They encapsulate the OTel private-attribute access so your code stays portable across Kubit OTel SDK upgrades.

Helper	What it does
`set_attr` / `setAttr`	Overwrite or add an attribute. Works on a span or an event.
`delete_attr` / `deleteAttr`	Remove an attribute. No-op if it isn't there. Works on a span or an event.
`mask_events` / `maskEvents`	Iterate the span's events. Return the event to keep it, return `None` / `null` to drop it.

You do not need to reach into span._attributes (Python) or (span as any).attributes (Node) yourself — the helpers do this for you, correctly.

4. Recipes

Recipe A — Scrub a regex pattern from a prompt attribute

OTel GenAI v1 (and OpenLLMetry / Traceloop) put the prompt on the span as the gen_ai.prompt attribute.

Python


import re
from kubit_otel import configure
from kubit_otel.mask import set_attr

CARD_RE = re.compile(r"\b(?:\d[ -]*?){13,19}\b")

def mask(span):
    prompt = (span.attributes or {}).get("gen_ai.prompt")
    if isinstance(prompt, str):
        set_attr(span, "gen_ai.prompt", CARD_RE.sub("[REDACTED CC]", prompt))
    return span

configure(api_key="rg.v1.xxx", mask=mask)

Node


import { configure } from "@kubit-ai/otel";
import { setAttr } from "@kubit-ai/otel/mask";

const CARD_RE = /\b(?:\d[ -]*?){13,19}\b/g;

configure({
  apiKey: "rg.v1.xxx",
  mask: (span) => {
    const attrs = span.attributes as Record<string, unknown>;
    const prompt = attrs["gen_ai.prompt"];
    if (typeof prompt === "string") {
      setAttr(span, "gen_ai.prompt", prompt.replace(CARD_RE, "[REDACTED CC]"));
    }
    return span;
  },
});

Recipe B — Drop a whole class of events

OTel GenAI v2 puts user messages, assistant messages, and tool calls on the span as events (one event per message), not attributes. To drop an entire class — say, anything tagged gen_ai.user.message — use mask_events.

Python


from kubit_otel import configure
from kubit_otel.mask import mask_events

def mask(span):
    mask_events(span, lambda e: None if e.name == "gen_ai.user.message" else e)
    return span

configure(api_key="rg.v1.xxx", mask=mask)

Node


import { configure } from "@kubit-ai/otel";
import { maskEvents } from "@kubit-ai/otel/mask";

configure({
  apiKey: "rg.v1.xxx",
  mask: (span) => {
    maskEvents(span, (e) => (e.name === "gen_ai.user.message" ? null : e));
    return span;
  },
});

Recipe C — Rewrite a field on each event

Keep the events, but redact a single attribute on each one.

Python


from kubit_otel import configure
from kubit_otel.mask import set_attr, mask_events

def scrub_email(event):
    body = (event.attributes or {}).get("gen_ai.event.content")
    if isinstance(body, str) and "@" in body:
        set_attr(event, "gen_ai.event.content", "[REDACTED EMAIL]")
    return event

def mask(span):
    mask_events(span, scrub_email)
    return span

configure(api_key="rg.v1.xxx", mask=mask)

Node


import { configure } from "@kubit-ai/otel";
import { setAttr, maskEvents } from "@kubit-ai/otel/mask";

configure({
  apiKey: "rg.v1.xxx",
  mask: (span) => {
    maskEvents(span, (e) => {
      const body = (e.attributes ?? {})["gen_ai.event.content"];
      if (typeof body === "string" && body.includes("@")) {
        setAttr(e, "gen_ai.event.content", "[REDACTED EMAIL]");
      }
      return e;
    });
    return span;
  },
});

Recipe D — Remove a request/response body wholesale

Some auto-instrumentation stamps full HTTP request and response bodies on the span. If those are off-limits, delete them.

Python


from kubit_otel.mask import delete_attr

def mask(span):
    delete_attr(span, "http.request.body")
    delete_attr(span, "http.response.body")
    return span

Node


import { deleteAttr } from "@kubit-ai/otel/mask";

const mask = (span) => {
  deleteAttr(span, "http.request.body");
  deleteAttr(span, "http.response.body");
  return span;
};

Recipe E — Allow-list by span name or framework

A mask runs on every span the SDK ships, not just your top-level LLM calls. Child spans, tool-call spans, framework coordination spans, retries - they all flow through it. If you only know the shape of your top-level calls, narrow your work by name or by instrumentation scope.

Python


from kubit_otel.mask import set_attr

def mask(span):
    scope = getattr(span.instrumentation_scope, "name", "") or ""
    if scope.startswith("opentelemetry.instrumentation.openai"):
        # only scrub for spans coming from this instrumentor
        set_attr(span, "gen_ai.prompt", "[REDACTED]")
    return span

Node


const mask = (span) => {
  const scope = span.instrumentationScope?.name ?? "";
  if (scope.startsWith("@traceloop/")) {
    setAttr(span, "gen_ai.prompt", "[REDACTED]");
  }
  return span;
};

5. What happens when a mask fails

The mask is the only thing standing between sensitive data and the network, so the SDK is fail-closed: if your function raises an exception or returns None / null, the original (un-masked) span never ships.

Instead, a tombstone ships in its place:

The trace structure stays intact: trace ID, span ID, parent, start/end time, kind, name, and resource are all preserved - so child spans still attach to a real parent and the trace tree isn't broken.
All payload-bearing fields are wiped: attributes, events, links, status description
status is forced to ERROR.
A single attribute, kubit.sdk.mask_error, names the cause:
- the exception class name (e.g. "AttributeError", "TypeError") for a raise
- the literal "returned_none" (Python) or "returned_null" (Node) for a null return

The full exception (with traceback / stack) is logged at error level to the SDK's internal logger (`kubit_otel` in Python, the shared logger in Node). Set KUBIT_OTEL_LOG_LEVEL=error (or lower) to see it.

The practical takeaway: you can find the bug because the failure is logged and is visible in your trace tree, but you can't accidentally leak the data the mask was supposed to redact.

6. Authoring guidelines

A few rules that make masks safer and more maintainable.

Do

Keep the function pure and fast. It runs on the producer thread, on every exported span.
Be defensive about types. span.attributes and event.attributes can be None / undefined, and values can be of any OTel-allowed type (string, number, bool, or homogeneous array).
Author for every span shape, or scope your logic with an allow-list (see Recipe E). Don't assume "my LLM spans" — child tool calls, retries, and framework coordination spans all flow through the same mask.
Prefer delete_attr / deleteAttr over set_attr(..., ""). An empty string still ships and still uses bytes.
Unit-test your mask. The helpers work on plain ReadableSpan / TimedEvent instances you fabricate in a test — you don't have to spin up the whole pipeline.

Don't

Don't do I/O or network calls. Don't call a remote PII detector. Don't await anything — the contract is synchronous and Promises / coroutines will not be awaited.
Don't return None / null to mean "drop the span." That's treated as a programming error and produces a tombstone. To drop a span use should_export_span / shouldExportSpan instead.
Don't reach into private attributes (`span._attributes`, event._attributes, (span as any).attributes) yourself. The helpers handle this and stay compatible across OTel SDK upgrades.
Don't try to rewrite kubit.sdk.name or kubit.sdk.version — they are re-stamped after your mask runs.

7. Where masking does and does not apply

Entry point	Inherits masking?
`configure(..., mask=...)`	✅ Yes
`attach(..., mask=...)` (Python only)	✅ Yes
`KubitSpanProcessor(..., mask=...)`	✅ Yes
Bare `KubitExporter` wrapped in your own `SpanProcessor`	❌ No — apply the helpers in your processor

Masking lives on the processor so that dropped spans never enter the batch queue. If you've built your own SpanProcessor around the raw exporter, you're responsible for calling the mask helpers in your onEnd / on_end implementation.

8. Verifying it works

The simplest end-to-end check: configure the SDK against an in-memory exporter, end a span, and assert on what came out.

Python


from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import SimpleSpanProcessor
from opentelemetry.sdk.trace.export.in_memory_span_exporter import InMemorySpanExporter

from kubit_otel import KubitSpanProcessor
from kubit_otel.mask import set_attr

memory = InMemorySpanExporter()
provider = TracerProvider()
provider.add_span_processor(SimpleSpanProcessor(memory))
provider.add_span_processor(
    KubitSpanProcessor(
        api_key="rg.v1.xxx",
        mask=lambda s: (set_attr(s, "gen_ai.prompt", "[REDACTED]"), s)[1],
    )
)
trace.set_tracer_provider(provider)

tracer = trace.get_tracer("test")
with tracer.start_as_current_span("chat") as span:
    span.set_attribute("gen_ai.prompt", "secret")

assert memory.get_finished_spans()[0].attributes["gen_ai.prompt"] == "[REDACTED]"

Node


import { NodeTracerProvider } from "@opentelemetry/sdk-trace-node";
import { InMemorySpanExporter, SimpleSpanProcessor } from "@opentelemetry/sdk-trace-base";
import { trace } from "@opentelemetry/api";
import { KubitSpanProcessor } from "@kubit-ai/otel";
import { setAttr } from "@kubit-ai/otel/mask";

const memory = new InMemorySpanExporter();
const provider = new NodeTracerProvider({
  spanProcessors: [
    new SimpleSpanProcessor(memory),
    new KubitSpanProcessor({
      apiKey: "rg.v1.xxx",
      mask: (s) => { setAttr(s, "gen_ai.prompt", "[REDACTED]"); return s; },
    }),
  ],
});
provider.register();

const tracer = trace.getTracer("test");
tracer.startActiveSpan("chat", (span) => {
  span.setAttribute("gen_ai.prompt", "secret");
  span.end();
});

console.log(memory.getFinishedSpans()[0].attributes["gen_ai.prompt"]);
// → "[REDACTED]"

9. Not using our SDKs?

If you ship spans via stock OpenTelemetry (Java, Kotlin, Go, Ruby, or just the bare OTLP exporter in Python/Node), you have three good options. Pick based on where you want the redaction to run.

Where it runs	How	Best for
In your app, in code	Wrap the OTLP exporter in a custom `SpanProcessor` whose `onEnd` mutates the span before delegating to a `BatchSpanProcessor`	Languages without our SDK; teams that want redaction next to business code
In an OTel Collector (sidecar / gateway)	`redaction` processor — config-only, pattern-driven	Standard PII (cards, tokens, emails) managed as YAML, not code
In an OTel Collector	`transform` processor with OTTL	Surgical rewrites: `delete_key`, `replace_pattern`, `truncate_all`, conditional `where`

Trade-off: the Collector options are language-agnostic and keep app code clean, but spans leave your process before being scrubbed - so the Collector must run inside your trust boundary. If the data must never cross your process boundary, do the redaction in-app (or use one of our SDKs).

Custom SpanProcessor (any language)

The pattern is identical across SDKs — wrap a BatchSpanProcessor, mutate the span in onEnd, fail closed if your scrub raises. Python sketch:


import re
from opentelemetry.sdk.trace import SpanProcessor
from opentelemetry.sdk.trace.export import BatchSpanProcessor

CARD_RE = re.compile(r"\b(?:\d[ -]*?){13,19}\b")

class MaskingSpanProcessor(SpanProcessor):
    def __init__(self, inner: BatchSpanProcessor):
        self._inner = inner
    def on_start(self, span, parent_context=None):
        self._inner.on_start(span, parent_context)
    def on_end(self, span):
        try:
            prompt = (span.attributes or {}).get("gen_ai.prompt")
            if isinstance(prompt, str):
                span._attributes["gen_ai.prompt"] = CARD_RE.sub("[REDACTED CC]", prompt)
        except Exception:
            return  # fail closed: drop rather than leak
        self._inner.on_end(span)
    def shutdown(self): self._inner.shutdown()
    def force_flush(self, t=30000): return self._inner.force_flush(t)

Java equivalent: implement SpanProcessor.onEnding(ReadWriteSpan). Go equivalent: wrap sdktrace.SpanProcessor and mutate via ReadWriteSpan in OnEnd. Two rules to keep from this SDK's design: mask before the batch queue, and fail closed when the scrub raises.

OTel Collector — `redaction` processor


processors:
  redaction:
    allow_all_keys: true            # keep keys; only scrub values
    redact_all_types: true          # check numeric attrs too
    blocked_key_patterns: [".*token.*", ".*api_key.*", ".*password.*"]
    blocked_values:
      - "4[0-9]{12}(?:[0-9]{3})?"   # Visa
      - "(5[1-5][0-9]{14})"         # MasterCard
      - "\\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\\.[A-Za-z]{2,}\\b"
    summary: info
exporters:
  otlphttp/kubit:
    traces_endpoint: https://otel.kubit.ai/v1/traces
    headers: { x-api-key: ${env:KUBIT_API_KEY} }
service:
  pipelines:
    traces: { processors: [redaction, batch], exporters: [otlphttp/kubit] }

allowed_values takes precedence over blocked_values for carve-outs. hash_function: md5|sha1|sha256 swaps the fixed mask for a deterministic hash if you still want to group by the value downstream.

OTel Collector — `transform` processor (OTTL)

For rewrites the redaction processor can't express:


processors:
  transform/scrub:
    error_mode: ignore
    trace_statements:
      - delete_key(span.attributes, "http.request.body")
      - delete_key(span.attributes, "http.response.body")
      - replace_pattern(span.attributes["process.command_line"], "password\\=[^\\s]*(\\s?)", "password=***")
      - truncate_all(span.attributes, 4096)
      - set(span.attributes["gen_ai.prompt"], "[REDACTED]")
          where instrumentation_scope.name == "my-internal-llm"

redaction and transform compose — run transform first for surgical edits, then redaction for pattern-based scrubbing on whatever survives.

Or: don't put it on the span

The simplest option if your code is the source of the sensitive value: redact before calling setAttribute(...). No new infrastructure, and it defends against bugs in both the mask and the exporter.

10. FAQ

Can I write an async mask?

No. OpenTelemetry's onEnd / on_end contract is synchronous in both SDKs. Promises / coroutines are not awaited and passing one is undefined behaviour.

Can I drop a span from the mask?

No — that's a filter, not a transform. Use should_export_span / shouldExportSpan to drop spans, and use the mask only to rewrite or delete fields on spans that are otherwise kept.

Does the mask see resource attributes or span links?

Not in this release. Neither typically carries PII; if you need them, open a feature request.

Does Kubit scrub anything server-side?

Yes — a small, always-on baseline runs on every ingested record as a safety net for a few high-confidence patterns:

credit-card numbers (Luhn-validated)
US Social Security numbers
well-known API keys and access tokens from common providers (cloud, source-control, chat, LLM vendors, …)

Scope is limited to message-body and tool-call payload fields — the free-text content of prompts, completions, tool arguments, and tool responses. Span identifiers, names, attributes, and other metadata are deliberately left alone so trace structure stays intact.

Treat the baseline as a backstop, not a substitute. The SDK-side mask is still the right place for anything that:

you do not want crossing your process boundary in the first place,
is specific to your domain (internal IDs, customer names, addresses), or
lives on a span attribute rather than a message body.

Does masking add latency to my LLM calls?

Effectively no. The mask runs on the OTel exporter thread after your application code has returned, not on the request path. The only cost it adds is whatever CPU your function itself does.

Masking sensitive data

1. What masking is — and what it is not

Where it runs in the pipeline

2. Configure a mask

Python

Node.js

3. The helper API

4. Recipes

Recipe A — Scrub a regex pattern from a prompt attribute

Python

Node

Recipe B — Drop a whole class of events

Python

Node

Recipe C — Rewrite a field on each event

Python

Recipe D — Remove a request/response body wholesale

Python

Node

Recipe E — Allow-list by span name or framework

Python

Node

5. What happens when a mask fails

6. Authoring guidelines

Do

Don't

7. Where masking does and does not apply

8. Verifying it works

Python

Node

9. Not using our SDKs?

Custom SpanProcessor (any language)

OTel Collector — redaction processor

OTel Collector — transform processor (OTTL)

Or: don't put it on the span

10. FAQ

Can I write an async mask?

Can I drop a span from the mask?

Does the mask see resource attributes or span links?

Does Kubit scrub anything server-side?

Does masking add latency to my LLM calls?

OTel Collector — `redaction` processor

OTel Collector — `transform` processor (OTTL)