The existing Langfuse integration covered Chat / Embedding / Rerank / VLM /
ASR generations plus the HTTP + asynq spans, but the agent's own execution
tree was invisible: tool calls never appeared, multi-round ReAct iterations
were flat under the HTTP trace, and there was no single node representing
"one agent run".
This change adds three levels of agent-side spans:
- agent.execute — wraps AgentEngine.Execute, records query preview,
knowledge bases, allowed tools, final-answer length
and totals on finish.
- agent.round.<N> — wraps each ReAct iteration; records finish_reason,
tool-call count, token usage and duration.
- agent.tool.<name> — wraps each tool invocation; records arguments,
success, duration, output preview (rune-safe, 4KB
cap), error, data keys and image count.
To keep the loop's many exit paths (natural stop, stuck loop, empty-content
retry, final_answer, context cancellation) span-safe, the iteration body was
extracted into runReActIteration with a single defer span.Finish() and an
iterOutcome sentinel driving the outer loop. database_query arguments are
redacted (keys only) to avoid leaking raw SQL into the observability
backend, mirroring the existing UI hint policy.
Adds unit tests for the new helpers (truncateForLangfuse, argKeys, dataKeys,
finishToolSpan nil-safety, iterOutcome.String).
Previously the Langfuse integration only traced in-process HTTP requests
(chat / search / eval), so file uploads and every downstream asynq task
(document parse, chunk embedding, OCR/VLM, summary / question gen, wiki
ingest, datasource sync, etc.) produced either disconnected shallow
traces or no observation at all.
This change threads one trace end-to-end:
- tracer: add SPAN observation type and StartSpan; add ResumeTrace so a
worker can attach to an upstream trace without emitting a duplicate
trace-create; StartGeneration now auto-picks parentObservationId from
ctx so nested trace -> span -> generation trees render correctly.
- types.TracingContext + LangfuseTracingCarrier: embed on all 17 asynq
payloads so trace_id / parent_obs_id / user_id / session_id serialise
into every job.
- langfuse.InjectTracing: injected at 28 enqueue sites before json.Marshal
so the HTTP-layer trace survives the Redis hop.
- langfuse.AsynqMiddleware: mux.Use hook that peeks the payload, either
resumes the upstream trace or opens a standalone asynq.<type> trace
for scheduled jobs, and wraps the handler in a SPAN with task metadata
(id / queue / retry / payload_bytes) plus ERROR level on failure.
- GinMiddleware.shouldTrace: whitelist ingestion / knowledge-mutation /
FAQ / wiki / datasource endpoints so the root trace actually starts.
- Tests: tracer_test.go covers span nesting, error status, and
ResumeTrace no-trace-create guarantee; asynq_test.go covers
InjectTracing round-trip, middleware resume path, and standalone
trace fallback.
- Docs: docs/Langfuse\u96c6\u6210.md now lists the covered task types
and documents the cross-process propagation model.
No behavioural change when Langfuse is disabled (all new code paths are
no-ops and carriers serialise to empty strings with omitempty).
Closes#620#497. Add opt-in Langfuse observability covering all five
model types (chat, embedding, rerank, VLM, ASR) with HTTP-request-scoped
traces and Docker Compose support (both cloud and self-hosted).
Core package internal/tracing/langfuse:
- HTTP client with batched async ingestion (non-blocking in request path)
- Sampling, environment / release tagging, and graceful fallback when
LANGFUSE_* env vars are absent (wrappers become no-ops)
- Gin middleware opens one trace per traced request and finishes it after
the handler chain returns, attaching method / path / user / session
- Trace context is stored under a typed key exported from internal/types
so logger.CloneContext can preserve it across handler / goroutine
boundaries (otherwise each LLM call auto-created an orphan trace,
fragmenting one request into many)
Per-model generation wrappers (opt-in via NewChat/NewEmbedder/...):
- chat: captures prompt, streaming output, token usage + TTFT
- embedding: approximates tokens when the provider omits usage
- rerank: previews query/docs, summarizes results to keep payload small
- vlm: records image count and total bytes, never uploads raw pixels
- asr: records file size and audio duration, never uploads audio bytes
Async title generation (GenerateTitleAsync) now forwards the trace key
into the goroutine so title calls appear under the parent chat trace.
Docker Compose:
- LANGFUSE_* env passthrough on the `app` service for cloud deployments
- Optional `langfuse` profile spins up a self-hosted Langfuse stack that
reuses WeKnora's existing PostgreSQL (separate database via an idempotent
init container that fixes ICU collation drift) and Redis (separate DB
number), adding only ClickHouse, MinIO, web and worker containers
- web/worker entrypoints URL-encode DB_PASSWORD / REDIS_PASSWORD at start
to avoid Prisma P1013 when passwords contain @ / # / etc.
Docs: docs/Langfuse集成.md covers cloud vs self-hosted, per-model usage
strategy, code map, and resource footprint.