Commit Graph

3 Commits

Author SHA1 Message Date
wizardchen
c12296aa88 feat(observability): instrument agent ReAct loop and tool calls in Langfuse
The existing Langfuse integration covered Chat / Embedding / Rerank / VLM /
ASR generations plus the HTTP + asynq spans, but the agent's own execution
tree was invisible: tool calls never appeared, multi-round ReAct iterations
were flat under the HTTP trace, and there was no single node representing
"one agent run".

This change adds three levels of agent-side spans:

  - agent.execute       — wraps AgentEngine.Execute, records query preview,
                           knowledge bases, allowed tools, final-answer length
                           and totals on finish.
  - agent.round.<N>     — wraps each ReAct iteration; records finish_reason,
                           tool-call count, token usage and duration.
  - agent.tool.<name>   — wraps each tool invocation; records arguments,
                           success, duration, output preview (rune-safe, 4KB
                           cap), error, data keys and image count.

To keep the loop's many exit paths (natural stop, stuck loop, empty-content
retry, final_answer, context cancellation) span-safe, the iteration body was
extracted into runReActIteration with a single defer span.Finish() and an
iterOutcome sentinel driving the outer loop. database_query arguments are
redacted (keys only) to avoid leaking raw SQL into the observability
backend, mirroring the existing UI hint policy.

Adds unit tests for the new helpers (truncateForLangfuse, argKeys, dataKeys,
finishToolSpan nil-safety, iterOutcome.String).
2026-04-24 19:58:08 +08:00
wizardchen
beca2b89a3 feat(observability): extend Langfuse tracing across asynq pipeline
Previously the Langfuse integration only traced in-process HTTP requests
(chat / search / eval), so file uploads and every downstream asynq task
(document parse, chunk embedding, OCR/VLM, summary / question gen, wiki
ingest, datasource sync, etc.) produced either disconnected shallow
traces or no observation at all.

This change threads one trace end-to-end:

- tracer: add SPAN observation type and StartSpan; add ResumeTrace so a
  worker can attach to an upstream trace without emitting a duplicate
  trace-create; StartGeneration now auto-picks parentObservationId from
  ctx so nested trace -> span -> generation trees render correctly.
- types.TracingContext + LangfuseTracingCarrier: embed on all 17 asynq
  payloads so trace_id / parent_obs_id / user_id / session_id serialise
  into every job.
- langfuse.InjectTracing: injected at 28 enqueue sites before json.Marshal
  so the HTTP-layer trace survives the Redis hop.
- langfuse.AsynqMiddleware: mux.Use hook that peeks the payload, either
  resumes the upstream trace or opens a standalone asynq.<type> trace
  for scheduled jobs, and wraps the handler in a SPAN with task metadata
  (id / queue / retry / payload_bytes) plus ERROR level on failure.
- GinMiddleware.shouldTrace: whitelist ingestion / knowledge-mutation /
  FAQ / wiki / datasource endpoints so the root trace actually starts.
- Tests: tracer_test.go covers span nesting, error status, and
  ResumeTrace no-trace-create guarantee; asynq_test.go covers
  InjectTracing round-trip, middleware resume path, and standalone
  trace fallback.
- Docs: docs/Langfuse\u96c6\u6210.md now lists the covered task types
  and documents the cross-process propagation model.

No behavioural change when Langfuse is disabled (all new code paths are
no-ops and carriers serialise to empty strings with omitempty).
2026-04-24 13:16:47 +08:00
wizardchen
492e92580b feat(observability): integrate Langfuse for LLM token tracking and tracing
Closes #620 #497. Add opt-in Langfuse observability covering all five
model types (chat, embedding, rerank, VLM, ASR) with HTTP-request-scoped
traces and Docker Compose support (both cloud and self-hosted).

Core package internal/tracing/langfuse:
- HTTP client with batched async ingestion (non-blocking in request path)
- Sampling, environment / release tagging, and graceful fallback when
  LANGFUSE_* env vars are absent (wrappers become no-ops)
- Gin middleware opens one trace per traced request and finishes it after
  the handler chain returns, attaching method / path / user / session
- Trace context is stored under a typed key exported from internal/types
  so logger.CloneContext can preserve it across handler / goroutine
  boundaries (otherwise each LLM call auto-created an orphan trace,
  fragmenting one request into many)

Per-model generation wrappers (opt-in via NewChat/NewEmbedder/...):
- chat: captures prompt, streaming output, token usage + TTFT
- embedding: approximates tokens when the provider omits usage
- rerank: previews query/docs, summarizes results to keep payload small
- vlm: records image count and total bytes, never uploads raw pixels
- asr: records file size and audio duration, never uploads audio bytes

Async title generation (GenerateTitleAsync) now forwards the trace key
into the goroutine so title calls appear under the parent chat trace.

Docker Compose:
- LANGFUSE_* env passthrough on the `app` service for cloud deployments
- Optional `langfuse` profile spins up a self-hosted Langfuse stack that
  reuses WeKnora's existing PostgreSQL (separate database via an idempotent
  init container that fixes ICU collation drift) and Redis (separate DB
  number), adding only ClickHouse, MinIO, web and worker containers
- web/worker entrypoints URL-encode DB_PASSWORD / REDIS_PASSWORD at start
  to avoid Prisma P1013 when passwords contain @ / # / etc.

Docs: docs/Langfuse集成.md covers cloud vs self-hosted, per-model usage
strategy, code map, and resource footprint.
2026-04-24 10:29:19 +08:00