WeKnora

pub_soft/WeKnora

Fork 0

mirror of https://github.com/Tencent/WeKnora.git synced 2026-06-04 13:30:32 +08:00

Commit Graph

Author	SHA1	Message	Date
wizardchen	c12296aa88	feat(observability): instrument agent ReAct loop and tool calls in Langfuse The existing Langfuse integration covered Chat / Embedding / Rerank / VLM / ASR generations plus the HTTP + asynq spans, but the agent's own execution tree was invisible: tool calls never appeared, multi-round ReAct iterations were flat under the HTTP trace, and there was no single node representing "one agent run". This change adds three levels of agent-side spans: - agent.execute — wraps AgentEngine.Execute, records query preview, knowledge bases, allowed tools, final-answer length and totals on finish. - agent.round.<N> — wraps each ReAct iteration; records finish_reason, tool-call count, token usage and duration. - agent.tool.<name> — wraps each tool invocation; records arguments, success, duration, output preview (rune-safe, 4KB cap), error, data keys and image count. To keep the loop's many exit paths (natural stop, stuck loop, empty-content retry, final_answer, context cancellation) span-safe, the iteration body was extracted into runReActIteration with a single defer span.Finish() and an iterOutcome sentinel driving the outer loop. database_query arguments are redacted (keys only) to avoid leaking raw SQL into the observability backend, mirroring the existing UI hint policy. Adds unit tests for the new helpers (truncateForLangfuse, argKeys, dataKeys, finishToolSpan nil-safety, iterOutcome.String).	2026-04-24 19:58:08 +08:00
wizardchen	beca2b89a3	feat(observability): extend Langfuse tracing across asynq pipeline Previously the Langfuse integration only traced in-process HTTP requests (chat / search / eval), so file uploads and every downstream asynq task (document parse, chunk embedding, OCR/VLM, summary / question gen, wiki ingest, datasource sync, etc.) produced either disconnected shallow traces or no observation at all. This change threads one trace end-to-end: - tracer: add SPAN observation type and StartSpan; add ResumeTrace so a worker can attach to an upstream trace without emitting a duplicate trace-create; StartGeneration now auto-picks parentObservationId from ctx so nested trace -> span -> generation trees render correctly. - types.TracingContext + LangfuseTracingCarrier: embed on all 17 asynq payloads so trace_id / parent_obs_id / user_id / session_id serialise into every job. - langfuse.InjectTracing: injected at 28 enqueue sites before json.Marshal so the HTTP-layer trace survives the Redis hop. - langfuse.AsynqMiddleware: mux.Use hook that peeks the payload, either resumes the upstream trace or opens a standalone asynq.<type> trace for scheduled jobs, and wraps the handler in a SPAN with task metadata (id / queue / retry / payload_bytes) plus ERROR level on failure. - GinMiddleware.shouldTrace: whitelist ingestion / knowledge-mutation / FAQ / wiki / datasource endpoints so the root trace actually starts. - Tests: tracer_test.go covers span nesting, error status, and ResumeTrace no-trace-create guarantee; asynq_test.go covers InjectTracing round-trip, middleware resume path, and standalone trace fallback. - Docs: docs/Langfuse\u96c6\u6210.md now lists the covered task types and documents the cross-process propagation model. No behavioural change when Langfuse is disabled (all new code paths are no-ops and carriers serialise to empty strings with omitempty).	2026-04-24 13:16:47 +08:00
wizardchen	492e92580b	feat(observability): integrate Langfuse for LLM token tracking and tracing Closes #620 #497. Add opt-in Langfuse observability covering all five model types (chat, embedding, rerank, VLM, ASR) with HTTP-request-scoped traces and Docker Compose support (both cloud and self-hosted). Core package internal/tracing/langfuse: - HTTP client with batched async ingestion (non-blocking in request path) - Sampling, environment / release tagging, and graceful fallback when LANGFUSE_* env vars are absent (wrappers become no-ops) - Gin middleware opens one trace per traced request and finishes it after the handler chain returns, attaching method / path / user / session - Trace context is stored under a typed key exported from internal/types so logger.CloneContext can preserve it across handler / goroutine boundaries (otherwise each LLM call auto-created an orphan trace, fragmenting one request into many) Per-model generation wrappers (opt-in via NewChat/NewEmbedder/...): - chat: captures prompt, streaming output, token usage + TTFT - embedding: approximates tokens when the provider omits usage - rerank: previews query/docs, summarizes results to keep payload small - vlm: records image count and total bytes, never uploads raw pixels - asr: records file size and audio duration, never uploads audio bytes Async title generation (GenerateTitleAsync) now forwards the trace key into the goroutine so title calls appear under the parent chat trace. Docker Compose: - LANGFUSE_* env passthrough on the `app` service for cloud deployments - Optional `langfuse` profile spins up a self-hosted Langfuse stack that reuses WeKnora's existing PostgreSQL (separate database via an idempotent init container that fixes ICU collation drift) and Redis (separate DB number), adding only ClickHouse, MinIO, web and worker containers - web/worker entrypoints URL-encode DB_PASSWORD / REDIS_PASSWORD at start to avoid Prisma P1013 when passwords contain @ / # / etc. Docs: docs/Langfuse集成.md covers cloud vs self-hosted, per-model usage strategy, code map, and resource footprint.	2026-04-24 10:29:19 +08:00

Author

SHA1

Message

Date

wizardchen

c12296aa88

feat(observability): instrument agent ReAct loop and tool calls in Langfuse

The existing Langfuse integration covered Chat / Embedding / Rerank / VLM /
ASR generations plus the HTTP + asynq spans, but the agent's own execution
tree was invisible: tool calls never appeared, multi-round ReAct iterations
were flat under the HTTP trace, and there was no single node representing
"one agent run".

This change adds three levels of agent-side spans:

  - agent.execute       — wraps AgentEngine.Execute, records query preview,
                           knowledge bases, allowed tools, final-answer length
                           and totals on finish.
  - agent.round.<N>     — wraps each ReAct iteration; records finish_reason,
                           tool-call count, token usage and duration.
  - agent.tool.<name>   — wraps each tool invocation; records arguments,
                           success, duration, output preview (rune-safe, 4KB
                           cap), error, data keys and image count.

To keep the loop's many exit paths (natural stop, stuck loop, empty-content
retry, final_answer, context cancellation) span-safe, the iteration body was
extracted into runReActIteration with a single defer span.Finish() and an
iterOutcome sentinel driving the outer loop. database_query arguments are
redacted (keys only) to avoid leaking raw SQL into the observability
backend, mirroring the existing UI hint policy.

Adds unit tests for the new helpers (truncateForLangfuse, argKeys, dataKeys,
finishToolSpan nil-safety, iterOutcome.String).

2026-04-24 19:58:08 +08:00

wizardchen

beca2b89a3

feat(observability): extend Langfuse tracing across asynq pipeline

Previously the Langfuse integration only traced in-process HTTP requests
(chat / search / eval), so file uploads and every downstream asynq task
(document parse, chunk embedding, OCR/VLM, summary / question gen, wiki
ingest, datasource sync, etc.) produced either disconnected shallow
traces or no observation at all.

This change threads one trace end-to-end:

- tracer: add SPAN observation type and StartSpan; add ResumeTrace so a
  worker can attach to an upstream trace without emitting a duplicate
  trace-create; StartGeneration now auto-picks parentObservationId from
  ctx so nested trace -> span -> generation trees render correctly.
- types.TracingContext + LangfuseTracingCarrier: embed on all 17 asynq
  payloads so trace_id / parent_obs_id / user_id / session_id serialise
  into every job.
- langfuse.InjectTracing: injected at 28 enqueue sites before json.Marshal
  so the HTTP-layer trace survives the Redis hop.
- langfuse.AsynqMiddleware: mux.Use hook that peeks the payload, either
  resumes the upstream trace or opens a standalone asynq.<type> trace
  for scheduled jobs, and wraps the handler in a SPAN with task metadata
  (id / queue / retry / payload_bytes) plus ERROR level on failure.
- GinMiddleware.shouldTrace: whitelist ingestion / knowledge-mutation /
  FAQ / wiki / datasource endpoints so the root trace actually starts.
- Tests: tracer_test.go covers span nesting, error status, and
  ResumeTrace no-trace-create guarantee; asynq_test.go covers
  InjectTracing round-trip, middleware resume path, and standalone
  trace fallback.
- Docs: docs/Langfuse\u96c6\u6210.md now lists the covered task types
  and documents the cross-process propagation model.

No behavioural change when Langfuse is disabled (all new code paths are
no-ops and carriers serialise to empty strings with omitempty).

2026-04-24 13:16:47 +08:00

wizardchen

492e92580b

feat(observability): integrate Langfuse for LLM token tracking and tracing

Closes #620 #497. Add opt-in Langfuse observability covering all five
model types (chat, embedding, rerank, VLM, ASR) with HTTP-request-scoped
traces and Docker Compose support (both cloud and self-hosted).

Core package internal/tracing/langfuse:
- HTTP client with batched async ingestion (non-blocking in request path)
- Sampling, environment / release tagging, and graceful fallback when
  LANGFUSE_* env vars are absent (wrappers become no-ops)
- Gin middleware opens one trace per traced request and finishes it after
  the handler chain returns, attaching method / path / user / session
- Trace context is stored under a typed key exported from internal/types
  so logger.CloneContext can preserve it across handler / goroutine
  boundaries (otherwise each LLM call auto-created an orphan trace,
  fragmenting one request into many)

Per-model generation wrappers (opt-in via NewChat/NewEmbedder/...):
- chat: captures prompt, streaming output, token usage + TTFT
- embedding: approximates tokens when the provider omits usage
- rerank: previews query/docs, summarizes results to keep payload small
- vlm: records image count and total bytes, never uploads raw pixels
- asr: records file size and audio duration, never uploads audio bytes

Async title generation (GenerateTitleAsync) now forwards the trace key
into the goroutine so title calls appear under the parent chat trace.

Docker Compose:
- LANGFUSE_* env passthrough on the `app` service for cloud deployments
- Optional `langfuse` profile spins up a self-hosted Langfuse stack that
  reuses WeKnora's existing PostgreSQL (separate database via an idempotent
  init container that fixes ICU collation drift) and Redis (separate DB
  number), adding only ClickHouse, MinIO, web and worker containers
- web/worker entrypoints URL-encode DB_PASSWORD / REDIS_PASSWORD at start
  to avoid Prisma P1013 when passwords contain @ / # / etc.

Docs: docs/Langfuse集成.md covers cloud vs self-hosted, per-model usage
strategy, code map, and resource footprint.

2026-04-24 10:29:19 +08:00

3 Commits