docs(cli): README / AGENTS.md / CHANGELOG + CI parity test

Wire-contract documentation and the CI check that keeps it honest. * cli/README.md gains a verbatim --help block (top-level + subtrees), an Exit codes table covering 0/1/2/3/4/5/6/7/10/124/130, a "Status vs check" verb-pair subtable, and a "doc wait" paragraph spelling out the four exit codes (0 / 1 / 124 / 130). The api passthrough note trims storage provider out of the deep-config list now that kb create --storage-provider is a polished flag. * cli/AGENTS.md becomes the contributor guide: build/test, CRUD flag conventions, the status/check verb pattern, long-poll wait commands, the SetAgentHelp pattern, and a full Error code reference with 35 typed codes mapped to namespaces, exit codes, retryable / hint guidance. Reference section is bracketed by HTML markers so a CI parity test can keep it in sync with AllCodes(). * cli/internal/cmdutil/errors_doc_test.go enforces parity: every code in AllCodes() must appear in AGENTS.md inside the markers, and AGENTS.md must not reference codes that no longer exist. Fails CI if a new typed code is added without documentation. * CHANGELOG.md gets the v0.6 entry: BREAKING (--json / --no-stream / WEKNORA_SDK_DEBUG / kb create --name), Added (--format / --jq / doc wait / --log-level / kb-and-agent status & check / multi-id delete / api --paginate / MCP schema extension / SetAgentHelp / signal-aware ctx / kb create --storage-provider / new operation.* namespace), Changed (multi-id partial-failure exit code, doc upload FlagError, --log-level FlagError, multi-id stdout cleanup, README / AGENTS.md changes), with a Migration from v0.5 section walking every BREAKING through its v0.6 replacement.
2026-06-04 13:30:32 +08:00 · 2026-05-18 01:38:42 +08:00
parent 34bb0b5096
commit 7611d59d71
9 changed files with 388 additions and 68 deletions
--- a/cli/AGENTS.md
+++ b/cli/AGENTS.md
@@ -1,6 +1,8 @@
 # AGENTS.md

-WeKnora CLI (`weknora`) is a noun-verb wrapper around the WeKnora server API; module path `github.com/Tencent/WeKnora/cli`. This file is the developer guide for coding agents and human contributors editing the CLI. The user-facing wire contract (output shape, exit codes, error format) lives in [README.md](README.md).
+This is the WeKnora CLI (`weknora`), a command-line client for the WeKnora RAG server. The module path is `github.com/Tencent/WeKnora/cli`.
+
+The wire contract for AI agents *consuming* `weknora` output (JSON shape, exit codes, error format) lives in [README.md](README.md) — read that if you're integrating with the CLI binary, not modifying it.

 ## Build, Test, and Lint

@@ -21,8 +23,8 @@ Entry point: `cmd/main.go` → `cmd.Execute()` → `cmd.NewRootCmd(cmdutil.New()

 Key packages:

- `cmd/<noun>/` — cobra command implementations, one subdir per noun
- `internal/cmdutil/` — `Factory`, `JSONOptions`, typed `Error`, exit-code mapping, destructive-write confirm, KB id-or-name resolve
+- `cmd/<name>/` — cobra command implementations, one subdir per top-level command
+- `internal/cmdutil/` — `Factory`, `FormatOptions`, typed `Error`, exit-code mapping, destructive-write confirm, KB id-or-name resolve
 - `internal/format/` — bare JSON emitter (`WriteJSON` / `WriteJSONFiltered`)
 - `internal/iostreams/` — global IO singleton + TTY detection + `SetForTest` swap
 - `internal/secrets/` — `Store` interface; `KeyringStore` primary, `FileStore` 0600 fallback, `MemStore` for tests
@@ -48,8 +50,8 @@ Every command follows this structure (see `cmd/kb/list.go`):

 1. `Options` struct with flag-bound fields
 2. `Service` interface declaring only the SDK methods this command calls. `*sdk.Client` satisfies it implicitly via duck typing.
-3. `NewCmd<Verb>(f *cmdutil.Factory) *cobra.Command` constructor — flag registration + `cmdutil.AddJSONFlags`
-4. Separate `run<Verb>(ctx, opts, jopts, svc, args...)` with the business logic — the test injection point
+3. `NewCmd<Verb>(f *cmdutil.Factory) *cobra.Command` constructor — flag registration + `cmdutil.AddFormatFlag`
+4. Separate `run<Verb>(ctx, opts, fopts, svc, args...)` with the business logic — the test injection point

 Key rules:

@@ -64,31 +66,35 @@ Use a Go raw string with `weknora` as the example prefix. Keep one-line `Short`

 ```go
 Example: `  weknora kb view <id>
-  weknora kb view kb_abc --json
-  weknora kb view kb_abc --json=id,name`,
+  weknora kb view kb_abc --format json
+  weknora kb view kb_abc --format json --jq '{id, name}'`,
 ```

 ### JSON Output

-Add `--json` / `--jq` via `cmdutil.AddJSONFlags(cmd, fieldNames)`. In `RunE`:
+Add `--format` / `--jq` via `cmdutil.AddFormatFlag(cmd, fieldNames...)`. In `RunE`:

 ```go
-if jopts.Enabled() {
-    return jopts.Emit(iostreams.IO.Out, result)
+fopts, err := cmdutil.CheckFormatFlag(c)
+if err != nil { return err }
+fopts.ResolveDefault(iostreams.IO.IsStdoutTTY())
+// ...
+if fopts.WantsJSON() {
+    return fopts.Emit(iostreams.IO.Out, result)
 }
 ```

-`Emit` is the single source for the bare-JSON contract — it honors `--json=fields,...` projection and `--jq <expr>` filtering. Never call `format.WriteJSON*` directly from a command. See `cmd/kb/list.go`.
+`Emit` is the single source for the bare-JSON contract — it honors `--format json|ndjson` and `--jq <expr>` filtering. Never call `format.WriteJSON*` directly from a command. See `cmd/kb/list.go`.

 ### Destructive Writes

-Commands that delete / empty / overwrite call `cmdutil.ConfirmDestructive(p, opts.Yes, jopts.Enabled(), what, id)` before mutation. In non-TTY OR `--json` mode without `-y`, it returns `CodeInputConfirmationRequired` → exit 10. See `internal/cmdutil/confirm.go`.
+Commands that delete / empty / overwrite call `cmdutil.ConfirmDestructive(p, opts.Yes, fopts.WantsJSON(), what, id)` before mutation. In non-TTY OR JSON-output mode without `-y`, it returns `CodeInputConfirmationRequired` → exit 10. See `internal/cmdutil/confirm.go`.

 ## Testing

 ### Narrow Service Fakes

-Each command's `runX(ctx, opts, jopts, svc, ...)` takes its interface, not `*sdk.Client`. Tests inject plain-struct fakes:
+Each command's `runX(ctx, opts, fopts, svc, ...)` takes its interface, not `*sdk.Client`. Tests inject plain-struct fakes:

 ```go
 type fakeBarSvc struct {
@@ -165,6 +171,76 @@ Errors print to STDERR via `cmdutil.PrintError(w, err)` as `code: msg\nhint: ...

 User-facing exit-code mapping lives in [README.md "Exit codes"](README.md#exit-codes). When adding a new `ErrorCode` constant, also append to `AllCodes()` so the acceptance contract picks it up.

+## Error code reference
+
+> **Audience:** AI agents and scripted callers parsing `weknora` stderr.
+> Code authors writing new error sites — see [`## Error Handling`](#error-handling) above.
+
+When `weknora` exits non-zero, stderr carries a structured triplet:
+
+```
+<code>: <message>
+hint: <actionable next step>
+```
+
+Agents parse the first colon to extract the typed code. The exit code class (see [`README.md` "Exit codes"](README.md#exit-codes)) controls retry / surface decisions; the typed code disambiguates within a class.
+
+<!-- ERROR_REFERENCE_START -->
+<!-- DO NOT EDIT manually below this marker. Add new codes to
+     internal/cmdutil/errors.go + register in AllCodes() + add a row here.
+     The CI scan in errors_doc_test.go enforces parity. -->
+
+| Code | Exit | Retryable | Default hint |
+|---|---|---|---|
+| `auth.unauthenticated` | 3 | no (run `auth login`) | run `weknora auth login` |
+| `auth.token_expired` | 3 | yes (after refresh) | your session expired; run `weknora auth login` to re-authenticate |
+| `auth.bad_credential` | 3 | no (re-login) | run `weknora auth login` |
+| `auth.forbidden` | 3 | no | active context lacks permission for this resource |
+| `auth.cross_tenant_blocked` | 3 | no | verify tenant context with `weknora auth status` |
+| `auth.tenant_mismatch` | 3 | no | verify tenant context with `weknora auth status` |
+| `input.invalid_argument` | 5 | no | see `weknora <command> --help` for valid usage |
+| `input.missing_flag` | 5 | no | see `weknora <command> --help` for valid usage |
+| `input.confirmation_required` | 10 | **NO automatic retry** | high-risk write - re-run with `-y/--yes` after the user explicitly approves |
+| `resource.not_found` | 4 | no | verify the resource ID and try again |
+| `resource.already_exists` | 1 | no | use a different name or fetch the existing resource |
+| `resource.locked` | 1 | maybe (transient lock) | (no canonical hint; check resource state) |
+| `server.error` | 7 | yes (with backoff for 5xx) | (no canonical hint) |
+| `server.timeout` | 7 | yes (with backoff) | request timed out; retry, or run `weknora doctor` to check connectivity |
+| `server.rate_limited` | 6 | yes (back off, then retry) | rate-limited; retry after a few seconds |
+| `server.session_create_failed` | 1 | yes (with backoff) | could not create a chat session; pass `--session` to reuse an existing session |
+| `server.incompatible_version` | 7 | no (upgrade required) | run `weknora doctor` to see version skew details |
+| `network.error` | 7 | yes (with backoff) | check base URL reachability with `weknora doctor` |
+| `operation.timeout` | 124 | yes (raise `--timeout`) | wait timed out; raise `--timeout` or check the underlying job |
+| `operation.failed` | 1 | no (target reached terminal failure) | one or more targets reached a terminal failure (e.g. doc parse_status=failed) |
+| `operation.cancelled` | 1 (main overrides to 130) | no | command interrupted by SIGINT / SIGTERM. The typed code maps to exit 1, but `main` raises the exit to 130 when the root context was signal-cancelled so the user-visible exit follows Unix signal convention. |
+| `local.config_corrupt` | 1 | no (manual fix) | remove `~/.config/weknora/config.yaml` and re-run `weknora auth login` |
+| `local.context_not_found` | 1 | no | (no canonical hint; check `weknora context list`) |
+| `local.file_io` | 1 | no | check file permissions under `$XDG_CONFIG_HOME/weknora/` |
+| `local.kb_id_required` | 1 | no | run `weknora link` to bind this directory to a knowledge base, or pass `--kb` |
+| `local.kb_not_found` | 1 | no | list available with `weknora kb list` |
+| `local.keychain_denied` | 1 | no (system-level) | verify keyring access; falls back to file storage |
+| `local.project_link_corrupt` | 1 | no | remove `.weknora/project.yaml` and run `weknora link` again |
+| `local.sse_stream_aborted` | 1 | yes (rerun chat / agent invoke) | the streaming answer was cut off mid-flight; retry, or pass `--format json` to buffer the full response |
+| `local.unimplemented` | 1 | no | (planned in a future release) |
+| `local.upload_file_not_found` | 1 | no | verify the path is correct and readable |
+| `local.user_aborted` | 1 | no (user said no) | no action taken; pass `-y/--yes` to skip the confirmation prompt |
+| `mcp.readonly_mode` | 1 | no | MCP tool surface is read-only; mutations not exposed in this mode |
+| `mcp.schema_unknown_command` | 1 | no | (no canonical hint) |
+| `mcp.tool_not_allowed` | 1 | no | MCP tool not in the curated allowlist |
+
+<!-- ERROR_REFERENCE_END -->
+
+### Agent decision shortcuts
+
+For common retry patterns, agents can hardcode:
+
+- `network.*` → retry with exponential backoff
+- `auth.token_expired` → run `weknora auth refresh`, then retry once
+- `server.rate_limited` → back off (Retry-After if present) then retry
+- `operation.timeout` → raise `--timeout` and retry, or surface to user
+- `input.confirmation_required` → **NEVER** auto-pass `-y` without explicit user authorization
+- `*.invalid_argument` / `*.missing_flag` → surface to user (don't retry)
+
 ## MCP Tool Surface

 WeKnora's MCP server exposes a curated read-only tool surface. Many MCP servers in the wild ship write / mutation operations on by default and rely on credential-scope or sandbox restrictions for safety. WeKnora opts for curation instead: the server side doesn't yet enforce per-token scope, so an agent holding a user's token has full write access. Until server-side scope ships, the CLI keeps mutation tools out of the MCP surface as a belt-and-braces second line of defense. When server scope arrives this stance can loosen.
@@ -199,7 +275,7 @@ Before specifying any CLI command, do this in order:

 Rationale: earlier drafts produced three categories of schema errors — fields that didn't exist on the underlying SDK, wrong field counts in user-facing docs, and missing pagination flags — that all stemmed from "design from convention, not from SDK." The fix is canonical: the SDK schema is the ground truth; convention decides names and shapes around that ground truth.

-## CRUD command flag canon
+## CRUD command flag conventions

 CRUD commands follow the **hard-required-flags** pattern: every required input is a flag or positional, and a missing one yields an immediate `input.invalid_argument` exit. The contrast is **TTY-prompts-fill**, where missing input opens an interactive prompt; that pattern is reserved for `auth login` (the one command where a human must be at the terminal).

@@ -216,3 +292,39 @@ Reasons hard-required-flags is the v0.5+ default:
 - Agent-friendly: MCP callers do not stall waiting for stdin prompts.
 - Consistent with every existing non-auth WeKnora command.

+- **Agent help blob (v0.6, partial)**: Commands MAY call
+  `cmdutil.SetAgentHelp(cmd, cmdutil.AgentHelp{...})` to expose a stable
+  JSON used_for / required_flags / examples / output shape. Activated by
+  `WEKNORA_AGENT_HELP=1` at `--help` time. Currently applied to `chat`
+  and `kb list` only — extending to another command requires touching
+  only that command's `NewCmd`.
+
+## Status / check verb pair pattern
+
+When a resource has both a cheap "is it alive?" probe and a deeper
+"verify its dependencies / aggregate state" probe, expose them as two
+verbs so the verb itself communicates cost:
+
+- `status <id>` — single HTTP, returns reachable + cheap fields.
+- `check  <id>` — 1 + N HTTP, adds derived state that needs follow-up
+  calls (e.g., aggregating `failed_count` via doc-list page-walk,
+  probing every KB in an agent's scope).
+
+Current pairs: `kb status` / `kb check`, `agent status` / `agent check`.
+The deep verb's `Long` help text must enumerate the extra HTTP calls so
+cost is predictable.
+
+## Long-poll wait commands
+
+`doc wait <doc-id> [<doc-id>...]` is the model for any future
+`wait` command:
+
+- Always wait-all on multi-target (no fail-fast flag); compose in shell
+  (`wait id1 && wait id2`) when fail-fast is needed.
+- Exponential backoff with jitter (initial `--interval`, cap 15s).
+- Concurrency capped (5 in flight); large fan-out via `xargs -P`.
+- Exit-code priority: failed (1) > timeout (124) > completed (0). The
+  failed bucket is `operation.failed`, not `server.error` — a target's
+  own terminal failure is not a transient transport issue.
+- Validate `--format` / `--jq` before polling so an invalid flag does
+  not cost the caller a multi-minute poll.
--- a/cli/CHANGELOG.md
+++ b/cli/CHANGELOG.md
@@ -12,6 +12,140 @@ CLI history before v0.3 is recorded in the project root

 ## [Unreleased]

+### v0.6 — agent runtime hardening: --format, doc wait, --log-level, status, multi-id delete, paginate
+
+#### BREAKING (v0.5 → v0.6)
+- **`--json` flag removed** → use **`--format json`** (with optional
+  `--jq '<expr>'` for projection / filtering). The v0.5 `--json=fields,...`
+  per-field projection drops entirely; rewrite as
+  `--format json --jq '.[] | {id, name}'` (jq is the canonical projection
+  mechanism going forward).
+- **`--no-stream` flag removed** on `chat` / `agent invoke` → use
+  **`--format json`** to buffer the full answer before printing. The bare
+  text-accumulate use case (TTY but no streaming) is dropped.
+- **`WEKNORA_SDK_DEBUG=1` env removed** → use **`WEKNORA_LOG_LEVEL=debug`**.
+- **`kb create --name <name>` flag removed** → use positional
+  **`kb create <name>`** (consistent with `agent create <name>`).
+
+#### Added
+- **`--format text|json|ndjson`** flag selecting the stdout serialization.
+  Registered per-command (only commands that honor `--format` register it;
+  others reject it with `unknown flag` / exit 2). Output mode auto-resolves
+  to `text` on a TTY and `json` when stdout is piped, so
+  `weknora kb list | jq` works without an explicit flag.
+- **`--jq '<expr>'`** flag pairs with `--format json|ndjson` to filter or
+  project the JSON output via a jq expression.
+- **`weknora doc wait <id> [<id>...]`** — block until every document reaches a
+  terminal `parse_status`. Always wait-all — use shell composition
+  (`wait id1 && wait id2`) for fail-fast.
+  - `--timeout DURATION` (default 10m; exit 124 on hit)
+  - `--interval DURATION` (default 2s; exponential backoff to 15s + jitter)
+  - Multi-id concurrent (max 5 parallel); exit code priority 1 > 124 > 0
+- **`--log-level error|warn|info|debug`** persistent flag + `WEKNORA_LOG_LEVEL`
+  env. Wires into the SDK's debug logger via the additive
+  `client.SetDebugLevel(level string)` function.
+- **`kb create --storage-provider <local|minio|cos|tos|s3|oss|ks3>`** —
+  sets the new KB's `storage_provider_config.provider` at creation time
+  (server only accepts it on create, not update). Required on self-hosted
+  deployments where the server-side default doesn't pre-populate a
+  provider — without it, subsequent `doc upload` returns `kb not found`.
+- **`weknora kb status <id>`** — fast health snapshot (1 HTTP). Returns
+  reachable / counts / is_processing.
+- **`weknora kb check <id>`** — deep verification: status fields + `failed_count`
+  aggregated via doc list page-walk (1 + N HTTP). The verb split between
+  `status` (read state cheaply) and `check` (actively verify) communicates
+  cost to the caller.
+- **`weknora agent status <id>`** — fast health snapshot (1 HTTP):
+  reachable / model_id.
+- **`weknora agent check <id>`** — deep verification: status fields +
+  `kb_scope_all_reachable` from probing each KB in scope (1 + N HTTP). Same
+  status/check verb split as kb status/check.
+- **`weknora doc delete <doc-id> [<doc-id>...]`** — positional multi-id.
+  Default keep-going on failure. Single `-y/--yes` confirms the entire
+  batch; non-TTY without `-y` still exits 10.
+- **`weknora session delete <session-id> [<session-id>...]`** — positional
+  multi-id with the same keep-going semantics as `doc delete`.
+- **`weknora chunk delete <chunk-id> [<chunk-id>...] --doc <doc-id>`** — positional
+  multi-id, all chunks share the same `--doc` parent (server route requires it).
+- **`weknora api <path> --paginate`** — follows weknora's offset-based
+  pagination (`?page=N&page_size=M`) and merges all pages into a single
+  `{data, total}` JSON response.
+- **MCP `chat` and `agent_invoke` tools** output schemas extended with
+  `thinking` / `tool_calls` / `assistant_message_id`. Tool descriptions
+  callout "server-side accumulated, NOT streaming" (MCP tools/call has
+  no standard partial-response).
+- **`SetAgentHelp` pattern** — `cmdutil.SetAgentHelp(cmd, AgentHelp{...})`
+  exposes a stable JSON used_for / required_flags / examples / output
+  shape, activated by `WEKNORA_AGENT_HELP=1` at `--help` time. Applied
+  to `chat` and `kb list` as proof-of-pattern; extending to another
+  command requires touching only that command's `NewCmd`.
+- **`cli/AGENTS.md`** gains an "Error code reference" section (35 typed
+  codes + exit codes + retryable / hint), with `<!-- ERROR_REFERENCE_START -->`
+  markers and CI parity test (`errors_doc_test.go`) — every new typed
+  code in `AllCodes()` must be documented or CI fails.
+- New `operation.*` typed error namespace for CLI-level wait/poll outcomes:
+  - `operation.timeout` → exit 124 (distinct from `server.timeout` → exit 7;
+    matches the convention from GNU `timeout(1)`). Used by `doc wait` and
+    any future CLI-level wait/poll surfaces.
+  - `operation.failed` → exit 1. Emitted when one or more wait targets
+    reach a terminal failure (`doc wait` finds `parse_status=failed`) or
+    when multi-id `delete` rolls up partial failures. Distinct from
+    `server.error` because the failure is the target's own terminal state,
+    not a transient transport issue — `server.error`'s "retry with backoff"
+    hint would be misleading.
+  - `operation.cancelled` → exit 1, raised to **130** by `main.go` when the
+    root context was signal-cancelled. Surfaced by chat / agent invoke /
+    doc wait on Ctrl-C or SIGTERM. Carries a hint pointing at the signal,
+    not at `-y/--yes` (which would have been the misleading
+    `local.user_aborted` hint).
+- **Signal-aware root context** — `main.go` wires `signal.NotifyContext` for
+  SIGINT and SIGTERM so long-running commands observe `ctx.Done()` and run
+  their cancellation cleanup (re-emit auto-created session id, return
+  `operation.cancelled`); the process exits 130 whenever the context was
+  signal-cancelled, matching Unix signal convention.
+- **MCP tool input renames for consistency**: `doc_view` and `doc_download`
+  now accept `doc_id` (was `knowledge_id`) so every MCP tool that
+  references a document uses the same parameter name as `chunk_list` and
+  the CLI's `<doc-id>` positional.
+- `WriteNDJSON` helper in `internal/format/` (per http://ndjson.org:
+  arrays split per-line, single records emit one line).
+
+#### Changed
+- `cli/README.md` "Exit codes" subsection extended with `124`
+  (`operation.timeout`); rows for `1` and `130` now name `operation.failed`
+  and `operation.cancelled` alongside the existing groupings.
+- `cli/README.md` gains a "Status / check verb pair" subtable under "Health
+  check" and a `doc wait` paragraph with full exit-code list (0/1/124/130).
+- `cli/AGENTS.md` gains design SOPs for **Status / check verb pair pattern**
+  and **Long-poll wait commands**, plus a note on the SetAgentHelp pattern
+  and current coverage (chat / kb list).
+- **Multi-id delete partial-failure exit code**: `doc delete` /
+  `session delete` / `chunk delete` (multi-id mode) now exit `1`
+  (`operation.failed`) when some targets fail, rather than exit `7`
+  (`server.error`). The retry-with-backoff hint for server.* would have
+  misled callers when the actual cause is a target's terminal state.
+- **`doc upload` with no path / no `--from-url`** now exits `2`
+  (`FlagError`, matching cobra's `MinimumNArgs` convention for commands
+  that need a positional), rather than `5` (`input.invalid_argument`).
+- **`--log-level` invalid value** exits `2` (`FlagError`) for consistency
+  with `--format` invalid-value behaviour. Env values still fall through
+  silently (env is best-effort).
+- **Multi-id delete stdout contract**: pre-flight failures (e.g. missing
+  `-y` confirmation) no longer emit the empty `{ok, failed}` envelope to
+  stdout — stdout stays empty per the wire contract in README.md, the
+  typed error goes to stderr only.
+- **Positional id help strings now namespaced** for clarity in both human
+  help and agent `--help` parsing: `<id>` → `<kb-id>` / `<doc-id>` /
+  `<session-id>` on kb / doc / session subtrees. `agent` and `chunk`
+  subtrees were already namespaced. Pure help-text change — argument
+  parsing is unchanged.
+- `chat "<text>"` Use string now shows quotes — matches `agent invoke` and
+  `search chunks` quoting hint for queries that contain spaces.
+
+#### SDK additions (strictly additive)
+- `client.SetDebugLevel(level string)` — programmatic control over the SDK's
+  internal slog debug logger.
+
 ### v0.5 — agent CRUD, chunk subtree, MCP chunk_list, audit-driven cleanup

 #### Added
--- a/cli/README.md
+++ b/cli/README.md
@@ -6,8 +6,9 @@ ask streaming RAG questions from your terminal or from an AI agent.

 ```bash
 $ weknora --help
-WeKnora CLI lets you authenticate, browse knowledge bases, and run
-hybrid searches against a WeKnora server from your shell or an AI agent.
+Command-line client for the WeKnora RAG server. Manage knowledge bases
+and documents, run hybrid search, chat with grounded answers, or expose
+a curated read-only MCP tool surface for AI agents.

 Available Commands:
  agent       Manage and invoke custom agents
@@ -29,9 +30,8 @@ Available Commands:
  version     Show CLI build metadata
 ```

-The command surface follows a `<noun> <verb>` convention. The wire
-contract for AI agents is documented [below](#wire-contract). For
-contributing to the CLI source, see [AGENTS.md](AGENTS.md).
+The wire contract for AI agents is documented [below](#wire-contract).
+For contributing to the CLI source, see [AGENTS.md](AGENTS.md).

 ---

@@ -70,8 +70,9 @@ weknora kb list
 # 4. Bind this directory to a knowledge base — subsequent commands auto-resolve --kb
 weknora link --kb my-knowledge-base

-# 5. Upload a document
+# 5. Upload a document, then block until parsing finishes
 weknora doc upload notes.md
+weknora doc wait doc_abc                          # exit 0 completed, 1 failed, 124 --timeout, 130 ^C

 # 6. Search
 weknora search chunks "what is reciprocal rank fusion?"
@@ -85,6 +86,12 @@ weknora agent invoke ag_abc "what's our q4 retention plan?"

 # 9. Inspect a document's chunks for RAG retrieval debug
 weknora chunk list --doc doc_xyz
+
+# 10. Health & verification verbs
+weknora kb status kb_abc       # fast snapshot: reachable / counts / processing flag (1 HTTP)
+weknora kb check kb_abc        # deep verify: also aggregates failed_count via doc list (1+N HTTP)
+weknora agent status ag_abc    # fast: reachable / model_id
+weknora agent check ag_abc     # deep: probes every KB in the agent's scope
 ```

 ---
@@ -124,28 +131,28 @@ changes announced in the changelog and the corresponding

 ### Streams

- **stdout** is the data channel: bare JSON with `--json`, or
+- **stdout** is the data channel: bare JSON with `--format json`, or
  human-formatted output. Never carries error text.
 - **stderr** is logs, progress, warnings, and errors. A non-empty
  stderr does **not** mean failure — read the exit code.

 ### JSON output

-Every command supports `--json`, emitting bare JSON for the resource it
-produces — an array for `list` / `search`, a single object for `view`
-and write outcomes:
+Every command supports `--format json`, emitting bare JSON for the
+resource it produces — an array for `list` / `search`, a single object
+for `view` and write outcomes:

 ```bash
-weknora kb list --json                        # [{ "id": "kb_x", "name": "Eng" }, …]
-weknora kb view kb_x --json                   # { "id": "kb_x", "name": "Eng", … }
-weknora kb list --json=id,name                # project to listed fields
-weknora kb list --json --jq '.[].id'          # jq over the bare data
+weknora kb list --format json                              # [{ "id": "kb_x", "name": "Eng" }, …]
+weknora kb view kb_x --format json                         # { "id": "kb_x", "name": "Eng", … }
+weknora kb list --format json --jq '.[] | {id, name}'      # project to listed fields
+weknora kb list --format json --jq '.[].id'                # jq over the bare data
 ```

-Note the `=` form for projection: pflag's optional-value parser treats
-space-separated arguments after a bare `--json` as positionals, so
-`--json id,name` would be interpreted as bare `--json` + the positional
-`id,name`. Always use `--json=field,...`.
+`--format ndjson` is also accepted for streaming list commands; each
+element is emitted as its own JSON line. When stdout is not a TTY (pipe
+or redirect), `--format json` is the default — running `weknora kb list
+| jq` works without an explicit flag.

 ### Errors

@@ -166,14 +173,15 @@ hint: run `weknora auth login`

 The full code registry is in `cli/internal/cmdutil/errors.go`
 (`AllCodes()`). Code namespaces: `auth.*` / `resource.*` / `input.*` /
-`server.*` / `network.*` / `local.*` / `mcp.*`.
+`server.*` / `network.*` / `local.*` / `mcp.*` / `operation.*` (CLI-level
+wait/poll outcomes: `operation.timeout`, `operation.failed`, `operation.cancelled`).

 ### Exit codes

 | Code | Meaning | Agent action |
 |---|---|---|
 | `0`   | success                                                | continue |
-| `1`   | typed `local.*` or unclassified                        | read stderr, decide retry/abort |
+| `1`   | typed `local.*` / `operation.failed` / unclassified    | read stderr, decide retry/abort |
 | `2`   | flag / argument validation error                       | re-check `weknora <cmd> --help` |
 | `3`   | `auth.*` (token missing / expired / forbidden)         | re-auth, then retry |
 | `4`   | `resource.not_found`                                   | verify the resource id |
@@ -181,7 +189,8 @@ The full code registry is in `cli/internal/cmdutil/errors.go`
 | `6`   | `server.rate_limited`                                  | back off, retry |
 | `7`   | `server.*` / `network.*`                               | transient — retry with backoff |
 | `10`  | **`input.confirmation_required`** (high-risk write)    | ask the human, retry with `-y` only after explicit approval |
-| `130` | cancelled (SIGINT / Ctrl-C)                            | stop, do not retry |
+| `124` | `operation.timeout` (e.g. `doc wait --timeout` reached) | raise `--timeout` or check the underlying job |
+| `130` | `operation.cancelled` (SIGINT / SIGTERM)               | stop, do not retry |

 **Exit 10** is the wire-level signal for "destructive write needs
 explicit confirmation". Pass `-y/--yes` on `kb delete` / `kb empty` /
@@ -192,10 +201,10 @@ is the guard against unintended writes.

 ### Other agent ergonomics

- For chat / agent invoke in agent contexts, prefer `--no-stream --json`
-  — streaming tokens to stdout makes JSON parsing impossible.
- `--json` composes with the global `--context <name>` for single-shot
-  context overrides without disk writes.
+- For chat / agent invoke in agent contexts, pass `--format json` —
+  streaming tokens to stdout makes JSON parsing impossible.
+- `--format json` composes with the global `--context <name>` for
+  single-shot context overrides without disk writes.
 - `weknora mcp serve` exposes a curated read-only tool surface over
  stdio MCP for any MCP-compatible client.

@@ -209,9 +218,11 @@ targets common workflows, not 1:1 API parity. Examples of deep
 operations that intentionally go through `weknora api`:

 - **Tuning a KB's nested config** — chunking strategy, summary model,
-  multimodal extraction defaults, FAQ thresholds, VLM model, storage
-  provider. Use `weknora api PUT /api/v1/knowledge-bases/<id> --input -`
-  with a JSON body matching the server's `UpdateKnowledgeBaseRequest`.
+  multimodal extraction defaults, FAQ thresholds, VLM model. Use
+  `weknora api PUT /api/v1/knowledge-bases/<id> --input -` with a JSON
+  body matching the server's `UpdateKnowledgeBaseRequest`. (Note: the
+  storage provider is set once at create time via
+  `kb create --storage-provider <name>` and is not updatable.)
 - **Per-request `chat` parameters** — multi-KB scope, summary model
  override, image attachments, web search toggle. Use `weknora api POST
  /api/v1/knowledge-chat/<session-id> --input -`.
@@ -229,9 +240,25 @@ operations that intentionally go through `weknora api`:

 Run `weknora doctor` for a 4-status diagnostic (OK / warn / fail /
 skip) covering base URL reachability, authentication, server-CLI
-version skew, and credential storage backend. Add `--json` for
+version skew, and credential storage backend. Add `--format json` for
 machine-readable output, `--offline` to skip network checks.

+For per-resource verification, the `status` / `check` verb pair gives
+a fast vs deep choice:
+
+| Verb | Cost | Use |
+|---|---|---|
+| `weknora kb status <kb-id>`     | 1 HTTP    | live counts / processing flag |
+| `weknora kb check <kb-id>`      | 1+N HTTP  | adds `failed_count` via doc-list page-walk |
+| `weknora agent status <agent-id>` | 1 HTTP  | reachable / model_id |
+| `weknora agent check <agent-id>`  | 1+N HTTP | also probes every KB in the agent's scope |
+
+`weknora doc wait <doc-id> [<doc-id>...]` blocks until each document
+reaches a terminal `parse_status` (completed or failed). Exit codes:
+0 (all completed), 1 (any failed), 124 (`--timeout` reached), 130
+(Ctrl-C / SIGTERM). Multi-target is polled concurrently (max 5 in
+flight; pipe through `xargs -P` for more).
+
 ---

 ## Development
@@ -261,7 +288,7 @@ macOS / Windows × Go 1.26, path-filtered to changes under `cli/`.
  security findings.
 - **Pull requests**: the developer guide for editing the CLI lives in
  [AGENTS.md](AGENTS.md) (build / test / command-surface design SOP /
-  CRUD flag canon). Run `go test ./... -race -count=1` and `go vet ./...`
+  CRUD flag conventions). Run `go test ./... -race -count=1` and `go vet ./...`
  before submitting.

 ---
--- a/cli/acceptance/e2e/e2e_test.go
+++ b/cli/acceptance/e2e/e2e_test.go
@@ -46,7 +46,7 @@ func TestRAGFullLoop(t *testing.T) {
 		"XDG_CONFIG_HOME="+xdg,
 		"XDG_CACHE_HOME="+filepath.Join(xdg, "cache"),
 		// SDK debug off - explicit so the CI run isn't noisy.
-		"WEKNORA_SDK_DEBUG=",
+		"WEKNORA_LOG_LEVEL=error",
 	)

 	// 1. kb create → bare KnowledgeBase object
@@ -55,7 +55,7 @@ func TestRAGFullLoop(t *testing.T) {
 		ID   string `json:"id"`
 		Name string `json:"name"`
 	}
-	runJSONInto(t, bin, env, &created, "kb", "create", "--name", kbName, "--json")
+	runJSONInto(t, bin, env, &created, "kb", "create", kbName, "--format", "json")
 	if created.ID == "" {
 		t.Fatalf("kb create returned no id")
 	}
@@ -63,7 +63,7 @@ func TestRAGFullLoop(t *testing.T) {

 	t.Cleanup(func() {
 		// Best-effort cleanup; a 404 means the KB was already gone.
-		out, err := run(bin, env, "kb", "delete", created.ID, "-y", "--json")
+		out, err := run(bin, env, "kb", "delete", created.ID, "-y", "--format", "json")
 		if err != nil {
 			t.Logf("cleanup kb delete: %v\n%s", err, out)
 		}
@@ -74,7 +74,7 @@ func TestRAGFullLoop(t *testing.T) {
 	var uploaded struct {
 		ID string `json:"id"`
 	}
-	runJSONInto(t, bin, env, &uploaded, "doc", "upload", docPath, "--kb", created.ID, "--json")
+	runJSONInto(t, bin, env, &uploaded, "doc", "upload", docPath, "--kb", created.ID, "--format", "json")
 	if uploaded.ID == "" {
 		t.Fatalf("doc upload returned no id")
 	}
@@ -85,18 +85,18 @@ func TestRAGFullLoop(t *testing.T) {

 	// 4. search chunks → bare []SearchResult
 	var results []map[string]any
-	runJSONInto(t, bin, env, &results, "search", "chunks", "sample", "--kb", created.ID, "--limit", "5", "--json")
+	runJSONInto(t, bin, env, &results, "search", "chunks", "sample", "--kb", created.ID, "--limit", "5", "--format", "json")
 	if len(results) == 0 {
 		t.Fatalf("search returned no results")
 	}
 	t.Logf("search returned %d results", len(results))

-	// 5. chat --no-stream --json → bare {answer, references, ...} object
+	// 5. chat --format json → bare {answer, references, ...} object
 	var chat struct {
 		Answer     string           `json:"answer"`
 		References []map[string]any `json:"references"`
 	}
-	runJSONInto(t, bin, env, &chat, "chat", "summarize the document briefly", "--kb", created.ID, "--no-stream", "--json")
+	runJSONInto(t, bin, env, &chat, "chat", "summarize the document briefly", "--kb", created.ID, "--format", "json")
 	if strings.TrimSpace(chat.Answer) == "" {
 		t.Fatalf("chat returned empty answer")
 	}
@@ -201,7 +201,7 @@ func waitDocReady(t *testing.T, bin string, env []string, kbID, docID string, ti
 			ID          string `json:"id"`
 			ParseStatus string `json:"parse_status"`
 		}
-		runJSONInto(t, bin, env, &docs, "doc", "list", "--kb", kbID, "--page-size", "100", "--json")
+		runJSONInto(t, bin, env, &docs, "doc", "list", "--kb", kbID, "--page-size", "100", "--format", "json")
 		for _, d := range docs {
 			if d.ID != docID {
 				continue
--- a/cli/cmd/agent/agent.go
+++ b/cli/cmd/agent/agent.go
@@ -1,7 +1,7 @@
 // Package agentcmd holds the `weknora agent` command tree:
 // list / view / invoke / create / edit / delete. The directory is named
-// `agent/` (matches cobra noun-verb convention) but the Go package is
-// `agentcmd` to avoid colliding with cobra's *cobra.Command identifier.
+// `agent/` to match the cobra subcommand; the Go package is `agentcmd`
+// to avoid colliding with cobra's *cobra.Command identifier.
 //
 // "agent" in this subtree refers to WeKnora's user-defined Custom
 // Agents (server resource: GET/POST /agents/...). The CLI's
--- a/cli/cmd/chunk/chunk.go
+++ b/cli/cmd/chunk/chunk.go
@@ -1,7 +1,7 @@
-// Package chunkcmd implements the `chunk` verb subtree for managing
+// Package chunkcmd implements the `chunk` command subtree for managing
 // document chunks in a knowledge base. The directory is named `chunk/`
-// (cobra noun-verb convention) but the Go package is `chunkcmd` to
-// avoid colliding with cobra's *cobra.Command identifier.
+// to match the cobra subcommand; the Go package is `chunkcmd` to avoid
+// colliding with cobra's *cobra.Command identifier.
 //
 // "chunk" in this subtree refers to indexed pieces of a knowledge
 // document (server resource: GET/DELETE /chunks/...). Each document
--- a/cli/cmd/context/context.go
+++ b/cli/cmd/context/context.go
@@ -1,6 +1,5 @@
-// Package contextcmd holds `weknora context` command tree
-// (list / add / remove / use). Uses the `<noun> <verb>` shape
-// consistent with the rest of this CLI.
+// Package contextcmd holds the `weknora context` command tree
+// (list / add / remove / use).
 //
 // Package name `contextcmd` (not `context`) to avoid shadowing stdlib context.
 // The cobra Use: string is "context" - this is what users type.
--- a/cli/cmd/root.go
+++ b/cli/cmd/root.go
@@ -88,14 +88,14 @@ func NewRootCmd(f *cmdutil.Factory) *cobra.Command {
 	v, commit, date := build.Info()
 	cmd := &cobra.Command{
 		Use:   "weknora",
-		Short: "WeKnora CLI - RAG knowledge base from your terminal",
-		Long: `WeKnora CLI lets you authenticate, browse knowledge bases, and run
-hybrid searches against a WeKnora server from your shell or an AI agent.`,
-		Example: `  weknora auth login --host=https://kb.example.com   # one-time setup
-  weknora kb list                                    # list knowledge bases
-  weknora kb view <id>                               # show one
-  weknora search chunks "your question" --kb=<id>    # hybrid retrieval
-  weknora doctor --format json                       # health check (agent-readable)`,
+		Short: "WeKnora CLI",
+		Long: `Command-line client for the WeKnora RAG server. Manage knowledge bases
+and documents, run hybrid search, chat with grounded answers, or expose
+a curated read-only MCP tool surface for AI agents.`,
+		Example: `  weknora auth login --host=https://kb.example.com
+  weknora kb list
+  weknora chat "summarise the design doc"
+  weknora doctor --format json`,
 		SilenceUsage:  true,
 		SilenceErrors: true,
 		// Version makes cobra auto-register a `--version` global flag that
--- a/cli/internal/cmdutil/errors_doc_test.go
+++ b/cli/internal/cmdutil/errors_doc_test.go
@@ -0,0 +1,48 @@
+package cmdutil
+
+import (
+	"os"
+	"path/filepath"
+	"strings"
+	"testing"
+)
+
+// TestAllCodes_DocumentedInAGENTS verifies every typed code returned by
+// AllCodes() surfaces in cli/AGENTS.md "Error code reference" section
+// (delimited by ERROR_REFERENCE_START/END markers).
+//
+// Prevents drift: a contributor adding a new ErrorCode without updating
+// the doc fails this test, forcing the doc to stay current.
+func TestAllCodes_DocumentedInAGENTS(t *testing.T) {
+	// From cli/internal/cmdutil/, go up two levels to find cli/AGENTS.md.
+	docPath, err := filepath.Abs("../../AGENTS.md")
+	if err != nil {
+		t.Fatalf("abs: %v", err)
+	}
+	content, err := os.ReadFile(docPath)
+	if err != nil {
+		t.Fatalf("read %s: %v", docPath, err)
+	}
+	doc := string(content)
+
+	const startMarker = "<!-- ERROR_REFERENCE_START -->"
+	const endMarker = "<!-- ERROR_REFERENCE_END -->"
+	startIdx := strings.Index(doc, startMarker)
+	endIdx := strings.Index(doc, endMarker)
+	if startIdx == -1 || endIdx == -1 || endIdx <= startIdx {
+		t.Fatalf("error-reference markers missing or malformed in %s:\n  start=%d end=%d", docPath, startIdx, endIdx)
+	}
+	refSection := doc[startIdx:endIdx]
+
+	missing := []string{}
+	for _, c := range AllCodes() {
+		needle := "`" + string(c) + "`"
+		if !strings.Contains(refSection, needle) {
+			missing = append(missing, string(c))
+		}
+	}
+	if len(missing) > 0 {
+		t.Errorf("the following error codes are registered in AllCodes() but not listed in cli/AGENTS.md \"Error code reference\" section between the ERROR_REFERENCE markers:\n  - %s\n\nAdd a row for each missing code to keep agent-facing docs in sync.",
+			strings.Join(missing, "\n  - "))
+	}
+}