Files
WeKnora/cli/cmd/doc/upload_recursive_test.go
nullkey 2ee9741fa1 refactor(cli): finish context→profile cascade + post-review hardening (BREAKING)
Post-review polish on the v0.7 wire / surface contract. Bundles five
follow-ups that landed after the main BREAKING feat commit:

1. Complete context→profile cascade (internal API + YAML schema)

The prior commit renamed only the user-visible surface (commands /
flags / env / project link / envelope field). The internal Go API
and on-disk config schema were still half-renamed — an L-25
self-consistency violation flagged by post-merge review. Closed here:

Internal Go API:
- config.Context           → config.Profile
- config.Config.CurrentContext → CurrentProfile
- config.Config.Contexts       → Profiles
- LoginOptions.Context     → LoginOptions.Profile
- clearContextSecrets()    → clearProfileSecrets()
- saveContextRef()         → saveProfileRef()
- secrets.Store: param name `context` → `profile` (interface +
  FileStore + KeyringStore + MemStore)
- cmdutil.LoadSecret(store, context, key) → LoadSecret(store, profile, key)
- cmdutil.RefreshAndPersist's ctxName → profileName
- Local var `ctx := &config.Profile{...}` → `prof := &config.Profile{...}`
  in auth/login.go to eliminate the visual collision with Go stdlib
  context.Context that motivated the whole rename in the first place.

On-disk config.yaml schema:
- current_context: → current_profile:
- contexts:       → profiles:
- Pre-1.0 break, no compat alias. Users on v0.6 dogfooded configs
  must delete ~/.config/weknora/config.yaml or hand-rename the two
  keys (CHANGELOG migration note added).

Tests / fixtures / golden files:
- factory_test.go YAML fixture + assertion updated.
- acceptance/e2e/e2e_test.go writeContextYAML → writeProfileYAML,
  fixture YAML keys updated.
- acceptance/testdata/wire/doctor.error_network.json golden updated
  ("active context" → "active profile" in hint string).

User-visible prose sweep:
- cmd/mcp/serve.go --help Long: "active context (or --context)" →
  "active profile (or --profile)" — most-visible miss.
- cmd/{kb/list, search/kb, session/list, api/api} Short/Long help.
- cmd/auth/login.go stdout: `(context=%s)` → `(profile=%s)`.
- cmd/auth/logout.go error: `"no current context"` → `"no current profile"`.
- cmd/doctor/doctor.go hint string (also the wire golden above).
- cmd/auth/refresh.go error: `"refresh token missing for context"` →
  `"refresh token missing for profile"`.
- README.md: `## Multi-context` H2 → `## Multi-profile`; code-block
  comment `# current context` → `# current profile`.

Code-comment / docstring sweep across cli/cmd/auth/ and
cli/internal/cmdutil/. Comments referencing Go stdlib context.Context,
the RAG / LLM "context window" concept, and historical CHANGELOG
entries for v0.4 / v0.5 were left alone.

CHANGELOG v0.7 BREAKING entry gains the on-disk-schema bullet under
the existing "context → profile" item.

2. Profile name validation (shell-injection guard)

`envelope.error.retry_command` is a single shell-string field. An
AI agent that exec()s it via `sh -c <retry_command>` was injectable
through a maliciously-named profile:

  weknora auth logout --name 'x; rm -rf ~'
  # would produce: retry_command = "weknora auth logout --name x; rm -rf ~ -y"

`cmd/profile/add.go` already enforced an alphanumeric + `-_.`
allowlist via `validateName`. The `auth login` and `auth logout`
paths bypassed it.

- Moved validation from `cmd/profile/add.go` to
  `cli/internal/cmdutil/profilename.go` as exported
  `ValidateProfileName` (cmdutil is the import-cycle-safe home;
  internal/config can't depend on cmdutil).
- `auth login` runs the validator before any persist call.
- `auth logout` runs the validator on `opts.Name` before
  constructing `retry_command`.
- Unit tests (`profilename_test.go`) cover the allowlist, empty
  rejection, path-traversal, shell metacharacters (`;`, `&`, `|`,
  `$()`, backticks, quotes, whitespace, glob, redirects), and the
  user-facing hint text. The shell-metachar test exists as a
  regression guard.

Wire shape (`retry_command` string → `retry_command_argv []string`)
remains a v0.8 additive change per ROADMAP — this fix removes the
practical exploit path without touching the wire contract.

3. AI-agent terminology disambiguation

"agent" has three referents in this codebase: (a) WeKnora's
server-side Custom Agent resource, (b) the removed `agent invoke`
verb, (c) external LLM/automation consumers. Per project memory
feedback_no_meta_disambiguation_in_docs, the fix is full-term
naming, not "X has N meanings" prose. Surgical changes at section
headers + ambiguous prose:

- AGENTS.md: "Agent decision shortcuts" → "AI agent decision
  shortcuts"; "agent-callable surface" → "AI-agent-callable
  surface".
- README.md: "Designed to be agent-first" → "AI-agent-first";
  "Other agent ergonomics" → "Other AI-agent ergonomics"; "in
  agent contexts" → "in AI-agent contexts"; "for CI / agents" →
  "for CI / AI agents".

Anaphoric "agents" inside paragraphs that already established
"AI agents" was left alone — full substitution everywhere would
have been prose noise without clarity gain.

4. Wire-contract review follow-ups

Real findings from a second-pass review of the v0.7 envelope /
streaming / surface design. Per project memory
feedback_check_in_domain_anchor_first, candidate findings were
first verified against the in-domain peer CLI explicitly cited as
the envelope anchor; two earlier-flagged issues turned out to be
in-pattern and were withdrawn.

Surviving fixes:

- AGENTS.md success-envelope example rewritten. The prior example
  showed `has_more: false` / `_notice: {}` as if they were always
  present, but both fields are `omitempty` and never serialize
  when zero / nil. Replaced with three realistic shapes (list /
  single resource / mutation with no payload) and added a note
  that optional fields are omitted when empty.

- cmd/chat/chat.go Args: MinimumNArgs(1) → ExactArgs(1).
  v0.6 silently joined `weknora chat hello world` into
  `"hello world"`. v0.7 now rejects multi-arg with exit 2,
  matching `weknora session ask`. BREAKING; CHANGELOG entry
  added under v0.7 BREAKING.

- internal/output/envelope.go extracts NewEnvelope(data, meta,
  profile) constructor. The jq-filter path in
  cmdutil.FormatOptions.Emit was manually rebuilding the
  envelope literal alongside the canonical WriteEnvelope path —
  drift risk when fields are added. Single construction point now.

- internal/cmdutil/factory.go adds AddKBFlag(cmd) helper.
  Five files (chat, doc/list, doc/upload, doc/create, doc/fetch)
  had verbatim-identical `cmd.Flags().String("kb", ...)`
  declarations. Centralised so flag name + help text stay
  in sync with Factory.ResolveKB. Docstring reordering + gofmt
  fixup landed in the same edit to keep ResolveKB's own godoc
  attached to its function.

5. OSS-readiness comment / doc sweep

Pre-publication scrub of code, comments, and shipped Markdown to
remove references that only make sense in the development repo:

- AGENTS.md "Deliberate deviations + mainstream alignments"
  section: removed peer-project name-drops from the comparison
  table; rewrote as five flagged design decisions with rationale
  but no specific competitor named. The four rows that previously
  contrasted against a named peer CLI now state WeKnora's choice
  + rationale directly. Section header renamed to "Design
  decisions worth flagging" since it is no longer a
  deviation/alignment matrix.

- CHANGELOG v0.7 BREAKING rationales: three references to a
  named peer CLI removed; the context→profile rationale now
  cites only mainstream multi-credential CLIs by category (AWS /
  Stripe / OpenAI / Anthropic), and the `api -d/--data` removal
  rationale cites only `gh api` / `curl`. `chat` BREAKING entry
  rationale similarly simplified.

- 35 cross-references to design-spec section numbers (§4.1 /
  §4.5 / §5.3 etc.) removed from Go doc comments and test
  comments across 13 files. The referenced spec lives outside
  the shipped tree; readers of the public repo cannot resolve
  them. Each reference replaced with a self-contained semantic
  description (e.g. "the batch envelope" / "AGENTS.md section
  on the success path").

- Mixed-language strings translated to English:
  - Four Go comments: internal/cmdutil/exit.go:213,215,
    internal/cmdutil/errors.go:156,
    internal/output/batch_test.go:90,
    internal/output/envelope_test.go:27.
  - One CHANGELOG section title:
    `v0.7 — Agent-first wire contract + 命令面集中清理` →
    `... + command-surface cleanup`.
  - CJK test fixtures (internal/text/truncate_test.go CJK
    truncation cases, cmd/session/list_test.go Chinese session
    title, acceptance/e2e/e2e_test.go Chinese RAG corpus)
    retained — they are intentional test inputs, not stray prose.

- Makefile help comment: `golangci-lint added in PR-9` →
  `golangci-lint planned`. Internal PR numbering should not
  surface in shipped Makefile prose.

Build green, 28/28 packages, +5 new ValidateProfileName tests.
go vet / gofmt / go mod verify / go mod tidy all clean.

Rationale for the cascade: pre-1.0 is the cheapest moment to close
L-25 self-consistency (L-26). The half-finished internal rename
would have perpetuated the very `context` vs `context.Context`
ambiguity that motivated v0.7's user-visible rename in the first
place.
2026-05-27 10:56:34 +08:00

271 lines
9.2 KiB
Go

package doc
import (
"context"
"encoding/json"
"errors"
"os"
"path/filepath"
"sort"
"strings"
"testing"
"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"
"github.com/Tencent/WeKnora/cli/internal/cmdutil"
"github.com/Tencent/WeKnora/cli/internal/iostreams"
sdk "github.com/Tencent/WeKnora/client"
)
// scriptedUploadSvc records every CreateKnowledgeFromFile call and returns
// per-path scripted results.
type scriptedUploadSvc struct {
results map[string]struct {
k *sdk.Knowledge
err error
}
called []string
// Captures from the most-recent call (every recursive iteration writes
// these; tests that want all-rows can extend to slices later).
lastMetadata map[string]string
lastEnableMultimodel *bool
lastChannel string
}
func (s *scriptedUploadSvc) CreateKnowledgeFromFile(
_ context.Context,
_, filePath string,
metadata map[string]string,
enableMultimodel *bool,
_, channel string,
) (*sdk.Knowledge, error) {
s.called = append(s.called, filepath.Base(filePath))
s.lastMetadata = metadata
s.lastEnableMultimodel = enableMultimodel
s.lastChannel = channel
r, ok := s.results[filepath.Base(filePath)]
if !ok {
return &sdk.Knowledge{ID: "doc_" + filepath.Base(filePath), FileName: filepath.Base(filePath)}, nil
}
return r.k, r.err
}
func mkTree(t *testing.T, base string, names ...string) {
t.Helper()
for _, n := range names {
full := filepath.Join(base, n)
require.NoError(t, os.MkdirAll(filepath.Dir(full), 0o755))
require.NoError(t, os.WriteFile(full, []byte("x"), 0o644))
}
}
func TestUploadRecursive_WalksAllFiles(t *testing.T) {
out, _ := iostreams.SetForTest(t)
dir := t.TempDir()
mkTree(t, dir, "a.pdf", "b.pdf", "sub/c.pdf")
svc := &scriptedUploadSvc{}
opts := &UploadOptions{Recursive: true, Glob: "*"}
require.NoError(t, runUploadRecursive(context.Background(), opts, &cmdutil.FormatOptions{Mode: cmdutil.FormatText}, svc, "kb_xxx", dir))
sort.Strings(svc.called)
assert.Equal(t, []string{"a.pdf", "b.pdf", "c.pdf"}, svc.called)
got := out.String()
for _, w := range []string{"a.pdf", "b.pdf", "c.pdf", "Uploaded 3"} {
assert.Contains(t, got, w)
}
}
func TestUploadRecursive_GlobFilter(t *testing.T) {
_, _ = iostreams.SetForTest(t)
dir := t.TempDir()
mkTree(t, dir, "doc.pdf", "ignore.txt", "sub/keep.pdf", "sub/also-ignore.md")
svc := &scriptedUploadSvc{}
opts := &UploadOptions{Recursive: true, Glob: "*.pdf"}
require.NoError(t, runUploadRecursive(context.Background(), opts, &cmdutil.FormatOptions{Mode: cmdutil.FormatText}, svc, "kb_xxx", dir))
sort.Strings(svc.called)
assert.Equal(t, []string{"doc.pdf", "keep.pdf"}, svc.called)
}
func TestUploadRecursive_PartialFailure_Exits1(t *testing.T) {
out, _ := iostreams.SetForTest(t)
dir := t.TempDir()
mkTree(t, dir, "ok.pdf", "bad.pdf")
svc := &scriptedUploadSvc{results: map[string]struct {
k *sdk.Knowledge
err error
}{
"bad.pdf": {err: errors.New("HTTP error 500: internal")},
}}
opts := &UploadOptions{Recursive: true, Glob: "*"}
err := runUploadRecursive(context.Background(), opts, &cmdutil.FormatOptions{Mode: cmdutil.FormatText}, svc, "kb_xxx", dir)
require.Error(t, err)
var typed *cmdutil.Error
require.ErrorAs(t, err, &typed)
// CodeServerError preserves the 500 classification of the underlying
// SDK error - the recursive wrapper just aggregates.
assert.Equal(t, cmdutil.CodeServerError, typed.Code)
got := out.String()
assert.Contains(t, got, "OK") // ok.pdf still succeeded
assert.Contains(t, got, "FAIL")
assert.Contains(t, got, "Uploaded 1")
assert.Contains(t, got, "Failed 1")
}
func TestUploadRecursive_NoMatches(t *testing.T) {
out, _ := iostreams.SetForTest(t)
dir := t.TempDir()
mkTree(t, dir, "only.txt")
svc := &scriptedUploadSvc{}
opts := &UploadOptions{Recursive: true, Glob: "*.pdf"}
require.NoError(t, runUploadRecursive(context.Background(), opts, &cmdutil.FormatOptions{Mode: cmdutil.FormatText}, svc, "kb_xxx", dir))
assert.Len(t, svc.called, 0)
assert.Contains(t, strings.ToLower(out.String()), "no files matched")
}
func TestUploadRecursive_NotADirectory(t *testing.T) {
_, _ = iostreams.SetForTest(t)
path := writeTempFile(t, "single.pdf")
svc := &scriptedUploadSvc{}
err := runUploadRecursive(context.Background(), &UploadOptions{Recursive: true, Glob: "*"}, &cmdutil.FormatOptions{Mode: cmdutil.FormatText}, svc, "kb_xxx", path)
require.Error(t, err)
var typed *cmdutil.Error
require.ErrorAs(t, err, &typed)
assert.Equal(t, cmdutil.CodeInputInvalidArgument, typed.Code)
assert.Contains(t, typed.Message, "directory")
}
func TestUploadRecursive_RejectsNameFlag(t *testing.T) {
_, _ = iostreams.SetForTest(t)
dir := t.TempDir()
mkTree(t, dir, "a.pdf")
svc := &scriptedUploadSvc{}
opts := &UploadOptions{Recursive: true, Glob: "*", Name: "single-name.pdf"}
err := runUploadRecursive(context.Background(), opts, &cmdutil.FormatOptions{Mode: cmdutil.FormatText}, svc, "kb_xxx", dir)
require.Error(t, err)
var typed *cmdutil.Error
require.ErrorAs(t, err, &typed)
assert.Equal(t, cmdutil.CodeInputInvalidArgument, typed.Code)
assert.Contains(t, typed.Message, "--name")
}
func TestUploadRecursive_PropagatesMultimodelAndMetadata(t *testing.T) {
_, _ = iostreams.SetForTest(t)
dir := t.TempDir()
mkTree(t, dir, "a.pdf")
svc := &scriptedUploadSvc{}
mm := true
opts := &UploadOptions{
Recursive: true,
Glob: "*",
EnableMultimodel: &mm,
Metadata: []string{"team=alpha"},
Channel: "browser_extension",
}
require.NoError(t, runUploadRecursive(context.Background(), opts, &cmdutil.FormatOptions{Mode: cmdutil.FormatText}, svc, "kb_xxx", dir))
require.NotNil(t, svc.lastEnableMultimodel)
assert.True(t, *svc.lastEnableMultimodel)
assert.Equal(t, map[string]string{"team": "alpha"}, svc.lastMetadata)
assert.Equal(t, "browser_extension", svc.lastChannel)
}
func TestUploadRecursive_MetadataInvalid_NoCalls(t *testing.T) {
_, _ = iostreams.SetForTest(t)
dir := t.TempDir()
mkTree(t, dir, "a.pdf")
svc := &scriptedUploadSvc{}
opts := &UploadOptions{Recursive: true, Glob: "*", Metadata: []string{"badformat"}}
err := runUploadRecursive(context.Background(), opts, &cmdutil.FormatOptions{Mode: cmdutil.FormatText}, svc, "kb_xxx", dir)
require.Error(t, err)
var typed *cmdutil.Error
require.ErrorAs(t, err, &typed)
assert.Equal(t, cmdutil.CodeInputInvalidArgument, typed.Code)
assert.Empty(t, svc.called, "must fail before any per-file call")
}
// TestUploadRecursive_JSON_BatchEnvelope verifies that --format json emits the
// batch envelope shape: {ok, data:[{id,ok,result?|error?}...], meta:{count,successes,failures}}.
// The per-item id is the file path; result carries {id, name} from the server.
func TestUploadRecursive_JSON_BatchEnvelope(t *testing.T) {
out, _ := iostreams.SetForTest(t)
dir := t.TempDir()
mkTree(t, dir, "ok.pdf", "bad.pdf")
svc := &scriptedUploadSvc{results: map[string]struct {
k *sdk.Knowledge
err error
}{
"bad.pdf": {err: errors.New("HTTP error 500: internal")},
}}
opts := &UploadOptions{Recursive: true, Glob: "*"}
err := runUploadRecursive(context.Background(), opts, &cmdutil.FormatOptions{Mode: cmdutil.FormatJSON}, svc, "kb_xxx", dir)
require.Error(t, err) // partial failure → typed error
var env struct {
OK bool `json:"ok"`
Data []struct {
ID string `json:"id"`
OK bool `json:"ok"`
Result *struct {
ID string `json:"id"`
Name string `json:"name"`
} `json:"result,omitempty"`
Error *struct {
Type string `json:"type"`
Message string `json:"message"`
} `json:"error,omitempty"`
} `json:"data"`
Meta struct {
Count int `json:"count"`
Successes int `json:"successes"`
Failures int `json:"failures"`
} `json:"meta"`
}
require.NoError(t, json.Unmarshal(out.Bytes(), &env), "must be valid JSON: %s", out.String())
assert.False(t, env.OK, "top-level ok must be false when any item failed")
assert.Equal(t, 2, env.Meta.Count)
assert.Equal(t, 1, env.Meta.Successes)
assert.Equal(t, 1, env.Meta.Failures)
require.Len(t, env.Data, 2)
// File paths are used as batch item ids; verify both files appear.
ids := []string{env.Data[0].ID, env.Data[1].ID}
assert.True(t, strings.Contains(ids[0], "ok.pdf") || strings.Contains(ids[1], "ok.pdf"), "ok.pdf must appear in batch data")
assert.True(t, strings.Contains(ids[0], "bad.pdf") || strings.Contains(ids[1], "bad.pdf"), "bad.pdf must appear in batch data")
// The success item must have a result with server id/name.
for _, item := range env.Data {
if strings.Contains(item.ID, "ok.pdf") {
assert.True(t, item.OK)
assert.NotNil(t, item.Result)
} else {
assert.False(t, item.OK)
assert.NotNil(t, item.Error)
}
}
// --format json must emit exactly ONE JSON document. Per-file "FAIL"/"OK"
// progress lines belong on the human path; the typed error is Silent so
// the root handler doesn't write anything additional to stdout.
body := out.String()
assert.NotContains(t, body, "FAIL ", "per-file plain lines must not appear under --format json")
assert.NotContains(t, body, "OK ", "per-file plain lines must not appear under --format json")
var typed *cmdutil.Error
require.ErrorAs(t, err, &typed)
assert.True(t, typed.Silent, "JSON-path partial failure must be Silent")
assert.Equal(t, cmdutil.CodeServerError, typed.Code)
}