mirror of
https://github.com/Tencent/WeKnora.git
synced 2026-06-04 13:30:32 +08:00
Post-review polish on the v0.7 wire / surface contract. Bundles five
follow-ups that landed after the main BREAKING feat commit:
1. Complete context→profile cascade (internal API + YAML schema)
The prior commit renamed only the user-visible surface (commands /
flags / env / project link / envelope field). The internal Go API
and on-disk config schema were still half-renamed — an L-25
self-consistency violation flagged by post-merge review. Closed here:
Internal Go API:
- config.Context → config.Profile
- config.Config.CurrentContext → CurrentProfile
- config.Config.Contexts → Profiles
- LoginOptions.Context → LoginOptions.Profile
- clearContextSecrets() → clearProfileSecrets()
- saveContextRef() → saveProfileRef()
- secrets.Store: param name `context` → `profile` (interface +
FileStore + KeyringStore + MemStore)
- cmdutil.LoadSecret(store, context, key) → LoadSecret(store, profile, key)
- cmdutil.RefreshAndPersist's ctxName → profileName
- Local var `ctx := &config.Profile{...}` → `prof := &config.Profile{...}`
in auth/login.go to eliminate the visual collision with Go stdlib
context.Context that motivated the whole rename in the first place.
On-disk config.yaml schema:
- current_context: → current_profile:
- contexts: → profiles:
- Pre-1.0 break, no compat alias. Users on v0.6 dogfooded configs
must delete ~/.config/weknora/config.yaml or hand-rename the two
keys (CHANGELOG migration note added).
Tests / fixtures / golden files:
- factory_test.go YAML fixture + assertion updated.
- acceptance/e2e/e2e_test.go writeContextYAML → writeProfileYAML,
fixture YAML keys updated.
- acceptance/testdata/wire/doctor.error_network.json golden updated
("active context" → "active profile" in hint string).
User-visible prose sweep:
- cmd/mcp/serve.go --help Long: "active context (or --context)" →
"active profile (or --profile)" — most-visible miss.
- cmd/{kb/list, search/kb, session/list, api/api} Short/Long help.
- cmd/auth/login.go stdout: `(context=%s)` → `(profile=%s)`.
- cmd/auth/logout.go error: `"no current context"` → `"no current profile"`.
- cmd/doctor/doctor.go hint string (also the wire golden above).
- cmd/auth/refresh.go error: `"refresh token missing for context"` →
`"refresh token missing for profile"`.
- README.md: `## Multi-context` H2 → `## Multi-profile`; code-block
comment `# current context` → `# current profile`.
Code-comment / docstring sweep across cli/cmd/auth/ and
cli/internal/cmdutil/. Comments referencing Go stdlib context.Context,
the RAG / LLM "context window" concept, and historical CHANGELOG
entries for v0.4 / v0.5 were left alone.
CHANGELOG v0.7 BREAKING entry gains the on-disk-schema bullet under
the existing "context → profile" item.
2. Profile name validation (shell-injection guard)
`envelope.error.retry_command` is a single shell-string field. An
AI agent that exec()s it via `sh -c <retry_command>` was injectable
through a maliciously-named profile:
weknora auth logout --name 'x; rm -rf ~'
# would produce: retry_command = "weknora auth logout --name x; rm -rf ~ -y"
`cmd/profile/add.go` already enforced an alphanumeric + `-_.`
allowlist via `validateName`. The `auth login` and `auth logout`
paths bypassed it.
- Moved validation from `cmd/profile/add.go` to
`cli/internal/cmdutil/profilename.go` as exported
`ValidateProfileName` (cmdutil is the import-cycle-safe home;
internal/config can't depend on cmdutil).
- `auth login` runs the validator before any persist call.
- `auth logout` runs the validator on `opts.Name` before
constructing `retry_command`.
- Unit tests (`profilename_test.go`) cover the allowlist, empty
rejection, path-traversal, shell metacharacters (`;`, `&`, `|`,
`$()`, backticks, quotes, whitespace, glob, redirects), and the
user-facing hint text. The shell-metachar test exists as a
regression guard.
Wire shape (`retry_command` string → `retry_command_argv []string`)
remains a v0.8 additive change per ROADMAP — this fix removes the
practical exploit path without touching the wire contract.
3. AI-agent terminology disambiguation
"agent" has three referents in this codebase: (a) WeKnora's
server-side Custom Agent resource, (b) the removed `agent invoke`
verb, (c) external LLM/automation consumers. Per project memory
feedback_no_meta_disambiguation_in_docs, the fix is full-term
naming, not "X has N meanings" prose. Surgical changes at section
headers + ambiguous prose:
- AGENTS.md: "Agent decision shortcuts" → "AI agent decision
shortcuts"; "agent-callable surface" → "AI-agent-callable
surface".
- README.md: "Designed to be agent-first" → "AI-agent-first";
"Other agent ergonomics" → "Other AI-agent ergonomics"; "in
agent contexts" → "in AI-agent contexts"; "for CI / agents" →
"for CI / AI agents".
Anaphoric "agents" inside paragraphs that already established
"AI agents" was left alone — full substitution everywhere would
have been prose noise without clarity gain.
4. Wire-contract review follow-ups
Real findings from a second-pass review of the v0.7 envelope /
streaming / surface design. Per project memory
feedback_check_in_domain_anchor_first, candidate findings were
first verified against the in-domain peer CLI explicitly cited as
the envelope anchor; two earlier-flagged issues turned out to be
in-pattern and were withdrawn.
Surviving fixes:
- AGENTS.md success-envelope example rewritten. The prior example
showed `has_more: false` / `_notice: {}` as if they were always
present, but both fields are `omitempty` and never serialize
when zero / nil. Replaced with three realistic shapes (list /
single resource / mutation with no payload) and added a note
that optional fields are omitted when empty.
- cmd/chat/chat.go Args: MinimumNArgs(1) → ExactArgs(1).
v0.6 silently joined `weknora chat hello world` into
`"hello world"`. v0.7 now rejects multi-arg with exit 2,
matching `weknora session ask`. BREAKING; CHANGELOG entry
added under v0.7 BREAKING.
- internal/output/envelope.go extracts NewEnvelope(data, meta,
profile) constructor. The jq-filter path in
cmdutil.FormatOptions.Emit was manually rebuilding the
envelope literal alongside the canonical WriteEnvelope path —
drift risk when fields are added. Single construction point now.
- internal/cmdutil/factory.go adds AddKBFlag(cmd) helper.
Five files (chat, doc/list, doc/upload, doc/create, doc/fetch)
had verbatim-identical `cmd.Flags().String("kb", ...)`
declarations. Centralised so flag name + help text stay
in sync with Factory.ResolveKB. Docstring reordering + gofmt
fixup landed in the same edit to keep ResolveKB's own godoc
attached to its function.
5. OSS-readiness comment / doc sweep
Pre-publication scrub of code, comments, and shipped Markdown to
remove references that only make sense in the development repo:
- AGENTS.md "Deliberate deviations + mainstream alignments"
section: removed peer-project name-drops from the comparison
table; rewrote as five flagged design decisions with rationale
but no specific competitor named. The four rows that previously
contrasted against a named peer CLI now state WeKnora's choice
+ rationale directly. Section header renamed to "Design
decisions worth flagging" since it is no longer a
deviation/alignment matrix.
- CHANGELOG v0.7 BREAKING rationales: three references to a
named peer CLI removed; the context→profile rationale now
cites only mainstream multi-credential CLIs by category (AWS /
Stripe / OpenAI / Anthropic), and the `api -d/--data` removal
rationale cites only `gh api` / `curl`. `chat` BREAKING entry
rationale similarly simplified.
- 35 cross-references to design-spec section numbers (§4.1 /
§4.5 / §5.3 etc.) removed from Go doc comments and test
comments across 13 files. The referenced spec lives outside
the shipped tree; readers of the public repo cannot resolve
them. Each reference replaced with a self-contained semantic
description (e.g. "the batch envelope" / "AGENTS.md section
on the success path").
- Mixed-language strings translated to English:
- Four Go comments: internal/cmdutil/exit.go:213,215,
internal/cmdutil/errors.go:156,
internal/output/batch_test.go:90,
internal/output/envelope_test.go:27.
- One CHANGELOG section title:
`v0.7 — Agent-first wire contract + 命令面集中清理` →
`... + command-surface cleanup`.
- CJK test fixtures (internal/text/truncate_test.go CJK
truncation cases, cmd/session/list_test.go Chinese session
title, acceptance/e2e/e2e_test.go Chinese RAG corpus)
retained — they are intentional test inputs, not stray prose.
- Makefile help comment: `golangci-lint added in PR-9` →
`golangci-lint planned`. Internal PR numbering should not
surface in shipped Makefile prose.
Build green, 28/28 packages, +5 new ValidateProfileName tests.
go vet / gofmt / go mod verify / go mod tidy all clean.
Rationale for the cascade: pre-1.0 is the cheapest moment to close
L-25 self-consistency (L-26). The half-finished internal rename
would have perpetuated the very `context` vs `context.Context`
ambiguity that motivated v0.7's user-visible rename in the first
place.
254 lines
9.3 KiB
Go
254 lines
9.3 KiB
Go
package doc
|
|
|
|
import (
|
|
"context"
|
|
"fmt"
|
|
"slices"
|
|
"sort"
|
|
"strings"
|
|
"text/tabwriter"
|
|
"time"
|
|
|
|
"github.com/spf13/cobra"
|
|
|
|
"github.com/Tencent/WeKnora/cli/internal/cmdutil"
|
|
"github.com/Tencent/WeKnora/cli/internal/iostreams"
|
|
"github.com/Tencent/WeKnora/cli/internal/output"
|
|
"github.com/Tencent/WeKnora/cli/internal/text"
|
|
sdk "github.com/Tencent/WeKnora/client"
|
|
)
|
|
|
|
// docListFields enumerates the fields surfaced for `--format json` discovery on
|
|
// `doc list`. Filter applies to each Knowledge object in the bare array.
|
|
var docListFields = []string{
|
|
"id", "knowledge_base_id", "tag_id", "type", "title", "description",
|
|
"source", "channel", "parse_status", "summary_status", "enable_status",
|
|
"embedding_model_id", "file_name", "file_type", "file_size", "file_hash",
|
|
"file_path", "storage_size",
|
|
"created_at", "updated_at", "processed_at", "error_message",
|
|
}
|
|
|
|
type ListOptions struct {
|
|
PageSize int // Items per server batch. With --all-pages, controls
|
|
// per-request load. Without, controls the single page size.
|
|
Status string // --status: filter by parse_status (server-side query param)
|
|
// Limit caps the returned items client-side (default 30; 0 = no cap).
|
|
// Applied after pagination / --all-pages accumulation and sort.
|
|
Limit int
|
|
// AllPages walks server pages internally, accumulating items until
|
|
// total exhausted or --limit hit.
|
|
AllPages bool
|
|
// Additional server-side filters (each maps 1:1 to a sdk.KnowledgeListFilter
|
|
// field). Empty / zero values are omitted from the request.
|
|
Keyword string
|
|
FileType string
|
|
Source string
|
|
TagID string
|
|
StartTime string // raw RFC3339; parsed into filter.StartTime
|
|
EndTime string // raw RFC3339; parsed into filter.EndTime
|
|
}
|
|
|
|
// rfc3339Example is the canonical RFC3339 hint surfaced when --start-time /
|
|
// --end-time fail to parse. Picked to match Go's reference time docs.
|
|
const rfc3339Example = "2006-01-02T15:04:05Z"
|
|
|
|
// docListStatusValues mirrors internal/types/knowledge.go ParseStatus*
|
|
// constants - these are the values the server accepts on the
|
|
// ?parse_status= query. Kept in sync manually since the SDK doesn't
|
|
// re-export the enum.
|
|
var docListStatusValues = []string{"pending", "processing", "completed", "failed"}
|
|
|
|
// ListService is the narrow SDK surface this command depends on.
|
|
// *sdk.Client satisfies it.
|
|
type ListService interface {
|
|
ListKnowledgeWithFilter(ctx context.Context, kbID string, page, pageSize int, filter sdk.KnowledgeListFilter) ([]sdk.Knowledge, int64, error)
|
|
}
|
|
|
|
// NewCmdList builds `weknora doc list`.
|
|
func NewCmdList(f *cmdutil.Factory) *cobra.Command {
|
|
opts := &ListOptions{}
|
|
cmd := &cobra.Command{
|
|
Use: "list",
|
|
Short: "List documents in a knowledge base",
|
|
Long: `Lists documents (uploaded files / web pages / inline text) in the
|
|
resolved knowledge base. KB resolution follows the standard 4-level chain:
|
|
--kb flag > WEKNORA_KB_ID env > .weknora/project.yaml > error. The --kb
|
|
flag accepts either a KB UUID (passed through) or a name (resolved via list).
|
|
|
|
Default sort is updated_at desc so the most recent uploads surface first;
|
|
backend storage order is not guaranteed and varies between deployments.`,
|
|
Example: ` weknora doc list # uses project link / env
|
|
weknora doc list --kb a32a63ff-fb36-4874-bcaa-30f48570a694 # explicit UUID
|
|
weknora doc list --kb my-kb # resolved by name
|
|
weknora doc list --all-pages --format json # walk every page`,
|
|
Args: cobra.NoArgs,
|
|
RunE: func(c *cobra.Command, _ []string) error {
|
|
fopts, err := cmdutil.CheckFormatFlag(c)
|
|
if err != nil {
|
|
return err
|
|
}
|
|
fopts.ResolveDefault(iostreams.IO.IsStdoutTTY())
|
|
kbID, err := f.ResolveKB(c)
|
|
if err != nil {
|
|
return err
|
|
}
|
|
cli, err := f.Client()
|
|
if err != nil {
|
|
return err
|
|
}
|
|
return runList(c.Context(), opts, fopts, cli, kbID)
|
|
},
|
|
}
|
|
cmdutil.AddKBFlag(cmd)
|
|
cmd.Flags().IntVar(&opts.PageSize, "page-size", 50, "Items per server batch (1..1000)")
|
|
cmd.Flags().IntVarP(&opts.Limit, "limit", "L", 30, "Maximum results to return (1..10000)")
|
|
cmd.Flags().BoolVar(&opts.AllPages, "all-pages", false, "Walk all server pages until exhausted (or --limit hit)")
|
|
cmd.Flags().StringVar(&opts.Status, "status", "", "Filter by parse status: pending | processing | completed | failed")
|
|
cmd.Flags().StringVar(&opts.Keyword, "keyword", "", "Server-side substring match against title / file_name (case-sensitive)")
|
|
cmd.Flags().StringVar(&opts.FileType, "file-type", "", `Filter by file extension (e.g. "pdf", "md")`)
|
|
cmd.Flags().StringVar(&opts.Source, "source", "", `Filter by ingestion source (e.g. "api", "web")`)
|
|
cmd.Flags().StringVar(&opts.TagID, "tag-id", "", "Filter by tag association")
|
|
cmd.Flags().StringVar(&opts.StartTime, "start-time", "", "Include docs with updated_at >= this RFC3339 timestamp (e.g. 2006-01-02T15:04:05Z)")
|
|
cmd.Flags().StringVar(&opts.EndTime, "end-time", "", "Include docs with updated_at <= this RFC3339 timestamp (e.g. 2006-01-02T15:04:05Z)")
|
|
cmdutil.AddFormatFlag(cmd, docListFields...)
|
|
return cmd
|
|
}
|
|
|
|
func runList(ctx context.Context, opts *ListOptions, fopts *cmdutil.FormatOptions, svc ListService, kbID string) error {
|
|
if opts.PageSize < 1 || opts.PageSize > 1000 {
|
|
return &cmdutil.Error{
|
|
Code: cmdutil.CodeInputInvalidArgument,
|
|
Message: fmt.Sprintf("--page-size must be in 1..1000, got %d", opts.PageSize),
|
|
}
|
|
}
|
|
if opts.Limit < 1 || opts.Limit > 10000 {
|
|
return &cmdutil.Error{
|
|
Code: cmdutil.CodeInputInvalidArgument,
|
|
Message: fmt.Sprintf("--limit must be in 1..10000, got %d", opts.Limit),
|
|
}
|
|
}
|
|
if opts.Status != "" && !validDocListStatus(opts.Status) {
|
|
return &cmdutil.Error{
|
|
Code: cmdutil.CodeInputInvalidArgument,
|
|
Message: fmt.Sprintf("--status must be one of: %s - got %q",
|
|
strings.Join(docListStatusValues, " | "), opts.Status),
|
|
}
|
|
}
|
|
filter := sdk.KnowledgeListFilter{
|
|
ParseStatus: opts.Status,
|
|
Keyword: opts.Keyword,
|
|
FileType: opts.FileType,
|
|
Source: opts.Source,
|
|
TagID: opts.TagID,
|
|
}
|
|
if opts.StartTime != "" {
|
|
t, err := time.Parse(time.RFC3339, opts.StartTime)
|
|
if err != nil {
|
|
return cmdutil.NewError(cmdutil.CodeInputInvalidArgument,
|
|
fmt.Sprintf("--start-time must be RFC3339 (e.g. %s), got %q", rfc3339Example, opts.StartTime))
|
|
}
|
|
filter.StartTime = t
|
|
}
|
|
if opts.EndTime != "" {
|
|
t, err := time.Parse(time.RFC3339, opts.EndTime)
|
|
if err != nil {
|
|
return cmdutil.NewError(cmdutil.CodeInputInvalidArgument,
|
|
fmt.Sprintf("--end-time must be RFC3339 (e.g. %s), got %q", rfc3339Example, opts.EndTime))
|
|
}
|
|
filter.EndTime = t
|
|
}
|
|
|
|
// Pagination is always 1-indexed internally. --all-pages walks; the
|
|
// non-walking path returns the first page only.
|
|
var items []sdk.Knowledge
|
|
if opts.AllPages {
|
|
accum := make([]sdk.Knowledge, 0)
|
|
for page := 1; ; page++ {
|
|
chunk, total, err := svc.ListKnowledgeWithFilter(ctx, kbID, page, opts.PageSize, filter)
|
|
if err != nil {
|
|
return cmdutil.WrapHTTP(err, "list documents")
|
|
}
|
|
accum = append(accum, chunk...)
|
|
if opts.Limit > 0 && len(accum) >= opts.Limit {
|
|
accum = accum[:opts.Limit]
|
|
break
|
|
}
|
|
if int64(len(accum)) >= total || len(chunk) == 0 {
|
|
break
|
|
}
|
|
}
|
|
items = accum
|
|
} else {
|
|
chunk, _, err := svc.ListKnowledgeWithFilter(ctx, kbID, 1, opts.PageSize, filter)
|
|
if err != nil {
|
|
return cmdutil.WrapHTTP(err, "list documents")
|
|
}
|
|
items = chunk
|
|
}
|
|
if items == nil {
|
|
items = []sdk.Knowledge{} // ensure JSON [] not null
|
|
}
|
|
// Default sort: updated_at desc. Server return order is not guaranteed,
|
|
// so client-side sort makes output deterministic regardless of backend
|
|
// storage choices. Mirrors `weknora kb list`.
|
|
sort.Slice(items, func(i, j int) bool {
|
|
return items[i].UpdatedAt.After(items[j].UpdatedAt)
|
|
})
|
|
// --limit applies after sort so users get the top-N most-recent items
|
|
// when combined with a single-page fetch where page_size > limit.
|
|
truncated := false
|
|
if opts.Limit > 0 && len(items) > opts.Limit {
|
|
items = items[:opts.Limit]
|
|
truncated = true
|
|
}
|
|
|
|
if fopts.WantsJSON() {
|
|
meta := &output.Meta{Count: len(items), HasMore: truncated}
|
|
return fopts.Emit(iostreams.IO.Out, items, meta)
|
|
}
|
|
|
|
if len(items) == 0 {
|
|
fmt.Fprintln(iostreams.IO.Out, "(no documents)")
|
|
return nil
|
|
}
|
|
|
|
tw := tabwriter.NewWriter(iostreams.IO.Out, 0, 0, 2, ' ', 0)
|
|
fmt.Fprintln(tw, "ID\tNAME\tSTATUS\tSIZE\tUPDATED")
|
|
now := time.Now()
|
|
for _, k := range items {
|
|
name := text.Truncate(40, text.KnowledgeDisplayName(k.FileName, k.Title, k.ID))
|
|
updated := text.FuzzyAgo(now, k.UpdatedAt)
|
|
fmt.Fprintf(tw, "%s\t%s\t%s\t%s\t%s\n", k.ID, name, k.ParseStatus, formatSize(k.FileSize), updated)
|
|
}
|
|
return tw.Flush()
|
|
}
|
|
|
|
// validDocListStatus reports whether s matches one of the server-accepted
|
|
// parse_status enum values surfaced via --status.
|
|
func validDocListStatus(s string) bool {
|
|
return slices.Contains(docListStatusValues, s)
|
|
}
|
|
|
|
// formatSize renders a byte count as a short human string (KB / MB).
|
|
// Kept tiny on purpose - go-humanize would pull a transitive dep just for one
|
|
// column. A "-" placeholder hides zero-size entries (URL / text).
|
|
func formatSize(bytes int64) string {
|
|
if bytes <= 0 {
|
|
return "-"
|
|
}
|
|
const (
|
|
kb = 1 << 10
|
|
mb = 1 << 20
|
|
gb = 1 << 30
|
|
)
|
|
switch {
|
|
case bytes >= gb:
|
|
return fmt.Sprintf("%.1fGB", float64(bytes)/float64(gb))
|
|
case bytes >= mb:
|
|
return fmt.Sprintf("%.1fMB", float64(bytes)/float64(mb))
|
|
case bytes >= kb:
|
|
return fmt.Sprintf("%.1fKB", float64(bytes)/float64(kb))
|
|
}
|
|
return fmt.Sprintf("%dB", bytes)
|
|
}
|