mirror of
https://github.com/Tencent/WeKnora.git
synced 2026-06-04 13:30:32 +08:00
docs(chat): clarify cached-token semantics for explicit-cache providers
`cached_tokens` is reported by every OpenAI-compatible provider that
supports prompt caching, but how it becomes non-zero differs by mode:
- Implicit caching (OpenAI, Azure OpenAI, DeepSeek, …) populates the
field automatically whenever a prompt prefix matches a previous
request within the provider's cache TTL. No client-side opt-in.
- Explicit caching (Qwen on Aliyun, Anthropic Claude, …) only
populates the field after the caller attaches `cache_control:
{"type": "ephemeral"}` to the relevant message / content block.
Until that opt-in is applied upstream of the request, the field
stays zero even when the prefix is otherwise byte-stable.
Without this distinction documented, the previous commit reads as if
`TokenUsage.CachedTokens` will show non-zero values for Qwen / Claude
once this PR lands — which is not the case. The plumbing here is a
prerequisite (stable prefix via sorted tools) and a meter (visibility
of the field), but the explicit-cache opt-in itself is out of scope
and lives elsewhere.
Document this on `TokenUsage.CachedTokens` and the `cachedTokens`
helper so callers do not mistake observability for activation.
This commit is contained in:
@@ -1176,6 +1176,16 @@ func (c *RemoteAPIChat) GetAPIKey() string {
|
||||
// cachedTokens returns the cached prompt-token count from an OpenAI-compatible
|
||||
// usage detail block, or zero when the provider did not report one. Some
|
||||
// providers omit PromptTokensDetails entirely, so the nil guard is required.
|
||||
//
|
||||
// Note on provider semantics:
|
||||
// - Implicit-cache providers (OpenAI, Azure OpenAI, DeepSeek, …) populate
|
||||
// `cached_tokens` automatically whenever the prompt prefix matches a
|
||||
// previous request — no caller opt-in is required.
|
||||
// - Explicit-cache providers (Qwen on Aliyun, Anthropic Claude, …) only
|
||||
// populate `cached_tokens` after the caller attaches `cache_control:
|
||||
// {"type": "ephemeral"}` to the relevant message / content block. This
|
||||
// helper still returns zero for those providers until that opt-in is
|
||||
// applied upstream of the request.
|
||||
func cachedTokens(d *openai.PromptTokensDetails) int {
|
||||
if d == nil {
|
||||
return 0
|
||||
|
||||
@@ -11,10 +11,24 @@ type TokenUsage struct {
|
||||
CompletionTokens int `json:"completion_tokens"`
|
||||
TotalTokens int `json:"total_tokens"`
|
||||
// CachedTokens is the subset of PromptTokens that hit a provider-side
|
||||
// prompt cache (OpenAI prompt_tokens_details.cached_tokens, Qwen explicit
|
||||
// caching, etc.). Zero when the provider does not report cache hits or
|
||||
// when no cache was hit. Omitted from JSON when zero to keep payloads
|
||||
// quiet for providers that never populate it.
|
||||
// prompt cache. Populated from `usage.prompt_tokens_details.cached_tokens`
|
||||
// in OpenAI-compatible responses.
|
||||
//
|
||||
// Whether this field is non-zero depends on the provider's caching mode:
|
||||
//
|
||||
// - Implicit caching (OpenAI, Azure OpenAI, DeepSeek, …) — automatic.
|
||||
// The field populates whenever the prompt prefix matches a previous
|
||||
// request within the provider's cache TTL. No client-side opt-in.
|
||||
//
|
||||
// - Explicit caching (Qwen on Aliyun, Anthropic Claude, …) — opt-in
|
||||
// required. The caller must attach `cache_control: {"type":
|
||||
// "ephemeral"}` to the relevant message or content block to make
|
||||
// the provider create and read the cache. Until that opt-in is
|
||||
// applied, CachedTokens stays zero even when the prompt prefix is
|
||||
// otherwise byte-stable.
|
||||
//
|
||||
// Omitted from JSON when zero so payloads stay quiet for providers
|
||||
// that never populate it.
|
||||
CachedTokens int `json:"cached_tokens,omitempty"`
|
||||
}
|
||||
|
||||
|
||||
Reference in New Issue
Block a user