Files
xet-core/api_changes
Hoyt Koepke 6f0cf38065 Stable chunk boundary detection (#815)
This PR adds a function, next_stable_chunk_boundary, that takes a list
of chunk boundary positions and a starting cut point and returns the
next chunk boundary after the cut point such that, for all possible
alterations of the data up to the cut point, the chunk boundaries when
chunking the entire file will always be the same starting at the stable
chunk boundary.

The implication of this is that to alter a specific range of a file `[a,
b)`, we would do the following:

1. Locate the previous chunk boundary before a; call this `c_start`. 
2. Take the full set of chunk boundary locations, call
next_stable_chunk_boundary with b as the cut point. this will return the
next stable chunk boundary. Call this `c_end`.
3. Make the replacement to `[a, b)`; prepend the original `data[c_start,
a)` and append `data[b, c_end)`; chunk this segment.
4. Use the merkle hash subtrees for `[0, c_start)`, the new [c_start,
c_end), and the original `[c_end, end)` to calculate the new file hash.
This will be the same as chunking the entire new file.

<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> **Medium Risk**
> Adds new public chunk-boundary selection logic used to make
resumed/partial workflows deterministic; mistakes could cause
misalignment or incorrect resume behavior in deduplication/chunking
paths. Large new randomized/stress tests reduce risk but the algorithm’s
correctness assumptions are subtle.
> 
> **Overview**
> Introduces a new public helper, `next_stable_chunk_boundary`, that
computes a restart-safe/stable resume boundary *from existing
chunk-boundary metadata* (no byte access) by scanning for two
consecutive chunks that fall within a conservative size window derived
from chunking constants.
> 
> Updates `find_partitions` documentation to reflect the hash
warmup/hidden-trigger verification approach and to reference the new
helper, re-exports the function from `xet_data::deduplication`, and adds
extensive edge-case and randomized mutation/stress tests to validate
boundary stability under arbitrary prefix changes.
> 
> <sup>Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit
98411603e3. Bugbot is set up for automated
code reviews on this repo. Configure
[here](https://www.cursor.com/dashboard/bugbot).</sup>
<!-- /CURSOR_SUMMARY -->
2026-05-18 12:16:49 -07:00
..

API changes

This folder contains a record of API changes in main. It's intended for AI agents to read in order to correctly apply merges or update dependencies and PRs.

The updates are listed by date in the form: update_<yymmdd>_<description>.md

When applying a merge, rebase, or downstream update, all AI agents should first scan this folder to understand what relevant information may need to be applied.

When creating a PR that involves an API change potentially requiring downstream updates, an AI agent should create such a file. This file should be humanly readable but contain enough information to correctly apply the needed changes without scanning the code.