mirror of
https://github.com/huggingface/xet-core.git
synced 2026-06-04 13:30:29 +08:00
V2 reconstruction with client-side optional single range splitting (#703)
This PR introduces V2 multirange URL fetching for xorbs, but optionally splits the multirange requests into multiple single-range requests that can be executed in parallel. This allows the reconstruction process to generate full multirange presigned URLs, but the client effectively performs the retrieval stage as a sequence of parallel single-range queries. The config variable `client.enable_multirange_fetching` controls this behavior; by default it is set to false due to the current observed slowness of fetching multiranged URLs. --------- Co-authored-by: Adrien <adrien@huggingface.co>
This commit is contained in:
94
api_changes/update_260316_v2_reconstruction_multirange.md
Normal file
94
api_changes/update_260316_v2_reconstruction_multirange.md
Normal file
@@ -0,0 +1,94 @@
|
|||||||
|
# API Update: V2 Reconstruction with Multi-Range Fetch Support (2026-03-16)
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
|
||||||
|
The CAS reconstruction API now supports a V2 endpoint that returns optimized
|
||||||
|
multi-range fetch descriptors. The client auto-detects V2 and falls back to V1
|
||||||
|
transparently. Two new config options control reconstruction behavior.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 1. New CAS Endpoint
|
||||||
|
|
||||||
|
`GET /v2/reconstructions/{file_id}` returns `QueryReconstructionResponseV2`:
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"terms": [...],
|
||||||
|
"offset_into_first_range": 0,
|
||||||
|
"xorbs": {
|
||||||
|
"<hex_hash>": [
|
||||||
|
{
|
||||||
|
"url": "https://...",
|
||||||
|
"ranges": [
|
||||||
|
{ "chunks": { "start": 0, "end": 3 }, "bytes": { "start": 0, "end": 1023 } },
|
||||||
|
{ "chunks": { "start": 5, "end": 8 }, "bytes": { "start": 2048, "end": 3071 } }
|
||||||
|
]
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Each `XorbMultiRangeFetch` entry groups multiple disjoint chunk ranges under a
|
||||||
|
single presigned URL, enabling multi-range HTTP requests.
|
||||||
|
|
||||||
|
The client tries V2 first. On 404 or 501 it falls back to V1 and caches the
|
||||||
|
result so subsequent calls skip the V2 attempt. Setting
|
||||||
|
`HF_XET_CLIENT_RECONSTRUCTION_API_VERSION=1` or `=2` forces a specific version
|
||||||
|
with no fallback.
|
||||||
|
|
||||||
|
The `Client::get_reconstruction` trait method now always returns
|
||||||
|
`QueryReconstructionResponseV2`. When the server returns V1, the client
|
||||||
|
converts it internally.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 2. New Config Options
|
||||||
|
|
||||||
|
### `HF_XET_CLIENT_RECONSTRUCTION_API_VERSION`
|
||||||
|
|
||||||
|
Forces a specific reconstruction API version (1 or 2). When unset, the client
|
||||||
|
auto-detects by trying V2 first.
|
||||||
|
|
||||||
|
### `HF_XET_CLIENT_ENABLE_MULTIRANGE_FETCHING`
|
||||||
|
|
||||||
|
Default: `false`. When false, V2 multi-range fetch entries are split into
|
||||||
|
individual single-range requests executed in parallel. When true, multi-range
|
||||||
|
requests are sent as-is (using `multipart/byteranges` responses).
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 3. Default Concurrency Changes
|
||||||
|
|
||||||
|
- `ac_initial_upload_concurrency`: 1 → 2
|
||||||
|
- `ac_initial_download_concurrency`: 1 → 4
|
||||||
|
|
||||||
|
These align the defaults with the documented values.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 4. New Types in `xet_client::cas_types`
|
||||||
|
|
||||||
|
- `QueryReconstructionResponseV2` — V2 reconstruction response
|
||||||
|
- `XorbMultiRangeFetch` — A presigned URL with associated chunk/byte ranges
|
||||||
|
- `XorbRangeDescriptor` — A single chunk range + byte range pair
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 5. Multipart/Byteranges Parsing
|
||||||
|
|
||||||
|
`xet_client::cas_client::multipart::parse_multipart_byteranges` parses RFC 7233
|
||||||
|
`multipart/byteranges` HTTP responses. Used when `enable_multirange_fetching`
|
||||||
|
is true and the presigned URL server returns multiple byte ranges in a single
|
||||||
|
response.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 6. Downstream Impact
|
||||||
|
|
||||||
|
- `Client::get_reconstruction` return type changed to `QueryReconstructionResponseV2`
|
||||||
|
(all trait implementations updated).
|
||||||
|
- `URLProvider::retrieve_url` now returns `Vec<HttpRange>` instead of a single
|
||||||
|
`HttpRange` to support multi-range blocks.
|
||||||
|
- No wire format or serialization changes; V1 responses are converted client-side.
|
||||||
@@ -14,24 +14,27 @@ security:
|
|||||||
paths:
|
paths:
|
||||||
/v1/reconstructions/{file_id}:
|
/v1/reconstructions/{file_id}:
|
||||||
get:
|
get:
|
||||||
summary: Get File Reconstruction
|
summary: Get File Reconstruction (V1)
|
||||||
description: |
|
description: |
|
||||||
Retrieves reconstruction information for a specific file. Supports byte range via the optional `Range` header.
|
Retrieves reconstruction information for a specific file. Supports byte range via the optional `Range` header.
|
||||||
|
Returns one presigned URL per chunk range per xorb.
|
||||||
|
|
||||||
Minimum token scope: `read`.
|
Minimum token scope: `read`.
|
||||||
x-required-scope: read
|
x-required-scope: read
|
||||||
operationId: getReconstruction
|
operationId: getReconstructionV1
|
||||||
parameters:
|
parameters:
|
||||||
- $ref: '#/components/parameters/FileIdParam'
|
- $ref: '#/components/parameters/FileIdParam'
|
||||||
- $ref: '#/components/parameters/RangeHeader'
|
- $ref: '#/components/parameters/RangeHeader'
|
||||||
responses:
|
responses:
|
||||||
'200':
|
'200':
|
||||||
description: Reconstruction object
|
description: V1 reconstruction object
|
||||||
content:
|
content:
|
||||||
application/json:
|
application/json:
|
||||||
schema:
|
schema:
|
||||||
$ref: '#/components/schemas/QueryReconstructionResponse'
|
$ref: '#/components/schemas/QueryReconstructionResponse'
|
||||||
examples:
|
examples:
|
||||||
example:
|
v1:
|
||||||
|
summary: V1 response
|
||||||
value:
|
value:
|
||||||
offset_into_first_range: 0
|
offset_into_first_range: 0
|
||||||
terms:
|
terms:
|
||||||
@@ -57,6 +60,60 @@ paths:
|
|||||||
description: Not Found — File does not exist
|
description: Not Found — File does not exist
|
||||||
'416':
|
'416':
|
||||||
description: Range Not Satisfiable — Requested byte range start exceeds file length
|
description: Range Not Satisfiable — Requested byte range start exceeds file length
|
||||||
|
/v2/reconstructions/{file_id}:
|
||||||
|
get:
|
||||||
|
summary: Get File Reconstruction (V2)
|
||||||
|
description: |
|
||||||
|
V2 reconstruction endpoint optimized for multi-range fetching.
|
||||||
|
Returns fewer signed URLs by combining multiple byte ranges for the same xorb into a single URL,
|
||||||
|
enabling multi-range HTTP requests (RFC 7233).
|
||||||
|
|
||||||
|
Clients SHOULD try V2 first and fall back to V1 if the server returns 404 or 501.
|
||||||
|
|
||||||
|
Minimum token scope: `read`.
|
||||||
|
x-required-scope: read
|
||||||
|
operationId: getReconstructionV2
|
||||||
|
parameters:
|
||||||
|
- $ref: '#/components/parameters/FileIdParam'
|
||||||
|
- $ref: '#/components/parameters/RangeHeader'
|
||||||
|
responses:
|
||||||
|
'200':
|
||||||
|
description: V2 reconstruction object
|
||||||
|
content:
|
||||||
|
application/json:
|
||||||
|
schema:
|
||||||
|
$ref: '#/components/schemas/QueryReconstructionResponseV2'
|
||||||
|
examples:
|
||||||
|
v2:
|
||||||
|
summary: V2 response (multi-range optimized)
|
||||||
|
value:
|
||||||
|
offset_into_first_range: 0
|
||||||
|
terms:
|
||||||
|
- hash: a1b2c3d4e5f6789012345678901234567890abcdef1234567890abcdef123456
|
||||||
|
unpacked_length: 263873
|
||||||
|
range:
|
||||||
|
start: 0
|
||||||
|
end: 4
|
||||||
|
xorbs:
|
||||||
|
a1b2c3d4e5f6789012345678901234567890abcdef1234567890abcdef123456:
|
||||||
|
- url: "https://transfer.xethub.hf.co/xorbs/default/a1b2c3...?<signed-params>"
|
||||||
|
ranges:
|
||||||
|
- chunks:
|
||||||
|
start: 0
|
||||||
|
end: 4
|
||||||
|
bytes:
|
||||||
|
start: 0
|
||||||
|
end: 131071
|
||||||
|
'400':
|
||||||
|
description: Bad Request — Malformed file_id
|
||||||
|
'401':
|
||||||
|
description: Unauthorized — Missing/expired token
|
||||||
|
'404':
|
||||||
|
description: Not Found — File does not exist, or V2 not supported (fall back to V1)
|
||||||
|
'416':
|
||||||
|
description: Range Not Satisfiable — Requested byte range start exceeds file length
|
||||||
|
'501':
|
||||||
|
description: Not Implemented — V2 not supported by this server (fall back to V1)
|
||||||
/v1/chunks/{prefix}/{hash}:
|
/v1/chunks/{prefix}/{hash}:
|
||||||
get:
|
get:
|
||||||
summary: Query Chunk Deduplication (Global Deduplication)
|
summary: Query Chunk Deduplication (Global Deduplication)
|
||||||
@@ -286,6 +343,56 @@ components:
|
|||||||
$ref: '#/components/schemas/CASReconstructionFetchInfo'
|
$ref: '#/components/schemas/CASReconstructionFetchInfo'
|
||||||
required: [offset_into_first_range, terms, fetch_info]
|
required: [offset_into_first_range, terms, fetch_info]
|
||||||
additionalProperties: false
|
additionalProperties: false
|
||||||
|
XorbRangeDescriptor:
|
||||||
|
type: object
|
||||||
|
description: A chunk/byte range within a xorb
|
||||||
|
properties:
|
||||||
|
chunks:
|
||||||
|
$ref: '#/components/schemas/IndexRange'
|
||||||
|
bytes:
|
||||||
|
$ref: '#/components/schemas/ByteRange'
|
||||||
|
required: [chunks, bytes]
|
||||||
|
additionalProperties: false
|
||||||
|
XorbMultiRangeFetch:
|
||||||
|
type: object
|
||||||
|
description: A signed multi-range fetch entry covering a subset of ranges for a xorb
|
||||||
|
properties:
|
||||||
|
url:
|
||||||
|
type: string
|
||||||
|
format: uri
|
||||||
|
description: |
|
||||||
|
Signed URL with all byte ranges encoded.
|
||||||
|
Client must send exactly the signed range value as the Range header.
|
||||||
|
ranges:
|
||||||
|
type: array
|
||||||
|
items:
|
||||||
|
$ref: '#/components/schemas/XorbRangeDescriptor'
|
||||||
|
description: Byte ranges covered by this URL, sorted by chunk start
|
||||||
|
required: [url, ranges]
|
||||||
|
additionalProperties: false
|
||||||
|
QueryReconstructionResponseV2:
|
||||||
|
type: object
|
||||||
|
description: V2 reconstruction response optimized for multi-range fetching
|
||||||
|
properties:
|
||||||
|
offset_into_first_range:
|
||||||
|
type: integer
|
||||||
|
minimum: 0
|
||||||
|
terms:
|
||||||
|
type: array
|
||||||
|
items:
|
||||||
|
$ref: '#/components/schemas/CASReconstructionTerm'
|
||||||
|
xorbs:
|
||||||
|
type: object
|
||||||
|
description: Map from xorb hash to list of multi-range fetch entries
|
||||||
|
propertyNames:
|
||||||
|
$ref: '#/components/schemas/HexString64Lowercase'
|
||||||
|
additionalProperties:
|
||||||
|
type: array
|
||||||
|
items:
|
||||||
|
$ref: '#/components/schemas/XorbMultiRangeFetch'
|
||||||
|
minItems: 1
|
||||||
|
required: [offset_into_first_range, terms, xorbs]
|
||||||
|
additionalProperties: false
|
||||||
UploadXorbResponse:
|
UploadXorbResponse:
|
||||||
type: object
|
type: object
|
||||||
properties:
|
properties:
|
||||||
|
|||||||
@@ -6,14 +6,16 @@ use xet_core_structures::xorb_object::SerializedXorbObject;
|
|||||||
use super::adaptive_concurrency::ConnectionPermit;
|
use super::adaptive_concurrency::ConnectionPermit;
|
||||||
use super::error::Result;
|
use super::error::Result;
|
||||||
use super::progress_tracked_streams::ProgressCallback;
|
use super::progress_tracked_streams::ProgressCallback;
|
||||||
use crate::cas_types::{BatchQueryReconstructionResponse, FileRange, HttpRange, QueryReconstructionResponse};
|
use crate::cas_types::{BatchQueryReconstructionResponse, FileRange, HttpRange, QueryReconstructionResponseV2};
|
||||||
|
|
||||||
#[async_trait::async_trait]
|
#[async_trait::async_trait]
|
||||||
pub trait URLProvider: Send + Sync {
|
pub trait URLProvider: Send + Sync {
|
||||||
// Retrieves the URL.
|
/// Retrieves the URL and the byte ranges to fetch.
|
||||||
async fn retrieve_url(&self) -> Result<(String, HttpRange)>;
|
/// For single-range (V1) blocks, the Vec has one entry.
|
||||||
|
/// For multi-range (V2) blocks, all ranges are included.
|
||||||
|
async fn retrieve_url(&self) -> Result<(String, Vec<HttpRange>)>;
|
||||||
|
|
||||||
// Asks for a refresh of the URL; triggered on 403 errors.
|
/// Asks for a refresh of the URL; triggered on 403 errors.
|
||||||
async fn refresh_url(&self) -> Result<()>;
|
async fn refresh_url(&self) -> Result<()>;
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -30,11 +32,13 @@ pub trait Client: Send + Sync {
|
|||||||
file_hash: &MerkleHash,
|
file_hash: &MerkleHash,
|
||||||
) -> Result<Option<(MDBFileInfo, Option<MerkleHash>)>>;
|
) -> Result<Option<(MDBFileInfo, Option<MerkleHash>)>>;
|
||||||
|
|
||||||
|
/// Returns reconstruction info always in V2 format.
|
||||||
|
/// Implementations may try V2 first and fall back to V1 + convert.
|
||||||
async fn get_reconstruction(
|
async fn get_reconstruction(
|
||||||
&self,
|
&self,
|
||||||
file_id: &MerkleHash,
|
file_id: &MerkleHash,
|
||||||
bytes_range: Option<FileRange>,
|
bytes_range: Option<FileRange>,
|
||||||
) -> Result<Option<QueryReconstructionResponse>>;
|
) -> Result<Option<QueryReconstructionResponseV2>>;
|
||||||
|
|
||||||
async fn batch_get_reconstruction(&self, file_ids: &[MerkleHash]) -> Result<BatchQueryReconstructionResponse>;
|
async fn batch_get_reconstruction(&self, file_ids: &[MerkleHash]) -> Result<BatchQueryReconstructionResponse>;
|
||||||
|
|
||||||
|
|||||||
@@ -16,6 +16,7 @@ mod error;
|
|||||||
pub mod exports;
|
pub mod exports;
|
||||||
pub mod http_client;
|
pub mod http_client;
|
||||||
mod interface;
|
mod interface;
|
||||||
|
pub mod multipart;
|
||||||
pub mod progress_tracked_streams;
|
pub mod progress_tracked_streams;
|
||||||
pub mod remote_client;
|
pub mod remote_client;
|
||||||
pub mod retry_wrapper;
|
pub mod retry_wrapper;
|
||||||
|
|||||||
186
xet_client/src/cas_client/multipart.rs
Normal file
186
xet_client/src/cas_client/multipart.rs
Normal file
@@ -0,0 +1,186 @@
|
|||||||
|
use bytes::Bytes;
|
||||||
|
|
||||||
|
use crate::cas_client::error::{CasClientError, Result};
|
||||||
|
use crate::cas_types::HttpRange;
|
||||||
|
|
||||||
|
/// A single part from a multipart/byteranges HTTP response.
|
||||||
|
pub struct MultipartPart {
|
||||||
|
pub range: HttpRange,
|
||||||
|
pub data: Bytes,
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Parse a `multipart/byteranges` HTTP response body (RFC 7233 §4.1).
|
||||||
|
///
|
||||||
|
/// Extracts the boundary from `content_type`, splits the body by boundary markers,
|
||||||
|
/// parses `Content-Range` headers from each part, and returns parts sorted by byte range start.
|
||||||
|
pub fn parse_multipart_byteranges(content_type: &str, body: Bytes) -> Result<Vec<MultipartPart>> {
|
||||||
|
let boundary = extract_boundary(content_type)?;
|
||||||
|
|
||||||
|
let delimiter = format!("\r\n--{boundary}");
|
||||||
|
let body_slice = body.as_ref();
|
||||||
|
|
||||||
|
let mut parts = Vec::new();
|
||||||
|
|
||||||
|
let first_delim = format!("--{boundary}");
|
||||||
|
let Some(start) = find_subsequence(body_slice, first_delim.as_bytes()) else {
|
||||||
|
return Err(CasClientError::Other("No boundary found in multipart body".to_string()));
|
||||||
|
};
|
||||||
|
|
||||||
|
let mut remaining = &body_slice[start + first_delim.len()..];
|
||||||
|
|
||||||
|
loop {
|
||||||
|
if remaining.starts_with(b"\r\n") {
|
||||||
|
remaining = &remaining[2..];
|
||||||
|
} else {
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
|
||||||
|
let next_boundary = find_subsequence(remaining, delimiter.as_bytes());
|
||||||
|
let part_data = match next_boundary {
|
||||||
|
Some(pos) => &remaining[..pos],
|
||||||
|
None => remaining,
|
||||||
|
};
|
||||||
|
|
||||||
|
let Some(header_end) = find_subsequence(part_data, b"\r\n\r\n") else {
|
||||||
|
return Err(CasClientError::Other("Malformed multipart part: missing header/data separator".to_string()));
|
||||||
|
};
|
||||||
|
|
||||||
|
let headers = &part_data[..header_end];
|
||||||
|
let data_start = header_end + 4;
|
||||||
|
let data = &part_data[data_start..];
|
||||||
|
|
||||||
|
let range = parse_content_range(headers)?;
|
||||||
|
// Compute the absolute byte offset into the original `body` so we can
|
||||||
|
// use Bytes::slice for zero-copy extraction of this part's data.
|
||||||
|
let offset =
|
||||||
|
body.len() - body_slice.len() + (remaining.as_ptr() as usize - body_slice.as_ptr() as usize) + data_start;
|
||||||
|
parts.push(MultipartPart {
|
||||||
|
range,
|
||||||
|
data: body.slice(offset..offset + data.len()),
|
||||||
|
});
|
||||||
|
|
||||||
|
match next_boundary {
|
||||||
|
Some(pos) => {
|
||||||
|
remaining = &remaining[pos + delimiter.len()..];
|
||||||
|
},
|
||||||
|
None => break,
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
parts.sort_by_key(|p| p.range.start);
|
||||||
|
|
||||||
|
Ok(parts)
|
||||||
|
}
|
||||||
|
|
||||||
|
fn extract_boundary(content_type: &str) -> Result<String> {
|
||||||
|
for part in content_type.split(';') {
|
||||||
|
let part = part.trim();
|
||||||
|
if let Some(value) = part.strip_prefix("boundary=") {
|
||||||
|
let boundary = value.trim_matches('"');
|
||||||
|
return Ok(boundary.to_string());
|
||||||
|
}
|
||||||
|
}
|
||||||
|
Err(CasClientError::Other(format!("No boundary found in Content-Type: {content_type}")))
|
||||||
|
}
|
||||||
|
|
||||||
|
fn parse_content_range(headers: &[u8]) -> Result<HttpRange> {
|
||||||
|
let headers_str = std::str::from_utf8(headers)
|
||||||
|
.map_err(|e| CasClientError::Other(format!("Invalid UTF-8 in part headers: {e}")))?;
|
||||||
|
|
||||||
|
for line in headers_str.split("\r\n") {
|
||||||
|
let line_lower = line.to_ascii_lowercase();
|
||||||
|
if let Some(value) = line_lower.strip_prefix("content-range:") {
|
||||||
|
// Digits, dashes, and slashes are case-invariant, so we can parse
|
||||||
|
// directly from the lowercased value.
|
||||||
|
if let Some(range_spec) = value.trim().strip_prefix("bytes ") {
|
||||||
|
let original_value = range_spec.trim();
|
||||||
|
let slash_pos = original_value
|
||||||
|
.find('/')
|
||||||
|
.ok_or_else(|| CasClientError::Other(format!("Invalid Content-Range: {line}")))?;
|
||||||
|
let range_part = &original_value[..slash_pos];
|
||||||
|
let dash_pos = range_part
|
||||||
|
.find('-')
|
||||||
|
.ok_or_else(|| CasClientError::Other(format!("Invalid Content-Range: {line}")))?;
|
||||||
|
let start: u64 = range_part[..dash_pos]
|
||||||
|
.parse()
|
||||||
|
.map_err(|e| CasClientError::Other(format!("Invalid Content-Range start: {e}")))?;
|
||||||
|
let end: u64 = range_part[dash_pos + 1..]
|
||||||
|
.parse()
|
||||||
|
.map_err(|e| CasClientError::Other(format!("Invalid Content-Range end: {e}")))?;
|
||||||
|
// RFC 7233 Content-Range uses an inclusive end, which matches HttpRange.
|
||||||
|
return Ok(HttpRange::new(start, end));
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
Err(CasClientError::Other("No Content-Range header found in multipart part".to_string()))
|
||||||
|
}
|
||||||
|
|
||||||
|
fn find_subsequence(haystack: &[u8], needle: &[u8]) -> Option<usize> {
|
||||||
|
haystack.windows(needle.len()).position(|window| window == needle)
|
||||||
|
}
|
||||||
|
|
||||||
|
#[cfg(test)]
|
||||||
|
mod tests {
|
||||||
|
use super::*;
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_extract_boundary() {
|
||||||
|
assert_eq!(extract_boundary("multipart/byteranges; boundary=something").unwrap(), "something");
|
||||||
|
assert_eq!(extract_boundary("multipart/byteranges; boundary=\"quoted\"").unwrap(), "quoted");
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_extract_boundary_missing() {
|
||||||
|
assert!(extract_boundary("text/plain").is_err());
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_parse_single_part() {
|
||||||
|
let boundary = "abc123";
|
||||||
|
let body = format!(
|
||||||
|
"--{boundary}\r\nContent-Type: application/octet-stream\r\nContent-Range: bytes 0-99/1000\r\n\r\nHello World\r\n--{boundary}--\r\n"
|
||||||
|
);
|
||||||
|
let content_type = format!("multipart/byteranges; boundary={boundary}");
|
||||||
|
|
||||||
|
let parts = parse_multipart_byteranges(&content_type, Bytes::from(body)).unwrap();
|
||||||
|
assert_eq!(parts.len(), 1);
|
||||||
|
assert_eq!(parts[0].range.start, 0);
|
||||||
|
assert_eq!(parts[0].range.end, 99);
|
||||||
|
assert_eq!(&parts[0].data[..], b"Hello World");
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_parse_multiple_parts() {
|
||||||
|
let boundary = "sep";
|
||||||
|
let body = format!(
|
||||||
|
"--{boundary}\r\nContent-Range: bytes 100-199/1000\r\n\r\nPart2Data\r\n--{boundary}\r\nContent-Range: bytes 0-49/1000\r\n\r\nPart1Data\r\n--{boundary}--\r\n"
|
||||||
|
);
|
||||||
|
let content_type = format!("multipart/byteranges; boundary={boundary}");
|
||||||
|
|
||||||
|
let parts = parse_multipart_byteranges(&content_type, Bytes::from(body)).unwrap();
|
||||||
|
assert_eq!(parts.len(), 2);
|
||||||
|
assert_eq!(parts[0].range.start, 0);
|
||||||
|
assert_eq!(parts[0].range.end, 49);
|
||||||
|
assert_eq!(&parts[0].data[..], b"Part1Data");
|
||||||
|
assert_eq!(parts[1].range.start, 100);
|
||||||
|
assert_eq!(parts[1].range.end, 199);
|
||||||
|
assert_eq!(&parts[1].data[..], b"Part2Data");
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_parse_empty_body_no_boundary() {
|
||||||
|
let content_type = "multipart/byteranges; boundary=xyz";
|
||||||
|
let result = parse_multipart_byteranges(content_type, Bytes::new());
|
||||||
|
assert!(result.is_err());
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_parse_part_missing_header_separator() {
|
||||||
|
let boundary = "xyz";
|
||||||
|
let body = format!("--{boundary}\r\nContent-Range: bytes 0-9/100\r\nMISSING_SEPARATOR\r\n--{boundary}--\r\n");
|
||||||
|
let content_type = format!("multipart/byteranges; boundary={boundary}");
|
||||||
|
let result = parse_multipart_byteranges(&content_type, Bytes::from(body));
|
||||||
|
assert!(result.is_err());
|
||||||
|
}
|
||||||
|
}
|
||||||
@@ -1,5 +1,5 @@
|
|||||||
use std::sync::Arc;
|
use std::sync::Arc;
|
||||||
use std::sync::atomic::{AtomicU64, Ordering};
|
use std::sync::atomic::{AtomicU32, AtomicU64, Ordering};
|
||||||
|
|
||||||
use bytes::Bytes;
|
use bytes::Bytes;
|
||||||
use futures::TryStreamExt;
|
use futures::TryStreamExt;
|
||||||
@@ -24,8 +24,8 @@ use super::progress_tracked_streams::{
|
|||||||
use super::retry_wrapper::{RetryWrapper, RetryableReqwestError};
|
use super::retry_wrapper::{RetryWrapper, RetryableReqwestError};
|
||||||
use super::{Client, INFORMATION_LOG_LEVEL};
|
use super::{Client, INFORMATION_LOG_LEVEL};
|
||||||
use crate::cas_types::{
|
use crate::cas_types::{
|
||||||
BatchQueryReconstructionResponse, FileRange, HttpRange, Key, QueryReconstructionResponse, UploadShardResponse,
|
BatchQueryReconstructionResponse, FileRange, HttpRange, Key, QueryReconstructionResponse,
|
||||||
UploadShardResponseType, UploadXorbResponse,
|
QueryReconstructionResponseV2, UploadShardResponse, UploadShardResponseType, UploadXorbResponse,
|
||||||
};
|
};
|
||||||
|
|
||||||
pub const CAS_ENDPOINT: &str = "http://localhost:8080";
|
pub const CAS_ENDPOINT: &str = "http://localhost:8080";
|
||||||
@@ -48,6 +48,8 @@ pub struct RemoteClient {
|
|||||||
shard_upload_http_client: Arc<ClientWithMiddleware>,
|
shard_upload_http_client: Arc<ClientWithMiddleware>,
|
||||||
upload_concurrency_controller: Arc<AdaptiveConcurrencyController>,
|
upload_concurrency_controller: Arc<AdaptiveConcurrencyController>,
|
||||||
download_concurrency_controller: Arc<AdaptiveConcurrencyController>,
|
download_concurrency_controller: Arc<AdaptiveConcurrencyController>,
|
||||||
|
/// Caches the discovered reconstruction API version (0 = not yet probed, 1 = V1, 2 = V2).
|
||||||
|
detected_reconstruction_api_version: AtomicU32,
|
||||||
}
|
}
|
||||||
|
|
||||||
impl RemoteClient {
|
impl RemoteClient {
|
||||||
@@ -85,6 +87,7 @@ impl RemoteClient {
|
|||||||
),
|
),
|
||||||
upload_concurrency_controller: AdaptiveConcurrencyController::new_upload("upload"),
|
upload_concurrency_controller: AdaptiveConcurrencyController::new_upload("upload"),
|
||||||
download_concurrency_controller: AdaptiveConcurrencyController::new_download("download"),
|
download_concurrency_controller: AdaptiveConcurrencyController::new_download("download"),
|
||||||
|
detected_reconstruction_api_version: AtomicU32::new(0),
|
||||||
})
|
})
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -168,6 +171,126 @@ impl RemoteClient {
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
impl RemoteClient {
|
||||||
|
async fn get_reconstruction_impl<T>(
|
||||||
|
&self,
|
||||||
|
file_id: &MerkleHash,
|
||||||
|
bytes_range: Option<FileRange>,
|
||||||
|
api_version: &str,
|
||||||
|
) -> Result<Option<T>>
|
||||||
|
where
|
||||||
|
T: serde::de::DeserializeOwned + 'static,
|
||||||
|
{
|
||||||
|
let call_id = FN_CALL_ID.fetch_add(1, Ordering::Relaxed);
|
||||||
|
let url = Url::parse(&format!("{}/{api_version}/reconstructions/{}", self.endpoint, file_id.hex()))?;
|
||||||
|
let api_tag = match api_version {
|
||||||
|
"v1" => "cas::get_reconstruction_v1",
|
||||||
|
"v2" => "cas::get_reconstruction_v2",
|
||||||
|
_ => {
|
||||||
|
return Err(CasClientError::internal(format!("unsupported reconstruction API version: {api_version}")));
|
||||||
|
},
|
||||||
|
};
|
||||||
|
|
||||||
|
event!(
|
||||||
|
INFORMATION_LOG_LEVEL,
|
||||||
|
call_id,
|
||||||
|
%file_id,
|
||||||
|
?bytes_range,
|
||||||
|
api_version,
|
||||||
|
"Starting get_reconstruction API call",
|
||||||
|
);
|
||||||
|
|
||||||
|
let client = self.authenticated_http_client.clone();
|
||||||
|
|
||||||
|
let result: Result<T> = RetryWrapper::new(api_tag)
|
||||||
|
.run_and_extract_json(move || {
|
||||||
|
let mut request = client.get(url.clone()).with_extension(Api(api_tag));
|
||||||
|
if let Some(range) = bytes_range {
|
||||||
|
request = request.header(RANGE, HttpRange::from(range).range_header())
|
||||||
|
}
|
||||||
|
request.send()
|
||||||
|
})
|
||||||
|
.await;
|
||||||
|
|
||||||
|
match result {
|
||||||
|
Ok(response) => {
|
||||||
|
event!(
|
||||||
|
INFORMATION_LOG_LEVEL,
|
||||||
|
call_id,
|
||||||
|
%file_id,
|
||||||
|
?bytes_range,
|
||||||
|
api_version,
|
||||||
|
"Completed get_reconstruction API call"
|
||||||
|
);
|
||||||
|
Ok(Some(response))
|
||||||
|
},
|
||||||
|
Err(CasClientError::ReqwestError(ref e, _)) if e.status() == Some(StatusCode::RANGE_NOT_SATISFIABLE) => {
|
||||||
|
Ok(None)
|
||||||
|
},
|
||||||
|
Err(e) => Err(e),
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/// V1 reconstruction: returns per-range presigned URLs.
|
||||||
|
pub async fn get_reconstruction_v1(
|
||||||
|
&self,
|
||||||
|
file_id: &MerkleHash,
|
||||||
|
bytes_range: Option<FileRange>,
|
||||||
|
) -> Result<Option<QueryReconstructionResponse>> {
|
||||||
|
self.get_reconstruction_impl(file_id, bytes_range, "v1").await
|
||||||
|
}
|
||||||
|
|
||||||
|
/// V2 reconstruction: returns per-xorb multi-range fetch descriptors.
|
||||||
|
pub async fn get_reconstruction_v2(
|
||||||
|
&self,
|
||||||
|
file_id: &MerkleHash,
|
||||||
|
bytes_range: Option<FileRange>,
|
||||||
|
) -> Result<Option<QueryReconstructionResponseV2>> {
|
||||||
|
self.get_reconstruction_impl(file_id, bytes_range, "v2").await
|
||||||
|
}
|
||||||
|
|
||||||
|
pub(crate) async fn get_reconstruction_with_version_override(
|
||||||
|
&self,
|
||||||
|
file_id: &MerkleHash,
|
||||||
|
bytes_range: Option<FileRange>,
|
||||||
|
forced_version: Option<u32>,
|
||||||
|
) -> Result<Option<QueryReconstructionResponseV2>> {
|
||||||
|
// Prefer V2; fall back to V1 on 404/501; persist detected version to
|
||||||
|
// avoid repeated fallback attempts.
|
||||||
|
let version = match forced_version {
|
||||||
|
Some(v) => v,
|
||||||
|
None => {
|
||||||
|
let detected = self.detected_reconstruction_api_version.load(Ordering::Relaxed);
|
||||||
|
if detected != 0 { detected } else { 2 }
|
||||||
|
},
|
||||||
|
};
|
||||||
|
|
||||||
|
match version {
|
||||||
|
2 => match self.get_reconstruction_v2(file_id, bytes_range).await {
|
||||||
|
Ok(result) => {
|
||||||
|
if forced_version.is_none() {
|
||||||
|
self.detected_reconstruction_api_version.store(2, Ordering::Relaxed);
|
||||||
|
}
|
||||||
|
Ok(result)
|
||||||
|
},
|
||||||
|
Err(e)
|
||||||
|
if forced_version.is_none()
|
||||||
|
&& matches!(e.status(), Some(StatusCode::NOT_FOUND) | Some(StatusCode::NOT_IMPLEMENTED)) =>
|
||||||
|
{
|
||||||
|
info!(status = ?e.status(), "V2 reconstruction not available, falling back to V1");
|
||||||
|
let result = self.get_reconstruction_v1(file_id, bytes_range).await?.map(Into::into);
|
||||||
|
// Store after success to make sure we don't mess up on e.g. network failure.
|
||||||
|
self.detected_reconstruction_api_version.store(1, Ordering::Relaxed);
|
||||||
|
Ok(result)
|
||||||
|
},
|
||||||
|
Err(e) => Err(e),
|
||||||
|
},
|
||||||
|
1 => Ok(self.get_reconstruction_v1(file_id, bytes_range).await?.map(Into::into)),
|
||||||
|
other => Err(CasClientError::internal(format!("unsupported reconstruction API version: {other}"))),
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
#[cfg_attr(not(target_family = "wasm"), async_trait::async_trait)]
|
#[cfg_attr(not(target_family = "wasm"), async_trait::async_trait)]
|
||||||
#[cfg_attr(target_family = "wasm", async_trait::async_trait(?Send))]
|
#[cfg_attr(target_family = "wasm", async_trait::async_trait(?Send))]
|
||||||
impl Client for RemoteClient {
|
impl Client for RemoteClient {
|
||||||
@@ -175,49 +298,10 @@ impl Client for RemoteClient {
|
|||||||
&self,
|
&self,
|
||||||
file_id: &MerkleHash,
|
file_id: &MerkleHash,
|
||||||
bytes_range: Option<FileRange>,
|
bytes_range: Option<FileRange>,
|
||||||
) -> Result<Option<QueryReconstructionResponse>> {
|
) -> Result<Option<QueryReconstructionResponseV2>> {
|
||||||
let call_id = FN_CALL_ID.fetch_add(1, Ordering::Relaxed);
|
let forced_version = xet_config().client.reconstruction_api_version;
|
||||||
let url = Url::parse(&format!("{}/v1/reconstructions/{}", self.endpoint, file_id.hex()))?;
|
self.get_reconstruction_with_version_override(file_id, bytes_range, forced_version)
|
||||||
event!(
|
.await
|
||||||
INFORMATION_LOG_LEVEL,
|
|
||||||
call_id,
|
|
||||||
%file_id,
|
|
||||||
?bytes_range,
|
|
||||||
"Starting get_reconstruction API call",
|
|
||||||
);
|
|
||||||
|
|
||||||
let api_tag = "cas::get_reconstruction";
|
|
||||||
let client = self.authenticated_http_client.clone();
|
|
||||||
|
|
||||||
let result: Result<QueryReconstructionResponse> = RetryWrapper::new(api_tag)
|
|
||||||
.run_and_extract_json(move || {
|
|
||||||
let mut request = client.get(url.clone()).with_extension(Api(api_tag));
|
|
||||||
if let Some(range) = bytes_range {
|
|
||||||
// convert exclusive-end to inclusive-end range
|
|
||||||
request = request.header(RANGE, HttpRange::from(range).range_header())
|
|
||||||
}
|
|
||||||
|
|
||||||
request.send()
|
|
||||||
})
|
|
||||||
.await;
|
|
||||||
|
|
||||||
match result {
|
|
||||||
Ok(query_reconstruction_response) => {
|
|
||||||
event!(
|
|
||||||
INFORMATION_LOG_LEVEL,
|
|
||||||
call_id,
|
|
||||||
%file_id,
|
|
||||||
?bytes_range,
|
|
||||||
"Completed get_reconstruction API call"
|
|
||||||
);
|
|
||||||
Ok(Some(query_reconstruction_response))
|
|
||||||
},
|
|
||||||
Err(CasClientError::ReqwestError(ref e, _)) if e.status() == Some(StatusCode::RANGE_NOT_SATISFIABLE) => {
|
|
||||||
// bytes_range not satisfiable
|
|
||||||
Ok(None)
|
|
||||||
},
|
|
||||||
Err(e) => Err(e),
|
|
||||||
}
|
|
||||||
}
|
}
|
||||||
|
|
||||||
async fn batch_get_reconstruction(&self, file_ids: &[MerkleHash]) -> Result<BatchQueryReconstructionResponse> {
|
async fn batch_get_reconstruction(&self, file_ids: &[MerkleHash]) -> Result<BatchQueryReconstructionResponse> {
|
||||||
@@ -270,8 +354,8 @@ impl Client for RemoteClient {
|
|||||||
let http_client = self.http_client.clone();
|
let http_client = self.http_client.clone();
|
||||||
let url_info = Arc::new(url_info);
|
let url_info = Arc::new(url_info);
|
||||||
|
|
||||||
let (_, url_range) = url_info.retrieve_url().await?;
|
let (_, url_ranges) = url_info.retrieve_url().await?;
|
||||||
let total_download_bytes = url_range.length();
|
let total_download_bytes: u64 = url_ranges.iter().map(|r| r.length()).sum();
|
||||||
|
|
||||||
let mut transfer_reporter = StreamProgressReporter::new(total_download_bytes)
|
let mut transfer_reporter = StreamProgressReporter::new(total_download_bytes)
|
||||||
.with_adaptive_concurrency_reporter(download_permit.get_partial_completion_reporting_function());
|
.with_adaptive_concurrency_reporter(download_permit.get_partial_completion_reporting_function());
|
||||||
@@ -288,16 +372,28 @@ impl Client for RemoteClient {
|
|||||||
let url_info = url_info.clone();
|
let url_info = url_info.clone();
|
||||||
|
|
||||||
async move {
|
async move {
|
||||||
let (url_string, url_range) = url_info
|
let (url_string, url_ranges) = url_info
|
||||||
.retrieve_url()
|
.retrieve_url()
|
||||||
.await
|
.await
|
||||||
.map_err(|e| reqwest_middleware::Error::Middleware(e.into()))?;
|
.map_err(|e| reqwest_middleware::Error::Middleware(e.into()))?;
|
||||||
let url =
|
let url =
|
||||||
Url::parse(&url_string).map_err(|e| reqwest_middleware::Error::Middleware(e.into()))?;
|
Url::parse(&url_string).map_err(|e| reqwest_middleware::Error::Middleware(e.into()))?;
|
||||||
|
|
||||||
|
// RFC 7233 §2.1: single-range uses "bytes=S-E", multi-range uses "bytes=S1-E1,S2-E2,..."
|
||||||
|
let range_header_value = if url_ranges.len() == 1 {
|
||||||
|
url_ranges[0].range_header()
|
||||||
|
} else {
|
||||||
|
let joined = url_ranges
|
||||||
|
.iter()
|
||||||
|
.map(|r| format!("{}-{}", r.start, r.end))
|
||||||
|
.collect::<Vec<_>>()
|
||||||
|
.join(",");
|
||||||
|
format!("bytes={joined}")
|
||||||
|
};
|
||||||
|
|
||||||
let response = http_client
|
let response = http_client
|
||||||
.get(url)
|
.get(url)
|
||||||
.header(RANGE, url_range.range_header())
|
.header(RANGE, range_header_value)
|
||||||
.with_extension(Api(api_tag))
|
.with_extension(Api(api_tag))
|
||||||
.send()
|
.send()
|
||||||
.await?;
|
.await?;
|
||||||
@@ -315,34 +411,86 @@ impl Client for RemoteClient {
|
|||||||
move |resp: Response| {
|
move |resp: Response| {
|
||||||
let transfer_reporter = transfer_reporter.clone();
|
let transfer_reporter = transfer_reporter.clone();
|
||||||
async move {
|
async move {
|
||||||
let incoming_stream = DownloadProgressStream::wrap_stream(
|
let content_type = resp
|
||||||
resp.bytes_stream().map_err(std::io::Error::other),
|
.headers()
|
||||||
transfer_reporter,
|
.get("content-type")
|
||||||
);
|
.and_then(|v| v.to_str().ok())
|
||||||
|
.unwrap_or("")
|
||||||
|
.to_string();
|
||||||
|
|
||||||
let capacity = uncompressed_size_if_known.unwrap_or(0);
|
let is_multipart = content_type.contains("multipart/byteranges");
|
||||||
let mut buffer = Vec::with_capacity(capacity);
|
|
||||||
let mut writer = std::io::Cursor::new(&mut buffer);
|
|
||||||
|
|
||||||
let result = xet_core_structures::xorb_object::deserialize_async::deserialize_chunks_to_writer_from_stream(
|
if is_multipart {
|
||||||
incoming_stream,
|
let body = resp
|
||||||
&mut writer,
|
.bytes()
|
||||||
)
|
.await
|
||||||
.await;
|
.map_err(|e| RetryableReqwestError::RetryableError(CasClientError::from(e)))?;
|
||||||
|
|
||||||
match result {
|
let multipart_parts = crate::cas_client::multipart::parse_multipart_byteranges(&content_type, body)
|
||||||
Ok((_compressed_len, chunk_byte_indices)) => {
|
.map_err(RetryableReqwestError::FatalError)?;
|
||||||
if let Some(expected) = uncompressed_size_if_known
|
|
||||||
&& expected != buffer.len()
|
let mut all_decompressed = Vec::with_capacity(uncompressed_size_if_known.unwrap_or(0));
|
||||||
{
|
let mut all_chunk_indices = Vec::<u32>::new();
|
||||||
return Err(RetryableReqwestError::RetryableError(CasClientError::Other(format!(
|
let mut total_compressed_bytes = 0u64;
|
||||||
"get_file_term_data: expected {expected} uncompressed bytes, got {}",
|
|
||||||
buffer.len()
|
for part in multipart_parts {
|
||||||
))));
|
total_compressed_bytes += part.data.len() as u64;
|
||||||
}
|
|
||||||
Ok((Bytes::from(buffer), chunk_byte_indices))
|
let (data, chunk_indices) =
|
||||||
},
|
xet_core_structures::xorb_object::deserialize_chunks(&mut std::io::Cursor::new(part.data.as_ref()))
|
||||||
Err(e) => Err(RetryableReqwestError::RetryableError(CasClientError::FormatError(e))),
|
.map_err(|e| {
|
||||||
|
RetryableReqwestError::RetryableError(CasClientError::FormatError(e))
|
||||||
|
})?;
|
||||||
|
|
||||||
|
xet_core_structures::xorb_object::append_chunk_segment(
|
||||||
|
&mut all_decompressed,
|
||||||
|
&mut all_chunk_indices,
|
||||||
|
&data,
|
||||||
|
&chunk_indices,
|
||||||
|
);
|
||||||
|
|
||||||
|
transfer_reporter.report_progress(total_compressed_bytes as usize);
|
||||||
|
}
|
||||||
|
|
||||||
|
if let Some(expected) = uncompressed_size_if_known
|
||||||
|
&& expected != all_decompressed.len()
|
||||||
|
{
|
||||||
|
return Err(RetryableReqwestError::RetryableError(CasClientError::Other(format!(
|
||||||
|
"get_file_term_data: expected {expected} uncompressed bytes, got {}",
|
||||||
|
all_decompressed.len()
|
||||||
|
))));
|
||||||
|
}
|
||||||
|
Ok((Bytes::from(all_decompressed), all_chunk_indices))
|
||||||
|
} else {
|
||||||
|
let incoming_stream = DownloadProgressStream::wrap_stream(
|
||||||
|
resp.bytes_stream().map_err(std::io::Error::other),
|
||||||
|
transfer_reporter,
|
||||||
|
);
|
||||||
|
|
||||||
|
let capacity = uncompressed_size_if_known.unwrap_or(0);
|
||||||
|
let mut buffer = Vec::with_capacity(capacity);
|
||||||
|
let mut writer = std::io::Cursor::new(&mut buffer);
|
||||||
|
|
||||||
|
let result = xet_core_structures::xorb_object::deserialize_async::deserialize_chunks_to_writer_from_stream(
|
||||||
|
incoming_stream,
|
||||||
|
&mut writer,
|
||||||
|
)
|
||||||
|
.await;
|
||||||
|
|
||||||
|
match result {
|
||||||
|
Ok((_compressed_len, chunk_byte_indices)) => {
|
||||||
|
if let Some(expected) = uncompressed_size_if_known
|
||||||
|
&& expected != buffer.len()
|
||||||
|
{
|
||||||
|
return Err(RetryableReqwestError::RetryableError(CasClientError::Other(format!(
|
||||||
|
"get_file_term_data: expected {expected} uncompressed bytes, got {}",
|
||||||
|
buffer.len()
|
||||||
|
))));
|
||||||
|
}
|
||||||
|
Ok((Bytes::from(buffer), chunk_byte_indices))
|
||||||
|
},
|
||||||
|
Err(e) => Err(RetryableReqwestError::RetryableError(CasClientError::FormatError(e))),
|
||||||
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
},
|
},
|
||||||
|
|||||||
@@ -157,10 +157,13 @@ impl RetryWrapper {
|
|||||||
}
|
}
|
||||||
},
|
},
|
||||||
(Err(e), Some(Retryable::Transient)) => {
|
(Err(e), Some(Retryable::Transient)) => {
|
||||||
// Intercept the too many requests condition in the case of no retrying on 429.
|
|
||||||
if e.status() == Some(StatusCode::TOO_MANY_REQUESTS) && self.no_retry_on_429 {
|
if e.status() == Some(StatusCode::TOO_MANY_REQUESTS) && self.no_retry_on_429 {
|
||||||
let cas_err = process_error("Too Many Requests (retry on 429 disabled)", e, false);
|
let cas_err = process_error("Too Many Requests (retry on 429 disabled)", e, false);
|
||||||
Err(RetryableReqwestError::FatalError(cas_err))
|
Err(RetryableReqwestError::FatalError(cas_err))
|
||||||
|
} else if e.status() == Some(StatusCode::NOT_IMPLEMENTED) {
|
||||||
|
// 501 is permanent -- the server won't implement this on retry.
|
||||||
|
let cas_err = process_error("Not Implemented", e, true);
|
||||||
|
Err(RetryableReqwestError::FatalError(cas_err))
|
||||||
} else {
|
} else {
|
||||||
let cas_err = process_error("Retryable Error", e, true);
|
let cas_err = process_error("Retryable Error", e, true);
|
||||||
Err(RetryableReqwestError::RetryableError(cas_err))
|
Err(RetryableReqwestError::RetryableError(cas_err))
|
||||||
|
|||||||
@@ -36,6 +36,11 @@ where
|
|||||||
test_get_file_data_with_ranges(factory().await).await;
|
test_get_file_data_with_ranges(factory().await).await;
|
||||||
test_get_file_size(factory().await).await;
|
test_get_file_size(factory().await).await;
|
||||||
test_global_dedup(factory().await).await;
|
test_global_dedup(factory().await).await;
|
||||||
|
test_v2_reconstruction_basic(factory().await).await;
|
||||||
|
test_v2_reconstruction_ranges(factory().await).await;
|
||||||
|
test_v2_reconstruction_matches_v1(factory().await).await;
|
||||||
|
test_v2_max_ranges_per_fetch(factory().await).await;
|
||||||
|
test_v2_url_encoding(factory().await).await;
|
||||||
}
|
}
|
||||||
|
|
||||||
/// Tests that adjacent chunk ranges from the same xorb are merged into a single fetch_info.
|
/// Tests that adjacent chunk ranges from the same xorb are merged into a single fetch_info.
|
||||||
@@ -43,7 +48,7 @@ pub async fn test_reconstruction_merges_adjacent_ranges(client: Arc<dyn DirectAc
|
|||||||
let term_spec = &[(1, (0, 2)), (1, (2, 4))];
|
let term_spec = &[(1, (0, 2)), (1, (2, 4))];
|
||||||
let file = client.upload_random_file(term_spec, 2048).await.unwrap();
|
let file = client.upload_random_file(term_spec, 2048).await.unwrap();
|
||||||
|
|
||||||
let reconstruction = client.get_reconstruction(&file.file_hash, None).await.unwrap().unwrap();
|
let reconstruction = client.get_reconstruction_v1(&file.file_hash, None).await.unwrap().unwrap();
|
||||||
assert_eq!(reconstruction.terms.len(), 2);
|
assert_eq!(reconstruction.terms.len(), 2);
|
||||||
assert_eq!(reconstruction.fetch_info.len(), 1);
|
assert_eq!(reconstruction.fetch_info.len(), 1);
|
||||||
|
|
||||||
@@ -59,7 +64,7 @@ pub async fn test_reconstruction_with_multiple_xorbs(client: Arc<dyn DirectAcces
|
|||||||
let term_spec = &[(1, (0, 3)), (2, (0, 2)), (1, (3, 5))];
|
let term_spec = &[(1, (0, 3)), (2, (0, 2)), (1, (3, 5))];
|
||||||
let file = client.upload_random_file(term_spec, 2048).await.unwrap();
|
let file = client.upload_random_file(term_spec, 2048).await.unwrap();
|
||||||
|
|
||||||
let reconstruction = client.get_reconstruction(&file.file_hash, None).await.unwrap().unwrap();
|
let reconstruction = client.get_reconstruction_v1(&file.file_hash, None).await.unwrap().unwrap();
|
||||||
assert_eq!(reconstruction.terms.len(), 3);
|
assert_eq!(reconstruction.terms.len(), 3);
|
||||||
assert_eq!(reconstruction.fetch_info.len(), 2);
|
assert_eq!(reconstruction.fetch_info.len(), 2);
|
||||||
}
|
}
|
||||||
@@ -73,7 +78,7 @@ pub async fn test_reconstruction_overlapping_range_merging(client: Arc<dyn Direc
|
|||||||
let term_spec = &[(1, (0, 3)), (1, (1, 4))];
|
let term_spec = &[(1, (0, 3)), (1, (1, 4))];
|
||||||
let file = client.upload_random_file(term_spec, chunk_size).await.unwrap();
|
let file = client.upload_random_file(term_spec, chunk_size).await.unwrap();
|
||||||
|
|
||||||
let reconstruction = client.get_reconstruction(&file.file_hash, None).await.unwrap().unwrap();
|
let reconstruction = client.get_reconstruction_v1(&file.file_hash, None).await.unwrap().unwrap();
|
||||||
assert_eq!(reconstruction.terms.len(), 2);
|
assert_eq!(reconstruction.terms.len(), 2);
|
||||||
assert_eq!(reconstruction.fetch_info.len(), 1);
|
assert_eq!(reconstruction.fetch_info.len(), 1);
|
||||||
|
|
||||||
@@ -89,7 +94,7 @@ pub async fn test_reconstruction_overlapping_range_merging(client: Arc<dyn Direc
|
|||||||
let term_spec = &[(1, (0, 5)), (1, (1, 3))];
|
let term_spec = &[(1, (0, 5)), (1, (1, 3))];
|
||||||
let file = client.upload_random_file(term_spec, chunk_size).await.unwrap();
|
let file = client.upload_random_file(term_spec, chunk_size).await.unwrap();
|
||||||
|
|
||||||
let reconstruction = client.get_reconstruction(&file.file_hash, None).await.unwrap().unwrap();
|
let reconstruction = client.get_reconstruction_v1(&file.file_hash, None).await.unwrap().unwrap();
|
||||||
assert_eq!(reconstruction.terms.len(), 2);
|
assert_eq!(reconstruction.terms.len(), 2);
|
||||||
assert_eq!(reconstruction.fetch_info.len(), 1);
|
assert_eq!(reconstruction.fetch_info.len(), 1);
|
||||||
|
|
||||||
@@ -105,7 +110,7 @@ pub async fn test_reconstruction_overlapping_range_merging(client: Arc<dyn Direc
|
|||||||
let term_spec = &[(1, (0, 2)), (1, (1, 4)), (1, (3, 6))];
|
let term_spec = &[(1, (0, 2)), (1, (1, 4)), (1, (3, 6))];
|
||||||
let file = client.upload_random_file(term_spec, chunk_size).await.unwrap();
|
let file = client.upload_random_file(term_spec, chunk_size).await.unwrap();
|
||||||
|
|
||||||
let reconstruction = client.get_reconstruction(&file.file_hash, None).await.unwrap().unwrap();
|
let reconstruction = client.get_reconstruction_v1(&file.file_hash, None).await.unwrap().unwrap();
|
||||||
assert_eq!(reconstruction.terms.len(), 3);
|
assert_eq!(reconstruction.terms.len(), 3);
|
||||||
assert_eq!(reconstruction.fetch_info.len(), 1);
|
assert_eq!(reconstruction.fetch_info.len(), 1);
|
||||||
|
|
||||||
@@ -121,7 +126,7 @@ pub async fn test_reconstruction_overlapping_range_merging(client: Arc<dyn Direc
|
|||||||
let term_spec = &[(1, (0, 2)), (1, (4, 6))];
|
let term_spec = &[(1, (0, 2)), (1, (4, 6))];
|
||||||
let file = client.upload_random_file(term_spec, chunk_size).await.unwrap();
|
let file = client.upload_random_file(term_spec, chunk_size).await.unwrap();
|
||||||
|
|
||||||
let reconstruction = client.get_reconstruction(&file.file_hash, None).await.unwrap().unwrap();
|
let reconstruction = client.get_reconstruction_v1(&file.file_hash, None).await.unwrap().unwrap();
|
||||||
assert_eq!(reconstruction.terms.len(), 2);
|
assert_eq!(reconstruction.terms.len(), 2);
|
||||||
assert_eq!(reconstruction.fetch_info.len(), 1);
|
assert_eq!(reconstruction.fetch_info.len(), 1);
|
||||||
|
|
||||||
@@ -139,7 +144,7 @@ pub async fn test_reconstruction_overlapping_range_merging(client: Arc<dyn Direc
|
|||||||
let term_spec = &[(1, (0, 3)), (1, (3, 5))];
|
let term_spec = &[(1, (0, 3)), (1, (3, 5))];
|
||||||
let file = client.upload_random_file(term_spec, chunk_size).await.unwrap();
|
let file = client.upload_random_file(term_spec, chunk_size).await.unwrap();
|
||||||
|
|
||||||
let reconstruction = client.get_reconstruction(&file.file_hash, None).await.unwrap().unwrap();
|
let reconstruction = client.get_reconstruction_v1(&file.file_hash, None).await.unwrap().unwrap();
|
||||||
assert_eq!(reconstruction.terms.len(), 2);
|
assert_eq!(reconstruction.terms.len(), 2);
|
||||||
assert_eq!(reconstruction.fetch_info.len(), 1);
|
assert_eq!(reconstruction.fetch_info.len(), 1);
|
||||||
|
|
||||||
@@ -155,7 +160,7 @@ pub async fn test_reconstruction_overlapping_range_merging(client: Arc<dyn Direc
|
|||||||
let term_spec = &[(1, (2, 5)), (1, (2, 5)), (1, (2, 5))];
|
let term_spec = &[(1, (2, 5)), (1, (2, 5)), (1, (2, 5))];
|
||||||
let file = client.upload_random_file(term_spec, chunk_size).await.unwrap();
|
let file = client.upload_random_file(term_spec, chunk_size).await.unwrap();
|
||||||
|
|
||||||
let reconstruction = client.get_reconstruction(&file.file_hash, None).await.unwrap().unwrap();
|
let reconstruction = client.get_reconstruction_v1(&file.file_hash, None).await.unwrap().unwrap();
|
||||||
assert_eq!(reconstruction.terms.len(), 3);
|
assert_eq!(reconstruction.terms.len(), 3);
|
||||||
assert_eq!(reconstruction.fetch_info.len(), 1);
|
assert_eq!(reconstruction.fetch_info.len(), 1);
|
||||||
|
|
||||||
@@ -171,7 +176,7 @@ pub async fn test_reconstruction_overlapping_range_merging(client: Arc<dyn Direc
|
|||||||
let term_spec = &[(1, (0, 3)), (1, (2, 4)), (1, (6, 8)), (1, (7, 10))];
|
let term_spec = &[(1, (0, 3)), (1, (2, 4)), (1, (6, 8)), (1, (7, 10))];
|
||||||
let file = client.upload_random_file(term_spec, chunk_size).await.unwrap();
|
let file = client.upload_random_file(term_spec, chunk_size).await.unwrap();
|
||||||
|
|
||||||
let reconstruction = client.get_reconstruction(&file.file_hash, None).await.unwrap().unwrap();
|
let reconstruction = client.get_reconstruction_v1(&file.file_hash, None).await.unwrap().unwrap();
|
||||||
assert_eq!(reconstruction.terms.len(), 4);
|
assert_eq!(reconstruction.terms.len(), 4);
|
||||||
assert_eq!(reconstruction.fetch_info.len(), 1);
|
assert_eq!(reconstruction.fetch_info.len(), 1);
|
||||||
|
|
||||||
@@ -191,12 +196,12 @@ pub async fn test_range_requests(client: Arc<dyn DirectAccessClient>) {
|
|||||||
let file = client.upload_random_file(term_spec, 2048).await.unwrap();
|
let file = client.upload_random_file(term_spec, 2048).await.unwrap();
|
||||||
|
|
||||||
// Calculate total file size from terms
|
// Calculate total file size from terms
|
||||||
let reconstruction_full = client.get_reconstruction(&file.file_hash, None).await.unwrap().unwrap();
|
let reconstruction_full = client.get_reconstruction_v1(&file.file_hash, None).await.unwrap().unwrap();
|
||||||
let total_file_size: u64 = reconstruction_full.terms.iter().map(|t| t.unpacked_length as u64).sum();
|
let total_file_size: u64 = reconstruction_full.terms.iter().map(|t| t.unpacked_length as u64).sum();
|
||||||
|
|
||||||
// Partial out-of-range truncates
|
// Partial out-of-range truncates
|
||||||
let response = client
|
let response = client
|
||||||
.get_reconstruction(&file.file_hash, Some(FileRange::new(total_file_size / 2, total_file_size + 1000)))
|
.get_reconstruction_v1(&file.file_hash, Some(FileRange::new(total_file_size / 2, total_file_size + 1000)))
|
||||||
.await
|
.await
|
||||||
.unwrap()
|
.unwrap()
|
||||||
.unwrap();
|
.unwrap();
|
||||||
@@ -205,19 +210,19 @@ pub async fn test_range_requests(client: Arc<dyn DirectAccessClient>) {
|
|||||||
|
|
||||||
// Entire range out of bounds returns Ok(None) (like RemoteClient's 416 handling)
|
// Entire range out of bounds returns Ok(None) (like RemoteClient's 416 handling)
|
||||||
let result = client
|
let result = client
|
||||||
.get_reconstruction(&file.file_hash, Some(FileRange::new(total_file_size + 100, total_file_size + 1000)))
|
.get_reconstruction_v1(&file.file_hash, Some(FileRange::new(total_file_size + 100, total_file_size + 1000)))
|
||||||
.await;
|
.await;
|
||||||
assert!(result.unwrap().is_none());
|
assert!(result.unwrap().is_none());
|
||||||
|
|
||||||
// Start equals file size returns Ok(None)
|
// Start equals file size returns Ok(None)
|
||||||
let result = client
|
let result = client
|
||||||
.get_reconstruction(&file.file_hash, Some(FileRange::new(total_file_size, total_file_size + 100)))
|
.get_reconstruction_v1(&file.file_hash, Some(FileRange::new(total_file_size, total_file_size + 100)))
|
||||||
.await;
|
.await;
|
||||||
assert!(result.unwrap().is_none());
|
assert!(result.unwrap().is_none());
|
||||||
|
|
||||||
// Valid range within bounds succeeds
|
// Valid range within bounds succeeds
|
||||||
let response = client
|
let response = client
|
||||||
.get_reconstruction(&file.file_hash, Some(FileRange::new(0, total_file_size / 2)))
|
.get_reconstruction_v1(&file.file_hash, Some(FileRange::new(0, total_file_size / 2)))
|
||||||
.await
|
.await
|
||||||
.unwrap()
|
.unwrap()
|
||||||
.unwrap();
|
.unwrap();
|
||||||
@@ -226,7 +231,7 @@ pub async fn test_range_requests(client: Arc<dyn DirectAccessClient>) {
|
|||||||
|
|
||||||
// End exactly at file size succeeds
|
// End exactly at file size succeeds
|
||||||
let response = client
|
let response = client
|
||||||
.get_reconstruction(&file.file_hash, Some(FileRange::new(0, total_file_size)))
|
.get_reconstruction_v1(&file.file_hash, Some(FileRange::new(0, total_file_size)))
|
||||||
.await
|
.await
|
||||||
.unwrap()
|
.unwrap()
|
||||||
.unwrap();
|
.unwrap();
|
||||||
@@ -239,7 +244,7 @@ pub async fn test_upload_configurations(client: Arc<dyn DirectAccessClient>) {
|
|||||||
// Test 1: Single segment with 3 chunks
|
// Test 1: Single segment with 3 chunks
|
||||||
{
|
{
|
||||||
let file = client.upload_random_file(&[(1, (0, 3))], 2048).await.unwrap();
|
let file = client.upload_random_file(&[(1, (0, 3))], 2048).await.unwrap();
|
||||||
let reconstruction = client.get_reconstruction(&file.file_hash, None).await.unwrap().unwrap();
|
let reconstruction = client.get_reconstruction_v1(&file.file_hash, None).await.unwrap().unwrap();
|
||||||
assert_eq!(reconstruction.terms.len(), 1);
|
assert_eq!(reconstruction.terms.len(), 1);
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -248,7 +253,7 @@ pub async fn test_upload_configurations(client: Arc<dyn DirectAccessClient>) {
|
|||||||
let term_spec = &[(1, (0, 2)), (1, (2, 4)), (1, (4, 6))];
|
let term_spec = &[(1, (0, 2)), (1, (2, 4)), (1, (4, 6))];
|
||||||
let file = client.upload_random_file(term_spec, 2048).await.unwrap();
|
let file = client.upload_random_file(term_spec, 2048).await.unwrap();
|
||||||
|
|
||||||
let reconstruction = client.get_reconstruction(&file.file_hash, None).await.unwrap().unwrap();
|
let reconstruction = client.get_reconstruction_v1(&file.file_hash, None).await.unwrap().unwrap();
|
||||||
assert_eq!(reconstruction.terms.len(), 3);
|
assert_eq!(reconstruction.terms.len(), 3);
|
||||||
assert_eq!(reconstruction.fetch_info.len(), 1);
|
assert_eq!(reconstruction.fetch_info.len(), 1);
|
||||||
}
|
}
|
||||||
@@ -258,7 +263,7 @@ pub async fn test_upload_configurations(client: Arc<dyn DirectAccessClient>) {
|
|||||||
let term_spec = &[(1, (0, 3)), (2, (0, 2)), (3, (0, 4))];
|
let term_spec = &[(1, (0, 3)), (2, (0, 2)), (3, (0, 4))];
|
||||||
let file = client.upload_random_file(term_spec, 2048).await.unwrap();
|
let file = client.upload_random_file(term_spec, 2048).await.unwrap();
|
||||||
|
|
||||||
let reconstruction = client.get_reconstruction(&file.file_hash, None).await.unwrap().unwrap();
|
let reconstruction = client.get_reconstruction_v1(&file.file_hash, None).await.unwrap().unwrap();
|
||||||
assert_eq!(reconstruction.terms.len(), 3);
|
assert_eq!(reconstruction.terms.len(), 3);
|
||||||
assert_eq!(reconstruction.fetch_info.len(), 3);
|
assert_eq!(reconstruction.fetch_info.len(), 3);
|
||||||
}
|
}
|
||||||
@@ -268,7 +273,7 @@ pub async fn test_upload_configurations(client: Arc<dyn DirectAccessClient>) {
|
|||||||
let term_spec = &[(1, (0, 3)), (1, (1, 4)), (1, (2, 5))];
|
let term_spec = &[(1, (0, 3)), (1, (1, 4)), (1, (2, 5))];
|
||||||
let file = client.upload_random_file(term_spec, 2048).await.unwrap();
|
let file = client.upload_random_file(term_spec, 2048).await.unwrap();
|
||||||
|
|
||||||
let reconstruction = client.get_reconstruction(&file.file_hash, None).await.unwrap().unwrap();
|
let reconstruction = client.get_reconstruction_v1(&file.file_hash, None).await.unwrap().unwrap();
|
||||||
assert_eq!(reconstruction.terms.len(), 3);
|
assert_eq!(reconstruction.terms.len(), 3);
|
||||||
assert_eq!(reconstruction.fetch_info.len(), 1);
|
assert_eq!(reconstruction.fetch_info.len(), 1);
|
||||||
}
|
}
|
||||||
@@ -280,7 +285,7 @@ pub async fn test_chunk_boundary_shrinking(client: Arc<dyn DirectAccessClient>)
|
|||||||
let term_spec = &[(1, (0, 5))];
|
let term_spec = &[(1, (0, 5))];
|
||||||
let file = client.upload_random_file(term_spec, chunk_size).await.unwrap();
|
let file = client.upload_random_file(term_spec, chunk_size).await.unwrap();
|
||||||
|
|
||||||
let reconstruction_full = client.get_reconstruction(&file.file_hash, None).await.unwrap().unwrap();
|
let reconstruction_full = client.get_reconstruction_v1(&file.file_hash, None).await.unwrap().unwrap();
|
||||||
let total_file_size: u64 = reconstruction_full.terms.iter().map(|t| t.unpacked_length as u64).sum();
|
let total_file_size: u64 = reconstruction_full.terms.iter().map(|t| t.unpacked_length as u64).sum();
|
||||||
assert_eq!(total_file_size, (5 * chunk_size) as u64);
|
assert_eq!(total_file_size, (5 * chunk_size) as u64);
|
||||||
|
|
||||||
@@ -289,7 +294,7 @@ pub async fn test_chunk_boundary_shrinking(client: Arc<dyn DirectAccessClient>)
|
|||||||
let start = chunk_size as u64 + 500;
|
let start = chunk_size as u64 + 500;
|
||||||
let end = total_file_size;
|
let end = total_file_size;
|
||||||
let response = client
|
let response = client
|
||||||
.get_reconstruction(&file.file_hash, Some(FileRange::new(start, end)))
|
.get_reconstruction_v1(&file.file_hash, Some(FileRange::new(start, end)))
|
||||||
.await
|
.await
|
||||||
.unwrap()
|
.unwrap()
|
||||||
.unwrap();
|
.unwrap();
|
||||||
@@ -305,7 +310,7 @@ pub async fn test_chunk_boundary_shrinking(client: Arc<dyn DirectAccessClient>)
|
|||||||
let start = (chunk_size * 2) as u64;
|
let start = (chunk_size * 2) as u64;
|
||||||
let end = total_file_size;
|
let end = total_file_size;
|
||||||
let response = client
|
let response = client
|
||||||
.get_reconstruction(&file.file_hash, Some(FileRange::new(start, end)))
|
.get_reconstruction_v1(&file.file_hash, Some(FileRange::new(start, end)))
|
||||||
.await
|
.await
|
||||||
.unwrap()
|
.unwrap()
|
||||||
.unwrap();
|
.unwrap();
|
||||||
@@ -321,7 +326,7 @@ pub async fn test_chunk_boundary_shrinking(client: Arc<dyn DirectAccessClient>)
|
|||||||
let start = 0u64;
|
let start = 0u64;
|
||||||
let end = (chunk_size * 2) as u64 + 500;
|
let end = (chunk_size * 2) as u64 + 500;
|
||||||
let response = client
|
let response = client
|
||||||
.get_reconstruction(&file.file_hash, Some(FileRange::new(start, end)))
|
.get_reconstruction_v1(&file.file_hash, Some(FileRange::new(start, end)))
|
||||||
.await
|
.await
|
||||||
.unwrap()
|
.unwrap()
|
||||||
.unwrap();
|
.unwrap();
|
||||||
@@ -337,7 +342,7 @@ pub async fn test_chunk_boundary_shrinking(client: Arc<dyn DirectAccessClient>)
|
|||||||
let start = (chunk_size * 2) as u64 + 100;
|
let start = (chunk_size * 2) as u64 + 100;
|
||||||
let end = (chunk_size * 2) as u64 + 500;
|
let end = (chunk_size * 2) as u64 + 500;
|
||||||
let response = client
|
let response = client
|
||||||
.get_reconstruction(&file.file_hash, Some(FileRange::new(start, end)))
|
.get_reconstruction_v1(&file.file_hash, Some(FileRange::new(start, end)))
|
||||||
.await
|
.await
|
||||||
.unwrap()
|
.unwrap()
|
||||||
.unwrap();
|
.unwrap();
|
||||||
@@ -353,7 +358,7 @@ pub async fn test_chunk_boundary_shrinking(client: Arc<dyn DirectAccessClient>)
|
|||||||
let start = chunk_size as u64 - 100;
|
let start = chunk_size as u64 - 100;
|
||||||
let end = chunk_size as u64 + 100;
|
let end = chunk_size as u64 + 100;
|
||||||
let response = client
|
let response = client
|
||||||
.get_reconstruction(&file.file_hash, Some(FileRange::new(start, end)))
|
.get_reconstruction_v1(&file.file_hash, Some(FileRange::new(start, end)))
|
||||||
.await
|
.await
|
||||||
.unwrap()
|
.unwrap()
|
||||||
.unwrap();
|
.unwrap();
|
||||||
@@ -371,7 +376,7 @@ pub async fn test_chunk_boundary_multiple_segments(client: Arc<dyn DirectAccessC
|
|||||||
let term_spec = &[(1, (0, 4)), (2, (0, 4))];
|
let term_spec = &[(1, (0, 4)), (2, (0, 4))];
|
||||||
let file = client.upload_random_file(term_spec, chunk_size).await.unwrap();
|
let file = client.upload_random_file(term_spec, chunk_size).await.unwrap();
|
||||||
|
|
||||||
let reconstruction_full = client.get_reconstruction(&file.file_hash, None).await.unwrap().unwrap();
|
let reconstruction_full = client.get_reconstruction_v1(&file.file_hash, None).await.unwrap().unwrap();
|
||||||
let total_file_size: u64 = reconstruction_full.terms.iter().map(|t| t.unpacked_length as u64).sum();
|
let total_file_size: u64 = reconstruction_full.terms.iter().map(|t| t.unpacked_length as u64).sum();
|
||||||
assert_eq!(total_file_size, (8 * chunk_size) as u64);
|
assert_eq!(total_file_size, (8 * chunk_size) as u64);
|
||||||
|
|
||||||
@@ -380,7 +385,7 @@ pub async fn test_chunk_boundary_multiple_segments(client: Arc<dyn DirectAccessC
|
|||||||
let start = chunk_size as u64 + 500;
|
let start = chunk_size as u64 + 500;
|
||||||
let end = total_file_size;
|
let end = total_file_size;
|
||||||
let response = client
|
let response = client
|
||||||
.get_reconstruction(&file.file_hash, Some(FileRange::new(start, end)))
|
.get_reconstruction_v1(&file.file_hash, Some(FileRange::new(start, end)))
|
||||||
.await
|
.await
|
||||||
.unwrap()
|
.unwrap()
|
||||||
.unwrap();
|
.unwrap();
|
||||||
@@ -398,7 +403,7 @@ pub async fn test_chunk_boundary_multiple_segments(client: Arc<dyn DirectAccessC
|
|||||||
let start = chunk_size as u64;
|
let start = chunk_size as u64;
|
||||||
let end = (chunk_size * 3) as u64;
|
let end = (chunk_size * 3) as u64;
|
||||||
let response = client
|
let response = client
|
||||||
.get_reconstruction(&file.file_hash, Some(FileRange::new(start, end)))
|
.get_reconstruction_v1(&file.file_hash, Some(FileRange::new(start, end)))
|
||||||
.await
|
.await
|
||||||
.unwrap()
|
.unwrap()
|
||||||
.unwrap();
|
.unwrap();
|
||||||
@@ -415,7 +420,7 @@ pub async fn test_chunk_boundary_multiple_segments(client: Arc<dyn DirectAccessC
|
|||||||
let start = xorb1_size + chunk_size as u64;
|
let start = xorb1_size + chunk_size as u64;
|
||||||
let end = xorb1_size + (chunk_size * 3) as u64;
|
let end = xorb1_size + (chunk_size * 3) as u64;
|
||||||
let response = client
|
let response = client
|
||||||
.get_reconstruction(&file.file_hash, Some(FileRange::new(start, end)))
|
.get_reconstruction_v1(&file.file_hash, Some(FileRange::new(start, end)))
|
||||||
.await
|
.await
|
||||||
.unwrap()
|
.unwrap()
|
||||||
.unwrap();
|
.unwrap();
|
||||||
@@ -432,7 +437,7 @@ pub async fn test_chunk_boundary_multiple_segments(client: Arc<dyn DirectAccessC
|
|||||||
let start = (chunk_size * 2) as u64;
|
let start = (chunk_size * 2) as u64;
|
||||||
let end = xorb1_size + (chunk_size * 2) as u64 + 500;
|
let end = xorb1_size + (chunk_size * 2) as u64 + 500;
|
||||||
let response = client
|
let response = client
|
||||||
.get_reconstruction(&file.file_hash, Some(FileRange::new(start, end)))
|
.get_reconstruction_v1(&file.file_hash, Some(FileRange::new(start, end)))
|
||||||
.await
|
.await
|
||||||
.unwrap()
|
.unwrap()
|
||||||
.unwrap();
|
.unwrap();
|
||||||
@@ -712,7 +717,7 @@ async fn test_url_expiration_within_window(client: Arc<dyn DirectAccessClient>)
|
|||||||
|
|
||||||
// Upload a file and get reconstruction info (which creates URLs with current timestamp)
|
// Upload a file and get reconstruction info (which creates URLs with current timestamp)
|
||||||
let file = client.upload_random_file(&[(1, (0, 3))], 2048).await.unwrap();
|
let file = client.upload_random_file(&[(1, (0, 3))], 2048).await.unwrap();
|
||||||
let reconstruction = client.get_reconstruction(&file.file_hash, None).await.unwrap().unwrap();
|
let reconstruction = client.get_reconstruction_v1(&file.file_hash, None).await.unwrap().unwrap();
|
||||||
|
|
||||||
// Get the fetch_info for the first term's xorb
|
// Get the fetch_info for the first term's xorb
|
||||||
let xorb_hash = file.terms[0].xorb_hash;
|
let xorb_hash = file.terms[0].xorb_hash;
|
||||||
@@ -738,7 +743,7 @@ async fn test_url_expiration_after_window(client: Arc<dyn DirectAccessClient>) {
|
|||||||
|
|
||||||
// Upload a file and get reconstruction info (which creates URLs with current timestamp)
|
// Upload a file and get reconstruction info (which creates URLs with current timestamp)
|
||||||
let file = client.upload_random_file(&[(1, (0, 3))], 2048).await.unwrap();
|
let file = client.upload_random_file(&[(1, (0, 3))], 2048).await.unwrap();
|
||||||
let reconstruction = client.get_reconstruction(&file.file_hash, None).await.unwrap().unwrap();
|
let reconstruction = client.get_reconstruction_v1(&file.file_hash, None).await.unwrap().unwrap();
|
||||||
|
|
||||||
// Get the fetch_info for the first term's xorb
|
// Get the fetch_info for the first term's xorb
|
||||||
let xorb_hash = file.terms[0].xorb_hash;
|
let xorb_hash = file.terms[0].xorb_hash;
|
||||||
@@ -764,7 +769,7 @@ async fn test_url_expiration_default_infinite(client: Arc<dyn DirectAccessClient
|
|||||||
|
|
||||||
// Upload a file and get reconstruction info
|
// Upload a file and get reconstruction info
|
||||||
let file = client.upload_random_file(&[(1, (0, 3))], 2048).await.unwrap();
|
let file = client.upload_random_file(&[(1, (0, 3))], 2048).await.unwrap();
|
||||||
let reconstruction = client.get_reconstruction(&file.file_hash, None).await.unwrap().unwrap();
|
let reconstruction = client.get_reconstruction_v1(&file.file_hash, None).await.unwrap().unwrap();
|
||||||
|
|
||||||
// Get the fetch_info for the first term's xorb
|
// Get the fetch_info for the first term's xorb
|
||||||
let xorb_hash = file.terms[0].xorb_hash;
|
let xorb_hash = file.terms[0].xorb_hash;
|
||||||
@@ -790,7 +795,7 @@ async fn test_url_expiration_exact_boundary(client: Arc<dyn DirectAccessClient>)
|
|||||||
|
|
||||||
// Upload a file and get reconstruction info
|
// Upload a file and get reconstruction info
|
||||||
let file = client.upload_random_file(&[(1, (0, 3))], 2048).await.unwrap();
|
let file = client.upload_random_file(&[(1, (0, 3))], 2048).await.unwrap();
|
||||||
let reconstruction = client.get_reconstruction(&file.file_hash, None).await.unwrap().unwrap();
|
let reconstruction = client.get_reconstruction_v1(&file.file_hash, None).await.unwrap().unwrap();
|
||||||
|
|
||||||
// Get the fetch_info for the first term's xorb
|
// Get the fetch_info for the first term's xorb
|
||||||
let xorb_hash = file.terms[0].xorb_hash;
|
let xorb_hash = file.terms[0].xorb_hash;
|
||||||
@@ -916,3 +921,190 @@ async fn test_api_delay_can_be_disabled(client: Arc<dyn DirectAccessClient>) {
|
|||||||
"Delay should not be applied after disabling: elapsed={elapsed:?}"
|
"Delay should not be applied after disabling: elapsed={elapsed:?}"
|
||||||
);
|
);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// ===== V2 Reconstruction Tests =====
|
||||||
|
|
||||||
|
/// Tests basic V2 reconstruction response structure.
|
||||||
|
async fn test_v2_reconstruction_basic(client: Arc<dyn DirectAccessClient>) {
|
||||||
|
let term_spec = &[(1, (0, 5))];
|
||||||
|
let file = client.upload_random_file(term_spec, 2048).await.unwrap();
|
||||||
|
|
||||||
|
let response = client.get_reconstruction_v2(&file.file_hash, None).await.unwrap().unwrap();
|
||||||
|
|
||||||
|
assert!(!response.terms.is_empty());
|
||||||
|
assert!(!response.xorbs.is_empty());
|
||||||
|
assert_eq!(response.offset_into_first_range, 0);
|
||||||
|
|
||||||
|
for term in &response.terms {
|
||||||
|
let xorb_descriptor = response.xorbs.get(&term.hash).expect("xorb descriptor missing for term");
|
||||||
|
assert!(!xorb_descriptor.is_empty());
|
||||||
|
for fetch in xorb_descriptor {
|
||||||
|
assert!(!fetch.url.is_empty());
|
||||||
|
assert!(!fetch.ranges.is_empty());
|
||||||
|
for range in &fetch.ranges {
|
||||||
|
assert!(range.bytes.start < range.bytes.end);
|
||||||
|
assert!(range.chunks.start < range.chunks.end);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Tests V2 reconstruction with byte range queries.
|
||||||
|
async fn test_v2_reconstruction_ranges(client: Arc<dyn DirectAccessClient>) {
|
||||||
|
let term_spec = &[(1, (0, 3)), (2, (0, 3)), (1, (3, 6))];
|
||||||
|
let file = client.upload_random_file(term_spec, 2048).await.unwrap();
|
||||||
|
|
||||||
|
let file_size = file.data.len() as u64;
|
||||||
|
|
||||||
|
// Partial range
|
||||||
|
let range = FileRange::new(file_size / 4, file_size * 3 / 4);
|
||||||
|
let response = client
|
||||||
|
.get_reconstruction_v2(&file.file_hash, Some(range))
|
||||||
|
.await
|
||||||
|
.unwrap()
|
||||||
|
.unwrap();
|
||||||
|
|
||||||
|
assert!(!response.terms.is_empty());
|
||||||
|
assert!(!response.xorbs.is_empty());
|
||||||
|
|
||||||
|
// Out-of-range query returns None
|
||||||
|
let out_of_range = FileRange::new(file_size + 100, file_size + 200);
|
||||||
|
let none_result = client.get_reconstruction_v2(&file.file_hash, Some(out_of_range)).await.unwrap();
|
||||||
|
assert!(none_result.is_none());
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Tests that V2 reconstruction terms match V1 terms and offsets.
|
||||||
|
async fn test_v2_reconstruction_matches_v1(client: Arc<dyn DirectAccessClient>) {
|
||||||
|
let term_spec = &[(1, (0, 3)), (2, (0, 2)), (1, (3, 5))];
|
||||||
|
let file = client.upload_random_file(term_spec, 2048).await.unwrap();
|
||||||
|
|
||||||
|
let v1 = client.get_reconstruction_v1(&file.file_hash, None).await.unwrap().unwrap();
|
||||||
|
let v2 = client.get_reconstruction_v2(&file.file_hash, None).await.unwrap().unwrap();
|
||||||
|
|
||||||
|
assert_eq!(v1.offset_into_first_range, v2.offset_into_first_range);
|
||||||
|
assert_eq!(v1.terms.len(), v2.terms.len());
|
||||||
|
for (t1, t2) in v1.terms.iter().zip(v2.terms.iter()) {
|
||||||
|
assert_eq!(t1.hash, t2.hash);
|
||||||
|
assert_eq!(t1.range, t2.range);
|
||||||
|
assert_eq!(t1.unpacked_length, t2.unpacked_length);
|
||||||
|
}
|
||||||
|
|
||||||
|
// Both should have the same xorb hashes
|
||||||
|
let mut v1_xorb_hashes: Vec<_> = v1.fetch_info.keys().map(|h| h.to_string()).collect();
|
||||||
|
let mut v2_xorb_hashes: Vec<_> = v2.xorbs.keys().map(|h| h.to_string()).collect();
|
||||||
|
v1_xorb_hashes.sort();
|
||||||
|
v2_xorb_hashes.sort();
|
||||||
|
assert_eq!(v1_xorb_hashes, v2_xorb_hashes);
|
||||||
|
|
||||||
|
// Check range with partial file
|
||||||
|
let file_size = file.data.len() as u64;
|
||||||
|
let range = FileRange::new(file_size / 4, file_size * 3 / 4);
|
||||||
|
let v1r = client
|
||||||
|
.get_reconstruction_v1(&file.file_hash, Some(range))
|
||||||
|
.await
|
||||||
|
.unwrap()
|
||||||
|
.unwrap();
|
||||||
|
let v2r = client
|
||||||
|
.get_reconstruction_v2(&file.file_hash, Some(range))
|
||||||
|
.await
|
||||||
|
.unwrap()
|
||||||
|
.unwrap();
|
||||||
|
assert_eq!(v1r.offset_into_first_range, v2r.offset_into_first_range);
|
||||||
|
assert_eq!(v1r.terms.len(), v2r.terms.len());
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Tests that max_ranges_per_fetch correctly splits multi-range fetch entries.
|
||||||
|
async fn test_v2_max_ranges_per_fetch(client: Arc<dyn DirectAccessClient>) {
|
||||||
|
// Use a file with many non-contiguous segments from the same xorb,
|
||||||
|
// interleaved with another xorb to prevent merging.
|
||||||
|
let term_spec = &[
|
||||||
|
(1, (0, 2)),
|
||||||
|
(2, (0, 1)),
|
||||||
|
(1, (2, 4)),
|
||||||
|
(2, (1, 2)),
|
||||||
|
(1, (4, 6)),
|
||||||
|
(2, (2, 3)),
|
||||||
|
(1, (6, 8)),
|
||||||
|
];
|
||||||
|
let file = client.upload_random_file(term_spec, 512).await.unwrap();
|
||||||
|
|
||||||
|
// Without limit, xorb 1 should have all its ranges in a single fetch
|
||||||
|
let response_unlimited = client.get_reconstruction_v2(&file.file_hash, None).await.unwrap().unwrap();
|
||||||
|
|
||||||
|
// Find xorb 1's descriptor
|
||||||
|
let xorb1_hash = &file.terms[0].xorb_hash;
|
||||||
|
let hex_hash: crate::cas_types::HexMerkleHash = (*xorb1_hash).into();
|
||||||
|
let desc_unlimited = response_unlimited.xorbs.get(&hex_hash).unwrap();
|
||||||
|
|
||||||
|
// Now set max_ranges_per_fetch to 2
|
||||||
|
client.set_max_ranges_per_fetch(2);
|
||||||
|
|
||||||
|
let response_limited = client.get_reconstruction_v2(&file.file_hash, None).await.unwrap().unwrap();
|
||||||
|
|
||||||
|
let desc_limited = response_limited.xorbs.get(&hex_hash).unwrap();
|
||||||
|
|
||||||
|
// With a limit of 2, the number of fetch entries should be >= the unlimited count
|
||||||
|
assert!(
|
||||||
|
desc_limited.len() >= desc_unlimited.len(),
|
||||||
|
"Limited ({}) should have at least as many fetch entries as unlimited ({})",
|
||||||
|
desc_limited.len(),
|
||||||
|
desc_unlimited.len()
|
||||||
|
);
|
||||||
|
|
||||||
|
// Each fetch entry should have at most 2 ranges
|
||||||
|
for fetch in desc_limited {
|
||||||
|
assert!(fetch.ranges.len() <= 2, "Expected at most 2 ranges per fetch, got {}", fetch.ranges.len());
|
||||||
|
}
|
||||||
|
|
||||||
|
// Total ranges across all fetches should equal the unlimited total
|
||||||
|
let total_unlimited: usize = desc_unlimited.iter().map(|f| f.ranges.len()).sum();
|
||||||
|
let total_limited: usize = desc_limited.iter().map(|f| f.ranges.len()).sum();
|
||||||
|
assert_eq!(total_unlimited, total_limited, "Total ranges should be preserved");
|
||||||
|
|
||||||
|
// Reset for other tests
|
||||||
|
client.set_max_ranges_per_fetch(usize::MAX);
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Tests that V2 URLs are valid base64 and decode correctly.
|
||||||
|
/// When going through a server, URLs are HTTP; when direct, they're base64.
|
||||||
|
async fn test_v2_url_encoding(client: Arc<dyn DirectAccessClient>) {
|
||||||
|
use base64::Engine;
|
||||||
|
use base64::engine::general_purpose::URL_SAFE_NO_PAD;
|
||||||
|
|
||||||
|
let term_spec = &[(1, (0, 3))];
|
||||||
|
let file = client.upload_random_file(term_spec, 2048).await.unwrap();
|
||||||
|
|
||||||
|
let response = client.get_reconstruction_v2(&file.file_hash, None).await.unwrap().unwrap();
|
||||||
|
|
||||||
|
for fetch_entries in response.xorbs.values() {
|
||||||
|
for fetch in fetch_entries {
|
||||||
|
assert!(!fetch.url.is_empty(), "URL should not be empty");
|
||||||
|
|
||||||
|
if fetch.url.starts_with("http://") || fetch.url.starts_with("https://") {
|
||||||
|
// Server-transformed URL: should point to fetch_term
|
||||||
|
assert!(fetch.url.contains("/fetch_term"), "HTTP URL should contain /fetch_term: {}", fetch.url);
|
||||||
|
} else {
|
||||||
|
// Direct client URL: should be valid base64
|
||||||
|
let decoded = URL_SAFE_NO_PAD.decode(&fetch.url);
|
||||||
|
assert!(decoded.is_ok(), "URL should be valid base64: {}", fetch.url);
|
||||||
|
|
||||||
|
let payload = String::from_utf8(decoded.unwrap()).unwrap();
|
||||||
|
let parts: Vec<&str> = payload.splitn(3, ':').collect();
|
||||||
|
assert_eq!(parts.len(), 3, "Payload should have 3 colon-separated parts");
|
||||||
|
|
||||||
|
let hash = xet_core_structures::merklehash::MerkleHash::from_hex(parts[0]);
|
||||||
|
assert!(hash.is_ok(), "Hash part should be valid hex");
|
||||||
|
|
||||||
|
let ts: std::result::Result<u64, _> = parts[1].parse();
|
||||||
|
assert!(ts.is_ok(), "Timestamp should be a valid u64");
|
||||||
|
|
||||||
|
for range_str in parts[2].split(',').filter(|s| !s.is_empty()) {
|
||||||
|
let range_parts: Vec<&str> = range_str.split('-').collect();
|
||||||
|
assert_eq!(range_parts.len(), 2, "Each range should be start-end");
|
||||||
|
assert!(range_parts[0].parse::<u64>().is_ok());
|
||||||
|
assert!(range_parts[1].parse::<u64>().is_ok());
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|||||||
@@ -14,7 +14,9 @@ use xet_core_structures::xorb_object::XorbObject;
|
|||||||
|
|
||||||
use super::super::error::Result;
|
use super::super::error::Result;
|
||||||
use super::super::interface::Client;
|
use super::super::interface::Client;
|
||||||
use crate::cas_types::{FileRange, XorbReconstructionFetchInfo};
|
use crate::cas_types::{
|
||||||
|
FileRange, QueryReconstructionResponse, QueryReconstructionResponseV2, XorbReconstructionFetchInfo,
|
||||||
|
};
|
||||||
|
|
||||||
/// A Client with direct access to XORB and file storage.
|
/// A Client with direct access to XORB and file storage.
|
||||||
///
|
///
|
||||||
@@ -40,6 +42,39 @@ pub trait DirectAccessClient: Client + Send + Sync {
|
|||||||
/// Pass `None` to disable the delay.
|
/// Pass `None` to disable the delay.
|
||||||
fn set_api_delay_range(&self, delay_range: Option<Range<Duration>>);
|
fn set_api_delay_range(&self, delay_range: Option<Range<Duration>>);
|
||||||
|
|
||||||
|
/// Sets the maximum number of byte ranges per `XorbMultiRangeFetch` entry
|
||||||
|
/// in V2 reconstruction responses.
|
||||||
|
///
|
||||||
|
/// Default is `usize::MAX` (all ranges in one fetch). When set to N,
|
||||||
|
/// ranges for each xorb are grouped into entries of at most N ranges.
|
||||||
|
/// This simulates the CloudFront URL length limit that forces splitting.
|
||||||
|
fn set_max_ranges_per_fetch(&self, max_ranges: usize);
|
||||||
|
|
||||||
|
/// Disables V2 reconstruction responses with the given HTTP status code.
|
||||||
|
/// When disabled, the V2 endpoint returns this status, forcing clients to
|
||||||
|
/// fall back to V1. Pass 0 to re-enable.
|
||||||
|
fn disable_v2_reconstruction(&self, status_code: u16);
|
||||||
|
|
||||||
|
/// Returns the HTTP status code the V2 endpoint should return when disabled,
|
||||||
|
/// or 0 if V2 is enabled.
|
||||||
|
fn v2_disabled_status_code(&self) -> u16 {
|
||||||
|
0
|
||||||
|
}
|
||||||
|
|
||||||
|
/// V1 reconstruction: returns per-range presigned URLs.
|
||||||
|
async fn get_reconstruction_v1(
|
||||||
|
&self,
|
||||||
|
file_id: &MerkleHash,
|
||||||
|
bytes_range: Option<FileRange>,
|
||||||
|
) -> Result<Option<QueryReconstructionResponse>>;
|
||||||
|
|
||||||
|
/// V2 reconstruction: returns per-xorb multi-range fetch descriptors.
|
||||||
|
async fn get_reconstruction_v2(
|
||||||
|
&self,
|
||||||
|
file_id: &MerkleHash,
|
||||||
|
bytes_range: Option<FileRange>,
|
||||||
|
) -> Result<Option<QueryReconstructionResponseV2>>;
|
||||||
|
|
||||||
/// Applies the configured API delay if set.
|
/// Applies the configured API delay if set.
|
||||||
///
|
///
|
||||||
/// This method sleeps for a random duration within the configured delay range.
|
/// This method sleeps for a random duration within the configured delay range.
|
||||||
|
|||||||
@@ -5,14 +5,12 @@ use std::mem::size_of;
|
|||||||
use std::ops::Range;
|
use std::ops::Range;
|
||||||
use std::path::{Path, PathBuf};
|
use std::path::{Path, PathBuf};
|
||||||
use std::sync::Arc;
|
use std::sync::Arc;
|
||||||
use std::sync::atomic::{AtomicU64, Ordering};
|
use std::sync::atomic::{AtomicU16, AtomicU64, AtomicUsize, Ordering};
|
||||||
|
|
||||||
use anyhow::anyhow;
|
use anyhow::anyhow;
|
||||||
use async_trait::async_trait;
|
use async_trait::async_trait;
|
||||||
use bytes::Bytes;
|
use bytes::Bytes;
|
||||||
use heed::types::*;
|
use heed::types::*;
|
||||||
use lazy_static::lazy_static;
|
|
||||||
use more_asserts::*;
|
|
||||||
use rand::Rng;
|
use rand::Rng;
|
||||||
use tempfile::TempDir;
|
use tempfile::TempDir;
|
||||||
use tokio::time::{Duration, Instant};
|
use tokio::time::{Duration, Instant};
|
||||||
@@ -30,25 +28,16 @@ use xet_core_structures::xorb_object::{SerializedXorbObject, XorbObject};
|
|||||||
use xet_runtime::file_utils::SafeFileCreator;
|
use xet_runtime::file_utils::SafeFileCreator;
|
||||||
|
|
||||||
use super::direct_access_client::DirectAccessClient;
|
use super::direct_access_client::DirectAccessClient;
|
||||||
|
use super::xorb_utils::{self, REFERENCE_INSTANT};
|
||||||
use crate::cas_client::Client;
|
use crate::cas_client::Client;
|
||||||
use crate::cas_client::adaptive_concurrency::AdaptiveConcurrencyController;
|
use crate::cas_client::adaptive_concurrency::AdaptiveConcurrencyController;
|
||||||
use crate::cas_client::error::{CasClientError, Result};
|
use crate::cas_client::error::{CasClientError, Result};
|
||||||
use crate::cas_client::progress_tracked_streams::ProgressCallback;
|
use crate::cas_client::progress_tracked_streams::ProgressCallback;
|
||||||
use crate::cas_types::{
|
use crate::cas_types::{
|
||||||
BatchQueryReconstructionResponse, ChunkRange, FileRange, HexMerkleHash, HttpRange, QueryReconstructionResponse,
|
BatchQueryReconstructionResponse, FileRange, HexMerkleHash, HttpRange, QueryReconstructionResponse,
|
||||||
XorbReconstructionFetchInfo, XorbReconstructionTerm,
|
QueryReconstructionResponseV2, XorbMultiRangeFetch, XorbRangeDescriptor, XorbReconstructionFetchInfo,
|
||||||
};
|
};
|
||||||
|
|
||||||
lazy_static! {
|
|
||||||
/// Reference instant for URL timestamps. Initialized far in the past to allow
|
|
||||||
/// testing timestamps that are earlier in the current process lifetime.
|
|
||||||
static ref REFERENCE_INSTANT: Instant = {
|
|
||||||
let now = Instant::now();
|
|
||||||
now.checked_sub(Duration::from_secs(365 * 24 * 60 * 60))
|
|
||||||
.unwrap_or(now)
|
|
||||||
};
|
|
||||||
}
|
|
||||||
|
|
||||||
pub struct LocalClient {
|
pub struct LocalClient {
|
||||||
// Note: Field order matters for Drop! heed::Env must be dropped before _tmp_dir
|
// Note: Field order matters for Drop! heed::Env must be dropped before _tmp_dir
|
||||||
// because heed holds file handles that need to be closed before the directory is deleted.
|
// because heed holds file handles that need to be closed before the directory is deleted.
|
||||||
@@ -62,6 +51,10 @@ pub struct LocalClient {
|
|||||||
url_expiration_ms: AtomicU64,
|
url_expiration_ms: AtomicU64,
|
||||||
/// API delay range in milliseconds as (min_ms, max_ms). (0, 0) means disabled.
|
/// API delay range in milliseconds as (min_ms, max_ms). (0, 0) means disabled.
|
||||||
random_ms_delay_window: (AtomicU64, AtomicU64),
|
random_ms_delay_window: (AtomicU64, AtomicU64),
|
||||||
|
/// Max ranges per XorbMultiRangeFetch entry. usize::MAX means no splitting.
|
||||||
|
max_ranges_per_fetch: AtomicUsize,
|
||||||
|
/// HTTP status code to return when V2 is disabled (0 = enabled).
|
||||||
|
v2_disabled_status: AtomicU16,
|
||||||
_tmp_dir: Option<TempDir>, // Must be last - dropped after heed env is closed
|
_tmp_dir: Option<TempDir>, // Must be last - dropped after heed env is closed
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -76,7 +69,7 @@ impl LocalClient {
|
|||||||
}
|
}
|
||||||
|
|
||||||
/// Create a local client hosted in a directory. Effectively, this directory
|
/// Create a local client hosted in a directory. Effectively, this directory
|
||||||
/// is the CAS endpoint and persists across instances of LocalClient.
|
/// is the CAS endpoint and persists across instances of LocalClient.
|
||||||
pub async fn new(path: impl AsRef<Path>) -> Result<Arc<Self>> {
|
pub async fn new(path: impl AsRef<Path>) -> Result<Arc<Self>> {
|
||||||
let path = path.as_ref().to_owned();
|
let path = path.as_ref().to_owned();
|
||||||
Ok(Arc::new(Self::new_internal(path, None).await?))
|
Ok(Arc::new(Self::new_internal(path, None).await?))
|
||||||
@@ -157,6 +150,8 @@ impl LocalClient {
|
|||||||
upload_concurrency_controller: AdaptiveConcurrencyController::new_upload("local_uploads"),
|
upload_concurrency_controller: AdaptiveConcurrencyController::new_upload("local_uploads"),
|
||||||
url_expiration_ms: AtomicU64::new(u64::MAX),
|
url_expiration_ms: AtomicU64::new(u64::MAX),
|
||||||
random_ms_delay_window: (AtomicU64::new(0), AtomicU64::new(0)),
|
random_ms_delay_window: (AtomicU64::new(0), AtomicU64::new(0)),
|
||||||
|
max_ranges_per_fetch: AtomicUsize::new(usize::MAX),
|
||||||
|
v2_disabled_status: AtomicU16::new(0),
|
||||||
_tmp_dir: tmp_dir, // Must be last - dropped after heed env is closed
|
_tmp_dir: tmp_dir, // Must be last - dropped after heed env is closed
|
||||||
})
|
})
|
||||||
}
|
}
|
||||||
@@ -347,6 +342,34 @@ impl DirectAccessClient for LocalClient {
|
|||||||
self.url_expiration_ms.store(expiration.as_millis() as u64, Ordering::Relaxed);
|
self.url_expiration_ms.store(expiration.as_millis() as u64, Ordering::Relaxed);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
fn set_max_ranges_per_fetch(&self, max_ranges: usize) {
|
||||||
|
self.max_ranges_per_fetch.store(max_ranges, Ordering::Relaxed);
|
||||||
|
}
|
||||||
|
|
||||||
|
fn disable_v2_reconstruction(&self, status_code: u16) {
|
||||||
|
self.v2_disabled_status.store(status_code, Ordering::Relaxed);
|
||||||
|
}
|
||||||
|
|
||||||
|
fn v2_disabled_status_code(&self) -> u16 {
|
||||||
|
self.v2_disabled_status.load(Ordering::Relaxed)
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn get_reconstruction_v1(
|
||||||
|
&self,
|
||||||
|
file_id: &MerkleHash,
|
||||||
|
bytes_range: Option<FileRange>,
|
||||||
|
) -> Result<Option<QueryReconstructionResponse>> {
|
||||||
|
LocalClient::get_reconstruction_v1(self, file_id, bytes_range).await
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn get_reconstruction_v2(
|
||||||
|
&self,
|
||||||
|
file_id: &MerkleHash,
|
||||||
|
bytes_range: Option<FileRange>,
|
||||||
|
) -> Result<Option<QueryReconstructionResponseV2>> {
|
||||||
|
LocalClient::get_reconstruction_v2(self, file_id, bytes_range).await
|
||||||
|
}
|
||||||
|
|
||||||
fn set_api_delay_range(&self, delay_range: Option<Range<Duration>>) {
|
fn set_api_delay_range(&self, delay_range: Option<Range<Duration>>) {
|
||||||
match delay_range {
|
match delay_range {
|
||||||
Some(range) => {
|
Some(range) => {
|
||||||
@@ -626,7 +649,126 @@ impl DirectAccessClient for LocalClient {
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
/// LocalClient is responsible for writing/reading Xorbs on the local disk.
|
impl LocalClient {
|
||||||
|
async fn compute_reconstruction_ranges(
|
||||||
|
&self,
|
||||||
|
file_id: &MerkleHash,
|
||||||
|
bytes_range: Option<FileRange>,
|
||||||
|
) -> Result<xorb_utils::ReconstructionRangesResult> {
|
||||||
|
let Some((file_info, _)) = self.shard_manager.get_file_reconstruction_info(file_id).await? else {
|
||||||
|
return Ok(None);
|
||||||
|
};
|
||||||
|
|
||||||
|
xorb_utils::compute_reconstruction_ranges(&file_info, bytes_range, &mut |hash| self.xorb_footer_sync(hash))
|
||||||
|
}
|
||||||
|
|
||||||
|
fn xorb_footer_sync(&self, hash: &MerkleHash) -> Result<XorbObject> {
|
||||||
|
let file_path = self.get_path_for_entry(hash);
|
||||||
|
let mut file = File::open(&file_path).map_err(|_| {
|
||||||
|
error!("Unable to find file in local CAS {:?}", file_path);
|
||||||
|
CasClientError::XORBNotFound(*hash)
|
||||||
|
})?;
|
||||||
|
XorbObject::deserialize(&mut file).map_err(Into::into)
|
||||||
|
}
|
||||||
|
|
||||||
|
/// V1 reconstruction: returns per-range presigned URLs.
|
||||||
|
pub async fn get_reconstruction_v1(
|
||||||
|
&self,
|
||||||
|
file_id: &MerkleHash,
|
||||||
|
bytes_range: Option<FileRange>,
|
||||||
|
) -> Result<Option<QueryReconstructionResponse>> {
|
||||||
|
self.apply_api_delay().await;
|
||||||
|
|
||||||
|
let result = self.compute_reconstruction_ranges(file_id, bytes_range).await?;
|
||||||
|
let Some((offset_into_first_range, terms, merged_ranges)) = result else {
|
||||||
|
return Ok(None);
|
||||||
|
};
|
||||||
|
|
||||||
|
if terms.is_empty() {
|
||||||
|
return Ok(Some(QueryReconstructionResponse {
|
||||||
|
offset_into_first_range,
|
||||||
|
terms,
|
||||||
|
fetch_info: HashMap::new(),
|
||||||
|
}));
|
||||||
|
}
|
||||||
|
|
||||||
|
let timestamp = Instant::now();
|
||||||
|
let mut fetch_info: HashMap<HexMerkleHash, Vec<XorbReconstructionFetchInfo>> = HashMap::new();
|
||||||
|
for (hash, ranges) in merged_ranges {
|
||||||
|
let file_path = self.get_path_for_entry(&hash);
|
||||||
|
let entries = ranges
|
||||||
|
.into_iter()
|
||||||
|
.map(|r| XorbReconstructionFetchInfo {
|
||||||
|
range: r.chunk_range,
|
||||||
|
url: generate_fetch_url(&file_path, &r.byte_range, timestamp),
|
||||||
|
url_range: HttpRange::from(r.byte_range),
|
||||||
|
})
|
||||||
|
.collect();
|
||||||
|
fetch_info.insert(hash.into(), entries);
|
||||||
|
}
|
||||||
|
|
||||||
|
Ok(Some(QueryReconstructionResponse {
|
||||||
|
offset_into_first_range,
|
||||||
|
terms,
|
||||||
|
fetch_info,
|
||||||
|
}))
|
||||||
|
}
|
||||||
|
|
||||||
|
/// V2 reconstruction: returns per-xorb multi-range fetch descriptors.
|
||||||
|
pub async fn get_reconstruction_v2(
|
||||||
|
&self,
|
||||||
|
file_id: &MerkleHash,
|
||||||
|
bytes_range: Option<FileRange>,
|
||||||
|
) -> Result<Option<QueryReconstructionResponseV2>> {
|
||||||
|
self.apply_api_delay().await;
|
||||||
|
|
||||||
|
let result = self.compute_reconstruction_ranges(file_id, bytes_range).await?;
|
||||||
|
let Some((offset_into_first_range, terms, merged_ranges)) = result else {
|
||||||
|
return Ok(None);
|
||||||
|
};
|
||||||
|
|
||||||
|
if terms.is_empty() {
|
||||||
|
return Ok(Some(QueryReconstructionResponseV2 {
|
||||||
|
offset_into_first_range,
|
||||||
|
terms,
|
||||||
|
xorbs: HashMap::new(),
|
||||||
|
}));
|
||||||
|
}
|
||||||
|
|
||||||
|
let timestamp = Instant::now();
|
||||||
|
let max_ranges = self.max_ranges_per_fetch.load(Ordering::Relaxed);
|
||||||
|
|
||||||
|
let mut xorbs: HashMap<HexMerkleHash, Vec<XorbMultiRangeFetch>> = HashMap::new();
|
||||||
|
for (hash, ranges) in merged_ranges {
|
||||||
|
let mut fetch_entries = Vec::new();
|
||||||
|
|
||||||
|
for chunk in ranges.chunks(max_ranges) {
|
||||||
|
let range_descriptors: Vec<XorbRangeDescriptor> = chunk
|
||||||
|
.iter()
|
||||||
|
.map(|r| XorbRangeDescriptor {
|
||||||
|
chunks: r.chunk_range,
|
||||||
|
bytes: HttpRange::from(r.byte_range),
|
||||||
|
})
|
||||||
|
.collect();
|
||||||
|
|
||||||
|
let url = generate_v2_fetch_url(&hash, &range_descriptors, timestamp);
|
||||||
|
fetch_entries.push(XorbMultiRangeFetch {
|
||||||
|
url,
|
||||||
|
ranges: range_descriptors,
|
||||||
|
});
|
||||||
|
}
|
||||||
|
|
||||||
|
xorbs.insert(hash.into(), fetch_entries);
|
||||||
|
}
|
||||||
|
|
||||||
|
Ok(Some(QueryReconstructionResponseV2 {
|
||||||
|
offset_into_first_range,
|
||||||
|
terms,
|
||||||
|
xorbs,
|
||||||
|
}))
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
#[async_trait]
|
#[async_trait]
|
||||||
impl Client for LocalClient {
|
impl Client for LocalClient {
|
||||||
async fn get_file_reconstruction_info(
|
async fn get_file_reconstruction_info(
|
||||||
@@ -784,196 +926,8 @@ impl Client for LocalClient {
|
|||||||
&self,
|
&self,
|
||||||
file_id: &MerkleHash,
|
file_id: &MerkleHash,
|
||||||
bytes_range: Option<FileRange>,
|
bytes_range: Option<FileRange>,
|
||||||
) -> Result<Option<QueryReconstructionResponse>> {
|
) -> Result<Option<QueryReconstructionResponseV2>> {
|
||||||
self.apply_api_delay().await;
|
self.get_reconstruction_v2(file_id, bytes_range).await
|
||||||
let Some((file_info, _)) = self.shard_manager.get_file_reconstruction_info(file_id).await? else {
|
|
||||||
return Ok(None);
|
|
||||||
};
|
|
||||||
|
|
||||||
// Calculate total file size from segments
|
|
||||||
let total_file_size: u64 = file_info.file_size();
|
|
||||||
// Handle range validation and truncation
|
|
||||||
let file_range = if let Some(range) = bytes_range {
|
|
||||||
// If the entire range is out of bounds, return None (like RemoteClient does for 416)
|
|
||||||
if range.start >= total_file_size {
|
|
||||||
// For empty files (size 0), only the first query (start == 0) should return the empty reconstruction
|
|
||||||
// All subsequent queries should return None to prevent infinite remainder loops
|
|
||||||
if total_file_size == 0 && range.start == 0 {
|
|
||||||
// Empty file - return valid but empty reconstruction
|
|
||||||
return Ok(Some(QueryReconstructionResponse {
|
|
||||||
offset_into_first_range: 0,
|
|
||||||
terms: vec![],
|
|
||||||
fetch_info: HashMap::new(),
|
|
||||||
}));
|
|
||||||
}
|
|
||||||
return Ok(None);
|
|
||||||
}
|
|
||||||
// Truncate end if it extends beyond file size
|
|
||||||
FileRange::new(range.start, range.end.min(total_file_size))
|
|
||||||
} else {
|
|
||||||
// No range specified - handle empty files
|
|
||||||
if total_file_size == 0 {
|
|
||||||
return Ok(Some(QueryReconstructionResponse {
|
|
||||||
offset_into_first_range: 0,
|
|
||||||
terms: vec![],
|
|
||||||
fetch_info: HashMap::new(),
|
|
||||||
}));
|
|
||||||
}
|
|
||||||
FileRange::full()
|
|
||||||
};
|
|
||||||
|
|
||||||
// First skip file segments until we find the first one that starts before the file range start
|
|
||||||
let mut s_idx = 0;
|
|
||||||
let mut cumulative_bytes = 0u64;
|
|
||||||
let mut first_chunk_byte_start;
|
|
||||||
|
|
||||||
loop {
|
|
||||||
if s_idx >= file_info.segments.len() {
|
|
||||||
// We have here that the requested file range is out of bounds,
|
|
||||||
// so return a range error.
|
|
||||||
return Err(CasClientError::InvalidRange);
|
|
||||||
}
|
|
||||||
|
|
||||||
let n = file_info.segments[s_idx].unpacked_segment_bytes as u64;
|
|
||||||
if cumulative_bytes + n > file_range.start {
|
|
||||||
assert_ge!(file_range.start, cumulative_bytes);
|
|
||||||
first_chunk_byte_start = cumulative_bytes;
|
|
||||||
break;
|
|
||||||
} else {
|
|
||||||
cumulative_bytes += n;
|
|
||||||
s_idx += 1;
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
// Now, prepare the response by iterating over the segments and
|
|
||||||
// adding the terms and fetch info to the response.
|
|
||||||
let mut terms = Vec::new();
|
|
||||||
|
|
||||||
#[derive(Clone)]
|
|
||||||
struct FetchInfoIntermediate {
|
|
||||||
chunk_range: ChunkRange,
|
|
||||||
byte_range: FileRange,
|
|
||||||
}
|
|
||||||
|
|
||||||
let mut fetch_info_map: HashMap<MerkleHash, Vec<FetchInfoIntermediate>> = HashMap::new();
|
|
||||||
|
|
||||||
while s_idx < file_info.segments.len() && cumulative_bytes < file_range.end {
|
|
||||||
let mut segment = file_info.segments[s_idx].clone();
|
|
||||||
let mut chunk_range = ChunkRange::new(segment.chunk_index_start, segment.chunk_index_end);
|
|
||||||
|
|
||||||
// Now get the URL for this segment, which involves reading the actual byte range there.
|
|
||||||
let xorb_footer = self.xorb_footer(&segment.xorb_hash).await?;
|
|
||||||
|
|
||||||
// Do we need to prune the first segment on chunk boundaries to align with the range given?
|
|
||||||
if cumulative_bytes < file_range.start {
|
|
||||||
while chunk_range.start < chunk_range.end {
|
|
||||||
let next_chunk_size = xorb_footer.uncompressed_chunk_length(chunk_range.start)? as u64;
|
|
||||||
|
|
||||||
if cumulative_bytes + next_chunk_size <= file_range.start {
|
|
||||||
cumulative_bytes += next_chunk_size;
|
|
||||||
first_chunk_byte_start += next_chunk_size;
|
|
||||||
segment.unpacked_segment_bytes -= next_chunk_size as u32;
|
|
||||||
|
|
||||||
chunk_range.start += 1;
|
|
||||||
|
|
||||||
// Should find it somewhere in here.
|
|
||||||
debug_assert_lt!(chunk_range.start, chunk_range.end);
|
|
||||||
} else {
|
|
||||||
break;
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
// Do we need to prune the last segment on chunk boundaries to align with the range given?
|
|
||||||
if cumulative_bytes + segment.unpacked_segment_bytes as u64 > file_range.end {
|
|
||||||
while chunk_range.end > chunk_range.start {
|
|
||||||
let last_chunk_size = xorb_footer.uncompressed_chunk_length(chunk_range.end - 1)?;
|
|
||||||
|
|
||||||
if cumulative_bytes + (segment.unpacked_segment_bytes - last_chunk_size) as u64 >= file_range.end {
|
|
||||||
// We can cut the last chunk off and still contain the requested range.
|
|
||||||
chunk_range.end -= 1;
|
|
||||||
segment.unpacked_segment_bytes -= last_chunk_size;
|
|
||||||
debug_assert_lt!(chunk_range.start, chunk_range.end);
|
|
||||||
debug_assert_gt!(segment.unpacked_segment_bytes, 0);
|
|
||||||
} else {
|
|
||||||
break;
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
let (byte_start, byte_end) = xorb_footer.get_byte_offset(chunk_range.start, chunk_range.end)?;
|
|
||||||
let byte_range = FileRange::new(byte_start as u64, byte_end as u64);
|
|
||||||
|
|
||||||
let xorb_reconstruction_term = XorbReconstructionTerm {
|
|
||||||
hash: segment.xorb_hash.into(),
|
|
||||||
unpacked_length: segment.unpacked_segment_bytes,
|
|
||||||
range: chunk_range,
|
|
||||||
};
|
|
||||||
|
|
||||||
terms.push(xorb_reconstruction_term);
|
|
||||||
|
|
||||||
let fetch_info_intemediate = FetchInfoIntermediate {
|
|
||||||
chunk_range,
|
|
||||||
byte_range,
|
|
||||||
};
|
|
||||||
|
|
||||||
fetch_info_map
|
|
||||||
.entry(segment.xorb_hash)
|
|
||||||
.or_default()
|
|
||||||
.push(fetch_info_intemediate);
|
|
||||||
|
|
||||||
cumulative_bytes += segment.unpacked_segment_bytes as u64;
|
|
||||||
s_idx += 1;
|
|
||||||
}
|
|
||||||
|
|
||||||
assert!(!terms.is_empty());
|
|
||||||
|
|
||||||
let timestamp = Instant::now();
|
|
||||||
|
|
||||||
// Sort and merge adjacent/overlapping ranges in each fetch_info Vec
|
|
||||||
let mut merged_fetch_info_map: HashMap<HexMerkleHash, Vec<XorbReconstructionFetchInfo>> = HashMap::new();
|
|
||||||
for (hash, mut fi_vec) in fetch_info_map {
|
|
||||||
// Sort by url_range.start
|
|
||||||
fi_vec.sort_by_key(|fi| fi.chunk_range.start);
|
|
||||||
let file_path = self.get_path_for_entry(&hash);
|
|
||||||
|
|
||||||
// Merge adjacent or overlapping ranges
|
|
||||||
let mut merged: Vec<XorbReconstructionFetchInfo> = Vec::new();
|
|
||||||
let mut idx = 0;
|
|
||||||
|
|
||||||
while idx < fi_vec.len() {
|
|
||||||
// Go through and merge adjascent or overlapping ranges,
|
|
||||||
// then form the full XorbReconstructionFetchInfo structs.
|
|
||||||
let mut new_fi = fi_vec[idx].clone();
|
|
||||||
|
|
||||||
while idx + 1 < fi_vec.len() {
|
|
||||||
let next_fi = &fi_vec[idx + 1];
|
|
||||||
if next_fi.chunk_range.start <= new_fi.chunk_range.end {
|
|
||||||
new_fi.chunk_range.end = next_fi.chunk_range.end.max(new_fi.chunk_range.end);
|
|
||||||
new_fi.byte_range.end = next_fi.byte_range.end.max(new_fi.byte_range.end);
|
|
||||||
idx += 1;
|
|
||||||
} else {
|
|
||||||
break;
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
merged.push(XorbReconstructionFetchInfo {
|
|
||||||
range: new_fi.chunk_range,
|
|
||||||
url: generate_fetch_url(&file_path, &new_fi.byte_range, timestamp),
|
|
||||||
url_range: HttpRange::from(new_fi.byte_range),
|
|
||||||
});
|
|
||||||
|
|
||||||
idx += 1;
|
|
||||||
}
|
|
||||||
|
|
||||||
merged_fetch_info_map.insert(hash.into(), merged);
|
|
||||||
}
|
|
||||||
|
|
||||||
Ok(Some(QueryReconstructionResponse {
|
|
||||||
offset_into_first_range: file_range.start - first_chunk_byte_start,
|
|
||||||
terms,
|
|
||||||
fetch_info: merged_fetch_info_map,
|
|
||||||
}))
|
|
||||||
}
|
}
|
||||||
|
|
||||||
async fn batch_get_reconstruction(&self, file_ids: &[MerkleHash]) -> Result<BatchQueryReconstructionResponse> {
|
async fn batch_get_reconstruction(&self, file_ids: &[MerkleHash]) -> Result<BatchQueryReconstructionResponse> {
|
||||||
@@ -982,7 +936,7 @@ impl Client for LocalClient {
|
|||||||
let mut fetch_info_map: HashMap<HexMerkleHash, Vec<XorbReconstructionFetchInfo>> = HashMap::new();
|
let mut fetch_info_map: HashMap<HexMerkleHash, Vec<XorbReconstructionFetchInfo>> = HashMap::new();
|
||||||
|
|
||||||
for file_id in file_ids {
|
for file_id in file_ids {
|
||||||
if let Some(response) = self.get_reconstruction(file_id, None).await? {
|
if let Some(response) = self.get_reconstruction_v1(file_id, None).await? {
|
||||||
let hex_hash: HexMerkleHash = (*file_id).into();
|
let hex_hash: HexMerkleHash = (*file_id).into();
|
||||||
files.insert(hex_hash, response.terms);
|
files.insert(hex_hash, response.terms);
|
||||||
|
|
||||||
@@ -1013,8 +967,14 @@ impl Client for LocalClient {
|
|||||||
// Retry loop: try to fetch, and if URL expired, refresh and retry once.
|
// Retry loop: try to fetch, and if URL expired, refresh and retry once.
|
||||||
for attempt in 0..2 {
|
for attempt in 0..2 {
|
||||||
self.apply_api_delay().await;
|
self.apply_api_delay().await;
|
||||||
let (url, range) = url_info.retrieve_url().await?;
|
let (url, http_ranges) = url_info.retrieve_url().await?;
|
||||||
let (file_path, _url_byte_range, url_timestamp) = parse_fetch_url(&url)?;
|
|
||||||
|
let (file_path, url_timestamp) = if let Ok((path, _, ts)) = parse_fetch_url(&url) {
|
||||||
|
(path, ts)
|
||||||
|
} else {
|
||||||
|
let (hash, ts, _) = xorb_utils::parse_v2_fetch_url(&url)?;
|
||||||
|
(self.get_path_for_entry(&hash), ts)
|
||||||
|
};
|
||||||
|
|
||||||
// Check if URL has expired
|
// Check if URL has expired
|
||||||
let expiration_ms = self.url_expiration_ms.load(Ordering::Relaxed);
|
let expiration_ms = self.url_expiration_ms.load(Ordering::Relaxed);
|
||||||
@@ -1028,34 +988,46 @@ impl Client for LocalClient {
|
|||||||
return Err(CasClientError::PresignedUrlExpirationError);
|
return Err(CasClientError::PresignedUrlExpirationError);
|
||||||
}
|
}
|
||||||
|
|
||||||
// Read the byte range from the file and deserialize
|
// Read each byte range from the serialized file and deserialize the chunks.
|
||||||
let mut file = File::open(&file_path).map_err(|_| CasClientError::XORBNotFound(MerkleHash::default()))?;
|
let mut file = File::open(&file_path).map_err(|_| CasClientError::XORBNotFound(MerkleHash::default()))?;
|
||||||
let start = range.start;
|
|
||||||
let end = range.end + 1; // HttpRange is inclusive end
|
|
||||||
file.seek(SeekFrom::Start(start))?;
|
|
||||||
let len = (end - start) as usize;
|
|
||||||
let mut data = vec![0u8; len];
|
|
||||||
std::io::Read::read_exact(&mut file, &mut data)?;
|
|
||||||
|
|
||||||
// Deserialize the chunks from the raw XORB data
|
let mut all_decompressed = Vec::new();
|
||||||
let (decompressed_data, chunk_byte_indices) =
|
let mut all_chunk_indices = Vec::<u32>::new();
|
||||||
xet_core_structures::xorb_object::deserialize_chunks(&mut Cursor::new(&data))?;
|
let mut total_transfer = 0u64;
|
||||||
|
|
||||||
if let Some(expected) = uncompressed_size_if_known {
|
for http_range in &http_ranges {
|
||||||
debug_assert_eq!(
|
let len = http_range.length() as usize;
|
||||||
decompressed_data.len(),
|
total_transfer += http_range.length();
|
||||||
expected,
|
|
||||||
"get_file_term_data: expected {} bytes, got {}",
|
file.seek(SeekFrom::Start(http_range.start))?;
|
||||||
expected,
|
let mut data = vec![0u8; len];
|
||||||
decompressed_data.len()
|
std::io::Read::read_exact(&mut file, &mut data)?;
|
||||||
|
|
||||||
|
let (decompressed, chunk_indices) =
|
||||||
|
xet_core_structures::xorb_object::deserialize_chunks(&mut Cursor::new(&data))?;
|
||||||
|
|
||||||
|
xet_core_structures::xorb_object::append_chunk_segment(
|
||||||
|
&mut all_decompressed,
|
||||||
|
&mut all_chunk_indices,
|
||||||
|
&decompressed,
|
||||||
|
&chunk_indices,
|
||||||
);
|
);
|
||||||
}
|
}
|
||||||
|
|
||||||
let transfer_len = len as u64;
|
if let Some(expected) = uncompressed_size_if_known {
|
||||||
if let Some(ref cb) = progress_callback {
|
debug_assert_eq!(
|
||||||
cb(transfer_len, transfer_len, transfer_len);
|
all_decompressed.len(),
|
||||||
|
expected,
|
||||||
|
"get_file_term_data: expected {} bytes, got {}",
|
||||||
|
expected,
|
||||||
|
all_decompressed.len()
|
||||||
|
);
|
||||||
}
|
}
|
||||||
return Ok((Bytes::from(decompressed_data), chunk_byte_indices));
|
|
||||||
|
if let Some(ref cb) = progress_callback {
|
||||||
|
cb(total_transfer, total_transfer, total_transfer);
|
||||||
|
}
|
||||||
|
return Ok((Bytes::from(all_decompressed), all_chunk_indices));
|
||||||
}
|
}
|
||||||
|
|
||||||
// Should not reach here, but return error if we do.
|
// Should not reach here, but return error if we do.
|
||||||
@@ -1093,6 +1065,10 @@ fn parse_fetch_url(url: &str) -> Result<(PathBuf, FileRange, Instant)> {
|
|||||||
|
|
||||||
Ok((file_path, byte_range, timestamp))
|
Ok((file_path, byte_range, timestamp))
|
||||||
}
|
}
|
||||||
|
|
||||||
|
fn generate_v2_fetch_url(hash: &MerkleHash, ranges: &[XorbRangeDescriptor], timestamp: Instant) -> String {
|
||||||
|
xorb_utils::generate_v2_fetch_url(hash, ranges, timestamp)
|
||||||
|
}
|
||||||
#[cfg(test)]
|
#[cfg(test)]
|
||||||
mod tests {
|
mod tests {
|
||||||
use xet_core_structures::xorb_object::xorb_format_test_utils::{
|
use xet_core_structures::xorb_object::xorb_format_test_utils::{
|
||||||
@@ -1102,7 +1078,7 @@ mod tests {
|
|||||||
use super::*;
|
use super::*;
|
||||||
use crate::cas_client::simulation::DeletionControlableClient;
|
use crate::cas_client::simulation::DeletionControlableClient;
|
||||||
use crate::cas_client::simulation::client_testing_utils::ClientTestingUtils;
|
use crate::cas_client::simulation::client_testing_utils::ClientTestingUtils;
|
||||||
use crate::cas_types::XorbReconstructionFetchInfo;
|
use crate::cas_types::{ChunkRange, XorbReconstructionFetchInfo};
|
||||||
|
|
||||||
/// Runs the common TestingClient trait test suite for LocalClient.
|
/// Runs the common TestingClient trait test suite for LocalClient.
|
||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
|
|||||||
@@ -32,8 +32,8 @@ use super::super::super::error::CasClientError;
|
|||||||
use super::super::super::{DeletionControlableClient, DirectAccessClient};
|
use super::super::super::{DeletionControlableClient, DirectAccessClient};
|
||||||
use super::latency_simulation::{LatencySimulation, ServerLatencyProfile};
|
use super::latency_simulation::{LatencySimulation, ServerLatencyProfile};
|
||||||
use crate::cas_types::{
|
use crate::cas_types::{
|
||||||
FileRange, HexKey, HexMerkleHash, UploadShardResponse, UploadShardResponseType, UploadXorbResponse,
|
FileRange, HexKey, HexMerkleHash, QueryReconstructionResponseV2, UploadShardResponse, UploadShardResponseType,
|
||||||
XorbReconstructionFetchInfo,
|
UploadXorbResponse, XorbRangeDescriptor, XorbReconstructionFetchInfo,
|
||||||
};
|
};
|
||||||
|
|
||||||
/// Server state passed to all handlers.
|
/// Server state passed to all handlers.
|
||||||
@@ -128,27 +128,55 @@ pub(super) fn error_to_response(e: CasClientError) -> Response {
|
|||||||
(status, e.to_string()).into_response()
|
(status, e.to_string()).into_response()
|
||||||
}
|
}
|
||||||
|
|
||||||
/// Encodes term data (file path) into a URL-safe base64 string.
|
/// Encodes a V1 fetch term for HTTP transport.
|
||||||
///
|
/// Contains only the xorb hash; the byte range comes from the HTTP Range header.
|
||||||
/// The term encodes the local file path that the LocalClient uses.
|
|
||||||
/// This allows the fetch_term endpoint to retrieve the data.
|
|
||||||
/// Encodes a fetch term for HTTP transport.
|
|
||||||
///
|
|
||||||
/// The encoded term contains:
|
|
||||||
/// - xorb_hash: The XORB hash (hex encoded)
|
|
||||||
///
|
|
||||||
/// The byte range to fetch comes from the HTTP Range header, not encoded in the term.
|
|
||||||
fn encode_term(xorb_hash: &MerkleHash) -> String {
|
fn encode_term(xorb_hash: &MerkleHash) -> String {
|
||||||
URL_SAFE_NO_PAD.encode(xorb_hash.hex().as_bytes())
|
URL_SAFE_NO_PAD.encode(xorb_hash.hex().as_bytes())
|
||||||
}
|
}
|
||||||
|
|
||||||
/// Decodes a fetch term back into its components.
|
/// Encodes a V2 fetch term with embedded byte ranges.
|
||||||
///
|
/// Format: "{hash_hex}:{start1}-{end1},{start2}-{end2},..."
|
||||||
/// Returns the xorb_hash.
|
/// Byte ranges use exclusive end (FileRange convention).
|
||||||
fn decode_term(term: &str) -> Result<MerkleHash, String> {
|
fn encode_term_with_ranges(xorb_hash: &MerkleHash, ranges: &[XorbRangeDescriptor]) -> String {
|
||||||
|
let ranges_str: Vec<String> = ranges
|
||||||
|
.iter()
|
||||||
|
.map(|r| {
|
||||||
|
let file_range = FileRange::from(r.bytes);
|
||||||
|
format!("{}-{}", file_range.start, file_range.end)
|
||||||
|
})
|
||||||
|
.collect();
|
||||||
|
let payload = format!("{}:{}", xorb_hash.hex(), ranges_str.join(","));
|
||||||
|
URL_SAFE_NO_PAD.encode(payload.as_bytes())
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Decoded fetch term: hash and optional byte ranges (exclusive end).
|
||||||
|
struct DecodedTerm {
|
||||||
|
hash: MerkleHash,
|
||||||
|
byte_ranges: Vec<FileRange>,
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Decodes a fetch term. Supports both V1 (hash only) and V2 (hash + ranges).
|
||||||
|
fn decode_term(term: &str) -> Result<DecodedTerm, String> {
|
||||||
let bytes = URL_SAFE_NO_PAD.decode(term).map_err(|e| format!("Invalid base64: {e}"))?;
|
let bytes = URL_SAFE_NO_PAD.decode(term).map_err(|e| format!("Invalid base64: {e}"))?;
|
||||||
let hash_hex = String::from_utf8(bytes).map_err(|e| format!("Invalid UTF-8: {e}"))?;
|
let payload = String::from_utf8(bytes).map_err(|e| format!("Invalid UTF-8: {e}"))?;
|
||||||
MerkleHash::from_hex(&hash_hex).map_err(|e| format!("Invalid hash: {e}"))
|
|
||||||
|
if let Some((hash_hex, ranges_str)) = payload.split_once(':') {
|
||||||
|
let hash = MerkleHash::from_hex(hash_hex).map_err(|e| format!("Invalid hash: {e}"))?;
|
||||||
|
let mut byte_ranges = Vec::new();
|
||||||
|
for r in ranges_str.split(',').filter(|s| !s.is_empty()) {
|
||||||
|
let (start_s, end_s) = r.split_once('-').ok_or("Invalid range syntax")?;
|
||||||
|
let start: u64 = start_s.parse().map_err(|e| format!("Invalid range start: {e}"))?;
|
||||||
|
let end: u64 = end_s.parse().map_err(|e| format!("Invalid range end: {e}"))?;
|
||||||
|
byte_ranges.push(FileRange::new(start, end));
|
||||||
|
}
|
||||||
|
Ok(DecodedTerm { hash, byte_ranges })
|
||||||
|
} else {
|
||||||
|
let hash = MerkleHash::from_hex(&payload).map_err(|e| format!("Invalid hash: {e}"))?;
|
||||||
|
Ok(DecodedTerm {
|
||||||
|
hash,
|
||||||
|
byte_ranges: vec![],
|
||||||
|
})
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
/// Extracts the base URL from request headers (Host header).
|
/// Extracts the base URL from request headers (Host header).
|
||||||
@@ -220,7 +248,7 @@ pub async fn get_reconstruction(
|
|||||||
Err((status, msg)) => return (status, msg).into_response(),
|
Err((status, msg)) => return (status, msg).into_response(),
|
||||||
};
|
};
|
||||||
|
|
||||||
match state.client.get_reconstruction(&file_id, range).await {
|
match state.client.get_reconstruction_v1(&file_id, range).await {
|
||||||
Ok(Some(mut response)) => {
|
Ok(Some(mut response)) => {
|
||||||
transform_fetch_info_urls(&mut response.fetch_info, &base_url);
|
transform_fetch_info_urls(&mut response.fetch_info, &base_url);
|
||||||
Json(response).into_response()
|
Json(response).into_response()
|
||||||
@@ -230,6 +258,78 @@ pub async fn get_reconstruction(
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/// GET /v2/reconstructions/{file_id}
|
||||||
|
///
|
||||||
|
/// Returns V2 reconstruction information for a file, including:
|
||||||
|
/// - List of terms (chunks) needed to reconstruct the file
|
||||||
|
/// - Per-xorb fetch descriptors with multi-range URLs
|
||||||
|
///
|
||||||
|
/// Supports Range header for partial file reconstruction.
|
||||||
|
/// URLs in the response point to the /v1/fetch_term endpoint.
|
||||||
|
pub async fn get_reconstruction_v2(
|
||||||
|
State(state): State<ServerState>,
|
||||||
|
Path(HexMerkleHash(file_id)): Path<HexMerkleHash>,
|
||||||
|
headers: HeaderMap,
|
||||||
|
) -> Response {
|
||||||
|
let connection_guard = state.latency_simulation.register_connection().await;
|
||||||
|
if let Some(simulated_error) = connection_guard.simulate_error() {
|
||||||
|
return simulated_error;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Allow testing V1 fallback by simulating V2 endpoint unavailability.
|
||||||
|
let disabled_status = state.client.v2_disabled_status_code();
|
||||||
|
if disabled_status != 0 {
|
||||||
|
let code = StatusCode::from_u16(disabled_status).unwrap_or(StatusCode::NOT_FOUND);
|
||||||
|
return (code, "V2 reconstruction endpoint disabled").into_response();
|
||||||
|
}
|
||||||
|
|
||||||
|
let base_url = get_base_url(&headers);
|
||||||
|
|
||||||
|
let range = match parse_range_header(headers.get(RANGE)) {
|
||||||
|
Ok(Some(FileRangeVariant::Normal(range))) => Some(range),
|
||||||
|
Ok(Some(FileRangeVariant::OpenRHS(start))) => {
|
||||||
|
let file_size = match state.client.get_file_size(&file_id).await {
|
||||||
|
Ok(size) => size,
|
||||||
|
Err(e) => return error_to_response(e),
|
||||||
|
};
|
||||||
|
Some(FileRange::new(start, file_size))
|
||||||
|
},
|
||||||
|
Ok(Some(FileRangeVariant::Suffix(suffix))) => {
|
||||||
|
let file_size = match state.client.get_file_size(&file_id).await {
|
||||||
|
Ok(size) => size,
|
||||||
|
Err(e) => return error_to_response(e),
|
||||||
|
};
|
||||||
|
Some(FileRange::new(file_size.saturating_sub(suffix), file_size))
|
||||||
|
},
|
||||||
|
Ok(None) => None,
|
||||||
|
Err((status, msg)) => return (status, msg).into_response(),
|
||||||
|
};
|
||||||
|
|
||||||
|
match state.client.get_reconstruction_v2(&file_id, range).await {
|
||||||
|
Ok(Some(mut response)) => {
|
||||||
|
transform_v2_xorb_urls(&mut response, &base_url);
|
||||||
|
Json(response).into_response()
|
||||||
|
},
|
||||||
|
Ok(None) => (StatusCode::RANGE_NOT_SATISFIABLE, "Range not satisfiable").into_response(),
|
||||||
|
Err(e) => error_to_response(e),
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Transforms V2 xorb URLs from client-internal format to HTTP URLs.
|
||||||
|
///
|
||||||
|
/// Each `XorbMultiRangeFetch` URL is replaced with an HTTP URL pointing
|
||||||
|
/// to the /v1/fetch_term endpoint. The byte ranges from the V2 response
|
||||||
|
/// are encoded into the term so the endpoint can serve all ranges in one request.
|
||||||
|
fn transform_v2_xorb_urls(response: &mut QueryReconstructionResponseV2, base_url: &str) {
|
||||||
|
for (xorb_hash, fetch_entries) in response.xorbs.iter_mut() {
|
||||||
|
let xorb_hash: MerkleHash = (*xorb_hash).into();
|
||||||
|
for fetch in fetch_entries.iter_mut() {
|
||||||
|
let encoded_term = encode_term_with_ranges(&xorb_hash, &fetch.ranges);
|
||||||
|
fetch.url = format!("{base_url}/v1/fetch_term?term={encoded_term}");
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
/// GET /reconstructions?file_id=...&file_id=...
|
/// GET /reconstructions?file_id=...&file_id=...
|
||||||
///
|
///
|
||||||
/// Batch query for reconstruction information for multiple files using query parameters.
|
/// Batch query for reconstruction information for multiple files using query parameters.
|
||||||
@@ -285,10 +385,12 @@ pub async fn batch_get_reconstruction(
|
|||||||
/// GET /v1/fetch_term?term=<base64_encoded_term>
|
/// GET /v1/fetch_term?term=<base64_encoded_term>
|
||||||
///
|
///
|
||||||
/// Fetches raw XORB data based on an encoded term.
|
/// Fetches raw XORB data based on an encoded term.
|
||||||
/// The term contains the xorb hash. The byte range is specified via HTTP Range header.
|
|
||||||
///
|
///
|
||||||
/// This endpoint is called by RemoteClient when fetching reconstruction terms.
|
/// For V1 terms (hash only), the byte range comes from the HTTP Range header.
|
||||||
/// It returns raw (compressed) bytes that the client will decompress.
|
/// For V2 terms (hash + ranges), all encoded byte ranges are fetched and
|
||||||
|
/// concatenated in order, allowing a single request to serve multi-range blocks.
|
||||||
|
///
|
||||||
|
/// Returns raw (compressed) bytes that the client will decompress.
|
||||||
pub async fn fetch_term(State(state): State<ServerState>, uri: axum::http::Uri, headers: HeaderMap) -> Response {
|
pub async fn fetch_term(State(state): State<ServerState>, uri: axum::http::Uri, headers: HeaderMap) -> Response {
|
||||||
let connection_guard = state.latency_simulation.register_connection().await;
|
let connection_guard = state.latency_simulation.register_connection().await;
|
||||||
if let Some(simulated_error) = connection_guard.simulate_error() {
|
if let Some(simulated_error) = connection_guard.simulate_error() {
|
||||||
@@ -304,13 +406,69 @@ pub async fn fetch_term(State(state): State<ServerState>, uri: axum::http::Uri,
|
|||||||
return (StatusCode::BAD_REQUEST, "Missing 'term' query parameter").into_response();
|
return (StatusCode::BAD_REQUEST, "Missing 'term' query parameter").into_response();
|
||||||
};
|
};
|
||||||
|
|
||||||
let xorb_hash = match decode_term(&term) {
|
let decoded = match decode_term(&term) {
|
||||||
Ok(h) => h,
|
Ok(d) => d,
|
||||||
Err(e) => return (StatusCode::BAD_REQUEST, format!("Invalid term: {e}")).into_response(),
|
Err(e) => return (StatusCode::BAD_REQUEST, format!("Invalid term: {e}")).into_response(),
|
||||||
};
|
};
|
||||||
|
|
||||||
// Get total length of the raw XORB data for Range header handling
|
if !decoded.byte_ranges.is_empty() {
|
||||||
let total_length = match state.client.xorb_raw_length(&xorb_hash).await {
|
// If the client sends a single-range HTTP Range header, serve just that range.
|
||||||
|
// This simulates S3/CDN behavior where the Range header controls the response
|
||||||
|
// regardless of what ranges are encoded in the presigned URL. This is the
|
||||||
|
// common path when ranges are split into single-range requests based on
|
||||||
|
// the multirange thresholds (V2 URLs with individual requests).
|
||||||
|
if let Ok(Some(FileRangeVariant::Normal(range))) = parse_range_header(headers.get(RANGE)) {
|
||||||
|
return match state.client.get_xorb_raw_bytes(&decoded.hash, Some(range)).await {
|
||||||
|
Ok(data) => (StatusCode::PARTIAL_CONTENT, data).into_response(),
|
||||||
|
Err(e) => error_to_response(e),
|
||||||
|
};
|
||||||
|
}
|
||||||
|
|
||||||
|
if decoded.byte_ranges.len() == 1 {
|
||||||
|
let range = &decoded.byte_ranges[0];
|
||||||
|
return match state.client.get_xorb_raw_bytes(&decoded.hash, Some(*range)).await {
|
||||||
|
Ok(data) => (StatusCode::PARTIAL_CONTENT, data).into_response(),
|
||||||
|
Err(e) => error_to_response(e),
|
||||||
|
};
|
||||||
|
}
|
||||||
|
|
||||||
|
// Multiple ranges with no Range header override: return a multipart/byteranges
|
||||||
|
// response (RFC 7233 Section 4.1), matching S3/CloudFront multi-range format.
|
||||||
|
let total_length = match state.client.xorb_raw_length(&decoded.hash).await {
|
||||||
|
Ok(len) => len,
|
||||||
|
Err(e) => return error_to_response(e),
|
||||||
|
};
|
||||||
|
|
||||||
|
let boundary = "xet_multipart_boundary";
|
||||||
|
let mut response_body = Vec::new();
|
||||||
|
|
||||||
|
for range in &decoded.byte_ranges {
|
||||||
|
let data = match state.client.get_xorb_raw_bytes(&decoded.hash, Some(*range)).await {
|
||||||
|
Ok(d) => d,
|
||||||
|
Err(e) => return error_to_response(e),
|
||||||
|
};
|
||||||
|
// FileRange uses exclusive end; Content-Range header uses inclusive end.
|
||||||
|
let inclusive_end = range.end.saturating_sub(1);
|
||||||
|
let part_header = format!(
|
||||||
|
"--{boundary}\r\nContent-Type: application/octet-stream\r\nContent-Range: bytes {}-{}/{total_length}\r\n\r\n",
|
||||||
|
range.start, inclusive_end
|
||||||
|
);
|
||||||
|
response_body.extend_from_slice(part_header.as_bytes());
|
||||||
|
response_body.extend_from_slice(&data);
|
||||||
|
response_body.extend_from_slice(b"\r\n");
|
||||||
|
}
|
||||||
|
response_body.extend_from_slice(format!("--{boundary}--\r\n").as_bytes());
|
||||||
|
|
||||||
|
let content_type = format!("multipart/byteranges; boundary={boundary}");
|
||||||
|
let mut headers = HeaderMap::new();
|
||||||
|
headers.insert(http::header::CONTENT_TYPE, HeaderValue::from_str(&content_type).unwrap());
|
||||||
|
|
||||||
|
return (StatusCode::PARTIAL_CONTENT, headers, Bytes::from(response_body)).into_response();
|
||||||
|
}
|
||||||
|
|
||||||
|
// V1 term: byte range comes from the HTTP Range header.
|
||||||
|
// Get total length of the raw XORB data for Range header handling.
|
||||||
|
let total_length = match state.client.xorb_raw_length(&decoded.hash).await {
|
||||||
Ok(len) => len,
|
Ok(len) => len,
|
||||||
Err(e) => return error_to_response(e),
|
Err(e) => return error_to_response(e),
|
||||||
};
|
};
|
||||||
@@ -327,7 +485,7 @@ pub async fn fetch_term(State(state): State<ServerState>, uri: axum::http::Uri,
|
|||||||
};
|
};
|
||||||
|
|
||||||
// Fetch raw (serialized/compressed) bytes from the XORB
|
// Fetch raw (serialized/compressed) bytes from the XORB
|
||||||
match state.client.get_xorb_raw_bytes(&xorb_hash, byte_range).await {
|
match state.client.get_xorb_raw_bytes(&decoded.hash, byte_range).await {
|
||||||
Ok(data) => (StatusCode::OK, data).into_response(),
|
Ok(data) => (StatusCode::OK, data).into_response(),
|
||||||
Err(e) => error_to_response(e),
|
Err(e) => error_to_response(e),
|
||||||
}
|
}
|
||||||
@@ -713,9 +871,33 @@ mod tests {
|
|||||||
let xorb_hash = MerkleHash::from_hex(&format!("{:0>64}", "abc123")).unwrap();
|
let xorb_hash = MerkleHash::from_hex(&format!("{:0>64}", "abc123")).unwrap();
|
||||||
|
|
||||||
let encoded = encode_term(&xorb_hash);
|
let encoded = encode_term(&xorb_hash);
|
||||||
let decoded_hash = decode_term(&encoded).unwrap();
|
let decoded = decode_term(&encoded).unwrap();
|
||||||
|
assert_eq!(decoded.hash, xorb_hash);
|
||||||
|
assert!(decoded.byte_ranges.is_empty());
|
||||||
|
}
|
||||||
|
|
||||||
assert_eq!(decoded_hash, xorb_hash);
|
#[test]
|
||||||
|
fn test_encode_decode_term_with_ranges() {
|
||||||
|
use crate::cas_types::{ChunkRange, HttpRange, XorbRangeDescriptor};
|
||||||
|
|
||||||
|
let xorb_hash = MerkleHash::from_hex(&format!("{:0>64}", "abc123")).unwrap();
|
||||||
|
let ranges = vec![
|
||||||
|
XorbRangeDescriptor {
|
||||||
|
chunks: ChunkRange::new(0, 3),
|
||||||
|
bytes: HttpRange::new(0, 1023),
|
||||||
|
},
|
||||||
|
XorbRangeDescriptor {
|
||||||
|
chunks: ChunkRange::new(5, 8),
|
||||||
|
bytes: HttpRange::new(2048, 4095),
|
||||||
|
},
|
||||||
|
];
|
||||||
|
|
||||||
|
let encoded = encode_term_with_ranges(&xorb_hash, &ranges);
|
||||||
|
let decoded = decode_term(&encoded).unwrap();
|
||||||
|
assert_eq!(decoded.hash, xorb_hash);
|
||||||
|
assert_eq!(decoded.byte_ranges.len(), 2);
|
||||||
|
assert_eq!(decoded.byte_ranges[0], FileRange::new(0, 1024));
|
||||||
|
assert_eq!(decoded.byte_ranges[1], FileRange::new(2048, 4096));
|
||||||
}
|
}
|
||||||
|
|
||||||
#[test]
|
#[test]
|
||||||
|
|||||||
@@ -177,6 +177,7 @@ impl LocalServer {
|
|||||||
.route("/get_xorb/{prefix}/{hash}/", get(handlers::get_file_term_data))
|
.route("/get_xorb/{prefix}/{hash}/", get(handlers::get_file_term_data))
|
||||||
.route("/fetch_term", get(handlers::fetch_term)),
|
.route("/fetch_term", get(handlers::fetch_term)),
|
||||||
)
|
)
|
||||||
|
.nest("/v2", Router::new().route("/reconstructions/{file_id}", get(handlers::get_reconstruction_v2)))
|
||||||
.nest(
|
.nest(
|
||||||
"/simulation",
|
"/simulation",
|
||||||
super::simulation_handlers::simulation_routes()
|
super::simulation_handlers::simulation_routes()
|
||||||
@@ -425,7 +426,7 @@ impl Client for LocalTestServer {
|
|||||||
&self,
|
&self,
|
||||||
file_id: &xet_core_structures::merklehash::MerkleHash,
|
file_id: &xet_core_structures::merklehash::MerkleHash,
|
||||||
bytes_range: Option<crate::cas_types::FileRange>,
|
bytes_range: Option<crate::cas_types::FileRange>,
|
||||||
) -> Result<Option<crate::cas_types::QueryReconstructionResponse>> {
|
) -> Result<Option<crate::cas_types::QueryReconstructionResponseV2>> {
|
||||||
self.remote_client.get_reconstruction(file_id, bytes_range).await
|
self.remote_client.get_reconstruction(file_id, bytes_range).await
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -492,6 +493,34 @@ impl DirectAccessClient for LocalTestServer {
|
|||||||
self.client.set_fetch_term_url_expiration(expiration);
|
self.client.set_fetch_term_url_expiration(expiration);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
fn set_max_ranges_per_fetch(&self, max_ranges: usize) {
|
||||||
|
self.client.set_max_ranges_per_fetch(max_ranges);
|
||||||
|
}
|
||||||
|
|
||||||
|
fn disable_v2_reconstruction(&self, status_code: u16) {
|
||||||
|
self.client.disable_v2_reconstruction(status_code);
|
||||||
|
}
|
||||||
|
|
||||||
|
fn v2_disabled_status_code(&self) -> u16 {
|
||||||
|
self.client.v2_disabled_status_code()
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn get_reconstruction_v1(
|
||||||
|
&self,
|
||||||
|
file_id: &xet_core_structures::merklehash::MerkleHash,
|
||||||
|
bytes_range: Option<crate::cas_types::FileRange>,
|
||||||
|
) -> Result<Option<crate::cas_types::QueryReconstructionResponse>> {
|
||||||
|
self.remote_client.get_reconstruction_v1(file_id, bytes_range).await
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn get_reconstruction_v2(
|
||||||
|
&self,
|
||||||
|
file_id: &xet_core_structures::merklehash::MerkleHash,
|
||||||
|
bytes_range: Option<crate::cas_types::FileRange>,
|
||||||
|
) -> Result<Option<crate::cas_types::QueryReconstructionResponseV2>> {
|
||||||
|
self.remote_client.get_reconstruction_v2(file_id, bytes_range).await
|
||||||
|
}
|
||||||
|
|
||||||
fn set_api_delay_range(&self, delay_range: Option<std::ops::Range<std::time::Duration>>) {
|
fn set_api_delay_range(&self, delay_range: Option<std::ops::Range<std::time::Duration>>) {
|
||||||
self.client.set_api_delay_range(delay_range);
|
self.client.set_api_delay_range(delay_range);
|
||||||
}
|
}
|
||||||
@@ -588,7 +617,7 @@ mod tests {
|
|||||||
use crate::cas_client::simulation::client_testing_utils::ClientTestingUtils;
|
use crate::cas_client::simulation::client_testing_utils::ClientTestingUtils;
|
||||||
use crate::cas_client::simulation::local_server::SimulationControlClient;
|
use crate::cas_client::simulation::local_server::SimulationControlClient;
|
||||||
use crate::cas_client::simulation::{DeletionControlableClient, DirectAccessClient};
|
use crate::cas_client::simulation::{DeletionControlableClient, DirectAccessClient};
|
||||||
use crate::cas_types::FileRange;
|
use crate::cas_types::{FileRange, QueryReconstructionResponseV2};
|
||||||
|
|
||||||
const CHUNK_SIZE: usize = 123;
|
const CHUNK_SIZE: usize = 123;
|
||||||
|
|
||||||
@@ -604,16 +633,16 @@ mod tests {
|
|||||||
let local_data = server.client().get_file_data(&file.file_hash, None).await.unwrap();
|
let local_data = server.client().get_file_data(&file.file_hash, None).await.unwrap();
|
||||||
assert_eq!(file.data, local_data);
|
assert_eq!(file.data, local_data);
|
||||||
|
|
||||||
// Full file reconstruction - compare remote and local
|
// Full file reconstruction - compare remote and local (V1)
|
||||||
let remote_recon = server
|
let remote_recon = server
|
||||||
.remote_client()
|
.remote_client()
|
||||||
.get_reconstruction(&file.file_hash, None)
|
.get_reconstruction_v1(&file.file_hash, None)
|
||||||
.await
|
.await
|
||||||
.unwrap()
|
.unwrap()
|
||||||
.unwrap();
|
.unwrap();
|
||||||
let local_recon = server
|
let local_recon = server
|
||||||
.client()
|
.client()
|
||||||
.get_reconstruction(&file.file_hash, None)
|
.get_reconstruction_v1(&file.file_hash, None)
|
||||||
.await
|
.await
|
||||||
.unwrap()
|
.unwrap()
|
||||||
.unwrap();
|
.unwrap();
|
||||||
@@ -629,7 +658,7 @@ mod tests {
|
|||||||
let range = FileRange::new(file_size / 4, file_size * 3 / 4);
|
let range = FileRange::new(file_size / 4, file_size * 3 / 4);
|
||||||
let range_recon = server
|
let range_recon = server
|
||||||
.remote_client()
|
.remote_client()
|
||||||
.get_reconstruction(&file.file_hash, Some(range))
|
.get_reconstruction_v1(&file.file_hash, Some(range))
|
||||||
.await
|
.await
|
||||||
.unwrap();
|
.unwrap();
|
||||||
assert!(range_recon.is_some());
|
assert!(range_recon.is_some());
|
||||||
@@ -639,7 +668,7 @@ mod tests {
|
|||||||
let multi_file = server.client().upload_random_file(term_spec, CHUNK_SIZE).await.unwrap();
|
let multi_file = server.client().upload_random_file(term_spec, CHUNK_SIZE).await.unwrap();
|
||||||
let multi_recon = server
|
let multi_recon = server
|
||||||
.remote_client()
|
.remote_client()
|
||||||
.get_reconstruction(&multi_file.file_hash, None)
|
.get_reconstruction_v1(&multi_file.file_hash, None)
|
||||||
.await
|
.await
|
||||||
.unwrap()
|
.unwrap()
|
||||||
.unwrap();
|
.unwrap();
|
||||||
@@ -750,7 +779,7 @@ mod tests {
|
|||||||
// Verify single XORB URLs are HTTP
|
// Verify single XORB URLs are HTTP
|
||||||
let recon1 = server
|
let recon1 = server
|
||||||
.remote_client()
|
.remote_client()
|
||||||
.get_reconstruction(&file1.file_hash, None)
|
.get_reconstruction_v1(&file1.file_hash, None)
|
||||||
.await
|
.await
|
||||||
.unwrap()
|
.unwrap()
|
||||||
.unwrap();
|
.unwrap();
|
||||||
@@ -770,7 +799,7 @@ mod tests {
|
|||||||
// Verify multi-XORB file has HTTP URLs for all XORBs
|
// Verify multi-XORB file has HTTP URLs for all XORBs
|
||||||
let multi_recon = server
|
let multi_recon = server
|
||||||
.remote_client()
|
.remote_client()
|
||||||
.get_reconstruction(&multi_file.file_hash, None)
|
.get_reconstruction_v1(&multi_file.file_hash, None)
|
||||||
.await
|
.await
|
||||||
.unwrap()
|
.unwrap()
|
||||||
.unwrap();
|
.unwrap();
|
||||||
@@ -786,7 +815,7 @@ mod tests {
|
|||||||
let range = FileRange::new(file_size / 4, file_size * 3 / 4);
|
let range = FileRange::new(file_size / 4, file_size * 3 / 4);
|
||||||
let range_recon = server
|
let range_recon = server
|
||||||
.remote_client()
|
.remote_client()
|
||||||
.get_reconstruction(&multi_file.file_hash, Some(range))
|
.get_reconstruction_v1(&multi_file.file_hash, Some(range))
|
||||||
.await
|
.await
|
||||||
.unwrap()
|
.unwrap()
|
||||||
.unwrap();
|
.unwrap();
|
||||||
@@ -817,7 +846,7 @@ mod tests {
|
|||||||
// Get reconstruction via remote client
|
// Get reconstruction via remote client
|
||||||
let recon = server
|
let recon = server
|
||||||
.remote_client()
|
.remote_client()
|
||||||
.get_reconstruction(&file.file_hash, None)
|
.get_reconstruction_v1(&file.file_hash, None)
|
||||||
.await
|
.await
|
||||||
.unwrap()
|
.unwrap()
|
||||||
.unwrap();
|
.unwrap();
|
||||||
@@ -841,7 +870,7 @@ mod tests {
|
|||||||
// Get reconstruction
|
// Get reconstruction
|
||||||
let recon = server
|
let recon = server
|
||||||
.remote_client()
|
.remote_client()
|
||||||
.get_reconstruction(&file.file_hash, None)
|
.get_reconstruction_v1(&file.file_hash, None)
|
||||||
.await
|
.await
|
||||||
.unwrap()
|
.unwrap()
|
||||||
.unwrap();
|
.unwrap();
|
||||||
@@ -906,6 +935,241 @@ mod tests {
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/// Tests V2 reconstruction endpoint returns valid responses through the server.
|
||||||
|
async fn check_v2_reconstruction(server: &LocalTestServer) {
|
||||||
|
let file = server.client().upload_random_file(&[(1, (0, 5))], CHUNK_SIZE).await.unwrap();
|
||||||
|
|
||||||
|
// Query V2 endpoint via remote client
|
||||||
|
let v2 = server
|
||||||
|
.remote_client()
|
||||||
|
.get_reconstruction_v2(&file.file_hash, None)
|
||||||
|
.await
|
||||||
|
.unwrap()
|
||||||
|
.unwrap();
|
||||||
|
|
||||||
|
assert!(!v2.terms.is_empty());
|
||||||
|
assert!(!v2.xorbs.is_empty());
|
||||||
|
assert_eq!(v2.offset_into_first_range, 0);
|
||||||
|
|
||||||
|
// V2 URLs should be HTTP URLs pointing to /v1/fetch_term
|
||||||
|
for fetch_entries in v2.xorbs.values() {
|
||||||
|
for fetch in fetch_entries {
|
||||||
|
assert!(fetch.url.starts_with("http://"), "V2 URL should be HTTP, got: {}", fetch.url);
|
||||||
|
assert!(
|
||||||
|
fetch.url.contains("/v1/fetch_term?term="),
|
||||||
|
"V2 URL should point to fetch_term endpoint, got: {}",
|
||||||
|
fetch.url
|
||||||
|
);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// V2 terms should match V1 terms
|
||||||
|
let v1 = server
|
||||||
|
.remote_client()
|
||||||
|
.get_reconstruction_v1(&file.file_hash, None)
|
||||||
|
.await
|
||||||
|
.unwrap()
|
||||||
|
.unwrap();
|
||||||
|
|
||||||
|
assert_eq!(v1.terms.len(), v2.terms.len());
|
||||||
|
assert_eq!(v1.offset_into_first_range, v2.offset_into_first_range);
|
||||||
|
for (t1, t2) in v1.terms.iter().zip(v2.terms.iter()) {
|
||||||
|
assert_eq!(t1.hash, t2.hash);
|
||||||
|
assert_eq!(t1.range, t2.range);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Tests V2 fetch URLs are fetchable via the /v1/fetch_term endpoint.
|
||||||
|
async fn check_v2_url_transformation(server: &LocalTestServer) {
|
||||||
|
let http_client = reqwest::Client::new();
|
||||||
|
|
||||||
|
let file = server
|
||||||
|
.client()
|
||||||
|
.upload_random_file(&[(1, (0, 3)), (2, (0, 2))], CHUNK_SIZE)
|
||||||
|
.await
|
||||||
|
.unwrap();
|
||||||
|
|
||||||
|
let v2 = server
|
||||||
|
.remote_client()
|
||||||
|
.get_reconstruction_v2(&file.file_hash, None)
|
||||||
|
.await
|
||||||
|
.unwrap()
|
||||||
|
.unwrap();
|
||||||
|
|
||||||
|
for fetch_entries in v2.xorbs.values() {
|
||||||
|
for fetch in fetch_entries {
|
||||||
|
let response = http_client.get(&fetch.url).send().await.unwrap();
|
||||||
|
assert!(
|
||||||
|
response.status().is_success(),
|
||||||
|
"V2 fetch URL should be fetchable: {} (status: {})",
|
||||||
|
fetch.url,
|
||||||
|
response.status()
|
||||||
|
);
|
||||||
|
let data = response.bytes().await.unwrap();
|
||||||
|
assert!(!data.is_empty(), "Fetched data should not be empty");
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Tests V2 with range requests through the server.
|
||||||
|
async fn check_v2_range_reconstruction(server: &LocalTestServer) {
|
||||||
|
let term_spec = &[(1, (0, 3)), (2, (0, 2)), (1, (3, 5))];
|
||||||
|
let file = server.client().upload_random_file(term_spec, CHUNK_SIZE).await.unwrap();
|
||||||
|
let file_size = file.data.len() as u64;
|
||||||
|
|
||||||
|
let range = FileRange::new(file_size / 4, file_size * 3 / 4);
|
||||||
|
let v2 = server
|
||||||
|
.remote_client()
|
||||||
|
.get_reconstruction_v2(&file.file_hash, Some(range))
|
||||||
|
.await
|
||||||
|
.unwrap()
|
||||||
|
.unwrap();
|
||||||
|
|
||||||
|
assert!(!v2.terms.is_empty());
|
||||||
|
for fetch_entries in v2.xorbs.values() {
|
||||||
|
for fetch in fetch_entries {
|
||||||
|
assert!(fetch.url.starts_with("http://"));
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Validate open-ended and suffix range variants through the V2 HTTP endpoint.
|
||||||
|
let v2_url = format!("{}/v2/reconstructions/{}", server.endpoint(), file.file_hash.hex());
|
||||||
|
let http_client = reqwest::Client::new();
|
||||||
|
|
||||||
|
let open_rhs: QueryReconstructionResponseV2 = http_client
|
||||||
|
.get(&v2_url)
|
||||||
|
.header(reqwest::header::RANGE, "bytes=100-")
|
||||||
|
.send()
|
||||||
|
.await
|
||||||
|
.unwrap()
|
||||||
|
.error_for_status()
|
||||||
|
.unwrap()
|
||||||
|
.json()
|
||||||
|
.await
|
||||||
|
.unwrap();
|
||||||
|
assert!(!open_rhs.terms.is_empty());
|
||||||
|
|
||||||
|
let suffix: QueryReconstructionResponseV2 = http_client
|
||||||
|
.get(&v2_url)
|
||||||
|
.header(reqwest::header::RANGE, "bytes=-128")
|
||||||
|
.send()
|
||||||
|
.await
|
||||||
|
.unwrap()
|
||||||
|
.error_for_status()
|
||||||
|
.unwrap()
|
||||||
|
.json()
|
||||||
|
.await
|
||||||
|
.unwrap();
|
||||||
|
assert!(!suffix.terms.is_empty());
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Tests V2 max_ranges_per_fetch through the server.
|
||||||
|
async fn check_v2_max_ranges(server: &LocalTestServer) {
|
||||||
|
let term_spec = &[(1, (0, 2)), (2, (0, 1)), (1, (2, 4)), (2, (1, 2)), (1, (4, 6))];
|
||||||
|
let file = server.client().upload_random_file(term_spec, 512).await.unwrap();
|
||||||
|
|
||||||
|
// Set max_ranges_per_fetch to 1
|
||||||
|
server.set_max_ranges_per_fetch(1);
|
||||||
|
|
||||||
|
let v2 = server
|
||||||
|
.client()
|
||||||
|
.get_reconstruction_v2(&file.file_hash, None)
|
||||||
|
.await
|
||||||
|
.unwrap()
|
||||||
|
.unwrap();
|
||||||
|
|
||||||
|
let xorb1_hash: crate::cas_types::HexMerkleHash = file.terms[0].xorb_hash.into();
|
||||||
|
if let Some(desc) = v2.xorbs.get(&xorb1_hash) {
|
||||||
|
for fetch in desc {
|
||||||
|
assert!(fetch.ranges.len() <= 1, "Each fetch should have at most 1 range, got {}", fetch.ranges.len());
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Reset
|
||||||
|
server.set_max_ranges_per_fetch(usize::MAX);
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Verifies that disabling V2 with various status codes causes the V2 endpoint
|
||||||
|
/// to return that code, and that get_reconstruction falls back to V1.
|
||||||
|
async fn check_v2_disabled_fallback(server: &LocalTestServer) {
|
||||||
|
let file = server
|
||||||
|
.remote_client()
|
||||||
|
.upload_random_file(&[(1, (0, 3)), (2, (0, 2))], CHUNK_SIZE)
|
||||||
|
.await
|
||||||
|
.unwrap();
|
||||||
|
|
||||||
|
// V2 should work before disabling.
|
||||||
|
let v2_result = server.remote_client().get_reconstruction_v2(&file.file_hash, None).await;
|
||||||
|
assert!(v2_result.is_ok());
|
||||||
|
|
||||||
|
// Test 501 (Not Implemented) fallback first, before the RemoteClient
|
||||||
|
// caches a V1 preference from a 404 fallback.
|
||||||
|
server.disable_v2_reconstruction(501);
|
||||||
|
|
||||||
|
let v2_result = server.remote_client().get_reconstruction_v2(&file.file_hash, None).await;
|
||||||
|
assert!(v2_result.is_err(), "V2 should return error when disabled with 501");
|
||||||
|
|
||||||
|
// Forced V2 should surface the endpoint error directly with no fallback.
|
||||||
|
let forced_v2 = server
|
||||||
|
.remote_client()
|
||||||
|
.get_reconstruction_with_version_override(&file.file_hash, None, Some(2))
|
||||||
|
.await;
|
||||||
|
assert!(forced_v2.is_err());
|
||||||
|
assert_eq!(forced_v2.unwrap_err().status(), Some(reqwest::StatusCode::NOT_IMPLEMENTED));
|
||||||
|
|
||||||
|
// Forced V1 should continue to succeed when V2 is disabled.
|
||||||
|
let forced_v1 = server
|
||||||
|
.remote_client()
|
||||||
|
.get_reconstruction_with_version_override(&file.file_hash, None, Some(1))
|
||||||
|
.await
|
||||||
|
.unwrap()
|
||||||
|
.unwrap();
|
||||||
|
assert_eq!(forced_v1.terms.len(), 2);
|
||||||
|
|
||||||
|
let result = server
|
||||||
|
.remote_client()
|
||||||
|
.get_reconstruction(&file.file_hash, None)
|
||||||
|
.await
|
||||||
|
.unwrap()
|
||||||
|
.unwrap();
|
||||||
|
assert_eq!(result.terms.len(), 2);
|
||||||
|
|
||||||
|
// Re-enable V2, then test 404 fallback.
|
||||||
|
server.disable_v2_reconstruction(0);
|
||||||
|
|
||||||
|
// Reset the RemoteClient's cached version by making a successful V2 call.
|
||||||
|
let v2_result = server.remote_client().get_reconstruction_v2(&file.file_hash, None).await;
|
||||||
|
assert!(v2_result.is_ok(), "V2 should work again after re-enabling");
|
||||||
|
|
||||||
|
server.disable_v2_reconstruction(404);
|
||||||
|
|
||||||
|
let v2_result = server.remote_client().get_reconstruction_v2(&file.file_hash, None).await;
|
||||||
|
assert!(v2_result.is_err(), "V2 should return error when disabled with 404");
|
||||||
|
|
||||||
|
let forced_v2 = server
|
||||||
|
.remote_client()
|
||||||
|
.get_reconstruction_with_version_override(&file.file_hash, None, Some(2))
|
||||||
|
.await;
|
||||||
|
assert!(forced_v2.is_err());
|
||||||
|
assert_eq!(forced_v2.unwrap_err().status(), Some(reqwest::StatusCode::NOT_FOUND));
|
||||||
|
|
||||||
|
let forced_v1 = server
|
||||||
|
.remote_client()
|
||||||
|
.get_reconstruction_with_version_override(&file.file_hash, None, Some(1))
|
||||||
|
.await
|
||||||
|
.unwrap()
|
||||||
|
.unwrap();
|
||||||
|
assert_eq!(forced_v1.terms.len(), 2);
|
||||||
|
|
||||||
|
let result = server
|
||||||
|
.remote_client()
|
||||||
|
.get_reconstruction(&file.file_hash, None)
|
||||||
|
.await
|
||||||
|
.unwrap()
|
||||||
|
.unwrap();
|
||||||
|
assert_eq!(result.terms.len(), 2);
|
||||||
|
}
|
||||||
|
|
||||||
/// Runs all server checks for a given test server instance.
|
/// Runs all server checks for a given test server instance.
|
||||||
async fn run_all_server_checks(server: &LocalTestServer) {
|
async fn run_all_server_checks(server: &LocalTestServer) {
|
||||||
check_basic_correctness(server).await;
|
check_basic_correctness(server).await;
|
||||||
@@ -915,6 +1179,11 @@ mod tests {
|
|||||||
check_downloaded_terms_match_expected_data(server).await;
|
check_downloaded_terms_match_expected_data(server).await;
|
||||||
check_complete_file_reconstruction(server).await;
|
check_complete_file_reconstruction(server).await;
|
||||||
check_chunk_hashes_correctness(server).await;
|
check_chunk_hashes_correctness(server).await;
|
||||||
|
check_v2_reconstruction(server).await;
|
||||||
|
check_v2_url_transformation(server).await;
|
||||||
|
check_v2_range_reconstruction(server).await;
|
||||||
|
check_v2_max_ranges(server).await;
|
||||||
|
check_v2_disabled_fallback(server).await;
|
||||||
}
|
}
|
||||||
|
|
||||||
async fn all_file_hashes(client: &LocalClient) -> HashSet<MerkleHash> {
|
async fn all_file_hashes(client: &LocalClient) -> HashSet<MerkleHash> {
|
||||||
|
|||||||
@@ -17,7 +17,7 @@ use crate::cas_client::RemoteClient;
|
|||||||
use crate::cas_client::error::{CasClientError, Result};
|
use crate::cas_client::error::{CasClientError, Result};
|
||||||
use crate::cas_client::interface::Client;
|
use crate::cas_client::interface::Client;
|
||||||
use crate::cas_client::simulation::{DeletionControlableClient, DirectAccessClient};
|
use crate::cas_client::simulation::{DeletionControlableClient, DirectAccessClient};
|
||||||
use crate::cas_types::{FileRange, HexMerkleHash, XorbReconstructionFetchInfo};
|
use crate::cas_types::{FileRange, HexMerkleHash, QueryReconstructionResponseV2, XorbReconstructionFetchInfo};
|
||||||
|
|
||||||
/// A client that connects to a `LocalTestServer` via HTTP and provides access
|
/// A client that connects to a `LocalTestServer` via HTTP and provides access
|
||||||
/// to both `DirectAccessClient` and `DeletionControlableClient` operations
|
/// to both `DirectAccessClient` and `DeletionControlableClient` operations
|
||||||
@@ -91,7 +91,7 @@ impl Client for SimulationControlClient {
|
|||||||
&self,
|
&self,
|
||||||
file_id: &MerkleHash,
|
file_id: &MerkleHash,
|
||||||
bytes_range: Option<FileRange>,
|
bytes_range: Option<FileRange>,
|
||||||
) -> Result<Option<crate::cas_types::QueryReconstructionResponse>> {
|
) -> Result<Option<QueryReconstructionResponseV2>> {
|
||||||
self.remote_client.get_reconstruction(file_id, bytes_range).await
|
self.remote_client.get_reconstruction(file_id, bytes_range).await
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -172,6 +172,30 @@ impl DirectAccessClient for SimulationControlClient {
|
|||||||
// No-op: delays are applied server-side via set_api_delay_range
|
// No-op: delays are applied server-side via set_api_delay_range
|
||||||
}
|
}
|
||||||
|
|
||||||
|
fn set_max_ranges_per_fetch(&self, _max_ranges: usize) {
|
||||||
|
// No-op: SimulationControlClient configures server via HTTP; endpoint not yet implemented.
|
||||||
|
}
|
||||||
|
|
||||||
|
fn disable_v2_reconstruction(&self, _status_code: u16) {
|
||||||
|
// No-op: SimulationControlClient configures server via HTTP; endpoint not yet implemented.
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn get_reconstruction_v1(
|
||||||
|
&self,
|
||||||
|
file_id: &MerkleHash,
|
||||||
|
bytes_range: Option<FileRange>,
|
||||||
|
) -> Result<Option<crate::cas_types::QueryReconstructionResponse>> {
|
||||||
|
self.remote_client.get_reconstruction_v1(file_id, bytes_range).await
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn get_reconstruction_v2(
|
||||||
|
&self,
|
||||||
|
file_id: &MerkleHash,
|
||||||
|
bytes_range: Option<FileRange>,
|
||||||
|
) -> Result<Option<QueryReconstructionResponseV2>> {
|
||||||
|
self.remote_client.get_reconstruction_v2(file_id, bytes_range).await
|
||||||
|
}
|
||||||
|
|
||||||
/// Sets the API delay range via the `/simulation/config/api_delay` endpoint.
|
/// Sets the API delay range via the `/simulation/config/api_delay` endpoint.
|
||||||
fn set_api_delay_range(&self, delay_range: Option<Range<Duration>>) {
|
fn set_api_delay_range(&self, delay_range: Option<Range<Duration>>) {
|
||||||
let url = self.sim_url("/config/api_delay");
|
let url = self.sim_url("/config/api_delay");
|
||||||
|
|||||||
@@ -2,11 +2,10 @@ use std::collections::HashMap;
|
|||||||
use std::io::{BufReader, Cursor};
|
use std::io::{BufReader, Cursor};
|
||||||
use std::ops::Range;
|
use std::ops::Range;
|
||||||
use std::sync::Arc;
|
use std::sync::Arc;
|
||||||
use std::sync::atomic::{AtomicU64, Ordering};
|
use std::sync::atomic::{AtomicU16, AtomicU64, AtomicUsize, Ordering};
|
||||||
|
|
||||||
use async_trait::async_trait;
|
use async_trait::async_trait;
|
||||||
use bytes::Bytes;
|
use bytes::Bytes;
|
||||||
use more_asserts::{assert_ge, assert_gt, debug_assert_lt};
|
|
||||||
use rand::Rng;
|
use rand::Rng;
|
||||||
use tokio::sync::RwLock;
|
use tokio::sync::RwLock;
|
||||||
use tokio::time::{Duration, Instant};
|
use tokio::time::{Duration, Instant};
|
||||||
@@ -26,21 +25,12 @@ use super::super::progress_tracked_streams::ProgressCallback;
|
|||||||
use super::client_testing_utils::{FileTermReference, RandomFileContents};
|
use super::client_testing_utils::{FileTermReference, RandomFileContents};
|
||||||
use super::direct_access_client::DirectAccessClient;
|
use super::direct_access_client::DirectAccessClient;
|
||||||
use super::random_xorb::RandomXorb;
|
use super::random_xorb::RandomXorb;
|
||||||
|
use super::xorb_utils::{self, REFERENCE_INSTANT};
|
||||||
use crate::cas_types::{
|
use crate::cas_types::{
|
||||||
BatchQueryReconstructionResponse, ChunkRange, FileRange, HexMerkleHash, HttpRange, QueryReconstructionResponse,
|
BatchQueryReconstructionResponse, FileRange, HexMerkleHash, HttpRange, QueryReconstructionResponse,
|
||||||
XorbReconstructionFetchInfo, XorbReconstructionTerm,
|
QueryReconstructionResponseV2, XorbMultiRangeFetch, XorbRangeDescriptor, XorbReconstructionFetchInfo,
|
||||||
};
|
};
|
||||||
|
|
||||||
lazy_static::lazy_static! {
|
|
||||||
/// Reference instant for URL timestamps. Initialized far in the past to allow
|
|
||||||
/// testing timestamps that are earlier in the current process lifetime.
|
|
||||||
static ref REFERENCE_INSTANT: Instant = {
|
|
||||||
let now = Instant::now();
|
|
||||||
now.checked_sub(Duration::from_secs(365 * 24 * 60 * 60))
|
|
||||||
.unwrap_or(now)
|
|
||||||
};
|
|
||||||
}
|
|
||||||
|
|
||||||
/// Stored XORB data: the serialized data and the deserialized XorbObject (header/footer).
|
/// Stored XORB data: the serialized data and the deserialized XorbObject (header/footer).
|
||||||
struct MaterializedXorb {
|
struct MaterializedXorb {
|
||||||
serialized_data: Bytes,
|
serialized_data: Bytes,
|
||||||
@@ -69,6 +59,10 @@ pub struct MemoryClient {
|
|||||||
url_expiration_ms: AtomicU64,
|
url_expiration_ms: AtomicU64,
|
||||||
/// API delay range in milliseconds as (min_ms, max_ms). (0, 0) means disabled.
|
/// API delay range in milliseconds as (min_ms, max_ms). (0, 0) means disabled.
|
||||||
random_ms_delay_window: (AtomicU64, AtomicU64),
|
random_ms_delay_window: (AtomicU64, AtomicU64),
|
||||||
|
/// Max ranges per XorbMultiRangeFetch entry. usize::MAX means no splitting.
|
||||||
|
max_ranges_per_fetch: AtomicUsize,
|
||||||
|
/// HTTP status code to return when V2 is disabled (0 = enabled).
|
||||||
|
v2_disabled_status: AtomicU16,
|
||||||
}
|
}
|
||||||
|
|
||||||
impl MemoryClient {
|
impl MemoryClient {
|
||||||
@@ -81,6 +75,8 @@ impl MemoryClient {
|
|||||||
upload_concurrency_controller: AdaptiveConcurrencyController::new_upload("memory_uploads"),
|
upload_concurrency_controller: AdaptiveConcurrencyController::new_upload("memory_uploads"),
|
||||||
url_expiration_ms: AtomicU64::new(u64::MAX),
|
url_expiration_ms: AtomicU64::new(u64::MAX),
|
||||||
random_ms_delay_window: (AtomicU64::new(0), AtomicU64::new(0)),
|
random_ms_delay_window: (AtomicU64::new(0), AtomicU64::new(0)),
|
||||||
|
max_ranges_per_fetch: AtomicUsize::new(usize::MAX),
|
||||||
|
v2_disabled_status: AtomicU16::new(0),
|
||||||
})
|
})
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -225,6 +221,8 @@ impl Default for MemoryClient {
|
|||||||
upload_concurrency_controller: AdaptiveConcurrencyController::new_upload("memory_uploads"),
|
upload_concurrency_controller: AdaptiveConcurrencyController::new_upload("memory_uploads"),
|
||||||
url_expiration_ms: AtomicU64::new(u64::MAX),
|
url_expiration_ms: AtomicU64::new(u64::MAX),
|
||||||
random_ms_delay_window: (AtomicU64::new(0), AtomicU64::new(0)),
|
random_ms_delay_window: (AtomicU64::new(0), AtomicU64::new(0)),
|
||||||
|
max_ranges_per_fetch: AtomicUsize::new(usize::MAX),
|
||||||
|
v2_disabled_status: AtomicU16::new(0),
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
@@ -236,6 +234,34 @@ impl DirectAccessClient for MemoryClient {
|
|||||||
self.url_expiration_ms.store(expiration.as_millis() as u64, Ordering::Relaxed);
|
self.url_expiration_ms.store(expiration.as_millis() as u64, Ordering::Relaxed);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
fn set_max_ranges_per_fetch(&self, max_ranges: usize) {
|
||||||
|
self.max_ranges_per_fetch.store(max_ranges, Ordering::Relaxed);
|
||||||
|
}
|
||||||
|
|
||||||
|
fn disable_v2_reconstruction(&self, status_code: u16) {
|
||||||
|
self.v2_disabled_status.store(status_code, Ordering::Relaxed);
|
||||||
|
}
|
||||||
|
|
||||||
|
fn v2_disabled_status_code(&self) -> u16 {
|
||||||
|
self.v2_disabled_status.load(Ordering::Relaxed)
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn get_reconstruction_v1(
|
||||||
|
&self,
|
||||||
|
file_id: &MerkleHash,
|
||||||
|
bytes_range: Option<FileRange>,
|
||||||
|
) -> Result<Option<QueryReconstructionResponse>> {
|
||||||
|
MemoryClient::get_reconstruction_v1(self, file_id, bytes_range).await
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn get_reconstruction_v2(
|
||||||
|
&self,
|
||||||
|
file_id: &MerkleHash,
|
||||||
|
bytes_range: Option<FileRange>,
|
||||||
|
) -> Result<Option<QueryReconstructionResponseV2>> {
|
||||||
|
MemoryClient::get_reconstruction_v2(self, file_id, bytes_range).await
|
||||||
|
}
|
||||||
|
|
||||||
fn set_api_delay_range(&self, delay_range: Option<Range<Duration>>) {
|
fn set_api_delay_range(&self, delay_range: Option<Range<Duration>>) {
|
||||||
match delay_range {
|
match delay_range {
|
||||||
Some(range) => {
|
Some(range) => {
|
||||||
@@ -514,6 +540,130 @@ impl DirectAccessClient for MemoryClient {
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
impl MemoryClient {
|
||||||
|
async fn compute_reconstruction_ranges(
|
||||||
|
&self,
|
||||||
|
file_id: &MerkleHash,
|
||||||
|
bytes_range: Option<FileRange>,
|
||||||
|
) -> Result<xorb_utils::ReconstructionRangesResult> {
|
||||||
|
let file_info = {
|
||||||
|
let shard = self.shard.read().await;
|
||||||
|
match shard.get_file_reconstruction_info(file_id) {
|
||||||
|
Some(fi) => fi,
|
||||||
|
None => return Ok(None),
|
||||||
|
}
|
||||||
|
};
|
||||||
|
|
||||||
|
let xorbs = self.xorbs.read().await;
|
||||||
|
xorb_utils::compute_reconstruction_ranges(&file_info, bytes_range, &mut |hash| {
|
||||||
|
let storage = xorbs.get(hash).ok_or_else(|| {
|
||||||
|
error!("Unable to find xorb in memory CAS {:?}", hash);
|
||||||
|
CasClientError::XORBNotFound(*hash)
|
||||||
|
})?;
|
||||||
|
Ok(match storage {
|
||||||
|
XorbStorage::Materialized(entry) => entry.xorb_object.clone(),
|
||||||
|
XorbStorage::Random(xorb) => xorb.get_xorb_object(),
|
||||||
|
})
|
||||||
|
})
|
||||||
|
}
|
||||||
|
|
||||||
|
/// V1 reconstruction: returns per-range presigned URLs.
|
||||||
|
pub async fn get_reconstruction_v1(
|
||||||
|
&self,
|
||||||
|
file_id: &MerkleHash,
|
||||||
|
bytes_range: Option<FileRange>,
|
||||||
|
) -> Result<Option<QueryReconstructionResponse>> {
|
||||||
|
self.apply_api_delay().await;
|
||||||
|
|
||||||
|
let result = self.compute_reconstruction_ranges(file_id, bytes_range).await?;
|
||||||
|
let Some((offset_into_first_range, terms, merged_ranges)) = result else {
|
||||||
|
return Ok(None);
|
||||||
|
};
|
||||||
|
|
||||||
|
if terms.is_empty() {
|
||||||
|
return Ok(Some(QueryReconstructionResponse {
|
||||||
|
offset_into_first_range,
|
||||||
|
terms,
|
||||||
|
fetch_info: HashMap::new(),
|
||||||
|
}));
|
||||||
|
}
|
||||||
|
|
||||||
|
let timestamp = Instant::now();
|
||||||
|
let mut fetch_info: HashMap<HexMerkleHash, Vec<XorbReconstructionFetchInfo>> = HashMap::new();
|
||||||
|
for (hash, ranges) in merged_ranges {
|
||||||
|
let entries = ranges
|
||||||
|
.into_iter()
|
||||||
|
.map(|r| XorbReconstructionFetchInfo {
|
||||||
|
range: r.chunk_range,
|
||||||
|
url: generate_fetch_url(&hash, &r.byte_range, timestamp),
|
||||||
|
url_range: HttpRange::from(r.byte_range),
|
||||||
|
})
|
||||||
|
.collect();
|
||||||
|
fetch_info.insert(hash.into(), entries);
|
||||||
|
}
|
||||||
|
|
||||||
|
Ok(Some(QueryReconstructionResponse {
|
||||||
|
offset_into_first_range,
|
||||||
|
terms,
|
||||||
|
fetch_info,
|
||||||
|
}))
|
||||||
|
}
|
||||||
|
|
||||||
|
/// V2 reconstruction: returns per-xorb multi-range fetch descriptors.
|
||||||
|
pub async fn get_reconstruction_v2(
|
||||||
|
&self,
|
||||||
|
file_id: &MerkleHash,
|
||||||
|
bytes_range: Option<FileRange>,
|
||||||
|
) -> Result<Option<QueryReconstructionResponseV2>> {
|
||||||
|
self.apply_api_delay().await;
|
||||||
|
|
||||||
|
let result = self.compute_reconstruction_ranges(file_id, bytes_range).await?;
|
||||||
|
let Some((offset_into_first_range, terms, merged_ranges)) = result else {
|
||||||
|
return Ok(None);
|
||||||
|
};
|
||||||
|
|
||||||
|
if terms.is_empty() {
|
||||||
|
return Ok(Some(QueryReconstructionResponseV2 {
|
||||||
|
offset_into_first_range,
|
||||||
|
terms,
|
||||||
|
xorbs: HashMap::new(),
|
||||||
|
}));
|
||||||
|
}
|
||||||
|
|
||||||
|
let timestamp = Instant::now();
|
||||||
|
let max_ranges = self.max_ranges_per_fetch.load(Ordering::Relaxed);
|
||||||
|
|
||||||
|
let mut xorbs: HashMap<HexMerkleHash, Vec<XorbMultiRangeFetch>> = HashMap::new();
|
||||||
|
for (hash, ranges) in merged_ranges {
|
||||||
|
let mut fetch_entries = Vec::new();
|
||||||
|
|
||||||
|
for chunk in ranges.chunks(max_ranges) {
|
||||||
|
let range_descriptors: Vec<XorbRangeDescriptor> = chunk
|
||||||
|
.iter()
|
||||||
|
.map(|r| XorbRangeDescriptor {
|
||||||
|
chunks: r.chunk_range,
|
||||||
|
bytes: HttpRange::from(r.byte_range),
|
||||||
|
})
|
||||||
|
.collect();
|
||||||
|
|
||||||
|
let url = generate_v2_fetch_url(&hash, &range_descriptors, timestamp);
|
||||||
|
fetch_entries.push(XorbMultiRangeFetch {
|
||||||
|
url,
|
||||||
|
ranges: range_descriptors,
|
||||||
|
});
|
||||||
|
}
|
||||||
|
|
||||||
|
xorbs.insert(hash.into(), fetch_entries);
|
||||||
|
}
|
||||||
|
|
||||||
|
Ok(Some(QueryReconstructionResponseV2 {
|
||||||
|
offset_into_first_range,
|
||||||
|
terms,
|
||||||
|
xorbs,
|
||||||
|
}))
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
#[cfg_attr(not(target_family = "wasm"), async_trait)]
|
#[cfg_attr(not(target_family = "wasm"), async_trait)]
|
||||||
#[cfg_attr(target_family = "wasm", async_trait(?Send))]
|
#[cfg_attr(target_family = "wasm", async_trait(?Send))]
|
||||||
impl Client for MemoryClient {
|
impl Client for MemoryClient {
|
||||||
@@ -651,194 +801,8 @@ impl Client for MemoryClient {
|
|||||||
&self,
|
&self,
|
||||||
file_id: &MerkleHash,
|
file_id: &MerkleHash,
|
||||||
bytes_range: Option<FileRange>,
|
bytes_range: Option<FileRange>,
|
||||||
) -> Result<Option<QueryReconstructionResponse>> {
|
) -> Result<Option<QueryReconstructionResponseV2>> {
|
||||||
self.apply_api_delay().await;
|
self.get_reconstruction_v2(file_id, bytes_range).await
|
||||||
let file_info = {
|
|
||||||
let shard = self.shard.read().await;
|
|
||||||
match shard.get_file_reconstruction_info(file_id) {
|
|
||||||
Some(fi) => fi,
|
|
||||||
None => return Ok(None),
|
|
||||||
}
|
|
||||||
};
|
|
||||||
|
|
||||||
let total_file_size: u64 = file_info.file_size();
|
|
||||||
|
|
||||||
// Handle range validation and truncation
|
|
||||||
let file_range = if let Some(range) = bytes_range {
|
|
||||||
// If the entire range is out of bounds, return None (like RemoteClient does for 416)
|
|
||||||
if range.start >= total_file_size {
|
|
||||||
// For empty files (size 0), only the first query (start == 0) should return the empty reconstruction
|
|
||||||
// All subsequent queries should return None to prevent infinite remainder loops
|
|
||||||
if total_file_size == 0 && range.start == 0 {
|
|
||||||
// Empty file - return valid but empty reconstruction
|
|
||||||
return Ok(Some(QueryReconstructionResponse {
|
|
||||||
offset_into_first_range: 0,
|
|
||||||
terms: vec![],
|
|
||||||
fetch_info: HashMap::new(),
|
|
||||||
}));
|
|
||||||
}
|
|
||||||
return Ok(None);
|
|
||||||
}
|
|
||||||
FileRange::new(range.start, range.end.min(total_file_size))
|
|
||||||
} else {
|
|
||||||
// No range specified - handle empty files
|
|
||||||
if total_file_size == 0 {
|
|
||||||
return Ok(Some(QueryReconstructionResponse {
|
|
||||||
offset_into_first_range: 0,
|
|
||||||
terms: vec![],
|
|
||||||
fetch_info: HashMap::new(),
|
|
||||||
}));
|
|
||||||
}
|
|
||||||
FileRange::full()
|
|
||||||
};
|
|
||||||
|
|
||||||
// Find the first segment that contains bytes in our range
|
|
||||||
let mut s_idx = 0;
|
|
||||||
let mut cumulative_bytes = 0u64;
|
|
||||||
let mut first_chunk_byte_start;
|
|
||||||
|
|
||||||
loop {
|
|
||||||
if s_idx >= file_info.segments.len() {
|
|
||||||
return Err(CasClientError::InvalidRange);
|
|
||||||
}
|
|
||||||
|
|
||||||
let n = file_info.segments[s_idx].unpacked_segment_bytes as u64;
|
|
||||||
if cumulative_bytes + n > file_range.start {
|
|
||||||
assert_ge!(file_range.start, cumulative_bytes);
|
|
||||||
first_chunk_byte_start = cumulative_bytes;
|
|
||||||
break;
|
|
||||||
} else {
|
|
||||||
cumulative_bytes += n;
|
|
||||||
s_idx += 1;
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
let mut terms = Vec::new();
|
|
||||||
|
|
||||||
#[derive(Clone)]
|
|
||||||
struct FetchInfoIntermediate {
|
|
||||||
chunk_range: ChunkRange,
|
|
||||||
byte_range: FileRange,
|
|
||||||
}
|
|
||||||
|
|
||||||
let mut fetch_info_map: MerkleHashMap<Vec<FetchInfoIntermediate>> = MerkleHashMap::new();
|
|
||||||
|
|
||||||
let xorbs = self.xorbs.read().await;
|
|
||||||
|
|
||||||
while s_idx < file_info.segments.len() && cumulative_bytes < file_range.end {
|
|
||||||
let mut segment = file_info.segments[s_idx].clone();
|
|
||||||
let mut chunk_range = ChunkRange::new(segment.chunk_index_start, segment.chunk_index_end);
|
|
||||||
|
|
||||||
let storage = xorbs.get(&segment.xorb_hash).ok_or_else(|| {
|
|
||||||
error!("Unable to find xorb in memory CAS {:?}", segment.xorb_hash);
|
|
||||||
CasClientError::XORBNotFound(segment.xorb_hash)
|
|
||||||
})?;
|
|
||||||
let xorb_footer = match storage {
|
|
||||||
XorbStorage::Materialized(entry) => entry.xorb_object.clone(),
|
|
||||||
XorbStorage::Random(xorb) => xorb.get_xorb_object(),
|
|
||||||
};
|
|
||||||
|
|
||||||
// Prune first segment on chunk boundaries
|
|
||||||
if cumulative_bytes < file_range.start {
|
|
||||||
while chunk_range.start < chunk_range.end {
|
|
||||||
let next_chunk_size = xorb_footer.uncompressed_chunk_length(chunk_range.start)? as u64;
|
|
||||||
|
|
||||||
if cumulative_bytes + next_chunk_size <= file_range.start {
|
|
||||||
cumulative_bytes += next_chunk_size;
|
|
||||||
first_chunk_byte_start += next_chunk_size;
|
|
||||||
segment.unpacked_segment_bytes -= next_chunk_size as u32;
|
|
||||||
chunk_range.start += 1;
|
|
||||||
debug_assert_lt!(chunk_range.start, chunk_range.end);
|
|
||||||
} else {
|
|
||||||
break;
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
// Prune last segment on chunk boundaries
|
|
||||||
if cumulative_bytes + segment.unpacked_segment_bytes as u64 > file_range.end {
|
|
||||||
while chunk_range.end > chunk_range.start {
|
|
||||||
let last_chunk_size = xorb_footer.uncompressed_chunk_length(chunk_range.end - 1)?;
|
|
||||||
|
|
||||||
if cumulative_bytes + (segment.unpacked_segment_bytes - last_chunk_size) as u64 >= file_range.end {
|
|
||||||
chunk_range.end -= 1;
|
|
||||||
segment.unpacked_segment_bytes -= last_chunk_size;
|
|
||||||
debug_assert_lt!(chunk_range.start, chunk_range.end);
|
|
||||||
assert_gt!(segment.unpacked_segment_bytes, 0);
|
|
||||||
} else {
|
|
||||||
break;
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
let (byte_start, byte_end) = xorb_footer.get_byte_offset(chunk_range.start, chunk_range.end)?;
|
|
||||||
let byte_range = FileRange::new(byte_start as u64, byte_end as u64);
|
|
||||||
|
|
||||||
let xorb_reconstruction_term = XorbReconstructionTerm {
|
|
||||||
hash: segment.xorb_hash.into(),
|
|
||||||
unpacked_length: segment.unpacked_segment_bytes,
|
|
||||||
range: chunk_range,
|
|
||||||
};
|
|
||||||
|
|
||||||
terms.push(xorb_reconstruction_term);
|
|
||||||
|
|
||||||
let fetch_info_intermediate = FetchInfoIntermediate {
|
|
||||||
chunk_range,
|
|
||||||
byte_range,
|
|
||||||
};
|
|
||||||
|
|
||||||
fetch_info_map
|
|
||||||
.entry(segment.xorb_hash)
|
|
||||||
.or_default()
|
|
||||||
.push(fetch_info_intermediate);
|
|
||||||
|
|
||||||
cumulative_bytes += segment.unpacked_segment_bytes as u64;
|
|
||||||
s_idx += 1;
|
|
||||||
}
|
|
||||||
|
|
||||||
assert!(!terms.is_empty());
|
|
||||||
|
|
||||||
let timestamp = Instant::now();
|
|
||||||
|
|
||||||
// Sort and merge adjacent/overlapping ranges in each fetch_info Vec
|
|
||||||
let mut merged_fetch_info_map: HashMap<HexMerkleHash, Vec<XorbReconstructionFetchInfo>> = HashMap::new();
|
|
||||||
for (hash, mut fi_vec) in fetch_info_map {
|
|
||||||
fi_vec.sort_by_key(|fi| fi.chunk_range.start);
|
|
||||||
|
|
||||||
let mut merged: Vec<XorbReconstructionFetchInfo> = Vec::new();
|
|
||||||
let mut idx = 0;
|
|
||||||
|
|
||||||
while idx < fi_vec.len() {
|
|
||||||
let mut new_fi = fi_vec[idx].clone();
|
|
||||||
|
|
||||||
while idx + 1 < fi_vec.len() {
|
|
||||||
let next_fi = &fi_vec[idx + 1];
|
|
||||||
if next_fi.chunk_range.start <= new_fi.chunk_range.end {
|
|
||||||
new_fi.chunk_range.end = next_fi.chunk_range.end.max(new_fi.chunk_range.end);
|
|
||||||
new_fi.byte_range.end = next_fi.byte_range.end.max(new_fi.byte_range.end);
|
|
||||||
idx += 1;
|
|
||||||
} else {
|
|
||||||
break;
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
merged.push(XorbReconstructionFetchInfo {
|
|
||||||
range: new_fi.chunk_range,
|
|
||||||
url: generate_fetch_url(&hash, &new_fi.byte_range, timestamp),
|
|
||||||
url_range: HttpRange::from(new_fi.byte_range),
|
|
||||||
});
|
|
||||||
|
|
||||||
idx += 1;
|
|
||||||
}
|
|
||||||
|
|
||||||
merged_fetch_info_map.insert(hash.into(), merged);
|
|
||||||
}
|
|
||||||
|
|
||||||
Ok(Some(QueryReconstructionResponse {
|
|
||||||
offset_into_first_range: file_range.start - first_chunk_byte_start,
|
|
||||||
terms,
|
|
||||||
fetch_info: merged_fetch_info_map,
|
|
||||||
}))
|
|
||||||
}
|
}
|
||||||
|
|
||||||
async fn batch_get_reconstruction(&self, file_ids: &[MerkleHash]) -> Result<BatchQueryReconstructionResponse> {
|
async fn batch_get_reconstruction(&self, file_ids: &[MerkleHash]) -> Result<BatchQueryReconstructionResponse> {
|
||||||
@@ -847,7 +811,7 @@ impl Client for MemoryClient {
|
|||||||
let mut fetch_info_map: HashMap<HexMerkleHash, Vec<XorbReconstructionFetchInfo>> = HashMap::new();
|
let mut fetch_info_map: HashMap<HexMerkleHash, Vec<XorbReconstructionFetchInfo>> = HashMap::new();
|
||||||
|
|
||||||
for file_id in file_ids {
|
for file_id in file_ids {
|
||||||
if let Some(response) = self.get_reconstruction(file_id, None).await? {
|
if let Some(response) = self.get_reconstruction_v1(file_id, None).await? {
|
||||||
let hex_hash: HexMerkleHash = (*file_id).into();
|
let hex_hash: HexMerkleHash = (*file_id).into();
|
||||||
files.insert(hex_hash, response.terms);
|
files.insert(hex_hash, response.terms);
|
||||||
|
|
||||||
@@ -876,8 +840,8 @@ impl Client for MemoryClient {
|
|||||||
uncompressed_size_if_known: Option<usize>,
|
uncompressed_size_if_known: Option<usize>,
|
||||||
) -> Result<(Bytes, Vec<u32>)> {
|
) -> Result<(Bytes, Vec<u32>)> {
|
||||||
self.apply_api_delay().await;
|
self.apply_api_delay().await;
|
||||||
let (url, range) = url_info.retrieve_url().await?;
|
let (url, http_ranges) = url_info.retrieve_url().await?;
|
||||||
let (xorb_hash, _url_byte_range, url_timestamp) = parse_fetch_url(&url)?;
|
let (xorb_hash, url_timestamp) = parse_any_fetch_url(&url)?;
|
||||||
|
|
||||||
// Check if URL has expired
|
// Check if URL has expired
|
||||||
let expiration_ms = self.url_expiration_ms.load(Ordering::Relaxed);
|
let expiration_ms = self.url_expiration_ms.load(Ordering::Relaxed);
|
||||||
@@ -889,36 +853,49 @@ impl Client for MemoryClient {
|
|||||||
let xorbs = self.xorbs.read().await;
|
let xorbs = self.xorbs.read().await;
|
||||||
let storage = xorbs.get(&xorb_hash).ok_or(CasClientError::XORBNotFound(xorb_hash))?;
|
let storage = xorbs.get(&xorb_hash).ok_or(CasClientError::XORBNotFound(xorb_hash))?;
|
||||||
|
|
||||||
// Extract the byte range from the serialized data and deserialize
|
// Extract each byte range from the serialized data and deserialize
|
||||||
let start = range.start as usize;
|
let mut all_decompressed = Vec::new();
|
||||||
let end = range.end as usize + 1; // HttpRange is inclusive end
|
let mut all_chunk_indices = Vec::<u32>::new();
|
||||||
let transfer_len = (end - start) as u64;
|
let mut total_transfer = 0u64;
|
||||||
|
|
||||||
let (decompressed_data, chunk_byte_indices) = match storage {
|
for http_range in &http_ranges {
|
||||||
XorbStorage::Materialized(entry) => {
|
let start = http_range.start as usize;
|
||||||
let range_data = &entry.serialized_data[start..end];
|
let end = http_range.end as usize + 1;
|
||||||
xet_core_structures::xorb_object::deserialize_chunks(&mut Cursor::new(range_data))?
|
total_transfer += http_range.length();
|
||||||
},
|
|
||||||
XorbStorage::Random(xorb) => {
|
let (data, chunk_indices) = match storage {
|
||||||
let range_data = xorb.get_serialized_range(start as u64, end as u64);
|
XorbStorage::Materialized(entry) => {
|
||||||
xet_core_structures::xorb_object::deserialize_chunks(&mut Cursor::new(range_data.as_ref()))?
|
let range_data = &entry.serialized_data[start..end];
|
||||||
},
|
xet_core_structures::xorb_object::deserialize_chunks(&mut Cursor::new(range_data))?
|
||||||
};
|
},
|
||||||
|
XorbStorage::Random(xorb) => {
|
||||||
|
let range_data = xorb.get_serialized_range(start as u64, end as u64);
|
||||||
|
xet_core_structures::xorb_object::deserialize_chunks(&mut Cursor::new(range_data.as_ref()))?
|
||||||
|
},
|
||||||
|
};
|
||||||
|
|
||||||
|
xet_core_structures::xorb_object::append_chunk_segment(
|
||||||
|
&mut all_decompressed,
|
||||||
|
&mut all_chunk_indices,
|
||||||
|
&data,
|
||||||
|
&chunk_indices,
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
if let Some(expected) = uncompressed_size_if_known {
|
if let Some(expected) = uncompressed_size_if_known {
|
||||||
debug_assert_eq!(
|
debug_assert_eq!(
|
||||||
decompressed_data.len(),
|
all_decompressed.len(),
|
||||||
expected,
|
expected,
|
||||||
"get_file_term_data: expected {} bytes, got {}",
|
"get_file_term_data: expected {} bytes, got {}",
|
||||||
expected,
|
expected,
|
||||||
decompressed_data.len()
|
all_decompressed.len()
|
||||||
);
|
);
|
||||||
}
|
}
|
||||||
|
|
||||||
if let Some(ref cb) = progress_callback {
|
if let Some(ref cb) = progress_callback {
|
||||||
cb(transfer_len, transfer_len, transfer_len);
|
cb(total_transfer, total_transfer, total_transfer);
|
||||||
}
|
}
|
||||||
Ok((Bytes::from(decompressed_data), chunk_byte_indices))
|
Ok((Bytes::from(all_decompressed), all_chunk_indices))
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -946,6 +923,19 @@ fn parse_fetch_url(url: &str) -> Result<(MerkleHash, FileRange, Instant)> {
|
|||||||
Ok((hash, byte_range, timestamp))
|
Ok((hash, byte_range, timestamp))
|
||||||
}
|
}
|
||||||
|
|
||||||
|
fn generate_v2_fetch_url(hash: &MerkleHash, ranges: &[XorbRangeDescriptor], timestamp: Instant) -> String {
|
||||||
|
xorb_utils::generate_v2_fetch_url(hash, ranges, timestamp)
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Parse either a V1 or V2 fetch URL, returning (hash, timestamp).
|
||||||
|
fn parse_any_fetch_url(url: &str) -> Result<(MerkleHash, Instant)> {
|
||||||
|
if let Ok((hash, _, ts)) = parse_fetch_url(url) {
|
||||||
|
return Ok((hash, ts));
|
||||||
|
}
|
||||||
|
let (hash, ts, _) = xorb_utils::parse_v2_fetch_url(url)?;
|
||||||
|
Ok((hash, ts))
|
||||||
|
}
|
||||||
|
|
||||||
#[cfg(all(test, not(target_family = "wasm")))]
|
#[cfg(all(test, not(target_family = "wasm")))]
|
||||||
mod tests {
|
mod tests {
|
||||||
use super::super::client_testing_utils::ClientTestingUtils;
|
use super::super::client_testing_utils::ClientTestingUtils;
|
||||||
@@ -1062,7 +1052,7 @@ mod tests {
|
|||||||
assert_eq!(range_data.as_ref(), &file2.data[start as usize..end as usize]);
|
assert_eq!(range_data.as_ref(), &file2.data[start as usize..end as usize]);
|
||||||
|
|
||||||
// Reconstruction workflow
|
// Reconstruction workflow
|
||||||
let recon = client.get_reconstruction(&file2.file_hash, None).await.unwrap().unwrap();
|
let recon = client.get_reconstruction_v1(&file2.file_hash, None).await.unwrap().unwrap();
|
||||||
for term in &recon.terms {
|
for term in &recon.terms {
|
||||||
let xorb_hash: MerkleHash = term.hash.into();
|
let xorb_hash: MerkleHash = term.hash.into();
|
||||||
for fetch_info in recon.fetch_info.get(&term.hash).unwrap() {
|
for fetch_info in recon.fetch_info.get(&term.hash).unwrap() {
|
||||||
|
|||||||
@@ -34,6 +34,7 @@ mod simulation_server;
|
|||||||
#[cfg(unix)]
|
#[cfg(unix)]
|
||||||
#[cfg(not(target_family = "wasm"))]
|
#[cfg(not(target_family = "wasm"))]
|
||||||
pub mod socket_proxy;
|
pub mod socket_proxy;
|
||||||
|
pub(crate) mod xorb_utils;
|
||||||
|
|
||||||
pub use client_testing_utils::{ClientTestingUtils, RandomFileContents};
|
pub use client_testing_utils::{ClientTestingUtils, RandomFileContents};
|
||||||
#[cfg(not(target_family = "wasm"))]
|
#[cfg(not(target_family = "wasm"))]
|
||||||
|
|||||||
@@ -132,7 +132,7 @@ impl Client for RemoteSimulationClient {
|
|||||||
&self,
|
&self,
|
||||||
file_id: &xet_core_structures::merklehash::MerkleHash,
|
file_id: &xet_core_structures::merklehash::MerkleHash,
|
||||||
bytes_range: Option<crate::cas_types::FileRange>,
|
bytes_range: Option<crate::cas_types::FileRange>,
|
||||||
) -> Result<Option<crate::cas_types::QueryReconstructionResponse>> {
|
) -> Result<Option<crate::cas_types::QueryReconstructionResponseV2>> {
|
||||||
self.inner.get_reconstruction(file_id, bytes_range).await
|
self.inner.get_reconstruction(file_id, bytes_range).await
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|||||||
@@ -440,7 +440,7 @@ impl Client for LocalTestServer {
|
|||||||
&self,
|
&self,
|
||||||
file_id: &xet_core_structures::merklehash::MerkleHash,
|
file_id: &xet_core_structures::merklehash::MerkleHash,
|
||||||
bytes_range: Option<crate::cas_types::FileRange>,
|
bytes_range: Option<crate::cas_types::FileRange>,
|
||||||
) -> Result<Option<crate::cas_types::QueryReconstructionResponse>> {
|
) -> Result<Option<crate::cas_types::QueryReconstructionResponseV2>> {
|
||||||
self.remote_simulation_client.get_reconstruction(file_id, bytes_range).await
|
self.remote_simulation_client.get_reconstruction(file_id, bytes_range).await
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -512,6 +512,30 @@ impl DirectAccessClient for LocalTestServer {
|
|||||||
self.client.set_api_delay_range(delay_range);
|
self.client.set_api_delay_range(delay_range);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
fn set_max_ranges_per_fetch(&self, max_ranges: usize) {
|
||||||
|
self.client.set_max_ranges_per_fetch(max_ranges);
|
||||||
|
}
|
||||||
|
|
||||||
|
fn disable_v2_reconstruction(&self, status_code: u16) {
|
||||||
|
self.client.disable_v2_reconstruction(status_code);
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn get_reconstruction_v1(
|
||||||
|
&self,
|
||||||
|
file_id: &xet_core_structures::merklehash::MerkleHash,
|
||||||
|
bytes_range: Option<crate::cas_types::FileRange>,
|
||||||
|
) -> Result<Option<crate::cas_types::QueryReconstructionResponse>> {
|
||||||
|
self.client.get_reconstruction_v1(file_id, bytes_range).await
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn get_reconstruction_v2(
|
||||||
|
&self,
|
||||||
|
file_id: &xet_core_structures::merklehash::MerkleHash,
|
||||||
|
bytes_range: Option<crate::cas_types::FileRange>,
|
||||||
|
) -> Result<Option<crate::cas_types::QueryReconstructionResponseV2>> {
|
||||||
|
self.client.get_reconstruction_v2(file_id, bytes_range).await
|
||||||
|
}
|
||||||
|
|
||||||
async fn apply_api_delay(&self) {
|
async fn apply_api_delay(&self) {
|
||||||
self.client.apply_api_delay().await;
|
self.client.apply_api_delay().await;
|
||||||
}
|
}
|
||||||
@@ -690,31 +714,22 @@ mod tests {
|
|||||||
|
|
||||||
// Fetch term endpoint - verify URLs are HTTP and data can be fetched
|
// Fetch term endpoint - verify URLs are HTTP and data can be fetched
|
||||||
let http_client = reqwest::Client::new();
|
let http_client = reqwest::Client::new();
|
||||||
for fetch_infos in remote_recon.fetch_info.values() {
|
for multi_range_fetches in remote_recon.xorbs.values() {
|
||||||
for fi in fetch_infos {
|
for mrf in multi_range_fetches {
|
||||||
assert!(fi.url.starts_with("http://"));
|
assert!(mrf.url.starts_with("http://"));
|
||||||
assert!(fi.url.contains("/fetch_term?term="));
|
assert!(mrf.url.contains("/fetch_term?term="));
|
||||||
let response = http_client.get(&fi.url).send().await.unwrap();
|
let response = http_client.get(&mrf.url).send().await.unwrap();
|
||||||
assert!(response.status().is_success());
|
assert!(response.status().is_success());
|
||||||
assert!(!response.bytes().await.unwrap().is_empty());
|
assert!(!response.bytes().await.unwrap().is_empty());
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
// Fetch term with range request
|
// Verify V2 fetch URLs return consistent data across multiple requests.
|
||||||
let first_fi = &remote_recon.fetch_info.values().next().unwrap()[0];
|
let first_mrf = &remote_recon.xorbs.values().next().unwrap()[0];
|
||||||
let full_data = http_client.get(&first_fi.url).send().await.unwrap().bytes().await.unwrap();
|
let data_1 = http_client.get(&first_mrf.url).send().await.unwrap().bytes().await.unwrap();
|
||||||
if full_data.len() > 100 {
|
let data_2 = http_client.get(&first_mrf.url).send().await.unwrap().bytes().await.unwrap();
|
||||||
let range_resp = http_client
|
assert_eq!(data_1, data_2);
|
||||||
.get(&first_fi.url)
|
assert!(!data_1.is_empty());
|
||||||
.header(reqwest::header::RANGE, "bytes=0-99")
|
|
||||||
.send()
|
|
||||||
.await
|
|
||||||
.unwrap();
|
|
||||||
assert!(range_resp.status().is_success());
|
|
||||||
let range_data = range_resp.bytes().await.unwrap();
|
|
||||||
assert_eq!(range_data.len(), 100);
|
|
||||||
assert_eq!(&range_data[..], &full_data[..100]);
|
|
||||||
}
|
|
||||||
}
|
}
|
||||||
|
|
||||||
/// Tests that invalid requests return appropriate error responses.
|
/// Tests that invalid requests return appropriate error responses.
|
||||||
@@ -762,16 +777,16 @@ mod tests {
|
|||||||
.await
|
.await
|
||||||
.unwrap()
|
.unwrap()
|
||||||
.unwrap();
|
.unwrap();
|
||||||
for (hash, fetch_infos) in &recon1.fetch_info {
|
for (hash, multi_range_fetches) in &recon1.xorbs {
|
||||||
for fi in fetch_infos {
|
for mrf in multi_range_fetches {
|
||||||
assert!(
|
assert!(
|
||||||
fi.url.starts_with("http://") || fi.url.starts_with("https://"),
|
mrf.url.starts_with("http://") || mrf.url.starts_with("https://"),
|
||||||
"URL for hash {} should be HTTP, got: {}",
|
"URL for hash {} should be HTTP, got: {}",
|
||||||
hash,
|
hash,
|
||||||
fi.url
|
mrf.url
|
||||||
);
|
);
|
||||||
assert!(fi.url.contains("/fetch_term?term="));
|
assert!(mrf.url.contains("/fetch_term?term="));
|
||||||
assert!(!fi.url.contains("\":"));
|
assert!(!mrf.url.contains("\":"));
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -782,10 +797,10 @@ mod tests {
|
|||||||
.await
|
.await
|
||||||
.unwrap()
|
.unwrap()
|
||||||
.unwrap();
|
.unwrap();
|
||||||
assert!(multi_recon.fetch_info.len() >= 2);
|
assert!(multi_recon.xorbs.len() >= 2);
|
||||||
for fetch_infos in multi_recon.fetch_info.values() {
|
for multi_range_fetches in multi_recon.xorbs.values() {
|
||||||
for fi in fetch_infos {
|
for mrf in multi_range_fetches {
|
||||||
assert!(fi.url.starts_with("http://"));
|
assert!(mrf.url.starts_with("http://"));
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -798,18 +813,18 @@ mod tests {
|
|||||||
.await
|
.await
|
||||||
.unwrap()
|
.unwrap()
|
||||||
.unwrap();
|
.unwrap();
|
||||||
for fetch_infos in range_recon.fetch_info.values() {
|
for multi_range_fetches in range_recon.xorbs.values() {
|
||||||
for fi in fetch_infos {
|
for mrf in multi_range_fetches {
|
||||||
assert!(fi.url.starts_with("http://"));
|
assert!(mrf.url.starts_with("http://"));
|
||||||
assert!(fi.url.contains("/fetch_term?term="));
|
assert!(mrf.url.contains("/fetch_term?term="));
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
// Verify all term URLs are fetchable
|
// Verify all term URLs are fetchable
|
||||||
for term in &recon1.terms {
|
for term in &recon1.terms {
|
||||||
let fetch_infos = recon1.fetch_info.get(&term.hash).unwrap();
|
let multi_range_fetches = recon1.xorbs.get(&term.hash).unwrap();
|
||||||
for fi in fetch_infos {
|
for mrf in multi_range_fetches {
|
||||||
let response = http_client.get(&fi.url).send().await.unwrap();
|
let response = http_client.get(&mrf.url).send().await.unwrap();
|
||||||
assert!(response.status().is_success());
|
assert!(response.status().is_success());
|
||||||
assert!(!response.bytes().await.unwrap().is_empty());
|
assert!(!response.bytes().await.unwrap().is_empty());
|
||||||
}
|
}
|
||||||
@@ -860,9 +875,9 @@ mod tests {
|
|||||||
let expected_term = &file.terms[term_idx];
|
let expected_term = &file.terms[term_idx];
|
||||||
assert_eq!(recon_term.hash.0, expected_term.xorb_hash);
|
assert_eq!(recon_term.hash.0, expected_term.xorb_hash);
|
||||||
|
|
||||||
// Verify fetch_info exists for each XORB
|
// Verify xorbs has entry for each term
|
||||||
let fetch_infos = recon.fetch_info.get(&recon_term.hash).unwrap();
|
let multi_range_fetches = recon.xorbs.get(&recon_term.hash).unwrap();
|
||||||
assert!(!fetch_infos.is_empty());
|
assert!(!multi_range_fetches.is_empty());
|
||||||
}
|
}
|
||||||
|
|
||||||
// Verify the complete file can be retrieved correctly via LocalClient
|
// Verify the complete file can be retrieved correctly via LocalClient
|
||||||
|
|||||||
499
xet_client/src/cas_client/simulation/xorb_utils.rs
Normal file
499
xet_client/src/cas_client/simulation/xorb_utils.rs
Normal file
@@ -0,0 +1,499 @@
|
|||||||
|
//! Shared utilities for reconstruction range computation and V2 URL encoding.
|
||||||
|
//!
|
||||||
|
//! This module consolidates logic used by both `MemoryClient` and `LocalClient`
|
||||||
|
//! for computing reconstruction ranges from file segment info, merging adjacent
|
||||||
|
//! ranges, and encoding/decoding V2 fetch URLs.
|
||||||
|
|
||||||
|
use base64::Engine;
|
||||||
|
use base64::engine::general_purpose::URL_SAFE_NO_PAD;
|
||||||
|
use more_asserts::{assert_ge, assert_gt, debug_assert_lt};
|
||||||
|
use tokio::time::{Duration, Instant};
|
||||||
|
use xet_core_structures::MerkleHashMap;
|
||||||
|
use xet_core_structures::merklehash::MerkleHash;
|
||||||
|
use xet_core_structures::metadata_shard::file_structs::MDBFileInfo;
|
||||||
|
use xet_core_structures::xorb_object::XorbObject;
|
||||||
|
|
||||||
|
use crate::cas_client::error::{CasClientError, Result};
|
||||||
|
use crate::cas_types::{ChunkRange, FileRange, HttpRange, XorbRangeDescriptor, XorbReconstructionTerm};
|
||||||
|
|
||||||
|
lazy_static::lazy_static! {
|
||||||
|
/// Reference instant for URL timestamps. Initialized far in the past to allow
|
||||||
|
/// testing timestamps that are earlier in the current process lifetime.
|
||||||
|
pub(crate) static ref REFERENCE_INSTANT: Instant = {
|
||||||
|
let now = Instant::now();
|
||||||
|
now.checked_sub(Duration::from_secs(365 * 24 * 60 * 60))
|
||||||
|
.unwrap_or(now)
|
||||||
|
};
|
||||||
|
}
|
||||||
|
|
||||||
|
/// A merged byte/chunk range for a single xorb.
|
||||||
|
#[derive(Clone, Debug)]
|
||||||
|
pub(crate) struct MergedRange {
|
||||||
|
pub chunk_range: ChunkRange,
|
||||||
|
pub byte_range: FileRange,
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Result of `compute_reconstruction_ranges`: the offset into the first range,
|
||||||
|
/// the list of reconstruction terms, and the merged ranges per xorb hash.
|
||||||
|
pub(crate) type ReconstructionRangesResult =
|
||||||
|
Option<(u64, Vec<XorbReconstructionTerm>, MerkleHashMap<Vec<MergedRange>>)>;
|
||||||
|
|
||||||
|
/// Computes reconstruction ranges from file segment info.
|
||||||
|
///
|
||||||
|
/// Iterates the segments in `file_info`, prunes chunk boundaries to the
|
||||||
|
/// requested `bytes_range`, and merges adjacent/overlapping ranges per xorb.
|
||||||
|
///
|
||||||
|
/// `get_xorb_footer` is called for each unique xorb hash encountered to obtain
|
||||||
|
/// the `XorbObject` metadata needed for chunk-level byte offset calculations.
|
||||||
|
///
|
||||||
|
/// Returns `Ok(None)` when the range is out of bounds, or
|
||||||
|
/// `Ok(Some((offset_into_first_range, terms, merged_ranges_per_xorb)))`.
|
||||||
|
pub(crate) fn compute_reconstruction_ranges(
|
||||||
|
file_info: &MDBFileInfo,
|
||||||
|
bytes_range: Option<FileRange>,
|
||||||
|
get_xorb_footer: &mut dyn FnMut(&MerkleHash) -> Result<XorbObject>,
|
||||||
|
) -> Result<ReconstructionRangesResult> {
|
||||||
|
let total_file_size: u64 = file_info.file_size();
|
||||||
|
|
||||||
|
let file_range = if let Some(range) = bytes_range {
|
||||||
|
if range.start >= total_file_size {
|
||||||
|
if total_file_size == 0 && range.start == 0 {
|
||||||
|
return Ok(Some((0, vec![], MerkleHashMap::new())));
|
||||||
|
}
|
||||||
|
return Ok(None);
|
||||||
|
}
|
||||||
|
FileRange::new(range.start, range.end.min(total_file_size))
|
||||||
|
} else {
|
||||||
|
if total_file_size == 0 {
|
||||||
|
return Ok(Some((0, vec![], MerkleHashMap::new())));
|
||||||
|
}
|
||||||
|
FileRange::full()
|
||||||
|
};
|
||||||
|
|
||||||
|
let mut s_idx = 0;
|
||||||
|
let mut cumulative_bytes = 0u64;
|
||||||
|
let mut first_chunk_byte_start;
|
||||||
|
|
||||||
|
loop {
|
||||||
|
if s_idx >= file_info.segments.len() {
|
||||||
|
return Err(CasClientError::InvalidRange);
|
||||||
|
}
|
||||||
|
|
||||||
|
let n = file_info.segments[s_idx].unpacked_segment_bytes as u64;
|
||||||
|
if cumulative_bytes + n > file_range.start {
|
||||||
|
assert_ge!(file_range.start, cumulative_bytes);
|
||||||
|
first_chunk_byte_start = cumulative_bytes;
|
||||||
|
break;
|
||||||
|
} else {
|
||||||
|
cumulative_bytes += n;
|
||||||
|
s_idx += 1;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
let mut terms = Vec::new();
|
||||||
|
|
||||||
|
#[derive(Clone)]
|
||||||
|
struct FetchInfoIntermediate {
|
||||||
|
chunk_range: ChunkRange,
|
||||||
|
byte_range: FileRange,
|
||||||
|
}
|
||||||
|
|
||||||
|
let mut fetch_info_map: MerkleHashMap<Vec<FetchInfoIntermediate>> = MerkleHashMap::new();
|
||||||
|
|
||||||
|
while s_idx < file_info.segments.len() && cumulative_bytes < file_range.end {
|
||||||
|
let mut segment = file_info.segments[s_idx].clone();
|
||||||
|
let mut chunk_range = ChunkRange::new(segment.chunk_index_start, segment.chunk_index_end);
|
||||||
|
|
||||||
|
let xorb_footer = get_xorb_footer(&segment.xorb_hash)?;
|
||||||
|
|
||||||
|
if cumulative_bytes < file_range.start {
|
||||||
|
while chunk_range.start < chunk_range.end {
|
||||||
|
let next_chunk_size = xorb_footer.uncompressed_chunk_length(chunk_range.start)? as u64;
|
||||||
|
if cumulative_bytes + next_chunk_size <= file_range.start {
|
||||||
|
cumulative_bytes += next_chunk_size;
|
||||||
|
first_chunk_byte_start += next_chunk_size;
|
||||||
|
segment.unpacked_segment_bytes -= next_chunk_size as u32;
|
||||||
|
chunk_range.start += 1;
|
||||||
|
debug_assert_lt!(chunk_range.start, chunk_range.end);
|
||||||
|
} else {
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
if cumulative_bytes + segment.unpacked_segment_bytes as u64 > file_range.end {
|
||||||
|
while chunk_range.end > chunk_range.start {
|
||||||
|
let last_chunk_size = xorb_footer.uncompressed_chunk_length(chunk_range.end - 1)?;
|
||||||
|
if cumulative_bytes + (segment.unpacked_segment_bytes - last_chunk_size) as u64 >= file_range.end {
|
||||||
|
chunk_range.end -= 1;
|
||||||
|
segment.unpacked_segment_bytes -= last_chunk_size;
|
||||||
|
debug_assert_lt!(chunk_range.start, chunk_range.end);
|
||||||
|
assert_gt!(segment.unpacked_segment_bytes, 0);
|
||||||
|
} else {
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
let (byte_start, byte_end) = xorb_footer.get_byte_offset(chunk_range.start, chunk_range.end)?;
|
||||||
|
let byte_range = FileRange::new(byte_start as u64, byte_end as u64);
|
||||||
|
|
||||||
|
terms.push(XorbReconstructionTerm {
|
||||||
|
hash: segment.xorb_hash.into(),
|
||||||
|
unpacked_length: segment.unpacked_segment_bytes,
|
||||||
|
range: chunk_range,
|
||||||
|
});
|
||||||
|
|
||||||
|
fetch_info_map
|
||||||
|
.entry(segment.xorb_hash)
|
||||||
|
.or_default()
|
||||||
|
.push(FetchInfoIntermediate {
|
||||||
|
chunk_range,
|
||||||
|
byte_range,
|
||||||
|
});
|
||||||
|
|
||||||
|
cumulative_bytes += segment.unpacked_segment_bytes as u64;
|
||||||
|
s_idx += 1;
|
||||||
|
}
|
||||||
|
|
||||||
|
debug_assert!(!terms.is_empty());
|
||||||
|
|
||||||
|
let mut merged: MerkleHashMap<Vec<MergedRange>> = MerkleHashMap::new();
|
||||||
|
for (hash, mut fi_vec) in fetch_info_map {
|
||||||
|
fi_vec.sort_by_key(|fi| fi.chunk_range.start);
|
||||||
|
|
||||||
|
let mut result: Vec<MergedRange> = Vec::new();
|
||||||
|
let mut idx = 0;
|
||||||
|
|
||||||
|
while idx < fi_vec.len() {
|
||||||
|
let mut cur = fi_vec[idx].clone();
|
||||||
|
|
||||||
|
while idx + 1 < fi_vec.len() {
|
||||||
|
let next = &fi_vec[idx + 1];
|
||||||
|
if next.chunk_range.start <= cur.chunk_range.end {
|
||||||
|
cur.chunk_range.end = cur.chunk_range.end.max(next.chunk_range.end);
|
||||||
|
cur.byte_range.end = cur.byte_range.end.max(next.byte_range.end);
|
||||||
|
idx += 1;
|
||||||
|
} else {
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
result.push(MergedRange {
|
||||||
|
chunk_range: cur.chunk_range,
|
||||||
|
byte_range: cur.byte_range,
|
||||||
|
});
|
||||||
|
idx += 1;
|
||||||
|
}
|
||||||
|
|
||||||
|
merged.insert(hash, result);
|
||||||
|
}
|
||||||
|
|
||||||
|
Ok(Some((file_range.start - first_chunk_byte_start, terms, merged)))
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Generates a V2 fetch URL: base64("{hash_hex}:{timestamp_ms}:{r1_start}-{r1_end},...")
|
||||||
|
pub(crate) fn generate_v2_fetch_url(hash: &MerkleHash, ranges: &[XorbRangeDescriptor], timestamp: Instant) -> String {
|
||||||
|
let timestamp_ms = timestamp.saturating_duration_since(*REFERENCE_INSTANT).as_millis() as u64;
|
||||||
|
let ranges_str: Vec<String> = ranges.iter().map(|r| format!("{}-{}", r.bytes.start, r.bytes.end)).collect();
|
||||||
|
let payload = format!("{}:{}:{}", hash.hex(), timestamp_ms, ranges_str.join(","));
|
||||||
|
URL_SAFE_NO_PAD.encode(payload.as_bytes())
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Parses a V2 fetch URL back into (hash, timestamp, byte ranges).
|
||||||
|
pub(crate) fn parse_v2_fetch_url(url: &str) -> Result<(MerkleHash, Instant, Vec<HttpRange>)> {
|
||||||
|
let bytes = URL_SAFE_NO_PAD.decode(url).map_err(|_| CasClientError::InvalidArguments)?;
|
||||||
|
let payload = String::from_utf8(bytes).map_err(|_| CasClientError::InvalidArguments)?;
|
||||||
|
|
||||||
|
let mut parts = payload.splitn(3, ':');
|
||||||
|
let hash_hex = parts.next().ok_or(CasClientError::InvalidArguments)?;
|
||||||
|
let ts_str = parts.next().ok_or(CasClientError::InvalidArguments)?;
|
||||||
|
let ranges_str = parts.next().ok_or(CasClientError::InvalidArguments)?;
|
||||||
|
|
||||||
|
let hash = MerkleHash::from_hex(hash_hex).map_err(|_| CasClientError::InvalidArguments)?;
|
||||||
|
let timestamp_ms: u64 = ts_str.parse().map_err(|_| CasClientError::InvalidArguments)?;
|
||||||
|
let timestamp = *REFERENCE_INSTANT + Duration::from_millis(timestamp_ms);
|
||||||
|
|
||||||
|
let mut ranges = Vec::new();
|
||||||
|
for r in ranges_str.split(',').filter(|s| !s.is_empty()) {
|
||||||
|
let mut parts = r.splitn(2, '-');
|
||||||
|
let start: u64 = parts
|
||||||
|
.next()
|
||||||
|
.ok_or(CasClientError::InvalidArguments)?
|
||||||
|
.parse()
|
||||||
|
.map_err(|_| CasClientError::InvalidArguments)?;
|
||||||
|
let end: u64 = parts
|
||||||
|
.next()
|
||||||
|
.ok_or(CasClientError::InvalidArguments)?
|
||||||
|
.parse()
|
||||||
|
.map_err(|_| CasClientError::InvalidArguments)?;
|
||||||
|
ranges.push(HttpRange::new(start, end));
|
||||||
|
}
|
||||||
|
|
||||||
|
Ok((hash, timestamp, ranges))
|
||||||
|
}
|
||||||
|
|
||||||
|
#[cfg(test)]
|
||||||
|
mod tests {
|
||||||
|
use xet_core_structures::metadata_shard::file_structs::{
|
||||||
|
FileDataSequenceEntry, FileDataSequenceHeader, MDBFileInfo,
|
||||||
|
};
|
||||||
|
|
||||||
|
use super::super::random_xorb::RandomXorb;
|
||||||
|
use super::*;
|
||||||
|
|
||||||
|
fn make_range_descriptor(chunk_start: u32, chunk_end: u32, byte_start: u64, byte_end: u64) -> XorbRangeDescriptor {
|
||||||
|
XorbRangeDescriptor {
|
||||||
|
chunks: ChunkRange::new(chunk_start, chunk_end),
|
||||||
|
bytes: HttpRange::new(byte_start, byte_end),
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
fn build_xorb(chunk_sizes: &[usize]) -> (MerkleHash, XorbObject) {
|
||||||
|
let seed_and_sizes: Vec<(u64, u32)> =
|
||||||
|
chunk_sizes.iter().enumerate().map(|(i, &s)| (i as u64, s as u32)).collect();
|
||||||
|
let xorb = RandomXorb::new(&seed_and_sizes);
|
||||||
|
let xorb_object = xorb.get_xorb_object();
|
||||||
|
let hash = xorb.xorb_hash();
|
||||||
|
(hash, xorb_object)
|
||||||
|
}
|
||||||
|
|
||||||
|
fn make_segment(
|
||||||
|
xorb_hash: MerkleHash,
|
||||||
|
chunk_start: u32,
|
||||||
|
chunk_end: u32,
|
||||||
|
unpacked_bytes: u32,
|
||||||
|
) -> FileDataSequenceEntry {
|
||||||
|
FileDataSequenceEntry {
|
||||||
|
xorb_hash,
|
||||||
|
xorb_flags: 0,
|
||||||
|
chunk_index_start: chunk_start,
|
||||||
|
chunk_index_end: chunk_end,
|
||||||
|
unpacked_segment_bytes: unpacked_bytes,
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
fn make_file_info(segments: Vec<FileDataSequenceEntry>) -> MDBFileInfo {
|
||||||
|
MDBFileInfo {
|
||||||
|
metadata: FileDataSequenceHeader {
|
||||||
|
file_hash: MerkleHash::default(),
|
||||||
|
..Default::default()
|
||||||
|
},
|
||||||
|
segments,
|
||||||
|
verification: vec![],
|
||||||
|
metadata_ext: None,
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_v2_url_roundtrip() {
|
||||||
|
let hash = MerkleHash::from_hex("a32d3a2a2e83e4d41b04899f13a8e891f4dd3f2ed940f96f91da7bf55b7ee299").unwrap();
|
||||||
|
let ranges = vec![
|
||||||
|
make_range_descriptor(0, 3, 0, 1024),
|
||||||
|
make_range_descriptor(5, 8, 2048, 4096),
|
||||||
|
];
|
||||||
|
let timestamp = Instant::now();
|
||||||
|
|
||||||
|
let url = generate_v2_fetch_url(&hash, &ranges, timestamp);
|
||||||
|
let (parsed_hash, parsed_ts, parsed_ranges) = parse_v2_fetch_url(&url).unwrap();
|
||||||
|
|
||||||
|
assert_eq!(hash, parsed_hash);
|
||||||
|
assert_eq!(parsed_ranges.len(), 2);
|
||||||
|
assert_eq!(parsed_ranges[0].start, 0);
|
||||||
|
assert_eq!(parsed_ranges[0].end, 1024);
|
||||||
|
assert_eq!(parsed_ranges[1].start, 2048);
|
||||||
|
assert_eq!(parsed_ranges[1].end, 4096);
|
||||||
|
|
||||||
|
let diff = if parsed_ts > timestamp {
|
||||||
|
parsed_ts - timestamp
|
||||||
|
} else {
|
||||||
|
timestamp - parsed_ts
|
||||||
|
};
|
||||||
|
assert!(diff < Duration::from_millis(2));
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_v2_url_single_range() {
|
||||||
|
let hash = MerkleHash::default();
|
||||||
|
let ranges = vec![make_range_descriptor(0, 1, 100, 200)];
|
||||||
|
let timestamp = Instant::now();
|
||||||
|
|
||||||
|
let url = generate_v2_fetch_url(&hash, &ranges, timestamp);
|
||||||
|
let (_, _, parsed_ranges) = parse_v2_fetch_url(&url).unwrap();
|
||||||
|
|
||||||
|
assert_eq!(parsed_ranges.len(), 1);
|
||||||
|
assert_eq!(parsed_ranges[0].start, 100);
|
||||||
|
assert_eq!(parsed_ranges[0].end, 200);
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_v2_url_invalid_base64() {
|
||||||
|
assert!(parse_v2_fetch_url("not-valid!!!").is_err());
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_v2_url_invalid_payload() {
|
||||||
|
let url = URL_SAFE_NO_PAD.encode(b"bad");
|
||||||
|
assert!(parse_v2_fetch_url(&url).is_err());
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_compute_ranges_single_segment() {
|
||||||
|
let (xorb_hash, xorb_object) = build_xorb(&[100, 200, 300]);
|
||||||
|
let file_info = make_file_info(vec![make_segment(xorb_hash, 0, 3, 600)]);
|
||||||
|
|
||||||
|
let result = compute_reconstruction_ranges(&file_info, None, &mut |_| Ok(xorb_object.clone())).unwrap();
|
||||||
|
let (offset, terms, merged) = result.unwrap();
|
||||||
|
|
||||||
|
assert_eq!(offset, 0);
|
||||||
|
assert_eq!(terms.len(), 1);
|
||||||
|
assert_eq!(terms[0].unpacked_length, 600);
|
||||||
|
assert_eq!(terms[0].range.start, 0);
|
||||||
|
assert_eq!(terms[0].range.end, 3);
|
||||||
|
|
||||||
|
let xorb_ranges = merged.get(&xorb_hash).unwrap();
|
||||||
|
assert_eq!(xorb_ranges.len(), 1);
|
||||||
|
assert_eq!(xorb_ranges[0].chunk_range.start, 0);
|
||||||
|
assert_eq!(xorb_ranges[0].chunk_range.end, 3);
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_compute_ranges_partial_range() {
|
||||||
|
let (xorb_hash, xorb_object) = build_xorb(&[100, 200, 300]);
|
||||||
|
let file_info = make_file_info(vec![make_segment(xorb_hash, 0, 3, 600)]);
|
||||||
|
|
||||||
|
let range = FileRange::new(100, 300);
|
||||||
|
let result = compute_reconstruction_ranges(&file_info, Some(range), &mut |_| Ok(xorb_object.clone())).unwrap();
|
||||||
|
let (offset, terms, merged) = result.unwrap();
|
||||||
|
|
||||||
|
assert_eq!(offset, 0, "range starts exactly at chunk boundary");
|
||||||
|
assert_eq!(terms.len(), 1);
|
||||||
|
assert_eq!(terms[0].range.start, 1);
|
||||||
|
assert_eq!(terms[0].range.end, 2);
|
||||||
|
assert_eq!(terms[0].unpacked_length, 200);
|
||||||
|
|
||||||
|
let xorb_ranges = merged.get(&xorb_hash).unwrap();
|
||||||
|
assert_eq!(xorb_ranges.len(), 1);
|
||||||
|
assert_eq!(xorb_ranges[0].chunk_range.start, 1);
|
||||||
|
assert_eq!(xorb_ranges[0].chunk_range.end, 2);
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_compute_ranges_out_of_bounds() {
|
||||||
|
let file_info = make_file_info(vec![make_segment(MerkleHash::default(), 0, 1, 100)]);
|
||||||
|
|
||||||
|
let range = FileRange::new(200, 300);
|
||||||
|
let result = compute_reconstruction_ranges(&file_info, Some(range), &mut |_| {
|
||||||
|
panic!("should not be called for out-of-range")
|
||||||
|
})
|
||||||
|
.unwrap();
|
||||||
|
assert!(result.is_none());
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_compute_ranges_empty_file() {
|
||||||
|
let file_info = make_file_info(vec![]);
|
||||||
|
|
||||||
|
let result =
|
||||||
|
compute_reconstruction_ranges(&file_info, None, &mut |_| panic!("should not be called for empty file"))
|
||||||
|
.unwrap();
|
||||||
|
let (offset, terms, merged) = result.unwrap();
|
||||||
|
assert_eq!(offset, 0);
|
||||||
|
assert!(terms.is_empty());
|
||||||
|
assert!(merged.is_empty());
|
||||||
|
|
||||||
|
let result = compute_reconstruction_ranges(&file_info, Some(FileRange::new(0, 100)), &mut |_| {
|
||||||
|
panic!("should not be called for empty file")
|
||||||
|
})
|
||||||
|
.unwrap();
|
||||||
|
let (offset, terms, _) = result.unwrap();
|
||||||
|
assert_eq!(offset, 0);
|
||||||
|
assert!(terms.is_empty());
|
||||||
|
|
||||||
|
let result = compute_reconstruction_ranges(&file_info, Some(FileRange::new(1, 100)), &mut |_| {
|
||||||
|
panic!("should not be called for empty file")
|
||||||
|
})
|
||||||
|
.unwrap();
|
||||||
|
assert!(result.is_none());
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_compute_ranges_merges_adjacent() {
|
||||||
|
let (xorb_hash, xorb_object) = build_xorb(&[100, 100, 100, 100]);
|
||||||
|
let file_info = make_file_info(vec![make_segment(xorb_hash, 0, 2, 200), make_segment(xorb_hash, 2, 4, 200)]);
|
||||||
|
|
||||||
|
let result = compute_reconstruction_ranges(&file_info, None, &mut |_| Ok(xorb_object.clone())).unwrap();
|
||||||
|
let (offset, terms, merged) = result.unwrap();
|
||||||
|
|
||||||
|
assert_eq!(offset, 0);
|
||||||
|
assert_eq!(terms.len(), 2);
|
||||||
|
|
||||||
|
let xorb_ranges = merged.get(&xorb_hash).unwrap();
|
||||||
|
assert_eq!(xorb_ranges.len(), 1);
|
||||||
|
assert_eq!(xorb_ranges[0].chunk_range.start, 0);
|
||||||
|
assert_eq!(xorb_ranges[0].chunk_range.end, 4);
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_compute_ranges_multi_xorb_non_contiguous() {
|
||||||
|
let (hash_a, obj_a) = build_xorb(&[100, 100, 100, 100]);
|
||||||
|
let (hash_b, obj_b) = build_xorb(&[150, 150]);
|
||||||
|
|
||||||
|
let file_info = make_file_info(vec![
|
||||||
|
make_segment(hash_a, 0, 2, 200),
|
||||||
|
make_segment(hash_b, 0, 2, 300),
|
||||||
|
make_segment(hash_a, 2, 4, 200),
|
||||||
|
]);
|
||||||
|
|
||||||
|
let result = compute_reconstruction_ranges(&file_info, None, &mut |hash| {
|
||||||
|
if *hash == hash_a {
|
||||||
|
Ok(obj_a.clone())
|
||||||
|
} else if *hash == hash_b {
|
||||||
|
Ok(obj_b.clone())
|
||||||
|
} else {
|
||||||
|
Err(CasClientError::XORBNotFound(*hash))
|
||||||
|
}
|
||||||
|
})
|
||||||
|
.unwrap();
|
||||||
|
|
||||||
|
let (offset, terms, merged) = result.unwrap();
|
||||||
|
assert_eq!(offset, 0);
|
||||||
|
assert_eq!(terms.len(), 3);
|
||||||
|
|
||||||
|
let a_ranges = merged.get(&hash_a).unwrap();
|
||||||
|
assert_eq!(a_ranges.len(), 1);
|
||||||
|
assert_eq!(a_ranges[0].chunk_range.start, 0);
|
||||||
|
assert_eq!(a_ranges[0].chunk_range.end, 4);
|
||||||
|
|
||||||
|
let b_ranges = merged.get(&hash_b).unwrap();
|
||||||
|
assert_eq!(b_ranges.len(), 1);
|
||||||
|
assert_eq!(b_ranges[0].chunk_range.start, 0);
|
||||||
|
assert_eq!(b_ranges[0].chunk_range.end, 2);
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_compute_ranges_truncates_to_file_size() {
|
||||||
|
let (xorb_hash, xorb_object) = build_xorb(&[500]);
|
||||||
|
let file_info = make_file_info(vec![make_segment(xorb_hash, 0, 1, 500)]);
|
||||||
|
|
||||||
|
let range = FileRange::new(0, 10000);
|
||||||
|
let result = compute_reconstruction_ranges(&file_info, Some(range), &mut |_| Ok(xorb_object.clone())).unwrap();
|
||||||
|
let (offset, terms, _) = result.unwrap();
|
||||||
|
assert_eq!(offset, 0);
|
||||||
|
assert_eq!(terms.len(), 1);
|
||||||
|
assert_eq!(terms[0].unpacked_length, 500);
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_compute_ranges_offset_into_first_range() {
|
||||||
|
let (xorb_hash, xorb_object) = build_xorb(&[100, 200, 300]);
|
||||||
|
let file_info = make_file_info(vec![make_segment(xorb_hash, 0, 3, 600)]);
|
||||||
|
|
||||||
|
let range = FileRange::new(150, 600);
|
||||||
|
let result = compute_reconstruction_ranges(&file_info, Some(range), &mut |_| Ok(xorb_object.clone())).unwrap();
|
||||||
|
let (offset, terms, _) = result.unwrap();
|
||||||
|
|
||||||
|
assert_eq!(offset, 50);
|
||||||
|
assert_eq!(terms[0].range.start, 1);
|
||||||
|
}
|
||||||
|
}
|
||||||
@@ -217,6 +217,66 @@ pub struct QueryReconstructionResponse {
|
|||||||
pub fetch_info: HashMap<HexMerkleHash, Vec<XorbReconstructionFetchInfo>>,
|
pub fetch_info: HashMap<HexMerkleHash, Vec<XorbReconstructionFetchInfo>>,
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/// V2 reconstruction response - optimized for multi-range fetching.
|
||||||
|
/// May provide fewer signed URLs per xorb by combining multiple byte ranges
|
||||||
|
/// into a single URL where possible.
|
||||||
|
#[derive(Debug, Serialize, Deserialize, Clone)]
|
||||||
|
pub struct QueryReconstructionResponseV2 {
|
||||||
|
pub offset_into_first_range: u64,
|
||||||
|
pub terms: Vec<XorbReconstructionTerm>,
|
||||||
|
/// Map from xorb hash -> list of multi-range fetch entries.
|
||||||
|
/// Typically 1 entry per xorb. Multiple entries when the URL length limit
|
||||||
|
/// (~8 KiB, roughly ~500 ranges) forces a split.
|
||||||
|
pub xorbs: HashMap<HexMerkleHash, Vec<XorbMultiRangeFetch>>,
|
||||||
|
}
|
||||||
|
|
||||||
|
/// A signed multi-range fetch: one URL covering a subset of ranges for a xorb.
|
||||||
|
#[derive(Debug, Serialize, Deserialize, Clone)]
|
||||||
|
pub struct XorbMultiRangeFetch {
|
||||||
|
/// Signed URL with all byte ranges encoded. Client must send exactly the
|
||||||
|
/// signed range value as the Range header.
|
||||||
|
pub url: String,
|
||||||
|
/// Byte ranges covered by this URL, sorted by chunk start.
|
||||||
|
pub ranges: Vec<XorbRangeDescriptor>,
|
||||||
|
}
|
||||||
|
|
||||||
|
/// A single byte range within a xorb, mapping chunk indices to physical bytes.
|
||||||
|
#[derive(Debug, Serialize, Deserialize, Clone)]
|
||||||
|
pub struct XorbRangeDescriptor {
|
||||||
|
/// Chunk index range [start, end) within the xorb.
|
||||||
|
pub chunks: ChunkRange,
|
||||||
|
/// Physical byte range [start, end] (inclusive end) for the HTTP Range header.
|
||||||
|
pub bytes: HttpRange,
|
||||||
|
}
|
||||||
|
|
||||||
|
impl From<QueryReconstructionResponse> for QueryReconstructionResponseV2 {
|
||||||
|
fn from(v1: QueryReconstructionResponse) -> Self {
|
||||||
|
let xorbs = v1
|
||||||
|
.fetch_info
|
||||||
|
.into_iter()
|
||||||
|
.map(|(hash, fetch_infos)| {
|
||||||
|
let fetch = fetch_infos
|
||||||
|
.into_iter()
|
||||||
|
.map(|info| XorbMultiRangeFetch {
|
||||||
|
url: info.url,
|
||||||
|
ranges: vec![XorbRangeDescriptor {
|
||||||
|
chunks: info.range,
|
||||||
|
bytes: info.url_range,
|
||||||
|
}],
|
||||||
|
})
|
||||||
|
.collect();
|
||||||
|
(hash, fetch)
|
||||||
|
})
|
||||||
|
.collect();
|
||||||
|
|
||||||
|
QueryReconstructionResponseV2 {
|
||||||
|
offset_into_first_range: v1.offset_into_first_range,
|
||||||
|
terms: v1.terms,
|
||||||
|
xorbs,
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
// Request json body type representation for the POST /reconstructions endpoint
|
// Request json body type representation for the POST /reconstructions endpoint
|
||||||
// to get the reconstruction for multiple files at a time.
|
// to get the reconstruction for multiple files at a time.
|
||||||
// listing of non-duplicate (enforced by HashSet) keys (file ids) to get reconstructions for
|
// listing of non-duplicate (enforced by HashSet) keys (file ids) to get reconstructions for
|
||||||
|
|||||||
40
xet_client/tests/test_shard_upload_timeout.rs
Normal file
40
xet_client/tests/test_shard_upload_timeout.rs
Normal file
@@ -0,0 +1,40 @@
|
|||||||
|
//! Integration tests for the shard upload no-read-timeout client (XET-885).
|
||||||
|
//!
|
||||||
|
//! Verifies that shard uploads succeed even when the server takes a long time to process,
|
||||||
|
//! since the shard upload client has no read_timeout.
|
||||||
|
|
||||||
|
use std::time::Duration;
|
||||||
|
|
||||||
|
use xet_client::cas_client::simulation::ClientTestingUtils;
|
||||||
|
use xet_client::cas_client::{DirectAccessClient, LocalTestServerBuilder};
|
||||||
|
use xet_runtime::test_set_config;
|
||||||
|
|
||||||
|
test_set_config! {
|
||||||
|
client {
|
||||||
|
retry_max_attempts = 1usize;
|
||||||
|
retry_base_delay = Duration::from_millis(10);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
const CHUNK_SIZE: usize = 123;
|
||||||
|
|
||||||
|
#[tokio::test]
|
||||||
|
async fn test_shard_upload_succeeds_with_no_server_delay() {
|
||||||
|
let server = LocalTestServerBuilder::new().start().await;
|
||||||
|
|
||||||
|
let result = server.remote_client().upload_random_file(&[(1, (0, 5))], CHUNK_SIZE).await;
|
||||||
|
|
||||||
|
assert!(result.is_ok(), "Shard upload should succeed with no server delay: {result:?}");
|
||||||
|
}
|
||||||
|
|
||||||
|
#[tokio::test]
|
||||||
|
async fn test_shard_upload_succeeds_with_slow_server() {
|
||||||
|
let server = LocalTestServerBuilder::new().start().await;
|
||||||
|
|
||||||
|
// Server takes 3s to respond — shard upload client has no read_timeout so this should succeed
|
||||||
|
server.set_api_delay_range(Some(Duration::from_secs(3)..Duration::from_secs(3)));
|
||||||
|
|
||||||
|
let result = server.remote_client().upload_random_file(&[(1, (0, 5))], CHUNK_SIZE).await;
|
||||||
|
|
||||||
|
assert!(result.is_ok(), "Shard upload should succeed even with slow server (no read_timeout): {result:?}");
|
||||||
|
}
|
||||||
@@ -192,6 +192,27 @@ pub fn deserialize_chunks<R: Read>(reader: &mut R) -> Result<(Vec<u8>, Vec<u32>)
|
|||||||
Ok((buf, chunk_byte_indices))
|
Ok((buf, chunk_byte_indices))
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/// Appends a deserialized chunk segment to existing accumulated buffers.
|
||||||
|
///
|
||||||
|
/// `deserialize_chunks` returns `chunk_byte_indices` starting with a leading `0`.
|
||||||
|
/// When concatenating multiple segments, this function deduplicates that leading
|
||||||
|
/// zero for subsequent segments and rebases all indices to account for data already
|
||||||
|
/// accumulated.
|
||||||
|
pub fn append_chunk_segment(
|
||||||
|
all_data: &mut Vec<u8>,
|
||||||
|
all_chunk_indices: &mut Vec<u32>,
|
||||||
|
segment_data: &[u8],
|
||||||
|
segment_indices: &[u32],
|
||||||
|
) {
|
||||||
|
let base_offset = all_data.len() as u32;
|
||||||
|
if all_chunk_indices.is_empty() {
|
||||||
|
all_chunk_indices.extend_from_slice(segment_indices);
|
||||||
|
} else {
|
||||||
|
all_chunk_indices.extend(segment_indices.iter().skip(1).map(|&o| o + base_offset));
|
||||||
|
}
|
||||||
|
all_data.extend_from_slice(segment_data);
|
||||||
|
}
|
||||||
|
|
||||||
/// Reads the next chunk header, returning `None` on clean EOF.
|
/// Reads the next chunk header, returning `None` on clean EOF.
|
||||||
///
|
///
|
||||||
/// Uses a single `read()` call to detect EOF (returns 0), then completes
|
/// Uses a single `read()` call to detect EOF (returns 0), then completes
|
||||||
@@ -338,6 +359,37 @@ mod tests {
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_append_chunk_segment() {
|
||||||
|
let mut all_data = Vec::new();
|
||||||
|
let mut all_indices = Vec::<u32>::new();
|
||||||
|
|
||||||
|
// First segment: simulates deserialize_chunks output [0, 10, 25]
|
||||||
|
append_chunk_segment(&mut all_data, &mut all_indices, &[0u8; 25], &[0, 10, 25]);
|
||||||
|
assert_eq!(all_data.len(), 25);
|
||||||
|
assert_eq!(all_indices, vec![0, 10, 25]);
|
||||||
|
|
||||||
|
// Second segment: [0, 8, 20] — leading 0 should be skipped, offsets rebased by 25
|
||||||
|
append_chunk_segment(&mut all_data, &mut all_indices, &[1u8; 20], &[0, 8, 20]);
|
||||||
|
assert_eq!(all_data.len(), 45);
|
||||||
|
assert_eq!(all_indices, vec![0, 10, 25, 33, 45]);
|
||||||
|
|
||||||
|
// Third segment: single chunk [0, 5] — leading 0 skipped, rebased by 45
|
||||||
|
append_chunk_segment(&mut all_data, &mut all_indices, &[2u8; 5], &[0, 5]);
|
||||||
|
assert_eq!(all_data.len(), 50);
|
||||||
|
assert_eq!(all_indices, vec![0, 10, 25, 33, 45, 50]);
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_append_chunk_segment_single() {
|
||||||
|
let mut all_data = Vec::new();
|
||||||
|
let mut all_indices = Vec::<u32>::new();
|
||||||
|
|
||||||
|
append_chunk_segment(&mut all_data, &mut all_indices, &[0u8; 10], &[0, 10]);
|
||||||
|
assert_eq!(all_data.len(), 10);
|
||||||
|
assert_eq!(all_indices, vec![0, 10]);
|
||||||
|
}
|
||||||
|
|
||||||
#[test]
|
#[test]
|
||||||
fn test_truncated_stream_returns_error() {
|
fn test_truncated_stream_returns_error() {
|
||||||
let (_, xorb_data, _, _) = build_xorb_object(3, ChunkSize::Fixed(1024), CompressionScheme::None);
|
let (_, xorb_data, _, _) = build_xorb_object(3, ChunkSize::Fixed(1024), CompressionScheme::None);
|
||||||
|
|||||||
@@ -375,6 +375,7 @@ mod tests {
|
|||||||
use ulid::Ulid;
|
use ulid::Ulid;
|
||||||
use xet_client::cas_client::{ClientTestingUtils, DirectAccessClient, LocalClient, RandomFileContents};
|
use xet_client::cas_client::{ClientTestingUtils, DirectAccessClient, LocalClient, RandomFileContents};
|
||||||
use xet_client::cas_types::FileRange;
|
use xet_client::cas_types::FileRange;
|
||||||
|
use xet_runtime::core::XetRuntime;
|
||||||
|
|
||||||
use super::*;
|
use super::*;
|
||||||
use crate::progress_tracking::NoOpProgressUpdater;
|
use crate::progress_tracking::NoOpProgressUpdater;
|
||||||
@@ -405,6 +406,7 @@ mod tests {
|
|||||||
file_hash: MerkleHash,
|
file_hash: MerkleHash,
|
||||||
byte_range: Option<FileRange>,
|
byte_range: Option<FileRange>,
|
||||||
config: &ReconstructionConfig,
|
config: &ReconstructionConfig,
|
||||||
|
semaphore: Option<Arc<AdjustableSemaphore>>,
|
||||||
) -> Result<Vec<u8>> {
|
) -> Result<Vec<u8>> {
|
||||||
let buffer = Arc::new(std::sync::Mutex::new(Cursor::new(Vec::new())));
|
let buffer = Arc::new(std::sync::Mutex::new(Cursor::new(Vec::new())));
|
||||||
let writer = StaticCursorWriter(buffer.clone());
|
let writer = StaticCursorWriter(buffer.clone());
|
||||||
@@ -415,6 +417,9 @@ mod tests {
|
|||||||
if let Some(range) = byte_range {
|
if let Some(range) = byte_range {
|
||||||
reconstructor = reconstructor.with_byte_range(range);
|
reconstructor = reconstructor.with_byte_range(range);
|
||||||
}
|
}
|
||||||
|
if let Some(sem) = semaphore {
|
||||||
|
reconstructor = reconstructor.with_buffer_semaphore(sem);
|
||||||
|
}
|
||||||
|
|
||||||
reconstructor.reconstruct_to_writer(writer).await?;
|
reconstructor.reconstruct_to_writer(writer).await?;
|
||||||
|
|
||||||
@@ -528,7 +533,7 @@ mod tests {
|
|||||||
config.use_vectored_write = use_vectored;
|
config.use_vectored_write = use_vectored;
|
||||||
|
|
||||||
// Test 1: reconstruct_to_writer
|
// Test 1: reconstruct_to_writer
|
||||||
let vec_result = reconstruct_to_vec(client, h, None, &config).await.unwrap();
|
let vec_result = reconstruct_to_vec(client, h, None, &config, None).await.unwrap();
|
||||||
assert_eq!(vec_result, *expected, "vec failed (vectored={use_vectored})");
|
assert_eq!(vec_result, *expected, "vec failed (vectored={use_vectored})");
|
||||||
|
|
||||||
// Test 2: reconstruct_to_file
|
// Test 2: reconstruct_to_file
|
||||||
@@ -560,7 +565,7 @@ mod tests {
|
|||||||
config.use_vectored_write = use_vectored;
|
config.use_vectored_write = use_vectored;
|
||||||
|
|
||||||
// Test 1: reconstruct_to_writer
|
// Test 1: reconstruct_to_writer
|
||||||
let vec_result = reconstruct_to_vec(client, file_contents.file_hash, Some(range), &config)
|
let vec_result = reconstruct_to_vec(client, file_contents.file_hash, Some(range), &config, None)
|
||||||
.await
|
.await
|
||||||
.expect("reconstruct_to_vec should succeed");
|
.expect("reconstruct_to_vec should succeed");
|
||||||
assert_eq!(vec_result, expected, "vec failed (vectored={use_vectored})");
|
assert_eq!(vec_result, expected, "vec failed (vectored={use_vectored})");
|
||||||
@@ -911,7 +916,11 @@ mod tests {
|
|||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn test_non_contiguous_chunks() {
|
async fn test_non_contiguous_chunks() {
|
||||||
let (client, file_contents) = setup_test_file(&[(1, (0, 2)), (1, (4, 6))]).await;
|
let (client, file_contents) = setup_test_file(&[(1, (0, 2)), (1, (4, 6))]).await;
|
||||||
reconstruct_and_verify_full(&client, &file_contents, test_config()).await;
|
let config = test_config();
|
||||||
|
let result = reconstruct_to_vec(&client, file_contents.file_hash, None, &config, None)
|
||||||
|
.await
|
||||||
|
.unwrap();
|
||||||
|
assert_eq!(result, file_contents.data);
|
||||||
}
|
}
|
||||||
|
|
||||||
// ==================== Default Config Tests ====================
|
// ==================== Default Config Tests ====================
|
||||||
@@ -1157,7 +1166,7 @@ mod tests {
|
|||||||
let mut config = test_config();
|
let mut config = test_config();
|
||||||
config.download_buffer_perfile_size = xet_runtime::utils::ByteSize::from("8kb");
|
config.download_buffer_perfile_size = xet_runtime::utils::ByteSize::from("8kb");
|
||||||
|
|
||||||
let reconstructed = reconstruct_to_vec(&client, file_contents.file_hash, None, &config)
|
let reconstructed = reconstruct_to_vec(&client, file_contents.file_hash, None, &config, None)
|
||||||
.await
|
.await
|
||||||
.unwrap();
|
.unwrap();
|
||||||
assert_eq!(reconstructed, file_contents.data);
|
assert_eq!(reconstructed, file_contents.data);
|
||||||
@@ -1287,6 +1296,348 @@ mod tests {
|
|||||||
assert_eq!(&result[start as usize..end as usize], &file_contents.data[start as usize..end as usize]);
|
assert_eq!(&result[start as usize..end as usize], &file_contents.data[start as usize..end as usize]);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// ==================== V1 Fallback Tests ====================
|
||||||
|
//
|
||||||
|
// These tests use LocalTestServer with V2 disabled to verify that
|
||||||
|
// reconstruction works correctly when the client falls back from V2 to V1.
|
||||||
|
|
||||||
|
/// Helper to reconstruct through a LocalTestServer (RemoteClient HTTP path).
|
||||||
|
async fn reconstruct_via_server(
|
||||||
|
server: &xet_client::cas_client::LocalTestServer,
|
||||||
|
file_hash: MerkleHash,
|
||||||
|
byte_range: Option<FileRange>,
|
||||||
|
config: &ReconstructionConfig,
|
||||||
|
) -> Result<Vec<u8>> {
|
||||||
|
let buffer = Arc::new(std::sync::Mutex::new(Cursor::new(Vec::new())));
|
||||||
|
let writer = StaticCursorWriter(buffer.clone());
|
||||||
|
|
||||||
|
let client: Arc<dyn Client> = server.remote_client().clone();
|
||||||
|
let mut reconstructor = FileReconstructor::new(&client, file_hash).with_config(config);
|
||||||
|
|
||||||
|
if let Some(range) = byte_range {
|
||||||
|
reconstructor = reconstructor.with_byte_range(range);
|
||||||
|
}
|
||||||
|
|
||||||
|
reconstructor.reconstruct_to_writer(writer).await?;
|
||||||
|
|
||||||
|
let data = buffer.lock().unwrap().get_ref().clone();
|
||||||
|
Ok(data)
|
||||||
|
}
|
||||||
|
|
||||||
|
#[tokio::test]
|
||||||
|
async fn test_v1_fallback_full_reconstruction() {
|
||||||
|
let server = xet_client::cas_client::LocalTestServerBuilder::new().start().await;
|
||||||
|
let file_contents = server
|
||||||
|
.remote_client()
|
||||||
|
.upload_random_file(&[(1, (0, 3)), (2, (0, 2))], TEST_CHUNK_SIZE)
|
||||||
|
.await
|
||||||
|
.unwrap();
|
||||||
|
|
||||||
|
// Disable V2 so the remote client falls back to V1 + conversion.
|
||||||
|
server.disable_v2_reconstruction(404);
|
||||||
|
|
||||||
|
let config = test_config();
|
||||||
|
let result = reconstruct_via_server(&server, file_contents.file_hash, None, &config)
|
||||||
|
.await
|
||||||
|
.unwrap();
|
||||||
|
assert_eq!(result, file_contents.data.as_ref());
|
||||||
|
}
|
||||||
|
|
||||||
|
#[tokio::test]
|
||||||
|
async fn test_v1_fallback_partial_range() {
|
||||||
|
let server = xet_client::cas_client::LocalTestServerBuilder::new().start().await;
|
||||||
|
let file_contents = server
|
||||||
|
.remote_client()
|
||||||
|
.upload_random_file(&[(1, (0, 5)), (2, (0, 3))], TEST_CHUNK_SIZE)
|
||||||
|
.await
|
||||||
|
.unwrap();
|
||||||
|
|
||||||
|
server.disable_v2_reconstruction(404);
|
||||||
|
|
||||||
|
let file_len = file_contents.data.len() as u64;
|
||||||
|
let range = FileRange::new(file_len / 4, file_len * 3 / 4);
|
||||||
|
|
||||||
|
let config = test_config();
|
||||||
|
let result = reconstruct_via_server(&server, file_contents.file_hash, Some(range), &config)
|
||||||
|
.await
|
||||||
|
.unwrap();
|
||||||
|
assert_eq!(result, &file_contents.data[range.start as usize..range.end as usize]);
|
||||||
|
}
|
||||||
|
|
||||||
|
#[tokio::test]
|
||||||
|
async fn test_v1_fallback_non_contiguous_chunks() {
|
||||||
|
let server = xet_client::cas_client::LocalTestServerBuilder::new().start().await;
|
||||||
|
let file_contents = server
|
||||||
|
.remote_client()
|
||||||
|
.upload_random_file(&[(1, (0, 2)), (1, (4, 6))], TEST_CHUNK_SIZE)
|
||||||
|
.await
|
||||||
|
.unwrap();
|
||||||
|
|
||||||
|
server.disable_v2_reconstruction(404);
|
||||||
|
|
||||||
|
let config = test_config();
|
||||||
|
let result = reconstruct_via_server(&server, file_contents.file_hash, None, &config)
|
||||||
|
.await
|
||||||
|
.unwrap();
|
||||||
|
assert_eq!(result, file_contents.data.as_ref());
|
||||||
|
}
|
||||||
|
|
||||||
|
#[tokio::test]
|
||||||
|
async fn test_v1_fallback_multiple_xorbs() {
|
||||||
|
let server = xet_client::cas_client::LocalTestServerBuilder::new().start().await;
|
||||||
|
let file_contents = server
|
||||||
|
.remote_client()
|
||||||
|
.upload_random_file(&[(1, (0, 2)), (2, (0, 3)), (3, (0, 2)), (1, (2, 4))], TEST_CHUNK_SIZE)
|
||||||
|
.await
|
||||||
|
.unwrap();
|
||||||
|
|
||||||
|
server.disable_v2_reconstruction(404);
|
||||||
|
|
||||||
|
let config = test_config();
|
||||||
|
let result = reconstruct_via_server(&server, file_contents.file_hash, None, &config)
|
||||||
|
.await
|
||||||
|
.unwrap();
|
||||||
|
assert_eq!(result, file_contents.data.as_ref());
|
||||||
|
}
|
||||||
|
|
||||||
|
/// V1 fallback with three disjoint ranges from the same xorb.
|
||||||
|
#[tokio::test]
|
||||||
|
async fn test_v1_fallback_triple_disjoint_ranges() {
|
||||||
|
let server = xet_client::cas_client::LocalTestServerBuilder::new().start().await;
|
||||||
|
let file_contents = server
|
||||||
|
.remote_client()
|
||||||
|
.upload_random_file(&[(1, (0, 2)), (1, (4, 6)), (1, (8, 10))], TEST_CHUNK_SIZE)
|
||||||
|
.await
|
||||||
|
.unwrap();
|
||||||
|
|
||||||
|
server.disable_v2_reconstruction(404);
|
||||||
|
|
||||||
|
let config = test_config();
|
||||||
|
let result = reconstruct_via_server(&server, file_contents.file_hash, None, &config)
|
||||||
|
.await
|
||||||
|
.unwrap();
|
||||||
|
assert_eq!(result, file_contents.data.as_ref());
|
||||||
|
}
|
||||||
|
|
||||||
|
// ==================== Max Ranges Tests ====================
|
||||||
|
//
|
||||||
|
// These tests use LocalTestServer with max_ranges_per_fetch=2 to verify that
|
||||||
|
// multi-range fetch splitting works correctly through the full HTTP path.
|
||||||
|
|
||||||
|
/// Helper to set up a server with max_ranges_per_fetch and reconstruct.
|
||||||
|
async fn reconstruct_via_server_with_max_ranges(
|
||||||
|
term_spec: &[(u64, (u64, u64))],
|
||||||
|
max_ranges: usize,
|
||||||
|
byte_range: Option<FileRange>,
|
||||||
|
) -> (Vec<u8>, RandomFileContents) {
|
||||||
|
let server = xet_client::cas_client::LocalTestServerBuilder::new().start().await;
|
||||||
|
let file_contents = server
|
||||||
|
.remote_client()
|
||||||
|
.upload_random_file(term_spec, TEST_CHUNK_SIZE)
|
||||||
|
.await
|
||||||
|
.unwrap();
|
||||||
|
|
||||||
|
server.set_max_ranges_per_fetch(max_ranges);
|
||||||
|
|
||||||
|
let config = test_config();
|
||||||
|
let result = reconstruct_via_server(&server, file_contents.file_hash, byte_range, &config)
|
||||||
|
.await
|
||||||
|
.unwrap();
|
||||||
|
(result, file_contents)
|
||||||
|
}
|
||||||
|
|
||||||
|
#[tokio::test]
|
||||||
|
async fn test_max_ranges_simple() {
|
||||||
|
let (result, file_contents) =
|
||||||
|
reconstruct_via_server_with_max_ranges(&[(1, (0, 3)), (2, (0, 2))], 2, None).await;
|
||||||
|
assert_eq!(result, file_contents.data.as_ref());
|
||||||
|
}
|
||||||
|
|
||||||
|
/// A single xorb with two disjoint ranges, split at max_ranges=1.
|
||||||
|
/// Each range becomes its own fetch entry.
|
||||||
|
#[tokio::test]
|
||||||
|
async fn test_max_ranges_1_disjoint() {
|
||||||
|
let (result, file_contents) =
|
||||||
|
reconstruct_via_server_with_max_ranges(&[(1, (0, 2)), (1, (4, 6))], 1, None).await;
|
||||||
|
assert_eq!(result, file_contents.data.as_ref());
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Three disjoint ranges from the same xorb with max_ranges=2.
|
||||||
|
/// First two ranges are grouped, third gets its own fetch entry.
|
||||||
|
#[tokio::test]
|
||||||
|
async fn test_max_ranges_2_triple_disjoint() {
|
||||||
|
let (result, file_contents) =
|
||||||
|
reconstruct_via_server_with_max_ranges(&[(1, (0, 2)), (1, (4, 6)), (1, (8, 10))], 2, None).await;
|
||||||
|
assert_eq!(result, file_contents.data.as_ref());
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Multiple xorbs, each with disjoint ranges, with max_ranges=2.
|
||||||
|
/// Tests that splitting is applied per-xorb correctly.
|
||||||
|
#[tokio::test]
|
||||||
|
async fn test_max_ranges_2_multi_xorb_disjoint() {
|
||||||
|
let term_spec = &[
|
||||||
|
(1, (0, 2)),
|
||||||
|
(2, (0, 2)),
|
||||||
|
(1, (4, 6)),
|
||||||
|
(2, (4, 6)),
|
||||||
|
(1, (8, 10)),
|
||||||
|
(2, (8, 10)),
|
||||||
|
];
|
||||||
|
let (result, file_contents) = reconstruct_via_server_with_max_ranges(term_spec, 2, None).await;
|
||||||
|
assert_eq!(result, file_contents.data.as_ref());
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Complex interleaved pattern with max_ranges=2 and a partial byte range.
|
||||||
|
#[tokio::test]
|
||||||
|
async fn test_max_ranges_2_partial_range() {
|
||||||
|
let term_spec = &[
|
||||||
|
(1, (0, 3)),
|
||||||
|
(2, (0, 2)),
|
||||||
|
(1, (3, 5)),
|
||||||
|
(3, (1, 4)),
|
||||||
|
(2, (4, 6)),
|
||||||
|
(1, (0, 2)),
|
||||||
|
];
|
||||||
|
let server = xet_client::cas_client::LocalTestServerBuilder::new().start().await;
|
||||||
|
let file_contents = server
|
||||||
|
.remote_client()
|
||||||
|
.upload_random_file(term_spec, TEST_CHUNK_SIZE)
|
||||||
|
.await
|
||||||
|
.unwrap();
|
||||||
|
|
||||||
|
server.set_max_ranges_per_fetch(2);
|
||||||
|
|
||||||
|
let file_len = file_contents.data.len() as u64;
|
||||||
|
let range = FileRange::new(file_len / 4, file_len * 3 / 4);
|
||||||
|
|
||||||
|
let config = test_config();
|
||||||
|
let result = reconstruct_via_server(&server, file_contents.file_hash, Some(range), &config)
|
||||||
|
.await
|
||||||
|
.unwrap();
|
||||||
|
assert_eq!(result, &file_contents.data[range.start as usize..range.end as usize]);
|
||||||
|
}
|
||||||
|
|
||||||
|
// ==================== Multi-Disjoint Range Tests (LocalClient) ====================
|
||||||
|
//
|
||||||
|
// These tests exercise complex disjoint range patterns through the LocalClient path
|
||||||
|
// (no HTTP server), ensuring the reconstruction logic handles V2 multi-range
|
||||||
|
// XorbBlocks correctly.
|
||||||
|
|
||||||
|
/// Single xorb with three disjoint chunk ranges.
|
||||||
|
#[tokio::test]
|
||||||
|
async fn test_triple_disjoint_ranges_full() {
|
||||||
|
let (client, file_contents) = setup_test_file(&[(1, (0, 2)), (1, (4, 6)), (1, (8, 10))]).await;
|
||||||
|
reconstruct_and_verify_full(&client, &file_contents, test_config()).await;
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Single xorb with three disjoint chunk ranges, partial byte range.
|
||||||
|
#[tokio::test]
|
||||||
|
async fn test_triple_disjoint_ranges_partial() {
|
||||||
|
let (client, file_contents) = setup_test_file(&[(1, (0, 2)), (1, (4, 6)), (1, (8, 10))]).await;
|
||||||
|
let file_len = file_contents.data.len() as u64;
|
||||||
|
let range = FileRange::new(file_len / 4, file_len * 3 / 4);
|
||||||
|
reconstruct_and_verify_range(&client, &file_contents, range, test_config()).await;
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Multiple xorbs, each with multiple disjoint ranges, interleaved.
|
||||||
|
#[tokio::test]
|
||||||
|
async fn test_multi_xorb_interleaved_disjoint() {
|
||||||
|
let term_spec = &[
|
||||||
|
(1, (0, 2)),
|
||||||
|
(2, (0, 2)),
|
||||||
|
(1, (4, 6)),
|
||||||
|
(2, (4, 6)),
|
||||||
|
(1, (8, 10)),
|
||||||
|
(2, (8, 10)),
|
||||||
|
];
|
||||||
|
let (client, file_contents) = setup_test_file(term_spec).await;
|
||||||
|
reconstruct_and_verify_full(&client, &file_contents, test_config()).await;
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Multiple xorbs with interleaved disjoint ranges, partial byte range.
|
||||||
|
#[tokio::test]
|
||||||
|
async fn test_multi_xorb_interleaved_disjoint_partial() {
|
||||||
|
let term_spec = &[
|
||||||
|
(1, (0, 2)),
|
||||||
|
(2, (0, 2)),
|
||||||
|
(1, (4, 6)),
|
||||||
|
(2, (4, 6)),
|
||||||
|
(1, (8, 10)),
|
||||||
|
(2, (8, 10)),
|
||||||
|
];
|
||||||
|
let (client, file_contents) = setup_test_file(term_spec).await;
|
||||||
|
let file_len = file_contents.data.len() as u64;
|
||||||
|
let range = FileRange::new(file_len / 3, file_len * 2 / 3);
|
||||||
|
reconstruct_and_verify_range(&client, &file_contents, range, test_config()).await;
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Single xorb with four disjoint ranges (many gaps).
|
||||||
|
#[tokio::test]
|
||||||
|
async fn test_four_disjoint_ranges() {
|
||||||
|
let term_spec = &[(1, (0, 2)), (1, (4, 6)), (1, (8, 10)), (1, (12, 14))];
|
||||||
|
let (client, file_contents) = setup_test_file(term_spec).await;
|
||||||
|
reconstruct_and_verify_full(&client, &file_contents, test_config()).await;
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Mix of contiguous and disjoint ranges from the same xorb.
|
||||||
|
#[tokio::test]
|
||||||
|
async fn test_mixed_contiguous_and_disjoint() {
|
||||||
|
let term_spec = &[
|
||||||
|
(1, (0, 3)), // contiguous block
|
||||||
|
(1, (3, 5)), // continues contiguously
|
||||||
|
(1, (8, 10)), // gap, then disjoint
|
||||||
|
];
|
||||||
|
let (client, file_contents) = setup_test_file(term_spec).await;
|
||||||
|
reconstruct_and_verify_full(&client, &file_contents, test_config()).await;
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Disjoint ranges across three xorbs with a complex access pattern.
|
||||||
|
#[tokio::test]
|
||||||
|
async fn test_complex_three_xorb_disjoint() {
|
||||||
|
let term_spec = &[
|
||||||
|
(1, (0, 2)),
|
||||||
|
(2, (0, 3)),
|
||||||
|
(3, (2, 5)),
|
||||||
|
(1, (5, 8)),
|
||||||
|
(2, (6, 8)),
|
||||||
|
(3, (0, 2)),
|
||||||
|
];
|
||||||
|
let (client, file_contents) = setup_test_file(term_spec).await;
|
||||||
|
reconstruct_and_verify_full(&client, &file_contents, test_config()).await;
|
||||||
|
}
|
||||||
|
|
||||||
|
/// LocalClient with max_ranges_per_fetch=2 (tests V2 response splitting without HTTP).
|
||||||
|
#[tokio::test]
|
||||||
|
async fn test_local_client_max_ranges_2_disjoint() {
|
||||||
|
let client = LocalClient::temporary().await.unwrap();
|
||||||
|
client.set_max_ranges_per_fetch(2);
|
||||||
|
|
||||||
|
let term_spec = &[(1, (0, 2)), (1, (4, 6)), (1, (8, 10)), (1, (12, 14))];
|
||||||
|
let file_contents = client.upload_random_file(term_spec, TEST_CHUNK_SIZE).await.unwrap();
|
||||||
|
|
||||||
|
let config = test_config();
|
||||||
|
let result = reconstruct_to_vec(&client, file_contents.file_hash, None, &config, None)
|
||||||
|
.await
|
||||||
|
.unwrap();
|
||||||
|
assert_eq!(result, file_contents.data.as_ref());
|
||||||
|
}
|
||||||
|
|
||||||
|
/// LocalClient with max_ranges_per_fetch=1 (every range gets its own fetch entry).
|
||||||
|
#[tokio::test]
|
||||||
|
async fn test_local_client_max_ranges_1_multi_xorb() {
|
||||||
|
let client = LocalClient::temporary().await.unwrap();
|
||||||
|
client.set_max_ranges_per_fetch(1);
|
||||||
|
|
||||||
|
let term_spec = &[(1, (0, 2)), (2, (0, 2)), (1, (4, 6)), (2, (4, 6))];
|
||||||
|
let file_contents = client.upload_random_file(term_spec, TEST_CHUNK_SIZE).await.unwrap();
|
||||||
|
|
||||||
|
let config = test_config();
|
||||||
|
let result = reconstruct_to_vec(&client, file_contents.file_hash, None, &config, None)
|
||||||
|
.await
|
||||||
|
.unwrap();
|
||||||
|
assert_eq!(result, file_contents.data.as_ref());
|
||||||
|
}
|
||||||
|
|
||||||
// ==================== Cancellation Flag Tests ====================
|
// ==================== Cancellation Flag Tests ====================
|
||||||
|
|
||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
@@ -1385,4 +1736,132 @@ mod tests {
|
|||||||
assert_eq!(bytes_written, file_contents.data.len() as u64);
|
assert_eq!(bytes_written, file_contents.data.len() as u64);
|
||||||
assert_eq!(buffer.lock().unwrap().get_ref().clone(), file_contents.data);
|
assert_eq!(buffer.lock().unwrap().get_ref().clone(), file_contents.data);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// ==================== Multirange Fetching Tests ====================
|
||||||
|
//
|
||||||
|
// These tests verify that reconstruction works correctly with both values
|
||||||
|
// of `enable_multirange_fetching`. When true, V2 multi-range fetch entries
|
||||||
|
// are used as-is (multirange HTTP requests). When false (default), each
|
||||||
|
// range is split into its own XorbBlock and fetched via a separate
|
||||||
|
// single-range request in parallel.
|
||||||
|
//
|
||||||
|
// Uses XetRuntime::new_with_config() to override the config per-test,
|
||||||
|
// following the pattern from test_dynamic_buffer_scaling_noop_increment_preserves_total_permits.
|
||||||
|
|
||||||
|
fn with_multirange_config(enable: bool) -> Arc<XetRuntime> {
|
||||||
|
let mut config = xet_runtime::config::XetConfig::new();
|
||||||
|
config.client.enable_multirange_fetching = enable;
|
||||||
|
XetRuntime::new_with_config(config).unwrap()
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Exercises multiple disjoint-range scenarios through LocalClient with both
|
||||||
|
/// enable_multirange_fetching=true and =false.
|
||||||
|
#[test]
|
||||||
|
fn test_multirange_local_client() {
|
||||||
|
for enable in [false, true] {
|
||||||
|
let rt = with_multirange_config(enable);
|
||||||
|
rt.external_run_async_task(async move {
|
||||||
|
let scenarios: Vec<Vec<(u64, (u64, u64))>> = vec![
|
||||||
|
vec![(1, (0, 2)), (1, (4, 6)), (1, (8, 10))],
|
||||||
|
vec![
|
||||||
|
(1, (0, 2)),
|
||||||
|
(2, (0, 2)),
|
||||||
|
(1, (4, 6)),
|
||||||
|
(2, (4, 6)),
|
||||||
|
(1, (8, 10)),
|
||||||
|
(2, (8, 10)),
|
||||||
|
],
|
||||||
|
vec![
|
||||||
|
(1, (0, 2)),
|
||||||
|
(2, (0, 3)),
|
||||||
|
(3, (2, 5)),
|
||||||
|
(1, (5, 8)),
|
||||||
|
(2, (6, 8)),
|
||||||
|
(3, (0, 2)),
|
||||||
|
],
|
||||||
|
];
|
||||||
|
let config = test_config();
|
||||||
|
for term_spec in &scenarios {
|
||||||
|
let (client, fc) = setup_test_file(term_spec).await;
|
||||||
|
reconstruct_and_verify_full(&client, &fc, config.clone()).await;
|
||||||
|
|
||||||
|
let file_len = fc.data.len() as u64;
|
||||||
|
let range = FileRange::new(file_len / 4, file_len * 3 / 4);
|
||||||
|
reconstruct_and_verify_range(&client, &fc, range, config.clone()).await;
|
||||||
|
}
|
||||||
|
})
|
||||||
|
.unwrap();
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/// LocalClient with max_ranges_per_fetch constraint, both enable settings.
|
||||||
|
#[test]
|
||||||
|
fn test_multirange_max_ranges() {
|
||||||
|
for enable in [false, true] {
|
||||||
|
let rt = with_multirange_config(enable);
|
||||||
|
rt.external_run_async_task(async {
|
||||||
|
let client = LocalClient::temporary().await.unwrap();
|
||||||
|
client.set_max_ranges_per_fetch(2);
|
||||||
|
|
||||||
|
let term_spec = &[(1, (0, 2)), (1, (4, 6)), (1, (8, 10)), (1, (12, 14))];
|
||||||
|
let fc = client.upload_random_file(term_spec, TEST_CHUNK_SIZE).await.unwrap();
|
||||||
|
|
||||||
|
let config = test_config();
|
||||||
|
let result = reconstruct_to_vec(&client, fc.file_hash, None, &config, None).await.unwrap();
|
||||||
|
assert_eq!(result, fc.data.as_ref());
|
||||||
|
})
|
||||||
|
.unwrap();
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Exercises HTTP server path with full, max-ranges-split, and partial-range
|
||||||
|
/// reconstruction, both enable_multirange_fetching values.
|
||||||
|
#[test]
|
||||||
|
fn test_multirange_via_server() {
|
||||||
|
for enable in [false, true] {
|
||||||
|
let rt = with_multirange_config(enable);
|
||||||
|
rt.external_run_async_task(async {
|
||||||
|
let config = test_config();
|
||||||
|
|
||||||
|
// Full reconstruction with disjoint ranges
|
||||||
|
let server = xet_client::cas_client::LocalTestServerBuilder::new().start().await;
|
||||||
|
let fc = server
|
||||||
|
.remote_client()
|
||||||
|
.upload_random_file(&[(1, (0, 2)), (1, (4, 6)), (1, (8, 10))], TEST_CHUNK_SIZE)
|
||||||
|
.await
|
||||||
|
.unwrap();
|
||||||
|
let result = reconstruct_via_server(&server, fc.file_hash, None, &config).await.unwrap();
|
||||||
|
assert_eq!(result, fc.data.as_ref());
|
||||||
|
|
||||||
|
// Multi-xorb with max_ranges_per_fetch=2
|
||||||
|
let server = xet_client::cas_client::LocalTestServerBuilder::new().start().await;
|
||||||
|
let fc = server
|
||||||
|
.remote_client()
|
||||||
|
.upload_random_file(
|
||||||
|
&[(1, (0, 2)), (2, (0, 2)), (1, (4, 6)), (2, (4, 6)), (1, (8, 10))],
|
||||||
|
TEST_CHUNK_SIZE,
|
||||||
|
)
|
||||||
|
.await
|
||||||
|
.unwrap();
|
||||||
|
server.set_max_ranges_per_fetch(2);
|
||||||
|
let result = reconstruct_via_server(&server, fc.file_hash, None, &config).await.unwrap();
|
||||||
|
assert_eq!(result, fc.data.as_ref());
|
||||||
|
|
||||||
|
// Partial byte range
|
||||||
|
let server = xet_client::cas_client::LocalTestServerBuilder::new().start().await;
|
||||||
|
let fc = server
|
||||||
|
.remote_client()
|
||||||
|
.upload_random_file(&[(1, (0, 3)), (2, (0, 2)), (1, (3, 5)), (2, (4, 6))], TEST_CHUNK_SIZE)
|
||||||
|
.await
|
||||||
|
.unwrap();
|
||||||
|
let file_len = fc.data.len() as u64;
|
||||||
|
let range = FileRange::new(file_len / 4, file_len * 3 / 4);
|
||||||
|
let result = reconstruct_via_server(&server, fc.file_hash, Some(range), &config)
|
||||||
|
.await
|
||||||
|
.unwrap();
|
||||||
|
assert_eq!(result, &fc.data[range.start as usize..range.end as usize]);
|
||||||
|
})
|
||||||
|
.unwrap();
|
||||||
|
}
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -7,6 +7,7 @@ use tokio::sync::OnceCell;
|
|||||||
use xet_client::cas_client::Client;
|
use xet_client::cas_client::Client;
|
||||||
use xet_client::cas_types::{ChunkRange, FileRange, HttpRange};
|
use xet_client::cas_types::{ChunkRange, FileRange, HttpRange};
|
||||||
use xet_core_structures::merklehash::MerkleHash;
|
use xet_core_structures::merklehash::MerkleHash;
|
||||||
|
use xet_runtime::core::xet_config;
|
||||||
use xet_runtime::utils::UniqueId;
|
use xet_runtime::utils::UniqueId;
|
||||||
|
|
||||||
use super::super::FileReconstructionError;
|
use super::super::FileReconstructionError;
|
||||||
@@ -19,17 +20,28 @@ use crate::progress_tracking::download_tracking::DownloadTaskUpdater;
|
|||||||
/// in the output file that maps to a chunk range within a xorb block.
|
/// in the output file that maps to a chunk range within a xorb block.
|
||||||
#[derive(Clone)]
|
#[derive(Clone)]
|
||||||
pub struct FileTerm {
|
pub struct FileTerm {
|
||||||
|
// The byte range in the file of this term.
|
||||||
pub byte_range: FileRange,
|
pub byte_range: FileRange,
|
||||||
|
|
||||||
|
// Absolute chunk range within the full xorb. Doesn't account for only a partial xorb being downloaded.
|
||||||
pub xorb_chunk_range: ChunkRange,
|
pub xorb_chunk_range: ChunkRange,
|
||||||
|
|
||||||
|
// The index of the (chunk index, byte offset) pair in the xorb block that starts this file term.
|
||||||
|
pub xorb_block_start_index: usize,
|
||||||
|
|
||||||
|
// The byte offset into the first range of the xorb block should this term not start on a chunk boundary.
|
||||||
pub offset_into_first_range: u64,
|
pub offset_into_first_range: u64,
|
||||||
|
|
||||||
|
// The xorb block that sourced this file term.
|
||||||
pub xorb_block: Arc<XorbBlock>,
|
pub xorb_block: Arc<XorbBlock>,
|
||||||
|
|
||||||
|
// The retrieval URL information for this file term.
|
||||||
pub url_info: Arc<TermBlockRetrievalURLs>,
|
pub url_info: Arc<TermBlockRetrievalURLs>,
|
||||||
}
|
}
|
||||||
|
|
||||||
impl FileTerm {
|
impl FileTerm {
|
||||||
pub fn extract_bytes(&self, xorb_block_data: &XorbBlockData) -> Bytes {
|
pub fn extract_bytes(&self, xorb_block_data: &XorbBlockData) -> Bytes {
|
||||||
let local_start_chunk = (self.xorb_chunk_range.start - self.xorb_block.chunk_range.start) as usize;
|
let (_, start_byte_offset) = xorb_block_data.chunk_offsets[self.xorb_block_start_index];
|
||||||
let start_byte_offset = xorb_block_data.chunk_offsets[local_start_chunk];
|
|
||||||
let start_byte_offset = start_byte_offset + self.offset_into_first_range as usize;
|
let start_byte_offset = start_byte_offset + self.offset_into_first_range as usize;
|
||||||
let expected_size = (self.byte_range.end - self.byte_range.start) as usize;
|
let expected_size = (self.byte_range.end - self.byte_range.start) as usize;
|
||||||
let end_byte_offset = start_byte_offset + expected_size;
|
let end_byte_offset = start_byte_offset + expected_size;
|
||||||
@@ -67,6 +79,25 @@ impl FileTerm {
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/// Intermediate data for a single file term, collected during the first pass of
|
||||||
|
/// `retrieve_file_term_block` before the final `FileTerm` structs are built.
|
||||||
|
///
|
||||||
|
/// We need this because `FileTerm` requires `Arc<XorbBlock>` and `Arc<TermBlockRetrievalURLs>`,
|
||||||
|
/// which can't be constructed until all terms have been processed.
|
||||||
|
struct FileTermEntry {
|
||||||
|
/// The byte range in the output file that this term covers.
|
||||||
|
byte_range: FileRange,
|
||||||
|
/// The chunk range within the xorb that sources this term's data.
|
||||||
|
xorb_chunk_range: ChunkRange,
|
||||||
|
/// Byte offset into the first chunk's data, non-zero only for the first term
|
||||||
|
/// when the query range starts mid-chunk.
|
||||||
|
offset_into_first_range: u64,
|
||||||
|
/// Index into the `xorb_blocks` / `xorb_block_retrieval_urls` vectors.
|
||||||
|
xorb_block_index: usize,
|
||||||
|
/// Flattened index into the xorb block's `chunk_offsets` for this term's start chunk.
|
||||||
|
xorb_block_start_index: usize,
|
||||||
|
}
|
||||||
|
|
||||||
/// Retrieve file terms from the client for a given file hash and byte range.
|
/// Retrieve file terms from the client for a given file hash and byte range.
|
||||||
/// Returns None if the requested byte range is past the end of the file.
|
/// Returns None if the requested byte range is past the end of the file.
|
||||||
/// Returns the actual retrieved range and the number of bytes required for the
|
/// Returns the actual retrieved range and the number of bytes required for the
|
||||||
@@ -77,170 +108,233 @@ pub async fn retrieve_file_term_block(
|
|||||||
file_hash: MerkleHash,
|
file_hash: MerkleHash,
|
||||||
query_file_byte_range: FileRange,
|
query_file_byte_range: FileRange,
|
||||||
) -> Result<Option<(FileRange, u64, Vec<FileTerm>)>> {
|
) -> Result<Option<(FileRange, u64, Vec<FileTerm>)>> {
|
||||||
// First, get the raw reconstruction.
|
// get_reconstruction always returns V2 format (the client converts V1 internally).
|
||||||
let Some(raw_reconstruction) = client.get_reconstruction(&file_hash, Some(query_file_byte_range)).await? else {
|
let Some(raw_reconstruction) = client.get_reconstruction(&file_hash, Some(query_file_byte_range)).await? else {
|
||||||
// None means we've requested a byte range beyond the end of the file.
|
// None means we've requested a byte range beyond the end of the file.
|
||||||
return Ok(None);
|
return Ok(None);
|
||||||
};
|
};
|
||||||
|
|
||||||
// Set a new url acquisition id to ensure that we don't double up the url acquisitions.
|
// Each acquisition gets a unique ID used for single-flight URL refresh dedup.
|
||||||
let acquisition_id = UniqueId::new();
|
let acquisition_id = UniqueId::new();
|
||||||
|
|
||||||
// Intermediate storage for file term data before we create the actual FileTerm structs.
|
// First pass: iterate through the reconstruction terms and build up intermediate
|
||||||
// (byte_range, xorb_chunk_range, offset_into_first_range, index into xorb_blocks)
|
// FileTermEntry data, XorbBlock objects, and retrieval URL info. We can't construct
|
||||||
let mut file_term_data = Vec::<(FileRange, ChunkRange, u64, usize)>::with_capacity(raw_reconstruction.terms.len());
|
// the final FileTerm structs yet because they need Arc<XorbBlock> and Arc<TermBlockRetrievalURLs>,
|
||||||
|
// which require all terms to be processed first.
|
||||||
|
let mut file_term_data = Vec::<FileTermEntry>::with_capacity(raw_reconstruction.terms.len());
|
||||||
|
|
||||||
let n_xorb_terms = raw_reconstruction.fetch_info.values().map(|v| v.len()).sum();
|
// Parallel vectors indexed by xorb_block_index:
|
||||||
|
// - xorb_blocks: the block metadata (hash, chunk ranges, references)
|
||||||
|
// - xorb_block_retrieval_urls: the download URL and byte ranges for each block
|
||||||
|
let mut xorb_blocks: Vec<XorbBlock> = Vec::new();
|
||||||
|
let mut xorb_block_retrieval_urls = Vec::<(String, Vec<HttpRange>)>::new();
|
||||||
|
|
||||||
// Keep track of the xorb blocks we've created, keyed by (xorb_hash, first chunk index).
|
// Dedup map: (xorb_hash, first_range_chunk_start) -> xorb_block_index.
|
||||||
let mut xorb_blocks: Vec<XorbBlock> = Vec::with_capacity(n_xorb_terms);
|
// Multiple terms may reference the same xorb block; this ensures we create
|
||||||
|
// each block only once and share it across terms.
|
||||||
|
let mut xorb_index_lookup = HashMap::<(MerkleHash, u32), usize>::new();
|
||||||
|
|
||||||
// Keep track of the URLs for each.
|
// Track the current byte offset in the output file as we process terms sequentially.
|
||||||
let mut xorb_block_retrieval_urls = Vec::<(String, HttpRange)>::with_capacity(n_xorb_terms);
|
|
||||||
|
|
||||||
// Get a hash map so we can reindex the xorb terms; map of (xorb_hash, first chunk index) -> xorb block index.
|
|
||||||
let mut xorb_index_lookup = HashMap::<(MerkleHash, u64), usize>::with_capacity(n_xorb_terms);
|
|
||||||
|
|
||||||
// Keep track of where we are so as to map the file terms to the byte range within the file.
|
|
||||||
let mut cur_file_byte_offset = query_file_byte_range.start;
|
let mut cur_file_byte_offset = query_file_byte_range.start;
|
||||||
|
|
||||||
// We'll create the URL info after processing all terms, once we know the actual range.
|
let enable_multirange = xet_config().client.enable_multirange_fetching;
|
||||||
|
|
||||||
// Iterate over the terms and build the file terms and xorb terms.
|
|
||||||
for (local_term_index, term) in raw_reconstruction.terms.iter().enumerate() {
|
for (local_term_index, term) in raw_reconstruction.terms.iter().enumerate() {
|
||||||
let xorb_hash: MerkleHash = term.hash.into();
|
let xorb_hash: MerkleHash = term.hash.into();
|
||||||
|
|
||||||
// Get the xorb info here.
|
let Some(xorb_descriptor) = raw_reconstruction.xorbs.get(&term.hash) else {
|
||||||
let Some(xorb_info) = raw_reconstruction.fetch_info.get(&term.hash) else {
|
|
||||||
return Err(FileReconstructionError::CorruptedReconstruction(format!(
|
return Err(FileReconstructionError::CorruptedReconstruction(format!(
|
||||||
"Xorb info not found for xorb hash {xorb_hash:?}"
|
"Xorb info not found for xorb hash {xorb_hash:?}"
|
||||||
)));
|
)));
|
||||||
};
|
};
|
||||||
|
|
||||||
// Get the xorb block index that this term belongs to.
|
// Find the XorbBlock for this term's chunk range. The behavior depends on the
|
||||||
|
// enable_multirange_fetching config:
|
||||||
|
//
|
||||||
|
// - When true: one XorbBlock per XorbMultiRangeFetch entry, preserving all ranges in a single block
|
||||||
|
// (multi-range HTTP request).
|
||||||
|
// - When false (default): one XorbBlock per individual XorbRangeDescriptor, so each range is fetched as a
|
||||||
|
// separate single-range HTTP request in parallel.
|
||||||
let xorb_block_index = 'find_xorb_block: {
|
let xorb_block_index = 'find_xorb_block: {
|
||||||
for raw_xorb_block_info in xorb_info.iter() {
|
for fetch_entry in xorb_descriptor.iter() {
|
||||||
let chunk_range = raw_xorb_block_info.range;
|
if enable_multirange {
|
||||||
|
let term_contained = fetch_entry
|
||||||
|
.ranges
|
||||||
|
.iter()
|
||||||
|
.any(|r| r.chunks.start <= term.range.start && term.range.end <= r.chunks.end);
|
||||||
|
|
||||||
if chunk_range.start <= term.range.start && term.range.start <= chunk_range.end {
|
if !term_contained {
|
||||||
// Verify that the term range is contained within the xorb block.
|
continue;
|
||||||
if term.range.end > chunk_range.end {
|
|
||||||
return Err(FileReconstructionError::CorruptedReconstruction(format!(
|
|
||||||
"Term range extends beyond xorb block range for xorb hash {xorb_hash:?}"
|
|
||||||
)));
|
|
||||||
}
|
}
|
||||||
|
|
||||||
// Reuse the previous one if it exists, otherwise insert a new one.
|
let first_chunk_start = fetch_entry.ranges[0].chunks.start;
|
||||||
let index = match xorb_index_lookup.entry((xorb_hash, chunk_range.start as u64)) {
|
|
||||||
|
let index = match xorb_index_lookup.entry((xorb_hash, first_chunk_start)) {
|
||||||
Entry::Occupied(entry) => *entry.get(),
|
Entry::Occupied(entry) => *entry.get(),
|
||||||
Entry::Vacant(entry) => {
|
Entry::Vacant(entry) => {
|
||||||
let new_index = xorb_blocks.len();
|
let new_index = xorb_blocks.len();
|
||||||
|
|
||||||
|
let chunk_ranges: Vec<ChunkRange> = fetch_entry.ranges.iter().map(|r| r.chunks).collect();
|
||||||
|
let http_ranges: Vec<HttpRange> = fetch_entry.ranges.iter().map(|r| r.bytes).collect();
|
||||||
|
|
||||||
xorb_blocks.push(XorbBlock {
|
xorb_blocks.push(XorbBlock {
|
||||||
xorb_hash,
|
xorb_hash,
|
||||||
chunk_range,
|
chunk_ranges,
|
||||||
xorb_block_index: new_index,
|
xorb_block_index: new_index,
|
||||||
references: vec![],
|
references: vec![],
|
||||||
uncompressed_size_if_known: None,
|
uncompressed_size_if_known: None,
|
||||||
data: OnceCell::new(),
|
data: OnceCell::new(),
|
||||||
});
|
});
|
||||||
|
|
||||||
// Store the retrieval URL and range for this xorb block.
|
xorb_block_retrieval_urls.push((fetch_entry.url.clone(), http_ranges));
|
||||||
xorb_block_retrieval_urls
|
|
||||||
.push((raw_xorb_block_info.url.clone(), raw_xorb_block_info.url_range));
|
|
||||||
|
|
||||||
// Store the index.
|
|
||||||
entry.insert(new_index);
|
entry.insert(new_index);
|
||||||
new_index
|
new_index
|
||||||
},
|
},
|
||||||
};
|
};
|
||||||
|
|
||||||
break 'find_xorb_block index;
|
break 'find_xorb_block index;
|
||||||
|
} else {
|
||||||
|
for range in &fetch_entry.ranges {
|
||||||
|
if range.chunks.start <= term.range.start && term.range.end <= range.chunks.end {
|
||||||
|
let index = match xorb_index_lookup.entry((xorb_hash, range.chunks.start)) {
|
||||||
|
Entry::Occupied(entry) => *entry.get(),
|
||||||
|
Entry::Vacant(entry) => {
|
||||||
|
let new_index = xorb_blocks.len();
|
||||||
|
|
||||||
|
xorb_blocks.push(XorbBlock {
|
||||||
|
xorb_hash,
|
||||||
|
chunk_ranges: vec![range.chunks],
|
||||||
|
xorb_block_index: new_index,
|
||||||
|
references: vec![],
|
||||||
|
uncompressed_size_if_known: None,
|
||||||
|
data: OnceCell::new(),
|
||||||
|
});
|
||||||
|
|
||||||
|
xorb_block_retrieval_urls.push((fetch_entry.url.clone(), vec![range.bytes]));
|
||||||
|
|
||||||
|
entry.insert(new_index);
|
||||||
|
new_index
|
||||||
|
},
|
||||||
|
};
|
||||||
|
|
||||||
|
break 'find_xorb_block index;
|
||||||
|
}
|
||||||
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
return Err(FileReconstructionError::CorruptedReconstruction(format!(
|
return Err(FileReconstructionError::CorruptedReconstruction(format!(
|
||||||
"No xorb chunk range found for file term {local_term_index:?} in xorb info for xorb hash {xorb_hash:?}"
|
"No xorb fetch entry found for file term {local_term_index:?} in xorb info for xorb hash {xorb_hash:?}"
|
||||||
)));
|
)));
|
||||||
};
|
};
|
||||||
|
|
||||||
// Do we need to adjust for an offset into the first range?
|
// Only the first term can have a non-zero offset into its first chunk,
|
||||||
let offset_into_first_range = {
|
// which happens when the query byte range starts mid-chunk.
|
||||||
if local_term_index == 0 {
|
let offset_into_first_range = if local_term_index == 0 {
|
||||||
raw_reconstruction.offset_into_first_range
|
raw_reconstruction.offset_into_first_range
|
||||||
} else {
|
} else {
|
||||||
0
|
0
|
||||||
}
|
|
||||||
};
|
};
|
||||||
|
|
||||||
// The effective size of this term in the file.
|
// The term's contribution to the output file is its full uncompressed size
|
||||||
|
// minus any offset into the first chunk.
|
||||||
let term_byte_size = term.unpacked_length as u64 - offset_into_first_range;
|
let term_byte_size = term.unpacked_length as u64 - offset_into_first_range;
|
||||||
|
|
||||||
// Update the references term on the XorbBlock to track where the xorb gets used.
|
// Record this term as a reference on its xorb block (used later to determine
|
||||||
|
// whether the block's total uncompressed size can be inferred).
|
||||||
xorb_blocks[xorb_block_index].references.push(XorbReference {
|
xorb_blocks[xorb_block_index].references.push(XorbReference {
|
||||||
term_chunks: term.range,
|
term_chunks: term.range,
|
||||||
uncompressed_size: term.unpacked_length as usize,
|
uncompressed_size: term.unpacked_length as usize,
|
||||||
});
|
});
|
||||||
|
|
||||||
// Store the file term data (byte_range, xorb_chunk_range, offset_into_first_range, xorb_block_index).
|
// Compute the flattened index into the block's chunk_offsets for this term's
|
||||||
// We'll create the FileTerm structs after we know the actual range.
|
// starting chunk. This accounts for disjoint chunk ranges in multi-range blocks.
|
||||||
file_term_data.push((
|
//
|
||||||
FileRange::new(cur_file_byte_offset, cur_file_byte_offset + term_byte_size),
|
// The term_contained check above guarantees term.range.start falls within one of
|
||||||
term.range,
|
// the block's chunk_ranges, so this loop always finds a match.
|
||||||
|
let xorb_block_start_index = {
|
||||||
|
let chunk_start = term.range.start;
|
||||||
|
let chunk_ranges = &xorb_blocks[xorb_block_index].chunk_ranges;
|
||||||
|
let mut idx = 0;
|
||||||
|
let mut found = false;
|
||||||
|
for range in chunk_ranges {
|
||||||
|
if chunk_start >= range.start && chunk_start < range.end {
|
||||||
|
idx += (chunk_start - range.start) as usize;
|
||||||
|
found = true;
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
idx += (range.end - range.start) as usize;
|
||||||
|
}
|
||||||
|
if !found {
|
||||||
|
return Err(FileReconstructionError::CorruptedReconstruction(format!(
|
||||||
|
"chunk_start {chunk_start} not found in chunk_ranges {chunk_ranges:?} for file term {local_term_index}"
|
||||||
|
)));
|
||||||
|
}
|
||||||
|
idx
|
||||||
|
};
|
||||||
|
|
||||||
|
file_term_data.push(FileTermEntry {
|
||||||
|
byte_range: FileRange::new(cur_file_byte_offset, cur_file_byte_offset + term_byte_size),
|
||||||
|
xorb_chunk_range: term.range,
|
||||||
offset_into_first_range,
|
offset_into_first_range,
|
||||||
xorb_block_index,
|
xorb_block_index,
|
||||||
));
|
xorb_block_start_index,
|
||||||
|
});
|
||||||
|
|
||||||
cur_file_byte_offset += term_byte_size;
|
cur_file_byte_offset += term_byte_size;
|
||||||
}
|
}
|
||||||
|
|
||||||
// Sort the block references so that we can easily scan the terms to figure out how many references
|
// Sort each block's references by chunk start so that determine_size_if_possible
|
||||||
// a particular chunk may have.
|
// can use its forward-chaining DP to check coverage.
|
||||||
for block in &mut xorb_blocks {
|
for block in &mut xorb_blocks {
|
||||||
block.references.sort_by_key(|r| r.term_chunks.start);
|
block.references.sort_by_key(|r| r.term_chunks.start);
|
||||||
block.uncompressed_size_if_known = XorbBlock::determine_size_if_possible(block.chunk_range, &block.references);
|
block.uncompressed_size_if_known =
|
||||||
|
XorbBlock::determine_size_if_possible(&block.chunk_ranges, &block.references);
|
||||||
}
|
}
|
||||||
|
|
||||||
// Now, it's possible that we have to shrink the byte range of the last term, as we may have retrieved more
|
// The last term in the reconstruction may extend beyond the requested range
|
||||||
// due to chunk offsets.
|
// (e.g. when the query ends mid-chunk). Trim it to the query boundary.
|
||||||
if cur_file_byte_offset > query_file_byte_range.end {
|
if cur_file_byte_offset > query_file_byte_range.end {
|
||||||
let last_term_shrinkage = cur_file_byte_offset - query_file_byte_range.end;
|
let last_term_shrinkage = cur_file_byte_offset - query_file_byte_range.end;
|
||||||
|
|
||||||
debug_assert!(!file_term_data.is_empty());
|
debug_assert!(!file_term_data.is_empty());
|
||||||
|
|
||||||
if let Some(fi) = file_term_data.last_mut() {
|
if let Some(entry) = file_term_data.last_mut() {
|
||||||
fi.0.end -= last_term_shrinkage;
|
entry.byte_range.end -= last_term_shrinkage;
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
// Calculate the actual retrieved range from the file terms.
|
// The actual range covered, which may be smaller than requested if the file
|
||||||
|
// ends before the requested range.
|
||||||
let actual_range = FileRange::new(
|
let actual_range = FileRange::new(
|
||||||
file_term_data.first().map(|(br, _, _, _)| br.start).unwrap_or(0),
|
file_term_data.first().map(|e| e.byte_range.start).unwrap_or(0),
|
||||||
file_term_data.last().map(|(br, _, _, _)| br.end).unwrap_or(0),
|
file_term_data.last().map(|e| e.byte_range.end).unwrap_or(0),
|
||||||
);
|
);
|
||||||
|
|
||||||
// Now, calculate the total number of bytes that needs to be downloaded given dedup and compression savings.
|
// Total compressed bytes that will be transferred across all xorb block downloads.
|
||||||
let total_transfer_bytes = xorb_block_retrieval_urls
|
let total_transfer_bytes: u64 = xorb_block_retrieval_urls
|
||||||
.iter()
|
.iter()
|
||||||
.map(|(_, http_range)| {
|
.flat_map(|(_, ranges)| ranges)
|
||||||
let file_range = FileRange::from(*http_range);
|
.map(|r| r.length())
|
||||||
file_range.end.saturating_sub(file_range.start)
|
|
||||||
})
|
|
||||||
.sum();
|
.sum();
|
||||||
|
|
||||||
// Now create the URL info with the actual range and retrieval URLs.
|
// Wrap the retrieval URLs in a shared struct so all file terms can share them
|
||||||
|
// and coordinate URL refreshes through a single lock.
|
||||||
let url_info =
|
let url_info =
|
||||||
Arc::new(TermBlockRetrievalURLs::new(file_hash, actual_range, acquisition_id, xorb_block_retrieval_urls));
|
Arc::new(TermBlockRetrievalURLs::new(file_hash, actual_range, acquisition_id, xorb_block_retrieval_urls));
|
||||||
|
|
||||||
// Convert xorb_blocks to Arc<XorbBlock> for use in FileTerms.
|
// Second pass: convert the intermediate FileTermEntry data into final FileTerm
|
||||||
|
// structs, now that we can wrap xorb blocks in Arc and share the url_info.
|
||||||
let xorb_blocks_arc: Vec<Arc<XorbBlock>> = xorb_blocks.into_iter().map(Arc::new).collect();
|
let xorb_blocks_arc: Vec<Arc<XorbBlock>> = xorb_blocks.into_iter().map(Arc::new).collect();
|
||||||
|
|
||||||
// Convert the intermediate data to FileTerm structs with the shared url_info.
|
|
||||||
let file_terms: Vec<FileTerm> = file_term_data
|
let file_terms: Vec<FileTerm> = file_term_data
|
||||||
.into_iter()
|
.into_iter()
|
||||||
.map(|(byte_range, xorb_chunk_range, offset_into_first_range, xorb_block_index)| FileTerm {
|
.map(|entry| FileTerm {
|
||||||
byte_range,
|
byte_range: entry.byte_range,
|
||||||
xorb_chunk_range,
|
xorb_chunk_range: entry.xorb_chunk_range,
|
||||||
offset_into_first_range,
|
xorb_block_start_index: entry.xorb_block_start_index,
|
||||||
xorb_block: xorb_blocks_arc[xorb_block_index].clone(),
|
offset_into_first_range: entry.offset_into_first_range,
|
||||||
|
xorb_block: xorb_blocks_arc[entry.xorb_block_index].clone(),
|
||||||
url_info: url_info.clone(),
|
url_info: url_info.clone(),
|
||||||
})
|
})
|
||||||
.collect();
|
.collect();
|
||||||
@@ -252,7 +346,7 @@ pub async fn retrieve_file_term_block(
|
|||||||
mod tests {
|
mod tests {
|
||||||
use std::sync::Arc;
|
use std::sync::Arc;
|
||||||
|
|
||||||
use more_asserts::{assert_ge, assert_le};
|
use more_asserts::assert_le;
|
||||||
use xet_client::cas_client::{ClientTestingUtils, LocalClient, RandomFileContents};
|
use xet_client::cas_client::{ClientTestingUtils, LocalClient, RandomFileContents};
|
||||||
use xet_client::cas_types::{ChunkRange, FileRange};
|
use xet_client::cas_types::{ChunkRange, FileRange};
|
||||||
use xet_runtime::utils::UniqueId;
|
use xet_runtime::utils::UniqueId;
|
||||||
@@ -351,10 +445,18 @@ mod tests {
|
|||||||
// Track xorb block index
|
// Track xorb block index
|
||||||
seen_xorb_indices.insert(file_term.xorb_block.xorb_block_index);
|
seen_xorb_indices.insert(file_term.xorb_block.xorb_block_index);
|
||||||
|
|
||||||
// Verify chunk range is within xorb block boundaries.
|
// Verify chunk range is within xorb block boundaries: the term's chunk range
|
||||||
|
// must be contained within at least one of the block's chunk ranges.
|
||||||
let xorb_block = &file_term.xorb_block;
|
let xorb_block = &file_term.xorb_block;
|
||||||
assert_ge!(file_term.xorb_chunk_range.start, xorb_block.chunk_range.start);
|
let term_in_some_range = xorb_block
|
||||||
assert_le!(file_term.xorb_chunk_range.end, xorb_block.chunk_range.end);
|
.chunk_ranges
|
||||||
|
.iter()
|
||||||
|
.any(|cr| file_term.xorb_chunk_range.start >= cr.start && file_term.xorb_chunk_range.end <= cr.end);
|
||||||
|
assert!(
|
||||||
|
term_in_some_range,
|
||||||
|
"term chunk range {:?} not within any block chunk range {:?}",
|
||||||
|
file_term.xorb_chunk_range, xorb_block.chunk_ranges
|
||||||
|
);
|
||||||
|
|
||||||
// Cross-reference with known file contents.
|
// Cross-reference with known file contents.
|
||||||
if expected_term_idx < file_contents.terms.len() {
|
if expected_term_idx < file_contents.terms.len() {
|
||||||
@@ -365,7 +467,7 @@ mod tests {
|
|||||||
|
|
||||||
// Verify chunk range matches (accounting for partial first term).
|
// Verify chunk range matches (accounting for partial first term).
|
||||||
if file_term_data_offset == 0 {
|
if file_term_data_offset == 0 {
|
||||||
assert_eq!(file_term.xorb_chunk_range.start as u32, expected_term.chunk_start);
|
assert_eq!(file_term.xorb_chunk_range.start, expected_term.chunk_start);
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -549,10 +651,11 @@ mod tests {
|
|||||||
// Get the first file term's xorb block to test URL retrieval
|
// Get the first file term's xorb block to test URL retrieval
|
||||||
let file_term = &file_terms[0];
|
let file_term = &file_terms[0];
|
||||||
let xorb_block_index = file_term.xorb_block.xorb_block_index;
|
let xorb_block_index = file_term.xorb_block.xorb_block_index;
|
||||||
let (unique_id, url, http_range) = file_term.url_info.get_retrieval_url(xorb_block_index).await;
|
let (unique_id, url, http_ranges) = file_term.url_info.get_retrieval_url(xorb_block_index).await;
|
||||||
|
|
||||||
assert!(!url.is_empty());
|
assert!(!url.is_empty());
|
||||||
assert!(http_range.start < http_range.end);
|
assert!(!http_ranges.is_empty());
|
||||||
|
assert!(http_ranges[0].start <= http_ranges[0].end);
|
||||||
assert!(unique_id != UniqueId::null());
|
assert!(unique_id != UniqueId::null());
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -591,87 +694,136 @@ mod tests {
|
|||||||
|
|
||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn test_range_few_bytes_before_end() {
|
async fn test_range_few_bytes_before_end() {
|
||||||
// Test requesting a range that ends just a few bytes before the file end,
|
|
||||||
// within the same chunk as the file end.
|
|
||||||
let (client, file_contents) = setup_test_file(&[(1, (0, 5))]).await;
|
let (client, file_contents) = setup_test_file(&[(1, (0, 5))]).await;
|
||||||
let file_len = file_contents.data.len() as u64;
|
let file_len = file_contents.data.len() as u64;
|
||||||
|
|
||||||
// Request range ending 3 bytes before the end
|
|
||||||
let range = FileRange::new(0, file_len - 3);
|
let range = FileRange::new(0, file_len - 3);
|
||||||
retrieve_and_verify(&client, &file_contents, Some(range)).await;
|
retrieve_and_verify(&client, &file_contents, Some(range)).await;
|
||||||
|
|
||||||
// Request range ending 1 byte before the end
|
|
||||||
let range = FileRange::new(0, file_len - 1);
|
let range = FileRange::new(0, file_len - 1);
|
||||||
retrieve_and_verify(&client, &file_contents, Some(range)).await;
|
retrieve_and_verify(&client, &file_contents, Some(range)).await;
|
||||||
}
|
}
|
||||||
|
|
||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn test_range_few_bytes_after_start() {
|
async fn test_range_few_bytes_after_start() {
|
||||||
// Test requesting a range that starts just a few bytes after the file start,
|
|
||||||
// within the same chunk as the file start.
|
|
||||||
let (client, file_contents) = setup_test_file(&[(1, (0, 5))]).await;
|
let (client, file_contents) = setup_test_file(&[(1, (0, 5))]).await;
|
||||||
let file_len = file_contents.data.len() as u64;
|
let file_len = file_contents.data.len() as u64;
|
||||||
|
|
||||||
// Request range starting 3 bytes after the start
|
|
||||||
let range = FileRange::new(3, file_len);
|
let range = FileRange::new(3, file_len);
|
||||||
retrieve_and_verify(&client, &file_contents, Some(range)).await;
|
retrieve_and_verify(&client, &file_contents, Some(range)).await;
|
||||||
|
|
||||||
// Request range starting 1 byte after the start
|
|
||||||
let range = FileRange::new(1, file_len);
|
let range = FileRange::new(1, file_len);
|
||||||
retrieve_and_verify(&client, &file_contents, Some(range)).await;
|
retrieve_and_verify(&client, &file_contents, Some(range)).await;
|
||||||
}
|
}
|
||||||
|
|
||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn test_range_few_bytes_offset_both_ends() {
|
async fn test_range_few_bytes_offset_both_ends() {
|
||||||
// Test requesting a range with small offsets at both ends within the same chunk.
|
|
||||||
let (client, file_contents) = setup_test_file(&[(1, (0, 5))]).await;
|
let (client, file_contents) = setup_test_file(&[(1, (0, 5))]).await;
|
||||||
let file_len = file_contents.data.len() as u64;
|
let file_len = file_contents.data.len() as u64;
|
||||||
|
|
||||||
// Request range with 2 bytes trimmed from start and 2 bytes from end
|
|
||||||
let range = FileRange::new(2, file_len - 2);
|
let range = FileRange::new(2, file_len - 2);
|
||||||
retrieve_and_verify(&client, &file_contents, Some(range)).await;
|
retrieve_and_verify(&client, &file_contents, Some(range)).await;
|
||||||
|
|
||||||
// Request just the middle byte of a small range
|
|
||||||
let range = FileRange::new(file_len / 2 - 1, file_len / 2 + 1);
|
let range = FileRange::new(file_len / 2 - 1, file_len / 2 + 1);
|
||||||
retrieve_and_verify(&client, &file_contents, Some(range)).await;
|
retrieve_and_verify(&client, &file_contents, Some(range)).await;
|
||||||
}
|
}
|
||||||
|
|
||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn test_range_single_byte_at_various_positions() {
|
async fn test_range_single_byte_at_various_positions() {
|
||||||
// Test requesting single bytes at various positions in the file.
|
|
||||||
let (client, file_contents) = setup_test_file(&[(1, (0, 5))]).await;
|
let (client, file_contents) = setup_test_file(&[(1, (0, 5))]).await;
|
||||||
let file_len = file_contents.data.len() as u64;
|
let file_len = file_contents.data.len() as u64;
|
||||||
|
|
||||||
// First byte
|
|
||||||
retrieve_and_verify(&client, &file_contents, Some(FileRange::new(0, 1))).await;
|
retrieve_and_verify(&client, &file_contents, Some(FileRange::new(0, 1))).await;
|
||||||
|
|
||||||
// Last byte
|
|
||||||
retrieve_and_verify(&client, &file_contents, Some(FileRange::new(file_len - 1, file_len))).await;
|
retrieve_and_verify(&client, &file_contents, Some(FileRange::new(file_len - 1, file_len))).await;
|
||||||
|
|
||||||
// Middle byte
|
|
||||||
let mid = file_len / 2;
|
let mid = file_len / 2;
|
||||||
retrieve_and_verify(&client, &file_contents, Some(FileRange::new(mid, mid + 1))).await;
|
retrieve_and_verify(&client, &file_contents, Some(FileRange::new(mid, mid + 1))).await;
|
||||||
}
|
}
|
||||||
|
|
||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn test_multi_term_range_ends_mid_chunk() {
|
async fn test_multi_term_range_ends_mid_chunk() {
|
||||||
// Test with multiple terms where the requested range ends in the middle of the last term's chunk.
|
|
||||||
let (client, file_contents) = setup_test_file(&[(1, (0, 3)), (2, (0, 3)), (3, (0, 3))]).await;
|
let (client, file_contents) = setup_test_file(&[(1, (0, 3)), (2, (0, 3)), (3, (0, 3))]).await;
|
||||||
let file_len = file_contents.data.len() as u64;
|
let file_len = file_contents.data.len() as u64;
|
||||||
|
|
||||||
// End a few bytes before the file end
|
|
||||||
let range = FileRange::new(0, file_len - 5);
|
let range = FileRange::new(0, file_len - 5);
|
||||||
retrieve_and_verify(&client, &file_contents, Some(range)).await;
|
retrieve_and_verify(&client, &file_contents, Some(range)).await;
|
||||||
}
|
}
|
||||||
|
|
||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn test_multi_term_range_starts_mid_chunk() {
|
async fn test_multi_term_range_starts_mid_chunk() {
|
||||||
// Test with multiple terms where the requested range starts in the middle of the first term's chunk.
|
|
||||||
let (client, file_contents) = setup_test_file(&[(1, (0, 3)), (2, (0, 3)), (3, (0, 3))]).await;
|
let (client, file_contents) = setup_test_file(&[(1, (0, 3)), (2, (0, 3)), (3, (0, 3))]).await;
|
||||||
let file_len = file_contents.data.len() as u64;
|
let file_len = file_contents.data.len() as u64;
|
||||||
|
|
||||||
// Start a few bytes after the file start
|
|
||||||
let range = FileRange::new(5, file_len);
|
let range = FileRange::new(5, file_len);
|
||||||
retrieve_and_verify(&client, &file_contents, Some(range)).await;
|
retrieve_and_verify(&client, &file_contents, Some(range)).await;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// ==================== Multi-Disjoint Range Edge Cases ====================
|
||||||
|
|
||||||
|
/// Single xorb with three disjoint chunk ranges.
|
||||||
|
/// This creates one XorbBlock with chunk_ranges = [(0,2), (4,6), (8,10)].
|
||||||
|
#[tokio::test]
|
||||||
|
async fn test_triple_disjoint_same_xorb() {
|
||||||
|
let (client, file_contents) = setup_test_file(&[(1, (0, 2)), (1, (4, 6)), (1, (8, 10))]).await;
|
||||||
|
retrieve_and_verify(&client, &file_contents, None).await;
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Triple disjoint ranges with a partial byte range spanning the gap.
|
||||||
|
#[tokio::test]
|
||||||
|
async fn test_triple_disjoint_partial_range_across_gap() {
|
||||||
|
let (client, file_contents) = setup_test_file(&[(1, (0, 2)), (1, (4, 6)), (1, (8, 10))]).await;
|
||||||
|
let file_len = file_contents.data.len() as u64;
|
||||||
|
let range = FileRange::new(file_len / 4, file_len * 3 / 4);
|
||||||
|
retrieve_and_verify(&client, &file_contents, Some(range)).await;
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Two xorbs, each with two disjoint ranges, interleaved in file order.
|
||||||
|
#[tokio::test]
|
||||||
|
async fn test_two_xorbs_interleaved_disjoint() {
|
||||||
|
let term_spec = &[(1, (0, 2)), (2, (0, 2)), (1, (4, 6)), (2, (4, 6))];
|
||||||
|
let (client, file_contents) = setup_test_file(term_spec).await;
|
||||||
|
retrieve_and_verify(&client, &file_contents, None).await;
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Two xorbs interleaved with disjoint ranges, partial byte range.
|
||||||
|
#[tokio::test]
|
||||||
|
async fn test_two_xorbs_interleaved_disjoint_partial() {
|
||||||
|
let term_spec = &[(1, (0, 2)), (2, (0, 2)), (1, (4, 6)), (2, (4, 6))];
|
||||||
|
let (client, file_contents) = setup_test_file(term_spec).await;
|
||||||
|
let file_len = file_contents.data.len() as u64;
|
||||||
|
retrieve_and_verify(&client, &file_contents, Some(FileRange::new(file_len / 3, file_len * 2 / 3))).await;
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Single xorb with four disjoint ranges, each a single chunk wide.
|
||||||
|
#[tokio::test]
|
||||||
|
async fn test_four_single_chunk_disjoint() {
|
||||||
|
let term_spec = &[(1, (0, 1)), (1, (3, 4)), (1, (6, 7)), (1, (9, 10))];
|
||||||
|
let (client, file_contents) = setup_test_file(term_spec).await;
|
||||||
|
retrieve_and_verify(&client, &file_contents, None).await;
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Mix of contiguous and disjoint ranges from the same xorb.
|
||||||
|
/// Chunks 0-4 are contiguous, then a gap, then chunk 8-10.
|
||||||
|
#[tokio::test]
|
||||||
|
async fn test_contiguous_then_disjoint() {
|
||||||
|
let term_spec = &[(1, (0, 2)), (1, (2, 4)), (1, (8, 10))];
|
||||||
|
let (client, file_contents) = setup_test_file(term_spec).await;
|
||||||
|
retrieve_and_verify(&client, &file_contents, None).await;
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Three xorbs with complex disjoint access patterns.
|
||||||
|
#[tokio::test]
|
||||||
|
async fn test_three_xorbs_complex_disjoint() {
|
||||||
|
let term_spec = &[
|
||||||
|
(1, (0, 2)),
|
||||||
|
(2, (0, 3)),
|
||||||
|
(3, (2, 5)),
|
||||||
|
(1, (5, 8)),
|
||||||
|
(2, (6, 8)),
|
||||||
|
(3, (0, 2)),
|
||||||
|
];
|
||||||
|
let (client, file_contents) = setup_test_file(term_spec).await;
|
||||||
|
retrieve_and_verify(&client, &file_contents, None).await;
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -21,9 +21,11 @@ pub struct TermBlockRetrievalURLs {
|
|||||||
// which may be smaller than the originally requested range if the file ends early.
|
// which may be smaller than the originally requested range if the file ends early.
|
||||||
pub byte_range: FileRange,
|
pub byte_range: FileRange,
|
||||||
|
|
||||||
// The xorb retreival URLs. These could be refreshed if need be.
|
// The xorb retrieval URLs. These could be refreshed if need be.
|
||||||
// Indexed by xorb_block_index stored in each XorbBlock.
|
// Indexed by xorb_block_index stored in each XorbBlock.
|
||||||
pub(crate) xorb_block_retrieval_urls: RwLock<(UniqueId, Vec<(String, HttpRange)>)>,
|
// Each entry is (url, http_ranges) to support multi-range V2 blocks.
|
||||||
|
#[allow(clippy::type_complexity)]
|
||||||
|
pub(crate) xorb_block_retrieval_urls: RwLock<(UniqueId, Vec<(String, Vec<HttpRange>)>)>,
|
||||||
}
|
}
|
||||||
|
|
||||||
impl TermBlockRetrievalURLs {
|
impl TermBlockRetrievalURLs {
|
||||||
@@ -32,7 +34,7 @@ impl TermBlockRetrievalURLs {
|
|||||||
file_hash: MerkleHash,
|
file_hash: MerkleHash,
|
||||||
byte_range: FileRange,
|
byte_range: FileRange,
|
||||||
acquisition_id: UniqueId,
|
acquisition_id: UniqueId,
|
||||||
retrieval_urls: Vec<(String, HttpRange)>,
|
retrieval_urls: Vec<(String, Vec<HttpRange>)>,
|
||||||
) -> Self {
|
) -> Self {
|
||||||
Self {
|
Self {
|
||||||
file_hash,
|
file_hash,
|
||||||
@@ -41,15 +43,13 @@ impl TermBlockRetrievalURLs {
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
/// Gets the retrieval URL for a given xorb block. All URL requests go through
|
/// Gets the retrieval URL and all byte ranges for a given xorb block.
|
||||||
/// this method in order to manage url refreshes; this function returns the
|
/// All URL requests go through this method in order to manage URL refreshes;
|
||||||
/// most recent retrieval URL in the case of a refresh.
|
/// this function returns the most recent retrieval URL in the case of a refresh.
|
||||||
pub async fn get_retrieval_url(&self, xorb_block_index: usize) -> (UniqueId, String, HttpRange) {
|
pub async fn get_retrieval_url(&self, xorb_block_index: usize) -> (UniqueId, String, Vec<HttpRange>) {
|
||||||
let xbru = self.xorb_block_retrieval_urls.read().await;
|
let xbru = self.xorb_block_retrieval_urls.read().await;
|
||||||
|
let (url, url_ranges) = &xbru.1[xorb_block_index];
|
||||||
let (url, url_range) = xbru.1[xorb_block_index].clone();
|
(xbru.0, url.clone(), url_ranges.clone())
|
||||||
|
|
||||||
(xbru.0, url, url_range)
|
|
||||||
}
|
}
|
||||||
|
|
||||||
/// Refresh the retrieval URLs for all xorb blocks in this block.
|
/// Refresh the retrieval URLs for all xorb blocks in this block.
|
||||||
@@ -61,8 +61,7 @@ impl TermBlockRetrievalURLs {
|
|||||||
/// the new request will get a new URL.
|
/// the new request will get a new URL.
|
||||||
pub async fn refresh_retrieval_urls(&self, client: Arc<dyn Client>, acquisition_id: UniqueId) -> Result<()> {
|
pub async fn refresh_retrieval_urls(&self, client: Arc<dyn Client>, acquisition_id: UniqueId) -> Result<()> {
|
||||||
if self.xorb_block_retrieval_urls.read().await.0 != acquisition_id {
|
if self.xorb_block_retrieval_urls.read().await.0 != acquisition_id {
|
||||||
// This means another process has got in here while we're waiting for the lock and
|
// Another task already refreshed while we were waiting for the read lock.
|
||||||
// refreshed them.
|
|
||||||
debug!(
|
debug!(
|
||||||
file_hash = %self.file_hash,
|
file_hash = %self.file_hash,
|
||||||
byte_range = ?(self.byte_range.start, self.byte_range.end),
|
byte_range = ?(self.byte_range.start, self.byte_range.end),
|
||||||
@@ -74,7 +73,7 @@ impl TermBlockRetrievalURLs {
|
|||||||
let mut retrieval_urls = self.xorb_block_retrieval_urls.write().await;
|
let mut retrieval_urls = self.xorb_block_retrieval_urls.write().await;
|
||||||
|
|
||||||
if retrieval_urls.0 != acquisition_id {
|
if retrieval_urls.0 != acquisition_id {
|
||||||
// It's already been refreshed by another process.
|
// Already refreshed by another task while waiting for the write lock.
|
||||||
debug!(
|
debug!(
|
||||||
file_hash = %self.file_hash,
|
file_hash = %self.file_hash,
|
||||||
byte_range = ?(self.byte_range.start, self.byte_range.end),
|
byte_range = ?(self.byte_range.start, self.byte_range.end),
|
||||||
@@ -90,8 +89,7 @@ impl TermBlockRetrievalURLs {
|
|||||||
"Refreshing expired retrieval URLs"
|
"Refreshing expired retrieval URLs"
|
||||||
);
|
);
|
||||||
|
|
||||||
// Since this hopefully doesn't happen too often, go through and retrieve an
|
// Re-fetch the entire block to get fresh URLs, then verify the structure matches.
|
||||||
// entire new block, then make sure everything matches up and take in the new stuff.
|
|
||||||
let Some((returned_range, _transfer_bytes, file_terms)) =
|
let Some((returned_range, _transfer_bytes, file_terms)) =
|
||||||
retrieve_file_term_block(client, self.file_hash, self.byte_range).await?
|
retrieve_file_term_block(client, self.file_hash, self.byte_range).await?
|
||||||
else {
|
else {
|
||||||
@@ -141,11 +139,13 @@ pub struct XorbURLProvider {
|
|||||||
|
|
||||||
#[async_trait::async_trait]
|
#[async_trait::async_trait]
|
||||||
impl URLProvider for XorbURLProvider {
|
impl URLProvider for XorbURLProvider {
|
||||||
async fn retrieve_url(&self) -> std::result::Result<(String, HttpRange), xet_client::cas_client::CasClientError> {
|
async fn retrieve_url(
|
||||||
let (unique_id, url, http_range) = self.url_info.get_retrieval_url(self.xorb_block_index).await;
|
&self,
|
||||||
|
) -> std::result::Result<(String, Vec<HttpRange>), xet_client::cas_client::CasClientError> {
|
||||||
|
let (unique_id, url, http_ranges) = self.url_info.get_retrieval_url(self.xorb_block_index).await;
|
||||||
*self.last_acquisition_id.lock().await = unique_id;
|
*self.last_acquisition_id.lock().await = unique_id;
|
||||||
|
|
||||||
Ok((url, http_range))
|
Ok((url, http_ranges))
|
||||||
}
|
}
|
||||||
|
|
||||||
async fn refresh_url(&self) -> std::result::Result<(), xet_client::cas_client::CasClientError> {
|
async fn refresh_url(&self) -> std::result::Result<(), xet_client::cas_client::CasClientError> {
|
||||||
@@ -155,3 +155,110 @@ impl URLProvider for XorbURLProvider {
|
|||||||
.map_err(|e| xet_client::cas_client::CasClientError::Other(e.to_string()))
|
.map_err(|e| xet_client::cas_client::CasClientError::Other(e.to_string()))
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
#[cfg(test)]
|
||||||
|
mod tests {
|
||||||
|
use std::sync::Arc;
|
||||||
|
|
||||||
|
use tokio::sync::Mutex;
|
||||||
|
use xet_client::cas_client::{ClientTestingUtils, LocalClient, URLProvider};
|
||||||
|
use xet_client::cas_types::{FileRange, HttpRange};
|
||||||
|
use xet_core_structures::merklehash::MerkleHash;
|
||||||
|
use xet_runtime::utils::UniqueId;
|
||||||
|
|
||||||
|
use super::{TermBlockRetrievalURLs, XorbURLProvider};
|
||||||
|
|
||||||
|
fn sample_urls(n: usize) -> Vec<(String, Vec<HttpRange>)> {
|
||||||
|
(0..n)
|
||||||
|
.map(|i| (format!("https://example.com/xorb_{i}"), vec![HttpRange::new(0, 100)]))
|
||||||
|
.collect()
|
||||||
|
}
|
||||||
|
|
||||||
|
#[tokio::test]
|
||||||
|
async fn test_new_and_get_retrieval_url() {
|
||||||
|
let id = UniqueId::new();
|
||||||
|
let urls = sample_urls(3);
|
||||||
|
let block = TermBlockRetrievalURLs::new(MerkleHash::default(), FileRange::new(0, 100), id, urls.clone());
|
||||||
|
|
||||||
|
for (i, expected) in urls.iter().enumerate() {
|
||||||
|
let (ret_id, url, ranges) = block.get_retrieval_url(i).await;
|
||||||
|
assert!(ret_id == id, "acquisition ID mismatch for block {i}");
|
||||||
|
assert_eq!(url, expected.0);
|
||||||
|
assert_eq!(ranges, expected.1);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
#[tokio::test]
|
||||||
|
async fn test_refresh_skipped_when_already_refreshed() {
|
||||||
|
let (client, file_contents) = {
|
||||||
|
let c = LocalClient::temporary().await.unwrap();
|
||||||
|
let fc = c.upload_random_file(&[(1, (0, 3))], 64).await.unwrap();
|
||||||
|
(c, fc)
|
||||||
|
};
|
||||||
|
|
||||||
|
let file_range = FileRange::new(0, file_contents.data.len() as u64);
|
||||||
|
let dyn_client: Arc<dyn xet_client::cas_client::Client> = client.clone();
|
||||||
|
|
||||||
|
let (_, _, file_terms) =
|
||||||
|
super::retrieve_file_term_block(dyn_client.clone(), file_contents.file_hash, file_range)
|
||||||
|
.await
|
||||||
|
.unwrap()
|
||||||
|
.unwrap();
|
||||||
|
|
||||||
|
let url_info = file_terms[0].url_info.clone();
|
||||||
|
|
||||||
|
// Get original acquisition ID
|
||||||
|
let (original_id, _, _) = url_info.get_retrieval_url(0).await;
|
||||||
|
|
||||||
|
// Refresh with a stale (different) ID should be a no-op.
|
||||||
|
let stale_id = UniqueId::new();
|
||||||
|
url_info.refresh_retrieval_urls(dyn_client.clone(), stale_id).await.unwrap();
|
||||||
|
let (id_after, _, _) = url_info.get_retrieval_url(0).await;
|
||||||
|
assert!(id_after == original_id, "refresh with stale ID should not change acquisition ID");
|
||||||
|
|
||||||
|
// Refresh with the correct ID should update URLs.
|
||||||
|
url_info.refresh_retrieval_urls(dyn_client.clone(), original_id).await.unwrap();
|
||||||
|
let (refreshed_id, _, _) = url_info.get_retrieval_url(0).await;
|
||||||
|
assert!(refreshed_id != original_id, "refresh with correct ID should change acquisition ID");
|
||||||
|
}
|
||||||
|
|
||||||
|
#[tokio::test]
|
||||||
|
async fn test_xorb_url_provider_retrieve_and_refresh() {
|
||||||
|
let (client, file_contents) = {
|
||||||
|
let c = LocalClient::temporary().await.unwrap();
|
||||||
|
let fc = c.upload_random_file(&[(1, (0, 3))], 64).await.unwrap();
|
||||||
|
(c, fc)
|
||||||
|
};
|
||||||
|
|
||||||
|
let file_range = FileRange::new(0, file_contents.data.len() as u64);
|
||||||
|
let dyn_client: Arc<dyn xet_client::cas_client::Client> = client.clone();
|
||||||
|
|
||||||
|
let (_, _, file_terms) =
|
||||||
|
super::retrieve_file_term_block(dyn_client.clone(), file_contents.file_hash, file_range)
|
||||||
|
.await
|
||||||
|
.unwrap()
|
||||||
|
.unwrap();
|
||||||
|
|
||||||
|
let url_info = file_terms[0].url_info.clone();
|
||||||
|
|
||||||
|
let provider = XorbURLProvider {
|
||||||
|
client: dyn_client.clone(),
|
||||||
|
url_info,
|
||||||
|
xorb_block_index: 0,
|
||||||
|
last_acquisition_id: Mutex::new(UniqueId::null()),
|
||||||
|
};
|
||||||
|
|
||||||
|
// retrieve_url should succeed and return a valid URL.
|
||||||
|
let (url, ranges) = provider.retrieve_url().await.unwrap();
|
||||||
|
assert!(!url.is_empty());
|
||||||
|
assert!(!ranges.is_empty());
|
||||||
|
|
||||||
|
// refresh_url should succeed (refreshes with the current acquisition ID).
|
||||||
|
provider.refresh_url().await.unwrap();
|
||||||
|
|
||||||
|
// After refresh, retrieve_url should still work with updated URLs.
|
||||||
|
let (url2, ranges2) = provider.retrieve_url().await.unwrap();
|
||||||
|
assert!(!url2.is_empty());
|
||||||
|
assert!(!ranges2.is_empty());
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|||||||
@@ -13,27 +13,49 @@ use super::retrieval_urls::{TermBlockRetrievalURLs, XorbURLProvider};
|
|||||||
use crate::progress_tracking::download_tracking::DownloadTaskUpdater;
|
use crate::progress_tracking::download_tracking::DownloadTaskUpdater;
|
||||||
|
|
||||||
/// Downloaded and decompressed data for a xorb block, including chunk boundary offsets.
|
/// Downloaded and decompressed data for a xorb block, including chunk boundary offsets.
|
||||||
|
///
|
||||||
|
/// A single `XorbBlockData` may hold data from multiple disjoint chunk ranges
|
||||||
|
/// (V2 multi-range fetch). The chunks are concatenated in range order, and
|
||||||
|
/// `chunk_offsets` maps each chunk index to its byte position within `data`.
|
||||||
pub struct XorbBlockData {
|
pub struct XorbBlockData {
|
||||||
pub chunk_offsets: Vec<usize>,
|
/// Pairs of (chunk_index, byte_offset) mapping each chunk to its start position
|
||||||
pub uncompressed_size: u64,
|
/// within `data`. Because the block can span multiple disjoint chunk ranges,
|
||||||
|
/// storing the chunk index alongside the offset avoids ambiguity.
|
||||||
|
pub chunk_offsets: Vec<(usize, usize)>,
|
||||||
|
|
||||||
|
/// The concatenated decompressed chunk data for all ranges in this block.
|
||||||
pub data: Bytes,
|
pub data: Bytes,
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/// A reference from a file term back to the xorb block it belongs to.
|
||||||
|
/// Used by `determine_size_if_possible` to check whether the block's total
|
||||||
|
/// uncompressed size can be inferred from the terms that reference it.
|
||||||
#[derive(Debug)]
|
#[derive(Debug)]
|
||||||
pub struct XorbReference {
|
pub struct XorbReference {
|
||||||
|
/// The chunk range within the xorb that this file term covers.
|
||||||
pub term_chunks: ChunkRange,
|
pub term_chunks: ChunkRange,
|
||||||
|
/// The uncompressed byte size of this term's data.
|
||||||
pub uncompressed_size: usize,
|
pub uncompressed_size: usize,
|
||||||
}
|
}
|
||||||
|
|
||||||
/// A downloadable xorb block identified by hash and chunk range, with cached data.
|
/// A downloadable xorb block identified by hash and chunk ranges, with cached data.
|
||||||
/// Multiple file terms may reference the same xorb block.
|
///
|
||||||
|
/// A block may contain multiple disjoint chunk ranges from the same xorb (V2 multi-range).
|
||||||
|
/// Multiple file terms may reference the same block. Downloaded data is cached in `data`
|
||||||
|
/// so that the first term to request it triggers the download, and subsequent terms
|
||||||
|
/// reuse the cached result.
|
||||||
pub struct XorbBlock {
|
pub struct XorbBlock {
|
||||||
pub xorb_hash: MerkleHash,
|
pub xorb_hash: MerkleHash,
|
||||||
pub chunk_range: ChunkRange,
|
/// The chunk ranges fetched for this block. For V1 this is a single range;
|
||||||
|
/// for V2 multi-range fetches this may contain multiple disjoint ranges.
|
||||||
|
pub chunk_ranges: Vec<ChunkRange>,
|
||||||
|
/// Index into the parent `TermBlockRetrievalURLs` for URL lookup.
|
||||||
pub xorb_block_index: usize,
|
pub xorb_block_index: usize,
|
||||||
/// All file-term chunk ranges covered by this xorb block, sorted by range start.
|
/// All file-term references covered by this block, sorted by chunk range start.
|
||||||
|
/// Populated during `retrieve_file_term_block` and used to compute `uncompressed_size_if_known`.
|
||||||
pub references: Vec<XorbReference>,
|
pub references: Vec<XorbReference>,
|
||||||
/// Expected decompressed size of the block when known. Used for debug_assert in clients.
|
/// Expected total decompressed size across all chunk ranges, if it can be determined
|
||||||
|
/// from the references. Passed to clients as a debug assertion hint.
|
||||||
pub uncompressed_size_if_known: Option<usize>,
|
pub uncompressed_size_if_known: Option<usize>,
|
||||||
pub data: OnceCell<Arc<XorbBlockData>>,
|
pub data: OnceCell<Arc<XorbBlockData>>,
|
||||||
}
|
}
|
||||||
@@ -41,7 +63,7 @@ pub struct XorbBlock {
|
|||||||
impl PartialEq for XorbBlock {
|
impl PartialEq for XorbBlock {
|
||||||
fn eq(&self, other: &Self) -> bool {
|
fn eq(&self, other: &Self) -> bool {
|
||||||
self.xorb_hash == other.xorb_hash
|
self.xorb_hash == other.xorb_hash
|
||||||
&& self.chunk_range == other.chunk_range
|
&& self.chunk_ranges == other.chunk_ranges
|
||||||
&& self.xorb_block_index == other.xorb_block_index
|
&& self.xorb_block_index == other.xorb_block_index
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
@@ -63,6 +85,7 @@ impl XorbBlock {
|
|||||||
) -> Result<Arc<XorbBlockData>> {
|
) -> Result<Arc<XorbBlockData>> {
|
||||||
let xorb_block_index = self.xorb_block_index;
|
let xorb_block_index = self.xorb_block_index;
|
||||||
let uncompressed_size_if_known = self.uncompressed_size_if_known;
|
let uncompressed_size_if_known = self.uncompressed_size_if_known;
|
||||||
|
let chunk_ranges = self.chunk_ranges.clone();
|
||||||
|
|
||||||
self.data
|
self.data
|
||||||
.get_or_try_init(|| async {
|
.get_or_try_init(|| async {
|
||||||
@@ -89,14 +112,18 @@ impl XorbBlock {
|
|||||||
.get_file_term_data(Box::new(url_provider), permit, progress_callback, uncompressed_size_if_known)
|
.get_file_term_data(Box::new(url_provider), permit, progress_callback, uncompressed_size_if_known)
|
||||||
.await?;
|
.await?;
|
||||||
|
|
||||||
let chunk_offsets: Vec<usize> = chunk_byte_offsets.iter().map(|&x| x as usize).collect();
|
// Build chunk_offsets by zipping each chunk index (from all chunk_ranges)
|
||||||
let uncompressed_size = data.len() as u64;
|
// with the corresponding byte offset from the returned data.
|
||||||
|
let mut chunk_offsets = Vec::new();
|
||||||
|
let mut offset_idx = 0;
|
||||||
|
for range in &chunk_ranges {
|
||||||
|
for chunk_idx in range.start..range.end {
|
||||||
|
chunk_offsets.push((chunk_idx as usize, chunk_byte_offsets[offset_idx] as usize));
|
||||||
|
offset_idx += 1;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
Ok(Arc::new(XorbBlockData {
|
Ok(Arc::new(XorbBlockData { chunk_offsets, data }))
|
||||||
chunk_offsets,
|
|
||||||
uncompressed_size,
|
|
||||||
data,
|
|
||||||
}))
|
|
||||||
})
|
})
|
||||||
.await
|
.await
|
||||||
.cloned()
|
.cloned()
|
||||||
@@ -105,33 +132,67 @@ impl XorbBlock {
|
|||||||
/// Determines the total uncompressed size of the xorb block from the reference terms,
|
/// Determines the total uncompressed size of the xorb block from the reference terms,
|
||||||
/// if possible.
|
/// if possible.
|
||||||
///
|
///
|
||||||
/// The size can be determined when:
|
/// Uses a forward-chaining DP: starting from the first chunk range's start,
|
||||||
/// 1. A single term's chunk range exactly matches the full xorb range, or
|
/// we track which chunk positions are "reachable" (i.e., fully covered by a
|
||||||
/// 2. A chain of term chunk ranges exactly covers the full xorb range with no gaps (e.g. [0..3, 3..5] covers 0..5).
|
/// contiguous chain of terms) along with the accumulated uncompressed size.
|
||||||
///
|
///
|
||||||
/// The `terms` slice must be sorted by chunk range start index.
|
/// For multi-range blocks with disjoint chunk ranges (e.g. `[0,3)` and `[5,8)`),
|
||||||
pub fn determine_size_if_possible(xorb_range: ChunkRange, terms: &[XorbReference]) -> Option<usize> {
|
/// the gaps between ranges are inserted as zero-cost bridges. This lets the DP
|
||||||
|
/// traverse the full set of ranges in a single pass — a gap `[3,5)` contributes
|
||||||
|
/// no data but connects the end of one range to the start of the next.
|
||||||
|
///
|
||||||
|
/// Returns `Some(total_size)` if every range is fully covered, `None` otherwise.
|
||||||
|
///
|
||||||
|
/// The `terms` slice must be sorted by `term_chunks.start`.
|
||||||
|
pub fn determine_size_if_possible(xorb_ranges: &[ChunkRange], terms: &[XorbReference]) -> Option<usize> {
|
||||||
debug_assert!(
|
debug_assert!(
|
||||||
terms.windows(2).all(|w| w[0].term_chunks.start <= w[1].term_chunks.start),
|
terms.windows(2).all(|w| w[0].term_chunks.start <= w[1].term_chunks.start),
|
||||||
"terms must be sorted by chunk range start"
|
"terms must be sorted by chunk range start"
|
||||||
);
|
);
|
||||||
|
|
||||||
// DP approach: track which chunk endpoints are reachable from xorb_range.start
|
debug_assert!(
|
||||||
// via contiguous chains, along with accumulated uncompressed sizes.
|
terms.iter().all(|term| xorb_ranges
|
||||||
// This correctly handles multiple terms with the same start index by
|
.iter()
|
||||||
// considering all possible chain continuations.
|
.any(|r| term.term_chunks.start >= r.start && term.term_chunks.end <= r.end)),
|
||||||
let mut reachable: BTreeMap<u32, usize> = BTreeMap::new();
|
"all terms must fall within one of the xorb ranges"
|
||||||
reachable.insert(xorb_range.start, 0);
|
);
|
||||||
|
|
||||||
|
if xorb_ranges.is_empty() {
|
||||||
|
return Some(0);
|
||||||
|
}
|
||||||
|
|
||||||
|
// Build a lookup from range-end -> next-range-start for gap bridging.
|
||||||
|
// E.g. for ranges [0,3) and [5,8), maps 3 -> 5, meaning once chunk 3
|
||||||
|
// is reachable we can bridge to chunk 5 at zero cost.
|
||||||
|
let gap_bridges: BTreeMap<u32, u32> = xorb_ranges
|
||||||
|
.windows(2)
|
||||||
|
.filter(|pair| pair[0].end < pair[1].start)
|
||||||
|
.map(|pair| (pair[0].end, pair[1].start))
|
||||||
|
.collect();
|
||||||
|
|
||||||
|
// DP map: chunk position -> accumulated uncompressed size to reach that position.
|
||||||
|
// Seed with the start of the first range.
|
||||||
|
let mut reachable: BTreeMap<u32, usize> = BTreeMap::new();
|
||||||
|
reachable.insert(xorb_ranges[0].start, 0);
|
||||||
|
|
||||||
|
// Process terms in sorted order, extending reachable positions.
|
||||||
for term in terms {
|
for term in terms {
|
||||||
if let Some(&accumulated) = reachable.get(&term.term_chunks.start) {
|
if let Some(&accumulated) = reachable.get(&term.term_chunks.start) {
|
||||||
reachable
|
let new_end = term.term_chunks.end;
|
||||||
.entry(term.term_chunks.end)
|
let new_size = accumulated + term.uncompressed_size;
|
||||||
.or_insert(accumulated + term.uncompressed_size);
|
|
||||||
|
reachable.entry(new_end).or_insert(new_size);
|
||||||
|
|
||||||
|
// If this term reaches the end of a range that has a gap bridge,
|
||||||
|
// make the start of the next range reachable at the same accumulated size.
|
||||||
|
if let Some(&bridge_target) = gap_bridges.get(&new_end) {
|
||||||
|
reachable.entry(bridge_target).or_insert(new_size);
|
||||||
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
reachable.get(&xorb_range.end).copied()
|
// The block is fully covered if we can reach the end of the last range.
|
||||||
|
reachable.get(&xorb_ranges.last().unwrap().end).copied()
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -153,197 +214,210 @@ mod tests {
|
|||||||
|
|
||||||
#[test]
|
#[test]
|
||||||
fn test_single_term_exact_match() {
|
fn test_single_term_exact_match() {
|
||||||
let xorb_range = ChunkRange::new(0, 5);
|
let ranges = &[ChunkRange::new(0, 5)];
|
||||||
let terms = build_refs(&[(ChunkRange::new(0, 5), 1000)]);
|
let terms = build_refs(&[(ChunkRange::new(0, 5), 1000)]);
|
||||||
assert_eq!(XorbBlock::determine_size_if_possible(xorb_range, &terms), Some(1000));
|
assert_eq!(XorbBlock::determine_size_if_possible(ranges, &terms), Some(1000));
|
||||||
}
|
}
|
||||||
|
|
||||||
#[test]
|
#[test]
|
||||||
fn test_two_terms_chained() {
|
fn test_two_terms_chained() {
|
||||||
let xorb_range = ChunkRange::new(0, 5);
|
let ranges = &[ChunkRange::new(0, 5)];
|
||||||
let terms = build_refs(&[(ChunkRange::new(0, 3), 600), (ChunkRange::new(3, 5), 400)]);
|
let terms = build_refs(&[(ChunkRange::new(0, 3), 600), (ChunkRange::new(3, 5), 400)]);
|
||||||
assert_eq!(XorbBlock::determine_size_if_possible(xorb_range, &terms), Some(1000));
|
assert_eq!(XorbBlock::determine_size_if_possible(ranges, &terms), Some(1000));
|
||||||
}
|
}
|
||||||
|
|
||||||
#[test]
|
#[test]
|
||||||
fn test_three_terms_chained() {
|
fn test_three_terms_chained() {
|
||||||
let xorb_range = ChunkRange::new(0, 6);
|
let ranges = &[ChunkRange::new(0, 6)];
|
||||||
let terms = build_refs(&[
|
let terms = build_refs(&[
|
||||||
(ChunkRange::new(0, 2), 200),
|
(ChunkRange::new(0, 2), 200),
|
||||||
(ChunkRange::new(2, 4), 300),
|
(ChunkRange::new(2, 4), 300),
|
||||||
(ChunkRange::new(4, 6), 500),
|
(ChunkRange::new(4, 6), 500),
|
||||||
]);
|
]);
|
||||||
assert_eq!(XorbBlock::determine_size_if_possible(xorb_range, &terms), Some(1000));
|
assert_eq!(XorbBlock::determine_size_if_possible(ranges, &terms), Some(1000));
|
||||||
}
|
}
|
||||||
|
|
||||||
#[test]
|
#[test]
|
||||||
fn test_gap_in_chain() {
|
fn test_gap_in_chain() {
|
||||||
let xorb_range = ChunkRange::new(0, 6);
|
let ranges = &[ChunkRange::new(0, 6)];
|
||||||
let terms = build_refs(&[(ChunkRange::new(0, 2), 200), (ChunkRange::new(4, 6), 500)]);
|
let terms = build_refs(&[(ChunkRange::new(0, 2), 200), (ChunkRange::new(4, 6), 500)]);
|
||||||
assert_eq!(XorbBlock::determine_size_if_possible(xorb_range, &terms), None);
|
assert_eq!(XorbBlock::determine_size_if_possible(ranges, &terms), None);
|
||||||
}
|
}
|
||||||
|
|
||||||
#[test]
|
#[test]
|
||||||
fn test_does_not_start_at_xorb_start() {
|
fn test_does_not_start_at_xorb_start() {
|
||||||
let xorb_range = ChunkRange::new(0, 5);
|
let ranges = &[ChunkRange::new(0, 5)];
|
||||||
let terms = build_refs(&[(ChunkRange::new(1, 5), 800)]);
|
let terms = build_refs(&[(ChunkRange::new(1, 5), 800)]);
|
||||||
assert_eq!(XorbBlock::determine_size_if_possible(xorb_range, &terms), None);
|
assert_eq!(XorbBlock::determine_size_if_possible(ranges, &terms), None);
|
||||||
}
|
}
|
||||||
|
|
||||||
#[test]
|
#[test]
|
||||||
fn test_does_not_end_at_xorb_end() {
|
fn test_does_not_end_at_xorb_end() {
|
||||||
let xorb_range = ChunkRange::new(0, 5);
|
let ranges = &[ChunkRange::new(0, 5)];
|
||||||
let terms = build_refs(&[(ChunkRange::new(0, 3), 600)]);
|
let terms = build_refs(&[(ChunkRange::new(0, 3), 600)]);
|
||||||
assert_eq!(XorbBlock::determine_size_if_possible(xorb_range, &terms), None);
|
assert_eq!(XorbBlock::determine_size_if_possible(ranges, &terms), None);
|
||||||
}
|
}
|
||||||
|
|
||||||
#[test]
|
#[test]
|
||||||
fn test_empty_terms() {
|
fn test_empty_terms() {
|
||||||
let xorb_range = ChunkRange::new(0, 5);
|
let ranges = &[ChunkRange::new(0, 5)];
|
||||||
let terms: Vec<XorbReference> = vec![];
|
let terms: Vec<XorbReference> = vec![];
|
||||||
assert_eq!(XorbBlock::determine_size_if_possible(xorb_range, &terms), None);
|
assert_eq!(XorbBlock::determine_size_if_possible(ranges, &terms), None);
|
||||||
}
|
}
|
||||||
|
|
||||||
#[test]
|
#[test]
|
||||||
fn test_overlapping_terms_with_exact_cover() {
|
fn test_overlapping_terms_with_exact_cover() {
|
||||||
// Terms [0..3, 1..4, 3..5] - the chain 0..3, 3..5 covers 0..5.
|
// Terms [0..3, 1..4, 3..5] - the chain 0..3, 3..5 covers 0..5.
|
||||||
// The overlapping term 1..4 should be skipped.
|
// The overlapping term 1..4 should be skipped.
|
||||||
let xorb_range = ChunkRange::new(0, 5);
|
let ranges = &[ChunkRange::new(0, 5)];
|
||||||
let terms = build_refs(&[
|
let terms = build_refs(&[
|
||||||
(ChunkRange::new(0, 3), 600),
|
(ChunkRange::new(0, 3), 600),
|
||||||
(ChunkRange::new(1, 4), 700),
|
(ChunkRange::new(1, 4), 700),
|
||||||
(ChunkRange::new(3, 5), 400),
|
(ChunkRange::new(3, 5), 400),
|
||||||
]);
|
]);
|
||||||
assert_eq!(XorbBlock::determine_size_if_possible(xorb_range, &terms), Some(1000));
|
assert_eq!(XorbBlock::determine_size_if_possible(ranges, &terms), Some(1000));
|
||||||
}
|
}
|
||||||
|
|
||||||
#[test]
|
#[test]
|
||||||
fn test_duplicate_terms_first_covers() {
|
fn test_duplicate_terms_first_covers() {
|
||||||
// Two identical terms covering the full range.
|
// Two identical terms covering the full range.
|
||||||
let xorb_range = ChunkRange::new(0, 5);
|
let ranges = &[ChunkRange::new(0, 5)];
|
||||||
let terms = build_refs(&[(ChunkRange::new(0, 5), 1000), (ChunkRange::new(0, 5), 1000)]);
|
let terms = build_refs(&[(ChunkRange::new(0, 5), 1000), (ChunkRange::new(0, 5), 1000)]);
|
||||||
assert_eq!(XorbBlock::determine_size_if_possible(xorb_range, &terms), Some(1000));
|
assert_eq!(XorbBlock::determine_size_if_possible(ranges, &terms), Some(1000));
|
||||||
}
|
}
|
||||||
|
|
||||||
#[test]
|
#[test]
|
||||||
fn test_nonzero_xorb_start() {
|
fn test_nonzero_xorb_start() {
|
||||||
let xorb_range = ChunkRange::new(3, 8);
|
let ranges = &[ChunkRange::new(3, 8)];
|
||||||
let terms = build_refs(&[(ChunkRange::new(3, 5), 400), (ChunkRange::new(5, 8), 600)]);
|
let terms = build_refs(&[(ChunkRange::new(3, 5), 400), (ChunkRange::new(5, 8), 600)]);
|
||||||
assert_eq!(XorbBlock::determine_size_if_possible(xorb_range, &terms), Some(1000));
|
assert_eq!(XorbBlock::determine_size_if_possible(ranges, &terms), Some(1000));
|
||||||
}
|
}
|
||||||
|
|
||||||
#[test]
|
#[test]
|
||||||
fn test_nonzero_xorb_start_no_match() {
|
fn test_nonzero_xorb_start_no_match() {
|
||||||
let xorb_range = ChunkRange::new(3, 8);
|
let ranges = &[ChunkRange::new(3, 8)];
|
||||||
let terms = build_refs(&[(ChunkRange::new(3, 5), 400)]);
|
let terms = build_refs(&[(ChunkRange::new(3, 5), 400)]);
|
||||||
assert_eq!(XorbBlock::determine_size_if_possible(xorb_range, &terms), None);
|
assert_eq!(XorbBlock::determine_size_if_possible(ranges, &terms), None);
|
||||||
}
|
}
|
||||||
|
|
||||||
#[test]
|
#[test]
|
||||||
fn test_single_chunk_range() {
|
fn test_single_chunk_range() {
|
||||||
let xorb_range = ChunkRange::new(0, 1);
|
let ranges = &[ChunkRange::new(0, 1)];
|
||||||
let terms = build_refs(&[(ChunkRange::new(0, 1), 42)]);
|
let terms = build_refs(&[(ChunkRange::new(0, 1), 42)]);
|
||||||
assert_eq!(XorbBlock::determine_size_if_possible(xorb_range, &terms), Some(42));
|
assert_eq!(XorbBlock::determine_size_if_possible(ranges, &terms), Some(42));
|
||||||
}
|
}
|
||||||
|
|
||||||
#[test]
|
#[test]
|
||||||
fn test_chain_with_extra_terms_before_and_after() {
|
fn test_chain_with_overlapping_inner_terms() {
|
||||||
// Extra terms that don't participate in the chain but are within the sorted list.
|
let ranges = &[ChunkRange::new(2, 8)];
|
||||||
let xorb_range = ChunkRange::new(2, 8);
|
// The overlapping term [3,6) is within the range but doesn't form
|
||||||
|
// a better chain than [2,5) + [5,8), so it's harmlessly ignored.
|
||||||
let terms = build_refs(&[
|
let terms = build_refs(&[
|
||||||
(ChunkRange::new(0, 2), 100), // before xorb range
|
(ChunkRange::new(2, 5), 500),
|
||||||
(ChunkRange::new(2, 5), 500), // chain start
|
(ChunkRange::new(3, 6), 999),
|
||||||
(ChunkRange::new(3, 6), 999), // overlapping, skipped
|
(ChunkRange::new(5, 8), 300),
|
||||||
(ChunkRange::new(5, 8), 300), // chain end
|
|
||||||
(ChunkRange::new(8, 10), 200), // after xorb range
|
|
||||||
]);
|
]);
|
||||||
assert_eq!(XorbBlock::determine_size_if_possible(xorb_range, &terms), Some(800));
|
assert_eq!(XorbBlock::determine_size_if_possible(ranges, &terms), Some(800));
|
||||||
}
|
}
|
||||||
|
|
||||||
#[test]
|
#[test]
|
||||||
fn test_partial_overlap_no_cover() {
|
fn test_partial_overlap_no_cover() {
|
||||||
// Terms partially overlap but don't form a contiguous chain covering the full range.
|
// Terms partially overlap but don't form a contiguous chain covering the full range.
|
||||||
let xorb_range = ChunkRange::new(0, 10);
|
let ranges = &[ChunkRange::new(0, 10)];
|
||||||
let terms = build_refs(&[
|
let terms = build_refs(&[
|
||||||
(ChunkRange::new(0, 4), 400),
|
(ChunkRange::new(0, 4), 400),
|
||||||
(ChunkRange::new(3, 7), 400),
|
(ChunkRange::new(3, 7), 400),
|
||||||
(ChunkRange::new(6, 10), 400),
|
(ChunkRange::new(6, 10), 400),
|
||||||
]);
|
]);
|
||||||
assert_eq!(XorbBlock::determine_size_if_possible(xorb_range, &terms), None);
|
assert_eq!(XorbBlock::determine_size_if_possible(ranges, &terms), None);
|
||||||
}
|
}
|
||||||
|
|
||||||
#[test]
|
#[test]
|
||||||
fn test_same_start_short_then_long_covering_full() {
|
fn test_same_start_short_then_long_covering_full() {
|
||||||
// Short range first, then a long range that covers the full xorb.
|
// Short range first, then a long range that covers the full xorb.
|
||||||
let xorb_range = ChunkRange::new(0, 5);
|
let ranges = &[ChunkRange::new(0, 5)];
|
||||||
let terms = build_refs(&[(ChunkRange::new(0, 3), 300), (ChunkRange::new(0, 5), 500)]);
|
let terms = build_refs(&[(ChunkRange::new(0, 3), 300), (ChunkRange::new(0, 5), 500)]);
|
||||||
assert_eq!(XorbBlock::determine_size_if_possible(xorb_range, &terms), Some(500));
|
assert_eq!(XorbBlock::determine_size_if_possible(ranges, &terms), Some(500));
|
||||||
}
|
}
|
||||||
|
|
||||||
#[test]
|
#[test]
|
||||||
fn test_same_start_short_then_long_with_chain() {
|
fn test_same_start_short_then_long_with_chain() {
|
||||||
// Short range first, then a longer range, where the short range can also chain.
|
// Short range first, then a longer range, where the short range can also chain.
|
||||||
let xorb_range = ChunkRange::new(0, 6);
|
// Chain via 0..3 + 3..6 = 600
|
||||||
|
let ranges = &[ChunkRange::new(0, 6)];
|
||||||
let terms = build_refs(&[
|
let terms = build_refs(&[
|
||||||
(ChunkRange::new(0, 2), 200),
|
(ChunkRange::new(0, 2), 200),
|
||||||
(ChunkRange::new(0, 3), 300),
|
(ChunkRange::new(0, 3), 300),
|
||||||
(ChunkRange::new(3, 6), 300),
|
(ChunkRange::new(3, 6), 300),
|
||||||
]);
|
]);
|
||||||
// Chain via 0..3 + 3..6 = 600
|
assert_eq!(XorbBlock::determine_size_if_possible(ranges, &terms), Some(600));
|
||||||
assert_eq!(XorbBlock::determine_size_if_possible(xorb_range, &terms), Some(600));
|
|
||||||
}
|
}
|
||||||
|
|
||||||
#[test]
|
#[test]
|
||||||
fn test_same_start_multiple_duplicates_chain_through_second() {
|
fn test_same_start_multiple_duplicates_chain_through_second() {
|
||||||
// Multiple terms at start 0 with different lengths; only the middle one chains.
|
// Multiple terms at start 0 with different lengths; only the middle one chains.
|
||||||
let xorb_range = ChunkRange::new(0, 6);
|
// Chain via 0..4 + 4..6 = 600
|
||||||
|
let ranges = &[ChunkRange::new(0, 6)];
|
||||||
let terms = build_refs(&[
|
let terms = build_refs(&[
|
||||||
(ChunkRange::new(0, 2), 200),
|
(ChunkRange::new(0, 2), 200),
|
||||||
(ChunkRange::new(0, 4), 400),
|
(ChunkRange::new(0, 4), 400),
|
||||||
(ChunkRange::new(0, 5), 500),
|
(ChunkRange::new(0, 5), 500),
|
||||||
(ChunkRange::new(4, 6), 200),
|
(ChunkRange::new(4, 6), 200),
|
||||||
]);
|
]);
|
||||||
// Chain via 0..4 + 4..6 = 600
|
assert_eq!(XorbBlock::determine_size_if_possible(ranges, &terms), Some(600));
|
||||||
assert_eq!(XorbBlock::determine_size_if_possible(xorb_range, &terms), Some(600));
|
|
||||||
}
|
}
|
||||||
|
|
||||||
#[test]
|
#[test]
|
||||||
fn test_same_start_at_midpoint() {
|
fn test_same_start_at_midpoint() {
|
||||||
// Duplicate starts at a midpoint in the chain, not just at the beginning.
|
// Duplicate starts at a midpoint in the chain, not just at the beginning.
|
||||||
let xorb_range = ChunkRange::new(0, 8);
|
// Chain via 0..3 + 3..6 + 6..8 = 800
|
||||||
|
let ranges = &[ChunkRange::new(0, 8)];
|
||||||
let terms = build_refs(&[
|
let terms = build_refs(&[
|
||||||
(ChunkRange::new(0, 3), 300),
|
(ChunkRange::new(0, 3), 300),
|
||||||
(ChunkRange::new(3, 5), 200),
|
(ChunkRange::new(3, 5), 200),
|
||||||
(ChunkRange::new(3, 6), 300),
|
(ChunkRange::new(3, 6), 300),
|
||||||
(ChunkRange::new(6, 8), 200),
|
(ChunkRange::new(6, 8), 200),
|
||||||
]);
|
]);
|
||||||
// Chain via 0..3 + 3..6 + 6..8 = 800
|
assert_eq!(XorbBlock::determine_size_if_possible(ranges, &terms), Some(800));
|
||||||
assert_eq!(XorbBlock::determine_size_if_possible(xorb_range, &terms), Some(800));
|
|
||||||
}
|
}
|
||||||
|
|
||||||
#[test]
|
#[test]
|
||||||
fn test_same_start_none_covers() {
|
fn test_same_start_none_covers() {
|
||||||
// Multiple terms at start 0, but none chain to cover the full range.
|
// Multiple terms at start 0, but none chain to cover the full range.
|
||||||
let xorb_range = ChunkRange::new(0, 10);
|
let ranges = &[ChunkRange::new(0, 10)];
|
||||||
let terms = build_refs(&[
|
let terms = build_refs(&[
|
||||||
(ChunkRange::new(0, 2), 200),
|
(ChunkRange::new(0, 2), 200),
|
||||||
(ChunkRange::new(0, 4), 400),
|
(ChunkRange::new(0, 4), 400),
|
||||||
(ChunkRange::new(0, 6), 600),
|
(ChunkRange::new(0, 6), 600),
|
||||||
]);
|
]);
|
||||||
assert_eq!(XorbBlock::determine_size_if_possible(xorb_range, &terms), None);
|
assert_eq!(XorbBlock::determine_size_if_possible(ranges, &terms), None);
|
||||||
}
|
}
|
||||||
|
|
||||||
#[test]
|
#[test]
|
||||||
fn test_same_start_two_groups_chained() {
|
fn test_same_start_two_groups_chained() {
|
||||||
// Two groups of duplicate-start terms that chain together.
|
// Two groups of duplicate-start terms that chain together.
|
||||||
let xorb_range = ChunkRange::new(0, 6);
|
// Chain via 0..3 + 3..6 = 600
|
||||||
|
let ranges = &[ChunkRange::new(0, 6)];
|
||||||
let terms = build_refs(&[
|
let terms = build_refs(&[
|
||||||
(ChunkRange::new(0, 2), 200),
|
(ChunkRange::new(0, 2), 200),
|
||||||
(ChunkRange::new(0, 3), 300),
|
(ChunkRange::new(0, 3), 300),
|
||||||
(ChunkRange::new(3, 5), 200),
|
(ChunkRange::new(3, 5), 200),
|
||||||
(ChunkRange::new(3, 6), 300),
|
(ChunkRange::new(3, 6), 300),
|
||||||
]);
|
]);
|
||||||
// Chain via 0..3 + 3..6 = 600
|
assert_eq!(XorbBlock::determine_size_if_possible(ranges, &terms), Some(600));
|
||||||
assert_eq!(XorbBlock::determine_size_if_possible(xorb_range, &terms), Some(600));
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_multiple_disjoint_ranges_both_covered() {
|
||||||
|
let ranges = &[ChunkRange::new(0, 3), ChunkRange::new(5, 8)];
|
||||||
|
let terms = build_refs(&[(ChunkRange::new(0, 3), 300), (ChunkRange::new(5, 8), 400)]);
|
||||||
|
assert_eq!(XorbBlock::determine_size_if_possible(ranges, &terms), Some(700));
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn test_multiple_disjoint_ranges_one_uncovered() {
|
||||||
|
let ranges = &[ChunkRange::new(0, 3), ChunkRange::new(5, 8)];
|
||||||
|
let terms = build_refs(&[(ChunkRange::new(0, 3), 300)]);
|
||||||
|
assert_eq!(XorbBlock::determine_size_if_possible(ranges, &terms), None);
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -7,8 +7,8 @@ use anyhow::Result;
|
|||||||
use clap::{Args, Parser, Subcommand};
|
use clap::{Args, Parser, Subcommand};
|
||||||
use http::header::{self, HeaderMap, HeaderValue};
|
use http::header::{self, HeaderMap, HeaderValue};
|
||||||
use walkdir::WalkDir;
|
use walkdir::WalkDir;
|
||||||
|
use xet_client::cas_client::RemoteClient;
|
||||||
use xet_client::cas_client::auth::TokenRefresher;
|
use xet_client::cas_client::auth::TokenRefresher;
|
||||||
use xet_client::cas_client::{Client, RemoteClient};
|
|
||||||
use xet_client::cas_types::{FileRange, QueryReconstructionResponse};
|
use xet_client::cas_types::{FileRange, QueryReconstructionResponse};
|
||||||
use xet_client::hub_client::{BearerCredentialHelper, HubClient, Operation, RepoInfo};
|
use xet_client::hub_client::{BearerCredentialHelper, HubClient, Operation, RepoInfo};
|
||||||
use xet_core_structures::merklehash::MerkleHash;
|
use xet_core_structures::merklehash::MerkleHash;
|
||||||
@@ -230,8 +230,9 @@ async fn query_reconstruction(
|
|||||||
cas_storage_config.custom_headers.clone(),
|
cas_storage_config.custom_headers.clone(),
|
||||||
);
|
);
|
||||||
|
|
||||||
|
// Use V1 directly so the query tool returns the raw QueryReconstructionResponse for inspection.
|
||||||
remote_client
|
remote_client
|
||||||
.get_reconstruction(&file_hash, bytes_range)
|
.get_reconstruction_v1(&file_hash, bytes_range)
|
||||||
.await
|
.await
|
||||||
.map_err(anyhow::Error::from)
|
.map_err(anyhow::Error::from)
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -15,6 +15,48 @@ use super::file_cleaner::Sha256Policy;
|
|||||||
use super::{FileDownloadSession, FileUploadSession, XetFileInfo};
|
use super::{FileDownloadSession, FileUploadSession, XetFileInfo};
|
||||||
use crate::progress_tracking::TrackingProgressUpdater;
|
use crate::progress_tracking::TrackingProgressUpdater;
|
||||||
|
|
||||||
|
/// Describes how hydration (download/smudge) should be performed during a test.
|
||||||
|
///
|
||||||
|
/// Each variant exercises a different reconstruction path:
|
||||||
|
/// - `DirectClient`: Uses `LocalClient` directly (no HTTP server).
|
||||||
|
/// - `ServerV2`: Uses `LocalTestServer` with default V2 reconstruction.
|
||||||
|
/// - `ServerV1Fallback`: Uses `LocalTestServer` with V2 disabled, forcing V1 fallback.
|
||||||
|
/// - `ServerMaxRanges2`: Uses `LocalTestServer` with `max_ranges_per_fetch=2`, forcing multi-range fetch splitting in
|
||||||
|
/// V2 responses.
|
||||||
|
#[derive(Debug, Clone, Copy)]
|
||||||
|
pub enum HydrationMode {
|
||||||
|
DirectClient,
|
||||||
|
ServerV2,
|
||||||
|
ServerV1Fallback,
|
||||||
|
ServerMaxRanges2,
|
||||||
|
}
|
||||||
|
|
||||||
|
impl HydrationMode {
|
||||||
|
pub fn all() -> &'static [HydrationMode] {
|
||||||
|
&[
|
||||||
|
HydrationMode::DirectClient,
|
||||||
|
HydrationMode::ServerV2,
|
||||||
|
HydrationMode::ServerV1Fallback,
|
||||||
|
HydrationMode::ServerMaxRanges2,
|
||||||
|
]
|
||||||
|
}
|
||||||
|
|
||||||
|
pub fn uses_server(&self) -> bool {
|
||||||
|
!matches!(self, HydrationMode::DirectClient)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
impl std::fmt::Display for HydrationMode {
|
||||||
|
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
|
||||||
|
match self {
|
||||||
|
HydrationMode::DirectClient => write!(f, "direct_client"),
|
||||||
|
HydrationMode::ServerV2 => write!(f, "server_v2"),
|
||||||
|
HydrationMode::ServerV1Fallback => write!(f, "server_v1_fallback"),
|
||||||
|
HydrationMode::ServerMaxRanges2 => write!(f, "server_max_ranges_2"),
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
/// Creates or overwrites a single file in `dir` with `size` bytes of random data.
|
/// Creates or overwrites a single file in `dir` with `size` bytes of random data.
|
||||||
/// Panics on any I/O error. Returns the total number of bytes written (=`size`).
|
/// Panics on any I/O error. Returns the total number of bytes written (=`size`).
|
||||||
pub fn create_random_file(path: impl AsRef<Path>, size: usize, seed: u64) -> usize {
|
pub fn create_random_file(path: impl AsRef<Path>, size: usize, seed: u64) -> usize {
|
||||||
@@ -174,6 +216,44 @@ impl HydrateDehydrateTest {
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/// Creates a new test harness configured for a specific hydration mode.
|
||||||
|
pub fn for_mode(mode: HydrationMode) -> Self {
|
||||||
|
Self::new(mode.uses_server())
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Applies hydration mode configuration to the test server.
|
||||||
|
/// Must be called after `dehydrate()` and before `hydrate()`.
|
||||||
|
pub async fn apply_hydration_mode(&mut self, mode: HydrationMode) {
|
||||||
|
match mode {
|
||||||
|
HydrationMode::DirectClient => {},
|
||||||
|
HydrationMode::ServerV2 => {
|
||||||
|
self.ensure_server_created().await;
|
||||||
|
},
|
||||||
|
HydrationMode::ServerV1Fallback => {
|
||||||
|
self.ensure_server_created().await;
|
||||||
|
self.test_server.as_ref().unwrap().client().disable_v2_reconstruction(404);
|
||||||
|
},
|
||||||
|
HydrationMode::ServerMaxRanges2 => {
|
||||||
|
self.ensure_server_created().await;
|
||||||
|
self.test_server.as_ref().unwrap().client().set_max_ranges_per_fetch(2);
|
||||||
|
},
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Ensures the test server is running, creating it if necessary.
|
||||||
|
/// Call this before configuring the server (e.g., disabling V2 or setting max ranges).
|
||||||
|
pub async fn ensure_server_created(&mut self) {
|
||||||
|
if self.use_test_server && self.test_server.is_none() {
|
||||||
|
let local_client = LocalClient::new(self.cas_dir.join("xet/xorbs")).await.unwrap();
|
||||||
|
self.test_server = Some(LocalTestServerBuilder::new().with_client(local_client).start().await);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Returns a reference to the test server, if one has been created.
|
||||||
|
pub fn test_server(&self) -> Option<&LocalTestServer> {
|
||||||
|
self.test_server.as_ref()
|
||||||
|
}
|
||||||
|
|
||||||
/// Lazily initializes the test server (if needed) and returns a CAS client.
|
/// Lazily initializes the test server (if needed) and returns a CAS client.
|
||||||
async fn get_or_create_client(&mut self) -> Arc<dyn Client> {
|
async fn get_or_create_client(&mut self) -> Arc<dyn Client> {
|
||||||
if self.use_test_server {
|
if self.use_test_server {
|
||||||
|
|||||||
@@ -10,18 +10,25 @@ test_set_constants! {
|
|||||||
MAX_XORB_CHUNKS = 8;
|
MAX_XORB_CHUNKS = 8;
|
||||||
}
|
}
|
||||||
|
|
||||||
/// Runs clean/smudge test with all combinations of (use_test_server, sequential).
|
/// Runs clean/smudge test with all combinations of (hydration_mode, sequential).
|
||||||
/// Each combination runs sequentially with its own HydrateDehydrateTest instance to avoid
|
/// Each combination runs sequentially with its own HydrateDehydrateTest instance to avoid
|
||||||
/// too many open files.
|
/// too many open files.
|
||||||
|
///
|
||||||
|
/// This exercises every hydration path for every test case:
|
||||||
|
/// - DirectClient: LocalClient without a server
|
||||||
|
/// - ServerV2: LocalTestServer with default V2 reconstruction
|
||||||
|
/// - ServerV1Fallback: LocalTestServer with V2 disabled (tests V1-to-V2 conversion)
|
||||||
|
/// - ServerMaxRanges2: LocalTestServer with max_ranges_per_fetch=2 (tests fetch splitting)
|
||||||
pub async fn check_clean_smudge_files(file_list: &[(impl AsRef<str> + Clone, usize)]) {
|
pub async fn check_clean_smudge_files(file_list: &[(impl AsRef<str> + Clone, usize)]) {
|
||||||
for use_server in [false, true] {
|
for &mode in HydrationMode::all() {
|
||||||
for sequential in [true, false] {
|
for sequential in [true, false] {
|
||||||
eprintln!("Testing use_test_server={use_server}, sequential={sequential}");
|
eprintln!("Testing mode={mode}, sequential={sequential}");
|
||||||
|
|
||||||
let mut ts = HydrateDehydrateTest::new(use_server);
|
let mut ts = HydrateDehydrateTest::for_mode(mode);
|
||||||
create_random_files(&ts.src_dir, file_list, 0);
|
create_random_files(&ts.src_dir, file_list, 0);
|
||||||
|
|
||||||
ts.dehydrate(sequential).await;
|
ts.dehydrate(sequential).await;
|
||||||
|
ts.apply_hydration_mode(mode).await;
|
||||||
ts.hydrate().await;
|
ts.hydrate().await;
|
||||||
ts.verify_src_dest_match();
|
ts.verify_src_dest_match();
|
||||||
ts.hydrate_partitioned_writers(4).await;
|
ts.hydrate_partitioned_writers(4).await;
|
||||||
@@ -35,18 +42,21 @@ pub async fn check_clean_smudge_files(file_list: &[(impl AsRef<str> + Clone, usi
|
|||||||
/// Helper for multipart tests:
|
/// Helper for multipart tests:
|
||||||
/// - takes a slice of `(String, Vec<(u64, u64)>)` which fully specifies each file.
|
/// - takes a slice of `(String, Vec<(u64, u64)>)` which fully specifies each file.
|
||||||
/// - for each file, calls `create_random_multipart_file` with the given segments.
|
/// - for each file, calls `create_random_multipart_file` with the given segments.
|
||||||
|
///
|
||||||
|
/// Exercises all hydration modes just like `check_clean_smudge_files`.
|
||||||
async fn check_clean_smudge_files_multipart(file_specs: &[(String, Vec<(usize, u64)>)]) {
|
async fn check_clean_smudge_files_multipart(file_specs: &[(String, Vec<(usize, u64)>)]) {
|
||||||
for use_server in [false, true] {
|
for &mode in HydrationMode::all() {
|
||||||
for sequential in [true, false] {
|
for sequential in [true, false] {
|
||||||
eprintln!("Testing use_test_server={use_server}, sequential={sequential}");
|
eprintln!("Testing mode={mode}, sequential={sequential}");
|
||||||
|
|
||||||
let mut ts = HydrateDehydrateTest::new(use_server);
|
let mut ts = HydrateDehydrateTest::for_mode(mode);
|
||||||
|
|
||||||
for (file_name, segments) in file_specs {
|
for (file_name, segments) in file_specs {
|
||||||
create_random_multipart_file(ts.src_dir.join(file_name), segments);
|
create_random_multipart_file(ts.src_dir.join(file_name), segments);
|
||||||
}
|
}
|
||||||
|
|
||||||
ts.dehydrate(sequential).await;
|
ts.dehydrate(sequential).await;
|
||||||
|
ts.apply_hydration_mode(mode).await;
|
||||||
ts.hydrate().await;
|
ts.hydrate().await;
|
||||||
ts.verify_src_dest_match();
|
ts.verify_src_dest_match();
|
||||||
ts.hydrate_partitioned_writers(4).await;
|
ts.hydrate_partitioned_writers(4).await;
|
||||||
|
|||||||
71
xet_data/tests/test_clean_smudge_multirange.rs
Normal file
71
xet_data/tests/test_clean_smudge_multirange.rs
Normal file
@@ -0,0 +1,71 @@
|
|||||||
|
//! Clean/smudge integration tests with `enable_multirange_fetching = true`.
|
||||||
|
//!
|
||||||
|
//! This test binary is a separate copy of a subset of the clean/smudge tests
|
||||||
|
//! that runs with `enable_multirange_fetching` enabled, exercising the
|
||||||
|
//! multirange HTTP request path rather than the default single-range splitting.
|
||||||
|
|
||||||
|
use xet_data::deduplication::constants::{MAX_XORB_BYTES, MAX_XORB_CHUNKS, TARGET_CHUNK_SIZE};
|
||||||
|
use xet_data::processing::test_utils::*;
|
||||||
|
use xet_runtime::{test_set_config, test_set_constants};
|
||||||
|
|
||||||
|
test_set_constants! {
|
||||||
|
TARGET_CHUNK_SIZE = 1024;
|
||||||
|
MAX_XORB_BYTES = 5 * (*TARGET_CHUNK_SIZE);
|
||||||
|
MAX_XORB_CHUNKS = 8;
|
||||||
|
}
|
||||||
|
|
||||||
|
test_set_config! {
|
||||||
|
client {
|
||||||
|
enable_multirange_fetching = true;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
#[cfg(test)]
|
||||||
|
mod testing_clean_smudge_multirange {
|
||||||
|
use super::*;
|
||||||
|
|
||||||
|
pub async fn check_clean_smudge_files(file_list: &[(impl AsRef<str> + Clone, usize)]) {
|
||||||
|
for &mode in HydrationMode::all() {
|
||||||
|
for sequential in [true, false] {
|
||||||
|
eprintln!("Testing mode={mode}, sequential={sequential} (forced multirange)");
|
||||||
|
|
||||||
|
let mut ts = HydrateDehydrateTest::for_mode(mode);
|
||||||
|
create_random_files(&ts.src_dir, file_list, 0);
|
||||||
|
|
||||||
|
ts.dehydrate(sequential).await;
|
||||||
|
ts.apply_hydration_mode(mode).await;
|
||||||
|
ts.hydrate().await;
|
||||||
|
ts.verify_src_dest_match();
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
#[tokio::test(flavor = "multi_thread", worker_threads = 2)]
|
||||||
|
async fn test_simple_directory() {
|
||||||
|
check_clean_smudge_files(&[("a", 16)]).await;
|
||||||
|
}
|
||||||
|
|
||||||
|
#[tokio::test(flavor = "multi_thread", worker_threads = 2)]
|
||||||
|
async fn test_multiple() {
|
||||||
|
check_clean_smudge_files(&[("a", 16), ("b", 8)]).await;
|
||||||
|
}
|
||||||
|
|
||||||
|
#[tokio::test(flavor = "multi_thread", worker_threads = 2)]
|
||||||
|
async fn test_single_large() {
|
||||||
|
check_clean_smudge_files(&[("a", *MAX_XORB_BYTES + 1)]).await;
|
||||||
|
}
|
||||||
|
|
||||||
|
#[tokio::test(flavor = "multi_thread", worker_threads = 2)]
|
||||||
|
async fn test_multiple_large() {
|
||||||
|
check_clean_smudge_files(&[("a", *MAX_XORB_BYTES + 1), ("b", *MAX_XORB_BYTES + 2)]).await;
|
||||||
|
}
|
||||||
|
|
||||||
|
#[tokio::test(flavor = "multi_thread", worker_threads = 2)]
|
||||||
|
async fn test_many_small_multiple_xorbs() {
|
||||||
|
let n = 16;
|
||||||
|
let size = *MAX_XORB_BYTES / 8 + 1;
|
||||||
|
|
||||||
|
let files: Vec<_> = (0..n).map(|idx| (format!("f_{idx}"), size)).collect();
|
||||||
|
check_clean_smudge_files(&files).await;
|
||||||
|
}
|
||||||
|
}
|
||||||
@@ -217,7 +217,7 @@ crate::config_group!({
|
|||||||
/// The default value is 2.
|
/// The default value is 2.
|
||||||
///
|
///
|
||||||
/// Use the environment variable `HF_XET_CLIENT_AC_INITIAL_UPLOAD_CONCURRENCY` to set this value.
|
/// Use the environment variable `HF_XET_CLIENT_AC_INITIAL_UPLOAD_CONCURRENCY` to set this value.
|
||||||
ref ac_initial_upload_concurrency: usize = 1;
|
ref ac_initial_upload_concurrency: usize = 2;
|
||||||
|
|
||||||
/// The maximum number of simultaneous download streams permitted by
|
/// The maximum number of simultaneous download streams permitted by
|
||||||
/// the adaptive concurrency control.
|
/// the adaptive concurrency control.
|
||||||
@@ -238,10 +238,10 @@ crate::config_group!({
|
|||||||
/// The starting number of concurrent download streams, which will increase up to max_concurrent_downloads
|
/// The starting number of concurrent download streams, which will increase up to max_concurrent_downloads
|
||||||
/// on successful completions.
|
/// on successful completions.
|
||||||
///
|
///
|
||||||
/// The default value is 1.
|
/// The default value is 4.
|
||||||
///
|
///
|
||||||
/// Use the environment variable `HF_XET_CLIENT_AC_INITIAL_DOWNLOAD_CONCURRENCY` to set this value.
|
/// Use the environment variable `HF_XET_CLIENT_AC_INITIAL_DOWNLOAD_CONCURRENCY` to set this value.
|
||||||
ref ac_initial_download_concurrency: usize = 1;
|
ref ac_initial_download_concurrency: usize = 4;
|
||||||
|
|
||||||
/// Path to Unix domain socket for CAS HTTP connections.
|
/// Path to Unix domain socket for CAS HTTP connections.
|
||||||
/// When set, all CAS HTTP traffic uses this socket instead of TCP.
|
/// When set, all CAS HTTP traffic uses this socket instead of TCP.
|
||||||
@@ -252,4 +252,24 @@ crate::config_group!({
|
|||||||
/// Use the environment variable `HF_XET_CLIENT_UNIX_SOCKET_PATH` to set this value.
|
/// Use the environment variable `HF_XET_CLIENT_UNIX_SOCKET_PATH` to set this value.
|
||||||
ref unix_socket_path: Option<String> = None;
|
ref unix_socket_path: Option<String> = None;
|
||||||
|
|
||||||
|
/// The reconstruction API version to request from the CAS server.
|
||||||
|
/// When set to 1 or 2, forces that version with no fallback.
|
||||||
|
/// When unset, auto-detects by trying V2 first, falling back to V1 on 404 or 501.
|
||||||
|
///
|
||||||
|
/// The default value is None (auto-detect).
|
||||||
|
///
|
||||||
|
/// Use the environment variable `HF_XET_CLIENT_RECONSTRUCTION_API_VERSION` to set this value.
|
||||||
|
ref reconstruction_api_version: Option<u32> = None;
|
||||||
|
|
||||||
|
/// Whether to use multi-range HTTP requests when fetching xorb data.
|
||||||
|
/// When false (default), V2 multi-range fetch entries are split into
|
||||||
|
/// individual single-range requests executed in parallel, which avoids
|
||||||
|
/// slow server-side multirange processing.
|
||||||
|
/// When true, multi-range requests are sent as-is.
|
||||||
|
///
|
||||||
|
/// The default value is false.
|
||||||
|
///
|
||||||
|
/// Use the environment variable `HF_XET_CLIENT_ENABLE_MULTIRANGE_FETCHING` to set this value.
|
||||||
|
ref enable_multirange_fetching: bool = false;
|
||||||
|
|
||||||
});
|
});
|
||||||
|
|||||||
Reference in New Issue
Block a user