mirror of
https://github.com/huggingface/xet-core.git
synced 2026-06-04 13:30:29 +08:00
Currently, the rust HashMap uses a randomized hasher for input, which prevents hash collision attacks. However, in our code, we don't need that protection in the client, and a MerkleHash is already a cryptographic hash. This PR adds a MerkleHashMap type that just passes the hash through to the HashMap, providing a substantial speedup: ``` ================================================================= PERFORMANCE SUMMARY (times in ms, lower is better) ================================================================= Test HashMap PassThrough ----------------------------------------------------------------- --- 100K --- Insert 2.1 0.7 Lookup 2.1 1.3 Insert+Lookup 4.4 1.6 Serialize 1.6 0.9 Deserialize 4.3 1.2 --- 10M --- Insert 433.2 204.1 Lookup 615.3 255.5 Insert+Lookup 951.6 460.4 Serialize 117.2 93.4 Deserialize 599.5 89.3 ================================================================= ``` It also replaces HashMap<MerkleHash, ...> everywhere in the code to provide an across-the-board improvement.