rdkit

mirror of https://github.com/rdkit/rdkit.git synced 2026-06-03 21:44:30 +08:00

Files

Dan Nealschneider 76a32ef1ee synthon perf: replace sort+unique dedup with boost::unordered_flat_set (#9305 )

sortAndUniquifyToTry previously built a parallel vector of (index, string)
pairs, sorted by string, erased duplicates, then rebuilt the original vector
— O(N log N) with one heap allocation per candidate product.

Replace with an erase-remove over a boost::unordered_flat_set<size_t> keyed
on buildProductHash (boost::hash_combine over synthon IDs + reaction ID).
Dedup is now O(N) average with no string allocations on the hot path.

Also switch SearchResults::d_molNames from std::unordered_set<std::string>
to boost::unordered_flat_set<std::string> for the same open-addressing cache
locality benefit during mergeResults.

Perf (42-rxn / 140B-product Freedom space, maxHits=3000, hitStart=1000,
9 queries; vanilla.log → 2unordered_flat_set.log):
  Benzene:       6.92s → 5.64s  (−19%)
  Tolueneish:    6.19s → 5.07s  (−18%)
  Acetaminophen: 4.50s → 3.63s  (−19%)
  Allopurinol:   4.41s → 3.94s  (−11%)
  Theophylline:  4.39s → 3.90s  (−11%)
  Nicotine:      4.87s → 3.97s  (−18%)
  Ciprofloxacin: 6.82s → 6.09s  (−11%)
  Aspirin:       4.51s → 3.42s  (−24%)
  Metoprolol:    5.11s → 4.07s  (−20%)
  Total:        48.40s → 40.33s (−17%)

Hit counts and MaxNumResults unchanged across all queries.

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-05-28 17:13:03 +02:00

Bench

add more benchmarking (#8878 )

2025-11-27 14:25:57 +01:00

Catalogs

Refactor iostreams includes (#8846 )

2025-10-08 16:08:01 +02:00

ChemicalFeatures

Switch a bunch of C++ tests to use catch2 (#8625 )

2025-07-18 11:50:38 +02:00

cmake

Stop External/rapidjson-1.1.0 and Code/RDGeneral going to ${CMAKE_SOURCE_DIR} (#8810 )