Commit Graph

1871 Commits

Author SHA1 Message Date
rdk
6d47285116 Add kdtree_implementation param and fix quickselect duplicate-key hang
Add runtime parameter to switch between KdTree3D (default) and v1
AtomKdTree. Fix O(N²) quickselect degeneration on duplicate coordinates
by adding post-partition equal-range scan.
2026-03-02 22:20:59 +01:00
rdk
24b9f5f709 Optimize KdTree3D build: bottom-up bounds, eliminate redundant traversals
- Bounding boxes computed bottom-up from leaf scans instead of scanning
  full data range at every tree level (O(N) vs O(N log N))
- Approximate parent bounds passed down for split-axis selection (O(1)
  per node instead of O(range) scan)
- Remove findNodeCount() and dead code; buildNode returns max index
- Resolve split-axis array once in quickselect inner loop
2026-03-02 21:15:41 +01:00
rdk
76026b9297 Refactor Dataset item cache clearing and fix processItem typo
Rename processssItem to processItem. Add per-item conditional cache
clearing after processing to reduce peak memory. Refactor cleanCaches
into clearCache/clearPrimaryCache/clearSecondaryCache with null-safety.
2026-03-02 20:52:07 +01:00
rdk
7f4d37b5c4 Add comparative benchmark test for v1 vs v2 KdTree
Parametrized test generates random points, builds both trees, verifies
identical results for all query types, and measures relative performance.
Skipped during normal test runs; invoked via kdtree-benchmark.sh script.
2026-03-02 20:52:05 +01:00
rdk
6cce0eb016 Rewrite KdTree as immutable, hardcoded 3D implementation in v2 package
New KdTree3D.java uses SoA storage, linearized implicit-heap layout,
balanced quickselect build, and stack-based traversal. Immutable design
eliminates mutable node state, enabling thread-safe concurrent queries.

AtomKdTree.groovy provides drop-in API wrapper. Atoms.java switched to
v2 with invalidate-on-add pattern and periodic-rebuild consolidate().
2026-03-02 20:13:47 +01:00
rdk
5d9ec9eb58 Fix bugs and add error reporting to analyze subcommands
- Fix integer division in BinCounter.getPosRatio() (long/long → double)
- Fix broken NaN check in ConservationCloudFeature (== → Double.isNaN)
- Fix wrong variable in apo_protein error message (proteinFile → apoProteinFile)
- Fix outerLater typo → outerLayer in Atoms.SphereLayers and usages
- Fix xenegy_cloud2_layered typo → xenergy_cloud2_layered in Params and usages
- Add error reporting (writeErrorCsvs) to all analyze subcommands
- Add ignoreLigandsSwitch to doCmdFasta (doesn't need ligands)
2026-03-02 09:59:36 +01:00
rdk
5a38f8f1de Avoid unnecessary allocations in hot paths
Cache aggregated errors in Dataset.Result to avoid recomputing.
Use direct x/y/z field access instead of getCoords() in
Atoms.copyPoints and PointExportData to avoid double[3] allocations.
2026-03-02 04:47:14 +01:00
rdk
1bbdcbc196 Split dataset by protein chain presence in analyze proteins command
Add Dataset.Item.getRow() to reconstruct dataset row strings.
In cmdProteins(), collect items into with/without protein chains
using ConcurrentLinkedQueue and write split .ds files when any
structures lack protein chains.
2026-03-02 02:23:30 +01:00
rdk
22a7dec4bc Bump version to 2.6.0-dev.3 and update xz dependency to 1.12 2026-03-01 21:04:37 +01:00
rdk
3a8e985eb4 Skip ligand loading in parse-proteins command 2026-03-01 20:48:00 +01:00
rdk
582d5ebf1f Optimize distance calculations to avoid getCoords() array allocations
Add Atom-based sqrDist/dist overloads in PerfUtils that use
getX/getY/getZ directly instead of allocating double[] via getCoords().
Refactor Point to store x/y/z as individual fields instead of a
double[] array. Fix Point.setCoords() which was previously a no-op.
Pre-build KD tree in Ligands.makeLigands() before the ligand loop.
Simplify KD tree usage in Atoms.dist/sqrDist by removing redundant
size threshold check.
2026-03-01 20:47:56 +01:00
rdk
4240d9e5c8 Add aggregated error reporting and writeErrorCsvs convenience method
Add writeAggregatedItemErrorsToCsv to Dataset.Result that groups errors
by message and outputs count/error sorted by frequency. Update
getErrorSummary to display an aggregated error table instead of just
the count. Add writeErrorCsvs(outdir) that writes all three error files
(per-item, aggregated, full stack traces) and use it in AnalyzeRoutine.
2026-03-01 19:28:12 +01:00
rdk
a8ab7e97a2 Add analyze proteins and parse-proteins commands with DataTable utility
Add 'analyze proteins' command that outputs per-protein stats CSV
(chain counts, residues, atoms, ligands, peptides) and a summary table
with min/max/avg/median. Add 'analyze parse-proteins' for parsing
all dataset items and reporting errors only.

Introduce DataTable — a lock-free, pre-registered-column table for
structured data collection across threads, with CSV and summary output.
2026-03-01 18:17:07 +01:00
rdk
6434a097f8 Clean up unused imports and sort import order across codebase 2026-02-26 03:46:34 +01:00
rdk
bab04a2a5e Avoid duplicate console output: skip stdout write when log_to_console is enabled 2026-02-26 01:22:27 +01:00
rdk
e923d199e6 Add external conservation provider with cache, health check, and documentation 2026-02-26 00:07:55 +01:00
rdk
34a742cd1b Add tests for ResidueSite and site-based evaluation 2026-02-26 00:07:55 +01:00
rdk
347d4e38d6 Implement site-metrics criteria and evaluation
- Add ResidueSite and SiteLoader
- Update pocket criteria (DCA, DCC, DPA, DSA, DSO, DSWO) for site evaluation
- Extend Evaluation with site-metrics support
- Bump version to 2.6.0-dev.1
2026-02-26 00:07:55 +01:00
rdk
bfdc87f55b replace ArrayList<PPred> with PredictedScores parallel-array structure
Memory optimization: per-prediction cost reduced from ~40 bytes (PPred object)
to ~9 bytes (parallel double[] + boolean[] arrays). For large datasets this
reduces prediction storage by ~77%.

PredictedScores provides: ArrayList-style growth, bulk addAll via arraycopy,
cached observedPositiveCount, stable descending merge sort (required for
reproducible metrics with tied RF scores), and direct backing array access
for hot loops in Metrics/Curves.
2026-02-26 00:07:55 +01:00
rdk
65fc8f3676 update FasterForest to 2.10.2, add Weka RandomForest conversion support 2026-02-26 00:07:55 +01:00
rdk
5ec88309ef update FasterForest to 2.10.1 2026-02-26 00:07:55 +01:00
rdk
1f19bdd2a4 fix ModelConverterTest failing on macOS CI: skip NativePanamaFloat forest types when native library unavailable 2026-02-26 00:07:52 +01:00
rdk
2bf6bfa270 update FasterForest to 2.10.0, bump version to 2.5.2-dev.11 2026-02-23 02:18:41 +01:00
rdk
9fcce6156f add UseCompactObjectHeaders note to local-env.sh template 2026-02-23 02:11:29 +01:00
rdk
57fb214881 update local-env.sh template with throughput-oriented JVM options 2026-02-23 01:17:32 +01:00
rdk
40c7638bc2 implement ModelConverterTest with comprehensive forest conversion tests 2026-02-23 00:54:11 +01:00
rdk
b1a05d3097 bump version to 2.5.2-dev.10 2026-02-23 00:39:54 +01:00
rdk
f3fc9329bc update FasterForest to 2.9.1, bump JUnit Jupiter to 6.0.3, and add NativePanama flattened eval tests 2026-02-23 00:35:18 +01:00
rdk
4aaf212b9b update FasterForest to 2.8.1 with NativePanama support and improve eval time tracking
- Add NativePanamaForest/NativePanamaForestAvx2 availability checks in ModelConverter
- Refactor flattening logic to separate trainable forest preparation from conversion
- Track all eval times and compute average excluding first run (caching warmup)
- Rename TIME_M to TIME_TRAINEVAL_AVG_M, add TIME_EVAL_AVG_M stat
2026-02-22 21:11:16 +01:00
rdk
b5a8edc377 track last evaluation time in EvalResults for seed loop benchmarks 2026-02-22 17:44:56 +01:00
rdk
3ad261645c update FasterForest to 2.8.0 and support flattening of FlatBinaryForest models 2026-02-22 17:33:27 +01:00
rdk
8f7d71ffb3 update FasterForest to 2.7.0 2026-02-20 12:58:14 +01:00
rdk
b8f802b145 refactor model flattening to use FasterForestConverter API with configurable target types
Generalize Model classifier from Classifier to Object to support both
trainable classifiers and flat BinaryForest models. Add rf_flatten_target
parameter for selecting forest type (FlatBinaryForest, LegacyFlatBinaryForest,
InterleavedBfsForest, etc). Deprecate rf_flatten_as_legacy in favor of the
new target type selection.
2026-02-16 01:00:55 +01:00
rdk
de75ac6be1 upgrade FasterForest to 2.6.0 and fix GString compilation errors
Replace flat jar with local Maven repo dependency at correct path
(groupId/artifactId/version/). Fix GString-to-String type errors in
AnalyzeRoutine that broke compilation with @CompileStatic.
2026-02-14 07:35:29 +01:00
rdk
27caa5fe46 sort CSV output rows as strings in analyze commands (chains, chains-residues, labeled-residues) 2026-02-14 06:04:13 +01:00
rdk
93fd8e953a add experimental rescoring model section to rescoring docs 2026-02-11 18:44:04 +01:00
rdk
ad946de45e rephrase Requirements section in README 2026-02-11 18:31:13 +01:00
rdk
9711cc7192 fix aa-mapping docs: broken csv link, replace special characters, cleanup 2026-02-11 18:14:03 +01:00
rdk
4a42f664e2 update aa-mapping documentation: add links to pdbfixer source 2026-02-11 18:05:11 +01:00
rdk
652442a8d2 add JVM compatibility flags to run scripts, document all flags 2026-02-11 15:27:39 +01:00
rdk
e9f530ce37 make --sun-misc-unsafe-memory-access conditional on Java 23+ 2026-02-11 15:24:09 +01:00
rdk
6e35db0390 update build instructions in README 2026-02-11 15:03:40 +01:00
rdk
752a645937 exclude unavailable openchart transitive dep from biojava-alignment 2026-02-11 14:36:38 +01:00
rdk
126a0653f0 move tutorials to documentation/, update rescoring tutorial and README
Move misc/tutorials/ to documentation/ and add index readme.
Update rescoring.md: add quick-start examples, paper links for all
methods, add Pocketeer to supported methods list.
Fix stale links in README.md (tutorials path, local-env.sh typo).
2026-02-11 10:52:20 +01:00
rdk
7634c57749 add pocketeer prediction loader and rescoring tutorial
Add PocketeerLoader that parses pockets.json output from Pocketeer,
including alpha spheres, residues, centroids, and surface atom mapping.
Register "pocketeer" as a prediction method in Dataset. Add unit tests
covering all 7 available datasets (CIF and PDB). Add rescoring tutorial
documenting all supported methods with examples.
2026-02-11 10:22:46 +01:00
rdk
8614bed9c5 add pocketeer output examples and schema 2026-02-11 08:38:02 +01:00
rdk
65aee4cc84 add aa-mapping tutorial documenting non-canonical residue mapping feature 2026-02-11 02:01:49 +01:00
rdk
ed048ecf83 add non-canonical residue mapping (default, pdbfixer, custom CSV modes) #79 2026-02-11 01:41:55 +01:00
rdk
9c35bd542c update export-points tutorial to document new export-points command 2026-02-10 22:17:11 +01:00
rdk
26af252659 bump version to 2.5.2-dev.6 2026-02-08 23:43:22 +01:00