p2rank

mirror of https://github.com/rdk/p2rank.git synced 2026-06-04 12:44:24 +08:00

Author	SHA1	Message	Date
rdk	c9ad8f71ff	Add vis_site_centers param for rendering site/pocket centroids in PyMOL - New vis_site_centers param (default false) renders centroids as hotpink pseudoatom spheres in both old (PymolRenderer) and new (NewPymolRenderer) - Pass site centroids via RenderingModel.siteCentroids for analyze command - Old renderer shows predicted pocket centroids and ligand centroids - Fix empty visualizations/ dir in eval-predict: create vis dir under predDir instead of top-level outdir	2026-03-04 02:47:43 +01:00
rdk	d5715d9797	Fix PyMol renderer: bulk selections, CIF-to-PDB conversion, site-based labeling - Use bulk atom ID selections instead of per-residue named selections to avoid exceeding PyMOL's object limit on large proteins - Convert CIF inputs to PDB format with correct .pdb extension (PyMOL can't reliably parse BioJava CIF and uses extension to pick parser) - Rename PyMOL object from "protein" to "prot" to avoid reserved keyword - Fix null interpolation in PML when no ligands or no labeling - Build BinaryLabeling from explicit site residues for visualization (item.binaryLabeling doesn't support site-based datasets)	2026-03-04 01:13:48 +01:00
rdk	026be7eae5	Improve analyze binding-sites: visualizations, site radius, eager loading - Add PyMol visualizations using dataset.binaryResidueLabeler - Add site_radius column (max distance from centroid to any site atom) - Add excludeFromSummary param to DataTable.formatSummaryTable to skip center coordinates from numeric summary stats - Load ExplicitSitesIndex eagerly during dataset loading (fail-fast) - Skip CSV rows with empty residue/coordinate fields in AhojUbsSiteParser - Write items without binding sites to separate file in outdir	2026-03-03 21:58:55 +01:00
rdk	9e9a500836	Bump version to 2.6.0-dev.4	2026-03-03 15:00:46 +01:00
rdk	c9fef83950	Use AtomKdTree interface in Atoms and minor cleanups Switch Atoms.kdTree field and buildKdTree() to use the AtomKdTree interface instead of AtomKdTreeV1 directly. Add @NonNull to iterator(), improve initial capacity estimates, and fix whitespace.	2026-03-03 15:00:36 +01:00
rdk	997727e878	Add explicit sites loading and analyze binding-sites command Implement ExplicitSitesIndex for loading binding site definitions from external CSV files (pluggable format system, first format: ahoj_ubs). Sites are resolved during item loading via DatasetItemLoader. Add 'analyze binding-sites' sub-command producing unified CSV and summary stats for both ligand-based and explicit site datasets, with unresolved residue/site tracking for explicit datasets. Remove unused SiteLoader stub.	2026-03-03 15:00:31 +01:00
rdk	8f5da9fdcd	Add fused addWeighted and O(N²) single-linkage clusterer Add GenericVector.addWeighted() for fused multiply-add, eliminating per- neighbor array allocation in feature vector aggregation. Add SLinkClustererV2 using union-find with path compression, reducing single-linkage clustering from O(N³) to O(N²). Wire V2 via factory methods on AtomClusterer and AtomGroupClusterer.	2026-03-03 05:13:05 +01:00
rdk	261dae09c9	Rename consolidate() to sparsify() and add surface_sparsify param Rename Atoms.consolidate() to Atoms.sparsify() for clarity. Use mutable V1 KdTree for O(N log N) incremental insertion instead of periodic rebuilds. Add surface_sparsify runtime param (default true) to allow disabling surface point sparsification. Hardcode AtomKdTreeV1 in Atoms.buildKdTree() and delegate Dataset cache clearing to item methods.	2026-03-03 04:01:54 +01:00
rdk	a66a973e1c	Refactor KdTree into AtomKdTree interface with V1/V2 implementations Rewrite AtomKdTreeV1 from Groovy to Java to eliminate Groovy IndyInterface monitor contention that serialized 16 parallel threads down to ~2. Move V1 KdTree into v1/ subpackage, extract AtomKdTree as a Java interface with factory method dispatching by kdtree_implementation param, and rename the old v2 wrapper to AtomKdTreeV2 implementing the same interface.	2026-03-03 00:17:38 +01:00
rdk	6d47285116	Add kdtree_implementation param and fix quickselect duplicate-key hang Add runtime parameter to switch between KdTree3D (default) and v1 AtomKdTree. Fix O(N²) quickselect degeneration on duplicate coordinates by adding post-partition equal-range scan.	2026-03-02 22:20:59 +01:00
rdk	24b9f5f709	Optimize KdTree3D build: bottom-up bounds, eliminate redundant traversals - Bounding boxes computed bottom-up from leaf scans instead of scanning full data range at every tree level (O(N) vs O(N log N)) - Approximate parent bounds passed down for split-axis selection (O(1) per node instead of O(range) scan) - Remove findNodeCount() and dead code; buildNode returns max index - Resolve split-axis array once in quickselect inner loop	2026-03-02 21:15:41 +01:00
rdk	76026b9297	Refactor Dataset item cache clearing and fix processItem typo Rename processssItem to processItem. Add per-item conditional cache clearing after processing to reduce peak memory. Refactor cleanCaches into clearCache/clearPrimaryCache/clearSecondaryCache with null-safety.	2026-03-02 20:52:07 +01:00
rdk	7f4d37b5c4	Add comparative benchmark test for v1 vs v2 KdTree Parametrized test generates random points, builds both trees, verifies identical results for all query types, and measures relative performance. Skipped during normal test runs; invoked via kdtree-benchmark.sh script.	2026-03-02 20:52:05 +01:00
rdk	6cce0eb016	Rewrite KdTree as immutable, hardcoded 3D implementation in v2 package New KdTree3D.java uses SoA storage, linearized implicit-heap layout, balanced quickselect build, and stack-based traversal. Immutable design eliminates mutable node state, enabling thread-safe concurrent queries. AtomKdTree.groovy provides drop-in API wrapper. Atoms.java switched to v2 with invalidate-on-add pattern and periodic-rebuild consolidate().	2026-03-02 20:13:47 +01:00
rdk	5d9ec9eb58	Fix bugs and add error reporting to analyze subcommands - Fix integer division in BinCounter.getPosRatio() (long/long → double) - Fix broken NaN check in ConservationCloudFeature (== → Double.isNaN) - Fix wrong variable in apo_protein error message (proteinFile → apoProteinFile) - Fix outerLater typo → outerLayer in Atoms.SphereLayers and usages - Fix xenegy_cloud2_layered typo → xenergy_cloud2_layered in Params and usages - Add error reporting (writeErrorCsvs) to all analyze subcommands - Add ignoreLigandsSwitch to doCmdFasta (doesn't need ligands)	2026-03-02 09:59:36 +01:00
rdk	5a38f8f1de	Avoid unnecessary allocations in hot paths Cache aggregated errors in Dataset.Result to avoid recomputing. Use direct x/y/z field access instead of getCoords() in Atoms.copyPoints and PointExportData to avoid double[3] allocations.	2026-03-02 04:47:14 +01:00
rdk	1bbdcbc196	Split dataset by protein chain presence in analyze proteins command Add Dataset.Item.getRow() to reconstruct dataset row strings. In cmdProteins(), collect items into with/without protein chains using ConcurrentLinkedQueue and write split .ds files when any structures lack protein chains.	2026-03-02 02:23:30 +01:00
rdk	22a7dec4bc	Bump version to 2.6.0-dev.3 and update xz dependency to 1.12	2026-03-01 21:04:37 +01:00
rdk	3a8e985eb4	Skip ligand loading in parse-proteins command	2026-03-01 20:48:00 +01:00
rdk	582d5ebf1f	Optimize distance calculations to avoid getCoords() array allocations Add Atom-based sqrDist/dist overloads in PerfUtils that use getX/getY/getZ directly instead of allocating double[] via getCoords(). Refactor Point to store x/y/z as individual fields instead of a double[] array. Fix Point.setCoords() which was previously a no-op. Pre-build KD tree in Ligands.makeLigands() before the ligand loop. Simplify KD tree usage in Atoms.dist/sqrDist by removing redundant size threshold check.	2026-03-01 20:47:56 +01:00
rdk	4240d9e5c8	Add aggregated error reporting and writeErrorCsvs convenience method Add writeAggregatedItemErrorsToCsv to Dataset.Result that groups errors by message and outputs count/error sorted by frequency. Update getErrorSummary to display an aggregated error table instead of just the count. Add writeErrorCsvs(outdir) that writes all three error files (per-item, aggregated, full stack traces) and use it in AnalyzeRoutine.	2026-03-01 19:28:12 +01:00
rdk	a8ab7e97a2	Add analyze proteins and parse-proteins commands with DataTable utility Add 'analyze proteins' command that outputs per-protein stats CSV (chain counts, residues, atoms, ligands, peptides) and a summary table with min/max/avg/median. Add 'analyze parse-proteins' for parsing all dataset items and reporting errors only. Introduce DataTable — a lock-free, pre-registered-column table for structured data collection across threads, with CSV and summary output.	2026-03-01 18:17:07 +01:00
rdk	6434a097f8	Clean up unused imports and sort import order across codebase	2026-02-26 03:46:34 +01:00
rdk	bab04a2a5e	Avoid duplicate console output: skip stdout write when log_to_console is enabled	2026-02-26 01:22:27 +01:00
rdk	e923d199e6	Add external conservation provider with cache, health check, and documentation	2026-02-26 00:07:55 +01:00
rdk	34a742cd1b	Add tests for ResidueSite and site-based evaluation	2026-02-26 00:07:55 +01:00
rdk	347d4e38d6	Implement site-metrics criteria and evaluation - Add ResidueSite and SiteLoader - Update pocket criteria (DCA, DCC, DPA, DSA, DSO, DSWO) for site evaluation - Extend Evaluation with site-metrics support - Bump version to 2.6.0-dev.1	2026-02-26 00:07:55 +01:00
rdk	bfdc87f55b	replace ArrayList<PPred> with PredictedScores parallel-array structure Memory optimization: per-prediction cost reduced from ~40 bytes (PPred object) to ~9 bytes (parallel double[] + boolean[] arrays). For large datasets this reduces prediction storage by ~77%. PredictedScores provides: ArrayList-style growth, bulk addAll via arraycopy, cached observedPositiveCount, stable descending merge sort (required for reproducible metrics with tied RF scores), and direct backing array access for hot loops in Metrics/Curves.	2026-02-26 00:07:55 +01:00
rdk	65fc8f3676	update FasterForest to 2.10.2, add Weka RandomForest conversion support	2026-02-26 00:07:55 +01:00
rdk	5ec88309ef	update FasterForest to 2.10.1	2026-02-26 00:07:55 +01:00
rdk	1f19bdd2a4	fix ModelConverterTest failing on macOS CI: skip NativePanamaFloat forest types when native library unavailable	2026-02-26 00:07:52 +01:00
rdk	2bf6bfa270	update FasterForest to 2.10.0, bump version to 2.5.2-dev.11	2026-02-23 02:18:41 +01:00
rdk	9fcce6156f	add UseCompactObjectHeaders note to local-env.sh template	2026-02-23 02:11:29 +01:00
rdk	57fb214881	update local-env.sh template with throughput-oriented JVM options	2026-02-23 01:17:32 +01:00
rdk	40c7638bc2	implement ModelConverterTest with comprehensive forest conversion tests	2026-02-23 00:54:11 +01:00
rdk	b1a05d3097	bump version to 2.5.2-dev.10	2026-02-23 00:39:54 +01:00
rdk	f3fc9329bc	update FasterForest to 2.9.1, bump JUnit Jupiter to 6.0.3, and add NativePanama flattened eval tests	2026-02-23 00:35:18 +01:00
rdk	4aaf212b9b	update FasterForest to 2.8.1 with NativePanama support and improve eval time tracking - Add NativePanamaForest/NativePanamaForestAvx2 availability checks in ModelConverter - Refactor flattening logic to separate trainable forest preparation from conversion - Track all eval times and compute average excluding first run (caching warmup) - Rename TIME_M to TIME_TRAINEVAL_AVG_M, add TIME_EVAL_AVG_M stat	2026-02-22 21:11:16 +01:00
rdk	b5a8edc377	track last evaluation time in EvalResults for seed loop benchmarks	2026-02-22 17:44:56 +01:00
rdk	3ad261645c	update FasterForest to 2.8.0 and support flattening of FlatBinaryForest models	2026-02-22 17:33:27 +01:00
rdk	8f7d71ffb3	update FasterForest to 2.7.0	2026-02-20 12:58:14 +01:00
rdk	b8f802b145	refactor model flattening to use FasterForestConverter API with configurable target types Generalize Model classifier from Classifier to Object to support both trainable classifiers and flat BinaryForest models. Add rf_flatten_target parameter for selecting forest type (FlatBinaryForest, LegacyFlatBinaryForest, InterleavedBfsForest, etc). Deprecate rf_flatten_as_legacy in favor of the new target type selection.	2026-02-16 01:00:55 +01:00
rdk	de75ac6be1	upgrade FasterForest to 2.6.0 and fix GString compilation errors Replace flat jar with local Maven repo dependency at correct path (groupId/artifactId/version/). Fix GString-to-String type errors in AnalyzeRoutine that broke compilation with @CompileStatic.	2026-02-14 07:35:29 +01:00
rdk	27caa5fe46	sort CSV output rows as strings in analyze commands (chains, chains-residues, labeled-residues)	2026-02-14 06:04:13 +01:00
rdk	93fd8e953a	add experimental rescoring model section to rescoring docs	2026-02-11 18:44:04 +01:00
rdk	ad946de45e	rephrase Requirements section in README	2026-02-11 18:31:13 +01:00
rdk	9711cc7192	fix aa-mapping docs: broken csv link, replace special characters, cleanup	2026-02-11 18:14:03 +01:00
rdk	4a42f664e2	update aa-mapping documentation: add links to pdbfixer source	2026-02-11 18:05:11 +01:00
rdk	652442a8d2	add JVM compatibility flags to run scripts, document all flags	2026-02-11 15:27:39 +01:00
rdk	e9f530ce37	make --sun-misc-unsafe-memory-access conditional on Java 23+	2026-02-11 15:24:09 +01:00

1 2 3 4 5 ...

1880 Commits