1965 Commits

Author SHA1 Message Date
rdk
a66bea74be Add eval_output_prediction_files param to output per-protein prediction CSVs in eval commands 2026-03-17 18:59:13 +01:00
rdk
faddcfb70f Lazy-init EnergyCalculator and LJEnergyCalculator in energy features 2.6.0-dev.7 2026-03-16 07:55:16 +01:00
rdk
48cb681aaa Refactor DSO/DSWO: replace Tuple2 with OverlapCounts, cache counts instead of Atoms, simplify CdkUtils 2026-03-16 03:20:48 +01:00
rdk
5b4613c3a4 Extract FpocketAdHocHelper, add run_fpocket_ad_hoc param for eval-rescore and rescore commands 2026-03-16 03:20:41 +01:00
rdk
ba53b97e90 Add per-method CSVs and grouped summary to binding-site-centers, add DataTable filter/distinctValues/formatGroupedSummaryTable 2026-03-16 01:06:44 +01:00
rdk
91987129fe Bump version to 2.6.0-dev.7 2026-03-15 21:37:05 +01:00
rdk
8852739016 Add DCC_4 protein-centric success rate metrics 2026-03-15 21:35:53 +01:00
rdk
a814157e2b Minor cleanups: fix typos, normalize loop syntax and imports in Evaluation 2026-03-15 21:32:23 +01:00
rdk
f3616da217 Unify Protein.sites to contain all binding sites, add predictedPocket to BindingSite interface
Protein.sites now holds ground-truth binding sites for both ligand-defined
and explicit (residue-based) evaluation modes. Sites are populated from
ligands via populateSitesFromLigands() when no explicit sites are defined.

- Add predictedPocket and setSasPoints to BindingSite interface
- Add predictedPocket field to ResidueSite
- Rename assignPocketsToLigands to assignPocketsToSites (works on BindingSite)
- Update calcCoveragesProt to use BindingSite.predictedPocket
- Determine isLigandMode via instanceof instead of sites.isEmpty()
- Unify PymolRenderer sites/ligands branch into single BindingSite loop
- Simplify AnalyzeRoutine.cmdBindingSiteCenters to use p.sites directly
2026-03-15 21:25:49 +01:00
rdk
829cf9b8be Return typed result objects from calcConservationStats and calcOverlapStatsForPockets 2026-03-15 20:28:51 +01:00
rdk
8a516228e1 Fix @CompileStatic errors in Evaluation: destructuring assignment, int-to-Double casts 2026-03-15 19:59:15 +01:00
rdk
5ac9aab18a Refactor Evaluation: simplify avg/div methods, use Function instead of Closure, extract writeScoresToFileIfRequested 2026-03-15 19:27:15 +01:00
rdk
20236ef092 Refactor conservation/chains analysis, add @CompileStatic to Evaluation, rename criterium to criterion 2026-03-15 17:59:53 +01:00
rdk
d9de1fba7e Add contact_atoms_centroid site evaluation center method for ligand-defined sites 2026-03-15 17:09:04 +01:00
rdk
49a8430a7d Add binding-site-centers command, refactor center methods, consolidate error reporting
- Rename SiteCentroidMethod to SiteCenterMethod
- Extract getCenterForMethod(SiteCenterMethod) into BindingSite interface
  for thread-safe, param-independent center calculation
- Refactor Ligand/ResidueSite getCenterForEval() to delegate to getCenterForMethod()
- Add analyze binding-site-centers command comparing all center methods per site
- Add Dataset.Result.writeErrorsAndGetSummary() and use it across all
  AnalyzeRoutine commands for consistent error reporting to both console and CSV
2026-03-14 18:22:47 +01:00
rdk
0e0cb47907 Add ca_atoms_centroid site evaluation center method with tests 2026-03-14 15:57:41 +01:00
rdk
1ecb29f876 Add load_ligands_from_separate_files param for loading ligands from individual ligand_* files 2026-03-13 18:21:26 +01:00
rdk
0b5b61304d Add legacy conservation file name format fallback (e.g. 2ed4_A.) 2026-03-13 17:22:27 +01:00
rdk
e7fc457f6a Fix ligand detection for BioJava GroupType misclassifications
BioJava assigns GroupType based on its Chemical Component Dictionary,
not structural role. Ligands in non-polymer chains can get any GroupType:
- GDP, GTP, ATP -> GroupType.NUCLEOTIDE
- SHR and similar -> GroupType.AMINOACID
- Most others -> GroupType.HETATM

Previously only HETATM groups were detected as ligands, causing errors
like "Ligand definition 'GDP' matches no ligands" for nucleotide and
amino acid derivative ligands.

Fix: any non-water group in a NONPOLYMER chain is now a ligand
candidate, regardless of GroupType. Polymer chain groups (protein AA,
DNA/RNA) are only included if they have GroupType.HETATM.

Add test PDB files (1a2kC.pdb with GDP, 1e5qA.pdb with SHR) and
comprehensive tests for all three GroupType cases.
2.6.0-dev.6
2026-03-10 14:34:28 +01:00
rdk
d78f80ee73 Extract writeCases() method, rename sites.csv to observed_sites.csv
Consolidate case CSV writing into Evaluation.writeCases(). Remove
duplicate DSO_0.1 criterion and stale TODO comments.
2026-03-10 03:24:44 +01:00
rdk
838b0a697f Fix integer division bug in DSO criterion and clean up
The Jaccard ratio was computed as int/int, always producing 0 or 1,
making fractional thresholds ineffective. Cast to double for correct
floating-point division. Also fix typo (cahe->cache), remove debug
comments, and update javadoc.
2026-03-10 02:27:11 +01:00
rdk
2de315e9e0 Rename API: PocketCriterium->PocketCriterion, getLigandAtoms->getAtoms, centroid->center
- Rename PocketCriterium to PocketCriterion (fix Latin spelling)
- Revert getLigandAtoms() back to getAtoms() in BindingSite interface
- Rename getCentroidForEval() to getCenterForEval()
- Rename explicitCentroid to explicitCenter in ResidueSite
- Rename SiteCentroidMethod values: explicit_centroid->explicit,
  sas_points_center_of_mass->sas_points_centroid
- Rename site_centroid_method param to site_eval_center_method
- Ligand.getCentroid() now delegates to getCenterForEval()
2026-03-10 02:02:47 +01:00
rdk
412c590dcb Fix CSV spacing consistency: remove padding and trailing spaces
Remove leading-space padding from fmt calls in getMiscStatsCSV and
FeatureImportances, fix header/data spacing mismatch in toPocketsCSV,
and remove trailing space in toLigandsCSV header.
2026-03-09 13:32:51 +01:00
rdk
fdebd71daf Add example Jupyter notebook for analyzing P2Rank output
Add notebook loading _predictions.csv and _residues.csv with example
data from predict_1fbl. Clean up CSV formatting: remove padding from
values, add fmtCsv() without leading spaces for CSV output.
2026-03-09 12:05:00 +01:00
rdk
61b8863c27 Simplify CSV output formatting and add null guard in CsvRow
Remove fixed-width column padding from PredictionSummary, fix spacing
in ResidueLabelings CSV output, and add null safety in CsvRow.add().
2026-03-09 11:17:59 +01:00
rdk
42ad4dfe9f Move centerOfMass and calculateCentroid to PerfUtils to avoid array allocation
Reimplements BioJava's centerOfMass and Atoms.calculateCentroid in
PerfUtils accepting Collection directly, avoiding temporary array
allocation. Adds delegate methods in Struct.
2026-03-09 02:22:48 +01:00
rdk
d9b34ffbde Bump version to 2.6.0-dev.5 and update dependencies
Update parquet-floor 1.60→1.62, CDK 2.11→2.12. Add dev config.
2026-03-07 23:15:17 +01:00
rdk
af2f68e7b9 Add sites.csv to eval output and rename getAtoms() to getLigandAtoms() in BindingSite
Add unified sites.csv (alongside ligands.csv) containing site type, centroid
coordinates, radius, and residue counts for both ligand-defined and explicit
sites. Rename BindingSite.getAtoms() to getLigandAtoms() for clarity and
update all callers.
2026-03-07 22:53:03 +01:00
rdk
228cd1ab18 Fix review issues: stale comments, null centroid in closestPocket, docs
- DCC: remove stale comment, avoid double-call of centroidForEval
- ResidueSite/Ligand: fix stale javadocs, reference SiteCentroidMethod
- Ligand: add missing getLigandAtoms() for renamed BindingSite interface
- Evaluation.closestPocket(): skip pockets with null centroid
- Params: document site_centroid_method default semantics
- PocketRescorer: document point labeling vs DCA site representation gap
2026-03-07 20:35:01 +01:00
rdk
60225e3f1f Add null guards for centroids in DCC and DCA criteria
Prevents NPE when site centroid is null (e.g. buried residues with no
SAS points when using sas_points_center_of_mass) or pocket centroid is
null (e.g. PUResNetPocket).
2026-03-05 05:06:22 +01:00
rdk
7adb080022 Write error files to outdir in finalizeDatasetResult
Write errors.csv, errors_aggregated.csv, and errors_full.txt.gz with
full stack traces when processing errors occur. Also rename pockets.csv
to predicted_pockets.csv in eval results output.
2026-03-05 04:16:44 +01:00
rdk
ed8e9cabe9 Add configurable site centroid method and SAS-as-atoms option for evaluation
Add SiteCentroidMethod enum with support flags for ligand/explicit sites.
Rename ResidueSite.centroid to explicitCentroid, add getCentroidForEval()
to BindingSite interface used by DCC. Add site_eval_sas_pts_as_atoms param
to allow DCA to use SAS points instead of atoms for site representation.
2026-03-05 02:16:55 +01:00
rdk
ea0968816b Render predicted pocket and explicit site centroids in PyMOL renderer
Render predicted pocket centroids with individual pocket colors and
explicit site centroids (or ligand centroids as fallback) as hotpink
spheres, controlled by vis_site_centers param.
2026-03-04 21:37:56 +01:00
rdk
22ac1e51ee Fix DCC criterion to use predefined site centroid for ResidueSites
DCC was computing site.atoms.centroid (geometric center of resolved
residue atoms) instead of site.getCentroid(). For ResidueSites this
returns the predefined centroid from the input file, which is the
authoritative binding site location. For Ligands this changes from
geometric to mass-weighted center (negligible difference).
2026-03-04 04:00:18 +01:00
rdk
53500dd129 Fix SAS point classification stats for explicit-site datasets and improve cluster logging
- PocketRescorer: fall back to explicit site residue atoms for point
  labeling when no ligand atoms are available, fixing 0-positives in
  binary classification stats for site-based eval-predict
- SLinkClustererV2: log cluster count and sizes instead of full contents
2026-03-04 03:55:26 +01:00
rdk
c9ad8f71ff Add vis_site_centers param for rendering site/pocket centroids in PyMOL
- New vis_site_centers param (default false) renders centroids as hotpink
  pseudoatom spheres in both old (PymolRenderer) and new (NewPymolRenderer)
- Pass site centroids via RenderingModel.siteCentroids for analyze command
- Old renderer shows predicted pocket centroids and ligand centroids
- Fix empty visualizations/ dir in eval-predict: create vis dir under
  predDir instead of top-level outdir
2026-03-04 02:47:43 +01:00
rdk
d5715d9797 Fix PyMol renderer: bulk selections, CIF-to-PDB conversion, site-based labeling
- Use bulk atom ID selections instead of per-residue named selections to
  avoid exceeding PyMOL's object limit on large proteins
- Convert CIF inputs to PDB format with correct .pdb extension (PyMOL
  can't reliably parse BioJava CIF and uses extension to pick parser)
- Rename PyMOL object from "protein" to "prot" to avoid reserved keyword
- Fix null interpolation in PML when no ligands or no labeling
- Build BinaryLabeling from explicit site residues for visualization
  (item.binaryLabeling doesn't support site-based datasets)
2026-03-04 01:13:48 +01:00
rdk
026be7eae5 Improve analyze binding-sites: visualizations, site radius, eager loading
- Add PyMol visualizations using dataset.binaryResidueLabeler
- Add site_radius column (max distance from centroid to any site atom)
- Add excludeFromSummary param to DataTable.formatSummaryTable to skip
  center coordinates from numeric summary stats
- Load ExplicitSitesIndex eagerly during dataset loading (fail-fast)
- Skip CSV rows with empty residue/coordinate fields in AhojUbsSiteParser
- Write items without binding sites to separate file in outdir
2026-03-03 21:58:55 +01:00
rdk
9e9a500836 Bump version to 2.6.0-dev.4 2026-03-03 15:00:46 +01:00
rdk
c9fef83950 Use AtomKdTree interface in Atoms and minor cleanups
Switch Atoms.kdTree field and buildKdTree() to use the AtomKdTree
interface instead of AtomKdTreeV1 directly. Add @NonNull to iterator(),
improve initial capacity estimates, and fix whitespace.
2026-03-03 15:00:36 +01:00
rdk
997727e878 Add explicit sites loading and analyze binding-sites command
Implement ExplicitSitesIndex for loading binding site definitions from
external CSV files (pluggable format system, first format: ahoj_ubs).
Sites are resolved during item loading via DatasetItemLoader.

Add 'analyze binding-sites' sub-command producing unified CSV and summary
stats for both ligand-based and explicit site datasets, with unresolved
residue/site tracking for explicit datasets.

Remove unused SiteLoader stub.
2026-03-03 15:00:31 +01:00
rdk
8f5da9fdcd Add fused addWeighted and O(N²) single-linkage clusterer
Add GenericVector.addWeighted() for fused multiply-add, eliminating per-
neighbor array allocation in feature vector aggregation. Add SLinkClustererV2
using union-find with path compression, reducing single-linkage clustering
from O(N³) to O(N²). Wire V2 via factory methods on AtomClusterer and
AtomGroupClusterer.
2026-03-03 05:13:05 +01:00
rdk
261dae09c9 Rename consolidate() to sparsify() and add surface_sparsify param
Rename Atoms.consolidate() to Atoms.sparsify() for clarity. Use mutable
V1 KdTree for O(N log N) incremental insertion instead of periodic
rebuilds. Add surface_sparsify runtime param (default true) to allow
disabling surface point sparsification. Hardcode AtomKdTreeV1 in
Atoms.buildKdTree() and delegate Dataset cache clearing to item methods.
2026-03-03 04:01:54 +01:00
rdk
a66a973e1c Refactor KdTree into AtomKdTree interface with V1/V2 implementations
Rewrite AtomKdTreeV1 from Groovy to Java to eliminate Groovy IndyInterface
monitor contention that serialized 16 parallel threads down to ~2.
Move V1 KdTree into v1/ subpackage, extract AtomKdTree as a Java interface
with factory method dispatching by kdtree_implementation param, and rename
the old v2 wrapper to AtomKdTreeV2 implementing the same interface.
2026-03-03 00:17:38 +01:00
rdk
6d47285116 Add kdtree_implementation param and fix quickselect duplicate-key hang
Add runtime parameter to switch between KdTree3D (default) and v1
AtomKdTree. Fix O(N²) quickselect degeneration on duplicate coordinates
by adding post-partition equal-range scan.
2026-03-02 22:20:59 +01:00
rdk
24b9f5f709 Optimize KdTree3D build: bottom-up bounds, eliminate redundant traversals
- Bounding boxes computed bottom-up from leaf scans instead of scanning
  full data range at every tree level (O(N) vs O(N log N))
- Approximate parent bounds passed down for split-axis selection (O(1)
  per node instead of O(range) scan)
- Remove findNodeCount() and dead code; buildNode returns max index
- Resolve split-axis array once in quickselect inner loop
2026-03-02 21:15:41 +01:00
rdk
76026b9297 Refactor Dataset item cache clearing and fix processItem typo
Rename processssItem to processItem. Add per-item conditional cache
clearing after processing to reduce peak memory. Refactor cleanCaches
into clearCache/clearPrimaryCache/clearSecondaryCache with null-safety.
2026-03-02 20:52:07 +01:00
rdk
7f4d37b5c4 Add comparative benchmark test for v1 vs v2 KdTree
Parametrized test generates random points, builds both trees, verifies
identical results for all query types, and measures relative performance.
Skipped during normal test runs; invoked via kdtree-benchmark.sh script.
2026-03-02 20:52:05 +01:00
rdk
6cce0eb016 Rewrite KdTree as immutable, hardcoded 3D implementation in v2 package
New KdTree3D.java uses SoA storage, linearized implicit-heap layout,
balanced quickselect build, and stack-based traversal. Immutable design
eliminates mutable node state, enabling thread-safe concurrent queries.

AtomKdTree.groovy provides drop-in API wrapper. Atoms.java switched to
v2 with invalidate-on-add pattern and periodic-rebuild consolidate().
2026-03-02 20:13:47 +01:00
rdk
5d9ec9eb58 Fix bugs and add error reporting to analyze subcommands
- Fix integer division in BinCounter.getPosRatio() (long/long → double)
- Fix broken NaN check in ConservationCloudFeature (== → Double.isNaN)
- Fix wrong variable in apo_protein error message (proteinFile → apoProteinFile)
- Fix outerLater typo → outerLayer in Atoms.SphereLayers and usages
- Fix xenegy_cloud2_layered typo → xenergy_cloud2_layered in Params and usages
- Add error reporting (writeErrorCsvs) to all analyze subcommands
- Add ignoreLigandsSwitch to doCmdFasta (doesn't need ligands)
2026-03-02 09:59:36 +01:00