1129 Commits

Author SHA1 Message Date
rdk
af1d6eeb18 Drop frozen pocket-grid PLAN/SPEC; refine audit punch-list
PLAN.md and SPEC.md were pre-implementation design docs for the pocket-grid
feature. The feature has shipped, so they're frozen artifacts in the active
todo/ namespace. Delete them and strip the three "see SPEC.md" comments that
pointed at SPEC.md from Main.groovy and the predict/rescore routines.

Also reassess the PyMOL rank-gap entry in the audit: P2Rank ranks pockets
contiguously throughout the predict path and all in-tree loaders (except
SiteHoundLoader), so the previously-listed "renderer ignores rank gaps" is
cosmetic-only (empty objects in the Models panel for small pockets whose
filled BitSet ended up empty). Downgrade to a parity nit under
Inconsistencies; promote the PUResNet surfaceAtoms re-linking to the Top-5.
2026-05-20 19:42:47 +02:00
rdk
556ea9faa8 Cleanup batch: BitSet reuse + idiom touches (no user-visible perf change)
Pure technical cleanup, not a perf win — savings are microseconds per
protein. The useful artifact is item 2 below: bytecode verification that
the existing Groovy/BitSet workaround is still needed.

- MorphologicalCloser: pre-allocate the two per-iteration BitSets and
  reuse via swap+clear. Zero BitSet allocations inside the loop (vs
  two per iter previously).
- PocketGridRows: tried replacing the manual BitSet-OR loop with a
  direct .or(bs) call. Bytecode inspection showed Groovy dispatches it
  under @CompileStatic to DefaultGroovyMethods.or(BitSet, BitSet) which
  RETURNS a new BitSet rather than mutating in place — test failed.
  Reverted; updated the comment with the verification and the escape
  hatch (move the block into a Java helper if we ever want BitSet#or).
- PocketGridChimeraXRenderer: palette color loop iterates the present
  rank set (perPocketBasenames.keySet()) instead of dense 1..maxRank,
  matching the layer loops below and avoiding unreferenced color
  definitions for missing ranks.
- PocketDescriptorsRows: replaced `pockets.any { ... }` Groovy closure
  with manual loop under @CompileStatic — consistent with the rest of
  the constructor and one fewer per-protein closure allocation.
- DescriptorListValidator: HashSet → LinkedHashSet for the dedup
  tracker. Tiny UX improvement (deterministic order in any future
  multi-duplicate debug output).

Output byte-identical end-to-end; full test suite green.
2026-05-19 19:59:29 +02:00
rdk
6c1e394ea5 Extract three framework helpers (centroid, schema, registry)
Tier 3 reuse refactor: collapse ~120 lines of duplication across the
descriptor framework. Composition over inheritance throughout — no
public API change, no behavior change (smoke run output byte-identical).

NamedRegistryHelper<T> (new, generic):
- Composition helper for name-keyed registries. Both descriptor
  registries (per-pocket and per-grid-point) now delegate register/
  unregister/get/knownNames to one shared helper, keeping their public
  static API. Per-registry invariants (the size/dup-cols check) stay
  in each registry's private validate() and plug in via a Consumer<T>
  hook. PocketDescriptorRegistry shrinks ~80→55 lines;
  PocketGridPointDescriptorRegistry shrinks ~75→55.

DescriptorSchemaHelper.appendColumns (new):
- Single point where the "{name}.{col}" multi-column header rule lives.
  Both PocketDescriptorsRows and PocketGridRows route schema build
  through it. Interface-agnostic (takes name + colNames + colTypes
  directly), so it works for both descriptor types without coupling.

GridPointStats.centroid (new):
- Static helper for the centroid loop duplicated across
  SphericityDescriptor, RadiusOfGyramentDescriptor, and
  PrincipalMomentsDescriptor. Three descriptors each had the same
  BitSet → allPoints centroid pass; now one method call.

Skipped from the same plan (per Tier-3+4 reconsideration):
- vis_renderers validator merge (item 13): semantic mismatch
  (null handling, error wording) makes the abstraction lossy.
- AbstractVolsiteGridPointDescriptor base (item 16): two impls is
  below the threshold where a shared base earns its keep.
- Pre-classify protein atoms, per-point cache, Params hoist
  (items 18-20): real wins on the volsite hot path but speculative
  without a benchmarked workload. Defer until someone reports
  volsite descriptor compute as a bottleneck.
2026-05-19 16:23:09 +02:00
rdk
6fad858bc6 Audit follow-ups: bug fix, doc refresh, exception taxonomy, test hardening
Bug fix:
- PrincipalMomentsDescriptor.clampNonNegative now also clamps NaN. The
  v<0 check was false for NaN, so a NaN eigenvalue (possible if a future
  code path bypasses GridGenerator.isFiniteBox) would have propagated
  to the CSV output.

Doc refresh:
- breaking-changes.md: 2.6 entry for the multi-column descriptor
  migration + the -vis_pocket_grid / pocket_grid_vis_* renames.
- export-pocket-descriptors.md: step 4 rewrites a self-contradicting
  rationale — adding to the default list IS a breaking change for
  index-based parsers; recommends parse-by-name + breaking-changes.md
  note for future additions.
- export-pocket-grid.md: added "Adding a new per-grid-point descriptor"
  recipe (parallel to the per-pocket one); unified √3/2 precision to
  0.866 across docs and Params.groovy.
- README.md: added an "Opt-in tabular exports" subsection mentioning
  -export_pocket_descriptors, -export_pocket_grid, -vis_pocket_grid.
- testsets.sh "Full descriptor menu" now lists all seven shipped
  descriptors (was six).

Exception taxonomy:
- PocketDescriptorsRows.groovy and PocketGridBuilder.java now throw
  PrankException (was IllegalArgumentException) for user-facing config
  errors, matching the rest of the codebase.

Registry hardening:
- Both PocketDescriptorRegistry and PocketGridPointDescriptorRegistry
  now assert columnNames.size() == columnTypes.size() in register().
  A future descriptor with mismatched lists fails fast at class-load.

Quality fixes:
- PocketGridRows.getColumn uses BASE_COLS-1 instead of literal 3 for
  the pocket column. Removed dead 2-arg PocketGridRows constructor
  (only 3 test sites used it; now inlined).
- PocketGridPointContext gets a compact-constructor validator that
  rejects negative pointIndex/pocketRank, limiting blast radius of an
  int-arg swap.

Test hardening:
- VolsiteSmoothGridPointDescriptorTest + VolsiteGridPointDescriptorTest
  now pin sigma/radius in @BeforeEach AND restore in @AfterEach, so
  the Params singleton is clean for subsequent test classes.
- New tests: HIS ND1 double-flag (single atom setting donor+acceptor),
  PrincipalMoments at cardinality=2, PrincipalMoments two coincident
  points, GridGenerator NaN-box throw, PocketDescriptorRegistry
  register/unregister round-trip, MorphologicalCloser maxIters=1.
- Renamed respectsMaxIters → maxItersZeroIsNoOp (the test only covered
  the maxIters=0 case despite the general name); added maxIters=1
  companion that verifies one iteration of fill actually runs.
- Extracted RendererTestFixtures.tinyGrid (was byte-identical in both
  renderer test files); unified the volsite atomAt signatures so the
  parameter order can't get swapped between the two volsite tests.
2026-05-19 15:36:12 +02:00
rdk
cb6f7f75eb Doc / comment refresh after the multi-column descriptor migration
- Params.groovy: pocket_descriptors javadoc now lists all 7 shipped
  descriptors (was: 6); softens the "essentially free" rationale to
  acknowledge principal_moments' small eigendecomposition cost.
- PocketDescriptorsTest.groovy: class javadoc "six shipped descriptors"
  → "seven", names principal_moments alongside the rest.
- export-pocket-descriptors.md: "6 base shipped descriptors use this
  adapter" → "6 of 7 use the adapter; principal_moments (multi-column)
  implements PocketDescriptor directly". Removes a misleading count.
- export-pocket-{grid,descriptors}.md: default-list rationale no longer
  claims adding descriptors is "essentially free" — clarifies that
  grid-derived scalars are cheap once the grid is built but
  principal_moments adds a small per-pocket compute on top, still
  negligible vs the grid build.

Caught by deep audit of 60220d7a..73e7c9df focused on doc/comment drift
after the recent multi-column interface migration.
2026-05-19 14:41:03 +02:00
rdk
73e7c9df9a Per-pocket descriptors: multi-column interface + PrincipalMomentsDescriptor
Unifies the per-pocket descriptor framework with the per-grid-point
framework: same shape (name + columnNames + columnTypes + double[]
compute), same multi-column "{name}.{col}" header convention, same
public register / unregister / dup-column-check registry. Shipped as
breaking change behind the same -pocket_descriptors knob.

Interface change:
  String name();
  List<String> columnNames();
  List<ColumnType> columnTypes();
  double[] compute(PocketGridContext);
  boolean needsGrid();  // unchanged

Scalar descriptors stay one-liners via the new
AbstractScalarPocketDescriptor adapter (name + scalarType +
computeScalar). The 6 existing descriptors migrated; behavior and
output byte-identical to before.

New descriptor: PrincipalMomentsDescriptor (3 × DOUBLE) — the three
eigenvalues of the pocket grid points' gyration tensor, sorted
descending. Implementation uses Apache Commons Math 3
EigenDecomposition. Shape signature complement to sphericity /
radius_of_gyration; sum equals radius_of_gyration² (verified in test).
Added to the default -pocket_descriptors list.

Default list reordered to put num_* (cheap, integer-valued) first,
then geometric scalars, then principal_moments:
  num_residues, num_surface_atoms, num_grid_points,
  volume, sphericity, radius_of_gyration,
  principal_moments

Tests:
  - 5 new PrincipalMomentsDescriptor tests (cube isotropy, rod-shape
    eigenvalues, sort order, degenerate empty/single, sum=Rg²)
  - PocketDescriptorsRowsTest +2 (multi-column prefix rule, mixed
    scalar + multi ordering)
  - existing 13 callsites updated for the double[] return signature
  - columnType() registry test → columnTypes()

User-visible change: the default -pocket_descriptors output now has
three new columns (principal_moments.lambda1/2/3) and the existing
columns appear in a different order. Scripts parsing by column name
are unaffected; scripts parsing by column index need updating.
2026-05-19 14:34:33 +02:00
rdk
0e044f6bb3 Audit follow-ups: fill warning, NaN guard, test hardening + docs
Bug fixes:
- MorphologicalCloser: gate the "didn't converge" warning on maxIters>0.
  maxIters=0 is a valid "disable fill" config and would otherwise log
  spuriously on every protein.
- GridGenerator: hoist the isFiniteBox NaN guard into the (Box, edge)
  ctor so both sampleGridPointsBetween and sampleGridPointsAroundAtoms
  are covered (the second sampler was previously unguarded — used by
  the training/feature path).
- PocketGridPdbSidecar.writePerPocket: serial-wrap warning added for
  parity with the combined write() path.

Test hardening:
- PocketGridPointDescriptorRegistry: add unregister() so tests can
  clean up fixture registrations; PocketGridRowsTest now @AfterAll
  unregisters its scalar fixture so it doesn't leak into the JVM-wide
  registry.
- VolsiteSmoothGridPointDescriptorTest: pin sigma via @BeforeEach so
  other tests mutating the Params singleton can't shift expectations;
  new weightAtExactCutoffEqualsExpMinusEight test pins the 4σ-inclusive
  cutoff semantic (cutoutSphere is inclusive; exp(-8) ≈ 3.354e-4).

Docs / clarifications:
- Params.pocket_grid_point_descriptors javadoc: the silent-ignore when
  -export_pocket_grid=false is intentional (symmetric with
  -pocket_descriptors / -export_pocket_descriptors).
- PocketDescriptor javadoc: intentionally scalar-only; recommend
  unifying with PocketGridPointDescriptor if multi-col is ever needed
  rather than ad-hoc extending this one.
- PocketGridPointDescriptor javadoc: needsGrid() is intentionally
  absent — every grid-point descriptor needs the grid by definition.
- documentation/export-pocket-grid.md: explain the default-empty
  rationale (cost: per-row × per-atom, not backward-compat).
- VdwRadiusTable.resolveSymbol: comment that the name-prefix isotope
  branch is a safety net, not a semantic mapping (e.g. "DA" in DNA
  isn't deuterium).
2026-05-19 13:29:10 +02:00
rdk
6888716aa0 Tests for pocket-grid-point descriptors + extract DescriptorListValidator
Adds focused regression tests for the new framework: 11 tests in three
new files plus 4 added to PocketGridRowsTest.

  PocketGridRowsTest +4
    - descriptor schema uses "{name}.{col}" prefix for multi-col
    - getRow appends descriptor values after the base 4 columns
    - unknown descriptor name throws at construction
    - scalar descriptor emits bare name() with no prefix (uses an
      inline ScalarTestDescriptor registered via the now-public
      registry hook — none of the shipped descriptors are scalar so
      the branch was untested)
  VolsiteGridPointDescriptorTest (new, 4 tests)
    - covers indicator aggregation + radius cutoff
  VolsiteSmoothGridPointDescriptorTest (new, 4 tests)
    - covers Gaussian kernel arithmetic + 4σ cutoff
  PocketGridPointDescriptorRegistryTest (new, 2 tests)
    - shipped names resolve, unknown name throws helpful error
  DescriptorListValidatorTest (new, 8 tests)
    - null/empty/valid/unknown/duplicate/null-entry/blank/dash-prefix

Refactors Main.validateDescriptorList out to a self-contained Java
utility (DescriptorListValidator) under predict/output/. The two call
sites in Main.validatePocketGridParams now invoke the static helper;
the private helper in Main is removed (-37 lines).

PocketGridPointDescriptorRegistry.register is promoted from private to
public so tests (and future external descriptor plugins) can add
descriptors without touching the registry's static initializer. The
shipped registrations still happen at class-load.
2026-05-19 10:29:02 +02:00
rdk
1931ef1f93 Pocket-grid-point descriptors: framework + two VolSite descriptors
Adds an opt-in extension to the pocket-grid export — extra columns per
(point, pocket) row driven by a registry of per-grid-point descriptors.
Mirrors the existing per-pocket descriptor framework (interface, context
record, static registry, name-driven CLI selection).

CLI:
  -pocket_grid_point_descriptors   list, default []
  -pocket_grid_volsite_radius      4.0 Å    (volsite indicator cutoff)
  -pocket_grid_volsite_sigma       2.0 Å    (volsite_smooth Gaussian σ)

Shipped descriptors (both 6-column, prefixed `{name}.`):
  volsite         INT  0/1 per pharmacophore type within radius
  volsite_smooth  DOUBLE Gaussian-weighted sum, kernel truncated at 4σ

Atom-level pharmacophore classification reuses VolSitePharmacophore — a
1 in volsite.vsCation here matches a 1 in vsCation from VolsiteFeature.

The 6 VolSite column names now live as VolSitePharmacophore.COLUMN_NAMES
(single source of truth, also used by VolsiteFeature). VolSitePharmacophore
gains a getAtomProperties(Atom) overload that does the PdbUtils hop.

Validation: -pocket_grid_point_descriptors goes through a new shared
validateDescriptorList(names, known, paramName) helper in Main, which
also replaces the open-coded equivalent for -pocket_descriptors. The
two new numeric params are bounds-checked.
2026-05-19 09:59:37 +02:00
rdk
a3efd0840c Pocket-grid defensive guards + ChimeraX rank-gap fix
- ChimeraX renderer: surfaces-layer rename now iterates the actual rank
  set (perPocketBasenames.keySet) instead of 1..maxRank. The previous
  code assumed every rank produces a ChimeraX submodel; a rank-skip
  would mis-target the rename. Latent today (P2Rank reorders pockets
  contiguously) but the assumption is now explicit in the code.
- PdbSidecar: warn when total grid atoms exceed the PDB 5-digit serial
  column (wrap still happens; the warning surfaces the limit so users
  with very fine grids know why bond-inference tools might misbehave).
- MorphologicalCloser: warn when loop exits at maxIters without
  converging, naming the param to raise. Previously silent.
- GridGenerator: throw early on non-finite SAS-point bounding box.
  IEEEremainder(NaN, edge) = NaN would otherwise produce a NaN-everywhere
  lattice from a broken PDB.
- VdwRadiusTable: map D/T isotopes to H before CDK lookup. Previously
  fell through to carbon (1.7 Å instead of hydrogen's 1.2 Å); marginal
  effect because of the atom_buffer cushion but no reason to be wrong.
- PocketDescriptorsRows: throw at construction if grid==null and any
  selected descriptor declares needsGrid()=true, instead of NPEing
  inside compute(). The upstream gate in PocketGridOutputs already
  honors this; the guard catches programming errors elsewhere.
2026-05-19 07:47:43 +02:00
rdk
f06628dd63 Audit follow-ups: rename leftovers, doc fixes, numeric validation
- testsets.sh: 4 sites still invoking -export_pocket_grid_pml after the
  rename; they were hard-failing at startup.
- PocketGridPymolRenderer javadoc: pocket_dens_N -> pocket_gauss_N (3
  refs), pocket_vol_N default ON not OFF (changed long ago in 82daf58a).
- documentation/export-pocket-grid.md: vis_pocket_grid_volume_radius
  default is the -1 sentinel, not the auto-scaled 1.02 Å; ChimeraX layers
  doc now shows the #99 (spheres) + #100 (surfaces) split.
- Main.validatePocketGridParams: numeric range checks for spacing,
  max_dist, atom_buffer, assign_cutoff, fill_min_neighbors (must lie in
  the 26-neighborhood), fill_max_iters, vis_pocket_grid_volume_radius
  (-1 sentinel or strictly positive), and gaussian_iso. Catches values
  that would otherwise produce a NaN lattice, empty grid, or garbage
  passed to PyMOL/ChimeraX.
2026-05-19 07:24:16 +02:00
rdk
60220d7a57 Add pocket-grid + descriptors export with PyMOL / ChimeraX viz
Per-protein 3D grid of points around predicted pockets with per-pocket
assignment, plus per-pocket geometric descriptors (volume, sphericity,
radius_of_gyration, num_residues, num_surface_atoms, num_grid_points).

User-facing knobs (all under -export_pocket_*, -pocket_grid_*, -vis_pocket_*):

  -export_pocket_grid          CSV/Arrow/Parquet grid file
  -export_pocket_descriptors   CSV/Arrow/Parquet descriptors file
  -vis_pocket_grid             PyMOL/ChimeraX overlay scripts
  -pocket_grid_format          csv | csv.gz | csv.zst | arrow{,.gz,.zst} | parquet
  -pocket_grid_spacing         lattice edge (Å)
  -pocket_grid_max_dist        outer bound vs nearest pocket SAS point
  -pocket_grid_atom_buffer     inner bound vs vdw(nearest atom)
  -pocket_grid_assign_cutoff   per-pocket membership cutoff
  -pocket_grid_assigner        kdtree | voxel_hash
  -pocket_grid_fill            morph_closing | none
  -pocket_descriptors          subset of registered descriptors
  -vis_pocket_grid_volume_radius / _gaussian_iso  viz tuning

Renderers (PocketGridPymolRenderer, PocketGridChimeraXRenderer) overlay
on top of the standard pocket viz with per-pocket togglable layers:
discrete spheres, vdW-radius surface union, gaussian-iso (PyMOL only),
convex-hull wireframe (PyMOL only, requires scipy). Both honor
-vis_renderers membership.

Startup validation for all new params (Main.validatePocketGridParams,
Main.validateVisParams) — typos in renderer/format/fill/assigner names
fail fast instead of silently emitting nothing.

Performance: LongIntHashMap-backed lattice index, BitSet pocket
assignments, pluggable range-query (kdtree vs voxel-hash), morph-closing
frontier expansion. Most hot paths converted from Groovy to Java.

Docs: documentation/export-pocket-grid.md, export-pocket-descriptors.md.

Squashed from 70 commits (9b7d7a64..fec803ff). Pre-squash granular
history preserved on branch develop-backup-2026-05-19.
2026-05-19 03:03:33 +02:00
rdk
0ef60da818 Guard pocket loaders against degenerate input
Both fpocket and Seq2Pocket loaders could previously produce a pocket
with a null centroid that NPEs downstream feature extraction:

- FPocketLoader: skip the pocket if its voronoi-centers het group is
  empty (Atoms.centerOfMass returns null on empty list). Guard runs
  before rank assignment so surviving ranks stay sequential.
- Seq2PocketLoader: skip the pocket if the input named atom serials
  but none resolved against queryProtein.allAtoms (otherwise the
  pocket would carry empty surfaceAtoms and null centroid). Real
  inputs rarely trigger this; synthetic test covers it.

Neither path is expected with well-formed input; both fixes are
defensive.
2026-05-17 01:44:29 +02:00
rdk
ddd5d8a11c Add Seq2PocketLoader for Seq2Pocket pocket predictions
Parses per-protein <ID>_predictions.txt (semicolon CSV) and resolves
atom_ids against queryProtein.allAtoms by PDB serial. Empty/header-only
files produce 0 pockets gracefully. Prediction is bound to the
caller-supplied queryProtein, avoiding the ConcavityLoader bug class.

- Dataset.groovy: new case "seq2pocket"
- README.md: list SwinSite and Seq2Pocket in rescoring methods;
  cite pocketeer.ds + swinsite.ds in test_data/ examples
- CLAUDE.md: note that distro/README.md is a transient build artifact
- Test fixtures: 5 real predictions under distro/test_data/, plus
  unsorted/header-only/path-independence variants under src/test/resources/
- Seq2PocketLoaderTest: 10 tests, all passing
2026-05-16 12:40:36 +02:00
rdk
e9641680c1 Silence javac deprecation/unchecked notes
- GenericVector.toList(): replace deprecated DefaultGroovyMethods.toList
  (Groovy 5) with a plain Java loop; drop unused addTo() (no callers)
- Atoms(List<? extends Atom>): @SuppressWarnings("unchecked") for the
  intentional wrap-without-copy
- KdNode.splitLeafNode: @SuppressWarnings("unchecked") for casts from
  the Object[] backing store
2026-05-15 16:15:26 +02:00
rdk
c6ee163ece Audit cleanup: remove dead param, dead commented code, stale docs
- Drop dead mask_unknown_residues=true from default(_rescore).groovy
  (param removed from Params.groovy in 1b7809a6, 2019; configs missed)
- Rewrite distro/models/readme.md to match models on disk (add rescore_2024,
  rescore_conservation; remove nonexistent conservation.model)
- Remove broken documentation/rescoring.md link from distro/README.md
- distro/config/readme.md: drop nonexistent working.groovy reference,
  fix github link master->develop
- Delete dead commented-out method bodies in PdbUtils, RPlotter,
  PredictionVisualizer
- Fix typo in Main.groovy javadoc
2026-05-15 09:34:28 +02:00
rdk
c78519c98e Cofactor smoke harness, CDK VdW workaround, analyze-cofactors fixes
Bumps faster-molecular-surface 1.0 -> 1.1, vendored in
lib/local-mvn-repo/. The 1.1 release adds a VdW radius fallback for
elements whose CDK Elements enum entry is null (Co, Ni, Cu, Rh, Os, Ir,
plus radioactive/synthetic). Without the fix, cobalamin-bearing
structures crashed surface computation under -cofactors.

PatchedCdkNumericalSurface wraps the default CDK NumericalSurface (used
when -use_optimized_surface 0) with the same fallback, via a Krypton
proxy for null-VdW atoms. Surface.groovy switched over to it. Unit tests
mirror the FMS-side regressions.

AnalyzeRoutine.cmdCofactors: replace Struct.getHetGroups with
Struct.getLigandGroups (2 call sites) so GDP/GTP/ATP and other groups
that BioJava classifies as NUCLEOTIDE/AMINOACID don't get falsely
reported as "name not in structure" in cofactor_matches.csv or omitted
from het_groups.csv. Mirrors the M1 fix applied earlier to
CofactorHandler.extractCofactorAtoms.

testsets.sh: new cofactors_full() function exercising the cofactor
demo + full datasets in p2rank-datasets2/other/cofactors/ (predict,
analyze cofactors, -aa_mapping composition, visualizations,
export-points). Uses -fail_fast 1 so per-structure errors surface as
test failures rather than silent skips.
2026-05-15 00:35:08 +02:00
rdk
79cda78473 Add cofactor-as-protein-surface feature (Issue #79 part 2)
The -cofactors flag and dataset cofactors column accept LigandDefinition
specifiers ("FAD", "FAD[atom_id:N]", "FAD[contact_res_ids:A_T259,A_D246]").
Matched HET groups merge into the protein surface (proteinAtoms) and are
excluded from ligand listings; per-item resolution lets a dataset column
override the global Params.cofactors.

New: analyze cofactors subcommand (HETATM survey + specifier dry-run),
PyMOL teal-stick visualization (vis_highlight_cofactors), distant-cofactor
and chain-excluded WARN diagnostics, aa_mapping collision WARN (R19),
drop-in safety benchmark with byte-equality on a never-present specifier.

Documentation in documentation/cofactors.md (user-facing) and
documentation/dev/cofactors.md (engineering record with R1-R24 design choices
and post-merge audit fixes). Tests in CofactorHandlerTest,
CofactorIntegrationTest, CofactorPipelineTest, CofactorAnalyzeTest,
DataTableCsvTest plus a Log4jCapture test helper.
2026-05-14 07:58:14 +02:00
rdk
0e8bb0cb33 Add SwinSiteLoader for SwinSite pocket predictions
Registers `swinsite` as a third-party predictor in Dataset.groovy. The
loader reads grid<N>_score_<float>.mol2 (raw voxel points) per pocket,
parses score from the filename, computes pocket centroid from the grid,
and derives surfaceAtoms via cutoutShell against queryProtein.exposedAtoms
(4.5 -> 10 A expanding shell), mirroring ConcavityLoader.

Reads grid mol2 instead of pocket mol2: pocket mol2 atoms are standalone
copies with chain reset to 'A' and synthetic residue names, so they break
P2Rank's residue/conservation/ASA feature lookups. Grid + cutoutShell
keeps surfaceAtoms bound to real queryProtein atoms.

Mol2 parsing is a small inline @<TRIPOS>ATOM scan rather than CDK's
Mol2Reader: CDK has a lazy-init race in AtomTypeFactory that NPEs under
parallel dataset processing.

Ships swinsite.ds plus 6 protein PDBs (1tjw_A from SwinSite's
test_protein_only example, plus 1a26A/1a2kC/1afkA/1atlA/1bqoB from
coach420) covering 1/2/3/4/6-pocket cases. 1atlA's on-disk N-order is
non-monotonic in score (0.7288, 0.0664, 0.3433), exercising the rerank.
SwinSiteLoaderTest covers all six fixtures plus the
predictionIsBoundToQueryProtein contract and empty-dir tolerance.
2026-05-08 01:05:15 +02:00
rdk
15349bb48f Add pocket rank column to points export, fix overlap labeling
The points export (predict/rescore -export_points 1) now includes an
integer 'pocket' column matching newRank in *_predictions.csv, so users
can directly aggregate per-pocket descriptors without a spatial join.
Standalone 'export-points' (no prediction) omits the column.

Pocket-extension shells can overlap, so a single SAS point can sit in
multiple pocket.labeledPoints lists. Previously the assignment loop
last-write-wins gave the worst rank to shared points, which was
counter-intuitive for both visualization (PredictionVisualizer PDB
output) and descriptor aggregation. PocketRescorer.setNewRanks now
iterates pockets best-first with a guard, so the lowest newRank wins;
the redundant lp.pocket write in PocketPredictor is removed.

TableData gains a per-column ColumnType (DOUBLE default, INT) so
TableExporter emits true integers in CSV (no decimals), Arrow (Int32),
and Parquet (INT32) for the pocket column.

Bump version to 2.6.0-dev.8.
2026-05-06 14:08:29 +02:00
rdk
c143e0fa9c Fix ConcavityLoader to bind prediction to queryProtein
ConcavityLoader.loadPrediction was ignoring its queryProtein parameter
and binding the returned Prediction to a Protein loaded from
*_residue.pdb (a pocket-touching residue subset, not the full protein).
Downstream features keyed on prediction.protein.fileName then resolved
against the wrong basename — most visibly conservation lookup, which
searched for "<ID>_<submethod>_residue_<chain>.hom" instead of
"<ID>_<chain>.hom" and silently produced zero conservation features.
Other feature extractors were similarly reading the truncated atom set.

The residue subset is still loaded and used to define the per-pocket
surface-atom shell (no behaviour change there), but the Prediction is
now bound to queryProtein, matching FPocketLoader and PUResNetLoader.

Add ConcavityLoaderTest plus a matching test in FPocketLoaderTest that
assert the loader-contract invariant prediction.protein === queryProtein.
2026-04-29 00:41:01 +02:00
rdk
42dfe7fd6f Fix PUResNet pocket loader to handle shifted insertion codes
PUResNet pocket PDBs occasionally left-shift the residue insertion code
into column 26 instead of column 27, breaking BioJava's strict resSeq
parser with NumberFormatException and silently dropping affected
predictions (216 of 9955 entries on holo4k+pdbbind2020).

Add PUResNetPdbRepair which detects the malformed pattern and rewrites
it in memory before parsing. Wire PUResNetLoader through it. PdbUtils
and the rest of the load path are unchanged.
2026-04-28 22:25:44 +02:00
rdk
43b1f7dcf1 Fix pocket centroid calculation in ConcavityLoader and PUResNetLoader
Use centroid instead of centerOfMass in ConcavityLoader, set centroid
explicitly in PUResNetLoader, fix POCKET_GRID_TO_SURFACE_DIST type to int.
2026-04-03 19:30:27 +02:00
rdk
17a4304d29 Add rg, n_unp_pockets, n_unp_pockets_multichain fields to AhojSiteInfo 2026-04-01 12:44:10 +02:00
rdk
858ba45fe7 Refactor AhojUbsSiteParser to use CSV library and add AhojSiteInfo data class
- Replace manual line.split(",") with Apache Commons CSV (column-name access)
- Support both reduced (9-col) and full (59-col) ahoj_ubs CSV formats
- Add AhojSiteInfo: typed data class for 14 pocket metadata fields
- Add secondaryData map to ResidueSite for extensible metadata
- Export AhojSiteInfo columns in observed_sites.csv when available
- Add comprehensive parser tests for both CSV formats
- Add test data files and format documentation
2026-04-01 10:22:43 +02:00
rdk
6cf293478a Add atom hybridization feature (one-hot sp2/sp3)
CSV-based lookup for standard amino acid atoms with tiered fallback
for non-standard residues (backbone name match, then element-based default).
2026-03-21 21:55:00 +01:00
rdk
a66bea74be Add eval_output_prediction_files param to output per-protein prediction CSVs in eval commands 2026-03-17 18:59:13 +01:00
rdk
faddcfb70f Lazy-init EnergyCalculator and LJEnergyCalculator in energy features 2026-03-16 07:55:16 +01:00
rdk
48cb681aaa Refactor DSO/DSWO: replace Tuple2 with OverlapCounts, cache counts instead of Atoms, simplify CdkUtils 2026-03-16 03:20:48 +01:00
rdk
5b4613c3a4 Extract FpocketAdHocHelper, add run_fpocket_ad_hoc param for eval-rescore and rescore commands 2026-03-16 03:20:41 +01:00
rdk
ba53b97e90 Add per-method CSVs and grouped summary to binding-site-centers, add DataTable filter/distinctValues/formatGroupedSummaryTable 2026-03-16 01:06:44 +01:00
rdk
8852739016 Add DCC_4 protein-centric success rate metrics 2026-03-15 21:35:53 +01:00
rdk
a814157e2b Minor cleanups: fix typos, normalize loop syntax and imports in Evaluation 2026-03-15 21:32:23 +01:00
rdk
f3616da217 Unify Protein.sites to contain all binding sites, add predictedPocket to BindingSite interface
Protein.sites now holds ground-truth binding sites for both ligand-defined
and explicit (residue-based) evaluation modes. Sites are populated from
ligands via populateSitesFromLigands() when no explicit sites are defined.

- Add predictedPocket and setSasPoints to BindingSite interface
- Add predictedPocket field to ResidueSite
- Rename assignPocketsToLigands to assignPocketsToSites (works on BindingSite)
- Update calcCoveragesProt to use BindingSite.predictedPocket
- Determine isLigandMode via instanceof instead of sites.isEmpty()
- Unify PymolRenderer sites/ligands branch into single BindingSite loop
- Simplify AnalyzeRoutine.cmdBindingSiteCenters to use p.sites directly
2026-03-15 21:25:49 +01:00
rdk
829cf9b8be Return typed result objects from calcConservationStats and calcOverlapStatsForPockets 2026-03-15 20:28:51 +01:00
rdk
8a516228e1 Fix @CompileStatic errors in Evaluation: destructuring assignment, int-to-Double casts 2026-03-15 19:59:15 +01:00
rdk
5ac9aab18a Refactor Evaluation: simplify avg/div methods, use Function instead of Closure, extract writeScoresToFileIfRequested 2026-03-15 19:27:15 +01:00
rdk
20236ef092 Refactor conservation/chains analysis, add @CompileStatic to Evaluation, rename criterium to criterion 2026-03-15 17:59:53 +01:00
rdk
d9de1fba7e Add contact_atoms_centroid site evaluation center method for ligand-defined sites 2026-03-15 17:09:04 +01:00
rdk
49a8430a7d Add binding-site-centers command, refactor center methods, consolidate error reporting
- Rename SiteCentroidMethod to SiteCenterMethod
- Extract getCenterForMethod(SiteCenterMethod) into BindingSite interface
  for thread-safe, param-independent center calculation
- Refactor Ligand/ResidueSite getCenterForEval() to delegate to getCenterForMethod()
- Add analyze binding-site-centers command comparing all center methods per site
- Add Dataset.Result.writeErrorsAndGetSummary() and use it across all
  AnalyzeRoutine commands for consistent error reporting to both console and CSV
2026-03-14 18:22:47 +01:00
rdk
0e0cb47907 Add ca_atoms_centroid site evaluation center method with tests 2026-03-14 15:57:41 +01:00
rdk
1ecb29f876 Add load_ligands_from_separate_files param for loading ligands from individual ligand_* files 2026-03-13 18:21:26 +01:00
rdk
0b5b61304d Add legacy conservation file name format fallback (e.g. 2ed4_A.) 2026-03-13 17:22:27 +01:00
rdk
e7fc457f6a Fix ligand detection for BioJava GroupType misclassifications
BioJava assigns GroupType based on its Chemical Component Dictionary,
not structural role. Ligands in non-polymer chains can get any GroupType:
- GDP, GTP, ATP -> GroupType.NUCLEOTIDE
- SHR and similar -> GroupType.AMINOACID
- Most others -> GroupType.HETATM

Previously only HETATM groups were detected as ligands, causing errors
like "Ligand definition 'GDP' matches no ligands" for nucleotide and
amino acid derivative ligands.

Fix: any non-water group in a NONPOLYMER chain is now a ligand
candidate, regardless of GroupType. Polymer chain groups (protein AA,
DNA/RNA) are only included if they have GroupType.HETATM.

Add test PDB files (1a2kC.pdb with GDP, 1e5qA.pdb with SHR) and
comprehensive tests for all three GroupType cases.
2026-03-10 14:34:28 +01:00
rdk
d78f80ee73 Extract writeCases() method, rename sites.csv to observed_sites.csv
Consolidate case CSV writing into Evaluation.writeCases(). Remove
duplicate DSO_0.1 criterion and stale TODO comments.
2026-03-10 03:24:44 +01:00
rdk
838b0a697f Fix integer division bug in DSO criterion and clean up
The Jaccard ratio was computed as int/int, always producing 0 or 1,
making fractional thresholds ineffective. Cast to double for correct
floating-point division. Also fix typo (cahe->cache), remove debug
comments, and update javadoc.
2026-03-10 02:27:11 +01:00
rdk
2de315e9e0 Rename API: PocketCriterium->PocketCriterion, getLigandAtoms->getAtoms, centroid->center
- Rename PocketCriterium to PocketCriterion (fix Latin spelling)
- Revert getLigandAtoms() back to getAtoms() in BindingSite interface
- Rename getCentroidForEval() to getCenterForEval()
- Rename explicitCentroid to explicitCenter in ResidueSite
- Rename SiteCentroidMethod values: explicit_centroid->explicit,
  sas_points_center_of_mass->sas_points_centroid
- Rename site_centroid_method param to site_eval_center_method
- Ligand.getCentroid() now delegates to getCenterForEval()
2026-03-10 02:02:47 +01:00
rdk
412c590dcb Fix CSV spacing consistency: remove padding and trailing spaces
Remove leading-space padding from fmt calls in getMiscStatsCSV and
FeatureImportances, fix header/data spacing mismatch in toPocketsCSV,
and remove trailing space in toLigandsCSV header.
2026-03-09 13:32:51 +01:00
rdk
fdebd71daf Add example Jupyter notebook for analyzing P2Rank output
Add notebook loading _predictions.csv and _residues.csv with example
data from predict_1fbl. Clean up CSV formatting: remove padding from
values, add fmtCsv() without leading spaces for CSV output.
2026-03-09 12:05:00 +01:00
rdk
61b8863c27 Simplify CSV output formatting and add null guard in CsvRow
Remove fixed-width column padding from PredictionSummary, fix spacing
in ResidueLabelings CSV output, and add null safety in CsvRow.add().
2026-03-09 11:17:59 +01:00