23 Commits

Author SHA1 Message Date
rdk
e3b551c902 Add second ChimeraX pocket-grid visualization screenshot 2026-05-20 12:15:27 +02:00
rdk
46194b8bdc Add ChimeraX pocket-grid visualization screenshots 2026-05-20 04:00:16 +02:00
rdk
0afdbe7d65 Doc polish: em-dash sweep + pocket-{grid,descriptors} audit cleanup
- export-pocket-{grid,descriptors}.md: audit-driven cleanup. Em-dashes
  replaced, type notation unified to f64/i32, vis_pocket_grid_volume_radius
  explanation consolidated into the parameters table, PyMOL/ChimeraX
  section split into Files produced / Layers and toggles / Renderer notes,
  preview warning rephrased.
- export-points.md, dev/evaluation-metric-fixes-2.6.md: em-dashes
  replaced (colons, commas, parentheses).
- conservation.md: Background paragraph tightened to match release-notes
  "previously only usable through PrankWeb" phrasing.
2026-05-20 02:57:25 +02:00
rdk
e3af9106c6 Mark pocket-grid/descriptors docs as preview + capture future ideas
documentation/export-pocket-{grid,descriptors}.md: add a GitHub-Flavored
Markdown alert at the top of each, flagging the feature as a preview in
the 2.6 release. Spells out exactly what may change before graduation
(parameter names, default values, default list contents, output column
ordering) and gives the forward-compatible usage pattern (parse by
column name, pass descriptor list explicitly). Invites feedback via
GitHub issues.

misc/todo/pocket_grid/IDEAS.md: new file capturing the four descriptor
ideas sketched during the audit cycle (local_atom_density,
electrostatic_proximity, conservation_proximity, residue_chemistry_summary)
with their per-protein resource requirements, plus a summary of the
ComputeCache framework prerequisite from the same discussion. Mirrors
the style of the existing NOTES.md / PLAN.md / SPEC.md in the same dir.
2026-05-20 00:12:12 +02:00
rdk
6fad858bc6 Audit follow-ups: bug fix, doc refresh, exception taxonomy, test hardening
Bug fix:
- PrincipalMomentsDescriptor.clampNonNegative now also clamps NaN. The
  v<0 check was false for NaN, so a NaN eigenvalue (possible if a future
  code path bypasses GridGenerator.isFiniteBox) would have propagated
  to the CSV output.

Doc refresh:
- breaking-changes.md: 2.6 entry for the multi-column descriptor
  migration + the -vis_pocket_grid / pocket_grid_vis_* renames.
- export-pocket-descriptors.md: step 4 rewrites a self-contradicting
  rationale — adding to the default list IS a breaking change for
  index-based parsers; recommends parse-by-name + breaking-changes.md
  note for future additions.
- export-pocket-grid.md: added "Adding a new per-grid-point descriptor"
  recipe (parallel to the per-pocket one); unified √3/2 precision to
  0.866 across docs and Params.groovy.
- README.md: added an "Opt-in tabular exports" subsection mentioning
  -export_pocket_descriptors, -export_pocket_grid, -vis_pocket_grid.
- testsets.sh "Full descriptor menu" now lists all seven shipped
  descriptors (was six).

Exception taxonomy:
- PocketDescriptorsRows.groovy and PocketGridBuilder.java now throw
  PrankException (was IllegalArgumentException) for user-facing config
  errors, matching the rest of the codebase.

Registry hardening:
- Both PocketDescriptorRegistry and PocketGridPointDescriptorRegistry
  now assert columnNames.size() == columnTypes.size() in register().
  A future descriptor with mismatched lists fails fast at class-load.

Quality fixes:
- PocketGridRows.getColumn uses BASE_COLS-1 instead of literal 3 for
  the pocket column. Removed dead 2-arg PocketGridRows constructor
  (only 3 test sites used it; now inlined).
- PocketGridPointContext gets a compact-constructor validator that
  rejects negative pointIndex/pocketRank, limiting blast radius of an
  int-arg swap.

Test hardening:
- VolsiteSmoothGridPointDescriptorTest + VolsiteGridPointDescriptorTest
  now pin sigma/radius in @BeforeEach AND restore in @AfterEach, so
  the Params singleton is clean for subsequent test classes.
- New tests: HIS ND1 double-flag (single atom setting donor+acceptor),
  PrincipalMoments at cardinality=2, PrincipalMoments two coincident
  points, GridGenerator NaN-box throw, PocketDescriptorRegistry
  register/unregister round-trip, MorphologicalCloser maxIters=1.
- Renamed respectsMaxIters → maxItersZeroIsNoOp (the test only covered
  the maxIters=0 case despite the general name); added maxIters=1
  companion that verifies one iteration of fill actually runs.
- Extracted RendererTestFixtures.tinyGrid (was byte-identical in both
  renderer test files); unified the volsite atomAt signatures so the
  parameter order can't get swapped between the two volsite tests.
2026-05-19 15:36:12 +02:00
rdk
cb6f7f75eb Doc / comment refresh after the multi-column descriptor migration
- Params.groovy: pocket_descriptors javadoc now lists all 7 shipped
  descriptors (was: 6); softens the "essentially free" rationale to
  acknowledge principal_moments' small eigendecomposition cost.
- PocketDescriptorsTest.groovy: class javadoc "six shipped descriptors"
  → "seven", names principal_moments alongside the rest.
- export-pocket-descriptors.md: "6 base shipped descriptors use this
  adapter" → "6 of 7 use the adapter; principal_moments (multi-column)
  implements PocketDescriptor directly". Removes a misleading count.
- export-pocket-{grid,descriptors}.md: default-list rationale no longer
  claims adding descriptors is "essentially free" — clarifies that
  grid-derived scalars are cheap once the grid is built but
  principal_moments adds a small per-pocket compute on top, still
  negligible vs the grid build.

Caught by deep audit of 60220d7a..73e7c9df focused on doc/comment drift
after the recent multi-column interface migration.
2026-05-19 14:41:03 +02:00
rdk
73e7c9df9a Per-pocket descriptors: multi-column interface + PrincipalMomentsDescriptor
Unifies the per-pocket descriptor framework with the per-grid-point
framework: same shape (name + columnNames + columnTypes + double[]
compute), same multi-column "{name}.{col}" header convention, same
public register / unregister / dup-column-check registry. Shipped as
breaking change behind the same -pocket_descriptors knob.

Interface change:
  String name();
  List<String> columnNames();
  List<ColumnType> columnTypes();
  double[] compute(PocketGridContext);
  boolean needsGrid();  // unchanged

Scalar descriptors stay one-liners via the new
AbstractScalarPocketDescriptor adapter (name + scalarType +
computeScalar). The 6 existing descriptors migrated; behavior and
output byte-identical to before.

New descriptor: PrincipalMomentsDescriptor (3 × DOUBLE) — the three
eigenvalues of the pocket grid points' gyration tensor, sorted
descending. Implementation uses Apache Commons Math 3
EigenDecomposition. Shape signature complement to sphericity /
radius_of_gyration; sum equals radius_of_gyration² (verified in test).
Added to the default -pocket_descriptors list.

Default list reordered to put num_* (cheap, integer-valued) first,
then geometric scalars, then principal_moments:
  num_residues, num_surface_atoms, num_grid_points,
  volume, sphericity, radius_of_gyration,
  principal_moments

Tests:
  - 5 new PrincipalMomentsDescriptor tests (cube isotropy, rod-shape
    eigenvalues, sort order, degenerate empty/single, sum=Rg²)
  - PocketDescriptorsRowsTest +2 (multi-column prefix rule, mixed
    scalar + multi ordering)
  - existing 13 callsites updated for the double[] return signature
  - columnType() registry test → columnTypes()

User-visible change: the default -pocket_descriptors output now has
three new columns (principal_moments.lambda1/2/3) and the existing
columns appear in a different order. Scripts parsing by column name
are unaffected; scripts parsing by column index need updating.
2026-05-19 14:34:33 +02:00
rdk
0e044f6bb3 Audit follow-ups: fill warning, NaN guard, test hardening + docs
Bug fixes:
- MorphologicalCloser: gate the "didn't converge" warning on maxIters>0.
  maxIters=0 is a valid "disable fill" config and would otherwise log
  spuriously on every protein.
- GridGenerator: hoist the isFiniteBox NaN guard into the (Box, edge)
  ctor so both sampleGridPointsBetween and sampleGridPointsAroundAtoms
  are covered (the second sampler was previously unguarded — used by
  the training/feature path).
- PocketGridPdbSidecar.writePerPocket: serial-wrap warning added for
  parity with the combined write() path.

Test hardening:
- PocketGridPointDescriptorRegistry: add unregister() so tests can
  clean up fixture registrations; PocketGridRowsTest now @AfterAll
  unregisters its scalar fixture so it doesn't leak into the JVM-wide
  registry.
- VolsiteSmoothGridPointDescriptorTest: pin sigma via @BeforeEach so
  other tests mutating the Params singleton can't shift expectations;
  new weightAtExactCutoffEqualsExpMinusEight test pins the 4σ-inclusive
  cutoff semantic (cutoutSphere is inclusive; exp(-8) ≈ 3.354e-4).

Docs / clarifications:
- Params.pocket_grid_point_descriptors javadoc: the silent-ignore when
  -export_pocket_grid=false is intentional (symmetric with
  -pocket_descriptors / -export_pocket_descriptors).
- PocketDescriptor javadoc: intentionally scalar-only; recommend
  unifying with PocketGridPointDescriptor if multi-col is ever needed
  rather than ad-hoc extending this one.
- PocketGridPointDescriptor javadoc: needsGrid() is intentionally
  absent — every grid-point descriptor needs the grid by definition.
- documentation/export-pocket-grid.md: explain the default-empty
  rationale (cost: per-row × per-atom, not backward-compat).
- VdwRadiusTable.resolveSymbol: comment that the name-prefix isotope
  branch is a safety net, not a semantic mapping (e.g. "DA" in DNA
  isn't deuterium).
2026-05-19 13:29:10 +02:00
rdk
1931ef1f93 Pocket-grid-point descriptors: framework + two VolSite descriptors
Adds an opt-in extension to the pocket-grid export — extra columns per
(point, pocket) row driven by a registry of per-grid-point descriptors.
Mirrors the existing per-pocket descriptor framework (interface, context
record, static registry, name-driven CLI selection).

CLI:
  -pocket_grid_point_descriptors   list, default []
  -pocket_grid_volsite_radius      4.0 Å    (volsite indicator cutoff)
  -pocket_grid_volsite_sigma       2.0 Å    (volsite_smooth Gaussian σ)

Shipped descriptors (both 6-column, prefixed `{name}.`):
  volsite         INT  0/1 per pharmacophore type within radius
  volsite_smooth  DOUBLE Gaussian-weighted sum, kernel truncated at 4σ

Atom-level pharmacophore classification reuses VolSitePharmacophore — a
1 in volsite.vsCation here matches a 1 in vsCation from VolsiteFeature.

The 6 VolSite column names now live as VolSitePharmacophore.COLUMN_NAMES
(single source of truth, also used by VolsiteFeature). VolSitePharmacophore
gains a getAtomProperties(Atom) overload that does the PdbUtils hop.

Validation: -pocket_grid_point_descriptors goes through a new shared
validateDescriptorList(names, known, paramName) helper in Main, which
also replaces the open-coded equivalent for -pocket_descriptors. The
two new numeric params are bounds-checked.
2026-05-19 09:59:37 +02:00
rdk
f06628dd63 Audit follow-ups: rename leftovers, doc fixes, numeric validation
- testsets.sh: 4 sites still invoking -export_pocket_grid_pml after the
  rename; they were hard-failing at startup.
- PocketGridPymolRenderer javadoc: pocket_dens_N -> pocket_gauss_N (3
  refs), pocket_vol_N default ON not OFF (changed long ago in 82daf58a).
- documentation/export-pocket-grid.md: vis_pocket_grid_volume_radius
  default is the -1 sentinel, not the auto-scaled 1.02 Å; ChimeraX layers
  doc now shows the #99 (spheres) + #100 (surfaces) split.
- Main.validatePocketGridParams: numeric range checks for spacing,
  max_dist, atom_buffer, assign_cutoff, fill_min_neighbors (must lie in
  the 26-neighborhood), fill_max_iters, vis_pocket_grid_volume_radius
  (-1 sentinel or strictly positive), and gaussian_iso. Catches values
  that would otherwise produce a NaN lattice, empty grid, or garbage
  passed to PyMOL/ChimeraX.
2026-05-19 07:24:16 +02:00
rdk
60220d7a57 Add pocket-grid + descriptors export with PyMOL / ChimeraX viz
Per-protein 3D grid of points around predicted pockets with per-pocket
assignment, plus per-pocket geometric descriptors (volume, sphericity,
radius_of_gyration, num_residues, num_surface_atoms, num_grid_points).

User-facing knobs (all under -export_pocket_*, -pocket_grid_*, -vis_pocket_*):

  -export_pocket_grid          CSV/Arrow/Parquet grid file
  -export_pocket_descriptors   CSV/Arrow/Parquet descriptors file
  -vis_pocket_grid             PyMOL/ChimeraX overlay scripts
  -pocket_grid_format          csv | csv.gz | csv.zst | arrow{,.gz,.zst} | parquet
  -pocket_grid_spacing         lattice edge (Å)
  -pocket_grid_max_dist        outer bound vs nearest pocket SAS point
  -pocket_grid_atom_buffer     inner bound vs vdw(nearest atom)
  -pocket_grid_assign_cutoff   per-pocket membership cutoff
  -pocket_grid_assigner        kdtree | voxel_hash
  -pocket_grid_fill            morph_closing | none
  -pocket_descriptors          subset of registered descriptors
  -vis_pocket_grid_volume_radius / _gaussian_iso  viz tuning

Renderers (PocketGridPymolRenderer, PocketGridChimeraXRenderer) overlay
on top of the standard pocket viz with per-pocket togglable layers:
discrete spheres, vdW-radius surface union, gaussian-iso (PyMOL only),
convex-hull wireframe (PyMOL only, requires scipy). Both honor
-vis_renderers membership.

Startup validation for all new params (Main.validatePocketGridParams,
Main.validateVisParams) — typos in renderer/format/fill/assigner names
fail fast instead of silently emitting nothing.

Performance: LongIntHashMap-backed lattice index, BitSet pocket
assignments, pluggable range-query (kdtree vs voxel-hash), morph-closing
frontier expansion. Most hot paths converted from Groovy to Java.

Docs: documentation/export-pocket-grid.md, export-pocket-descriptors.md.

Squashed from 70 commits (9b7d7a64..fec803ff). Pre-squash granular
history preserved on branch develop-backup-2026-05-19.
2026-05-19 03:03:33 +02:00
rdk
913e0b551e Expand rescoring docs for SwinSite, Seq2Pocket, and rescore_conservation
- Add SwinSite and Seq2Pocket rows to the supported methods table, with
  GitHub + paper links and a note that they point at per-protein
  directories rather than single files
- Add a "Rescoring directory-based predictions" example covering the
  per-directory dataset pattern
- Add a "Conservation-aware rescoring" section documenting
  -c rescore_conservation and the .hom file requirement
- Quick Start: add a swinsite example line
2026-05-17 02:44:00 +02:00
rdk
79cda78473 Add cofactor-as-protein-surface feature (Issue #79 part 2)
The -cofactors flag and dataset cofactors column accept LigandDefinition
specifiers ("FAD", "FAD[atom_id:N]", "FAD[contact_res_ids:A_T259,A_D246]").
Matched HET groups merge into the protein surface (proteinAtoms) and are
excluded from ligand listings; per-item resolution lets a dataset column
override the global Params.cofactors.

New: analyze cofactors subcommand (HETATM survey + specifier dry-run),
PyMOL teal-stick visualization (vis_highlight_cofactors), distant-cofactor
and chain-excluded WARN diagnostics, aa_mapping collision WARN (R19),
drop-in safety benchmark with byte-equality on a never-present specifier.

Documentation in documentation/cofactors.md (user-facing) and
documentation/dev/cofactors.md (engineering record with R1-R24 design choices
and post-merge audit fixes). Tests in CofactorHandlerTest,
CofactorIntegrationTest, CofactorPipelineTest, CofactorAnalyzeTest,
DataTableCsvTest plus a Log4jCapture test helper.
2026-05-14 07:58:14 +02:00
rdk
59bc84c265 Mention pocket column alongside score in export-points docs
The score and pocket columns share the same predict/rescore-only
origin, so describe them together in the prose, the export-points
"not contained" caveat, the predict/rescore output description, and
the "Which command to use?" table.
2026-05-07 03:21:38 +02:00
rdk
f5ad22f604 Document 2.6 evaluation-metric fixes and note ligand-detection breaking change
Add documentation/dev/evaluation-metric-fixes-2.6.md covering DSO/DSWO integer-
division fixes, the ResidueSite DCC centroid fix, and the BioJava GroupType
ligand-detection fix. Mention the ligand-detection change in breaking-changes.md
since it shifts DCA/DCC on datasets containing GDP/GTP/ATP/SHR-like ligands.
2026-05-06 14:46:26 +02:00
rdk
15349bb48f Add pocket rank column to points export, fix overlap labeling
The points export (predict/rescore -export_points 1) now includes an
integer 'pocket' column matching newRank in *_predictions.csv, so users
can directly aggregate per-pocket descriptors without a spatial join.
Standalone 'export-points' (no prediction) omits the column.

Pocket-extension shells can overlap, so a single SAS point can sit in
multiple pocket.labeledPoints lists. Previously the assignment loop
last-write-wins gave the worst rank to shared points, which was
counter-intuitive for both visualization (PredictionVisualizer PDB
output) and descriptor aggregation. PocketRescorer.setNewRanks now
iterates pockets best-first with a guard, so the lowest newRank wins;
the redundant lp.pocket write in PocketPredictor is removed.

TableData gains a per-column ColumnType (DOUBLE default, INT) so
TableExporter emits true integers in CSV (no decimals), Arrow (Int32),
and Parquet (INT32) for the pocket column.

Bump version to 2.6.0-dev.8.
2026-05-06 14:08:29 +02:00
rdk
2de315e9e0 Rename API: PocketCriterium->PocketCriterion, getLigandAtoms->getAtoms, centroid->center
- Rename PocketCriterium to PocketCriterion (fix Latin spelling)
- Revert getLigandAtoms() back to getAtoms() in BindingSite interface
- Rename getCentroidForEval() to getCenterForEval()
- Rename explicitCentroid to explicitCenter in ResidueSite
- Rename SiteCentroidMethod values: explicit_centroid->explicit,
  sas_points_center_of_mass->sas_points_centroid
- Rename site_centroid_method param to site_eval_center_method
- Ligand.getCentroid() now delegates to getCenterForEval()
2026-03-10 02:02:47 +01:00
rdk
fdebd71daf Add example Jupyter notebook for analyzing P2Rank output
Add notebook loading _predictions.csv and _residues.csv with example
data from predict_1fbl. Clean up CSV formatting: remove padding from
values, add fmtCsv() without leading spaces for CSV output.
2026-03-09 12:05:00 +01:00
rdk
e923d199e6 Add external conservation provider with cache, health check, and documentation 2026-02-26 00:07:55 +01:00
rdk
93fd8e953a add experimental rescoring model section to rescoring docs 2026-02-11 18:44:04 +01:00
rdk
9711cc7192 fix aa-mapping docs: broken csv link, replace special characters, cleanup 2026-02-11 18:14:03 +01:00
rdk
4a42f664e2 update aa-mapping documentation: add links to pdbfixer source 2026-02-11 18:05:11 +01:00
rdk
126a0653f0 move tutorials to documentation/, update rescoring tutorial and README
Move misc/tutorials/ to documentation/ and add index readme.
Update rescoring.md: add quick-start examples, paper links for all
methods, add Pocketeer to supported methods list.
Fix stale links in README.md (tutorials path, local-env.sh typo).
2026-02-11 10:52:20 +01:00