215 Commits

Author SHA1 Message Date
rdk
af1d6eeb18 Drop frozen pocket-grid PLAN/SPEC; refine audit punch-list
PLAN.md and SPEC.md were pre-implementation design docs for the pocket-grid
feature. The feature has shipped, so they're frozen artifacts in the active
todo/ namespace. Delete them and strip the three "see SPEC.md" comments that
pointed at SPEC.md from Main.groovy and the predict/rescore routines.

Also reassess the PyMOL rank-gap entry in the audit: P2Rank ranks pockets
contiguously throughout the predict path and all in-tree loaders (except
SiteHoundLoader), so the previously-listed "renderer ignores rank gaps" is
cosmetic-only (empty objects in the Models panel for small pockets whose
filled BitSet ended up empty). Downgrade to a parity nit under
Inconsistencies; promote the PUResNet surfaceAtoms re-linking to the Top-5.
2026-05-20 19:42:47 +02:00
rdk
41cdbe8773 Add post-2.5.1 audit punch-list to misc/dev
Captures findings from a 10-agent audit of all changes between tag 2.5.1
and develop. Grouped by severity: high-priority bugs, inconsistencies,
doc/config drift, stale comments, test-isolation gaps, perf nits. No
items are fixed here -- this is a triage backlog to work through as the
surrounding code is touched.
2026-05-20 12:24:25 +02:00
rdk
550ab9e1a0 testsets.sh: add opt-in pocket_grid_long sweep
Runs the full per-pocket + per-grid-point descriptor menu with
visualizations on the 4 main datasets. Kept out of all() — opt-in.
2026-05-20 11:35:58 +02:00
rdk
cbf4c3ffac testsets.sh: cover per-grid-point descriptors + aa_mapping smoke tests
- pocket_grid(): add per-grid-point descriptor invocations (volsite,
  volsite_smooth, both together, sigma/radius knobs, full combo with
  per-pocket descriptors + viz), plus a fail-fast for an unknown
  pocket_grid_point_descriptors name. Per-pocket descriptors were
  already covered by the full-menu line.
- new aa_mapping() function with -aa_mapping pdbfixer over predict /
  rescore / eval-predict paths; wired into the tests() aggregate.
2026-05-20 03:06:50 +02:00
rdk
e3af9106c6 Mark pocket-grid/descriptors docs as preview + capture future ideas
documentation/export-pocket-{grid,descriptors}.md: add a GitHub-Flavored
Markdown alert at the top of each, flagging the feature as a preview in
the 2.6 release. Spells out exactly what may change before graduation
(parameter names, default values, default list contents, output column
ordering) and gives the forward-compatible usage pattern (parse by
column name, pass descriptor list explicitly). Invites feedback via
GitHub issues.

misc/todo/pocket_grid/IDEAS.md: new file capturing the four descriptor
ideas sketched during the audit cycle (local_atom_density,
electrostatic_proximity, conservation_proximity, residue_chemistry_summary)
with their per-protein resource requirements, plus a summary of the
ComputeCache framework prerequisite from the same discussion. Mirrors
the style of the existing NOTES.md / PLAN.md / SPEC.md in the same dir.
2026-05-20 00:12:12 +02:00
rdk
6fad858bc6 Audit follow-ups: bug fix, doc refresh, exception taxonomy, test hardening
Bug fix:
- PrincipalMomentsDescriptor.clampNonNegative now also clamps NaN. The
  v<0 check was false for NaN, so a NaN eigenvalue (possible if a future
  code path bypasses GridGenerator.isFiniteBox) would have propagated
  to the CSV output.

Doc refresh:
- breaking-changes.md: 2.6 entry for the multi-column descriptor
  migration + the -vis_pocket_grid / pocket_grid_vis_* renames.
- export-pocket-descriptors.md: step 4 rewrites a self-contradicting
  rationale — adding to the default list IS a breaking change for
  index-based parsers; recommends parse-by-name + breaking-changes.md
  note for future additions.
- export-pocket-grid.md: added "Adding a new per-grid-point descriptor"
  recipe (parallel to the per-pocket one); unified √3/2 precision to
  0.866 across docs and Params.groovy.
- README.md: added an "Opt-in tabular exports" subsection mentioning
  -export_pocket_descriptors, -export_pocket_grid, -vis_pocket_grid.
- testsets.sh "Full descriptor menu" now lists all seven shipped
  descriptors (was six).

Exception taxonomy:
- PocketDescriptorsRows.groovy and PocketGridBuilder.java now throw
  PrankException (was IllegalArgumentException) for user-facing config
  errors, matching the rest of the codebase.

Registry hardening:
- Both PocketDescriptorRegistry and PocketGridPointDescriptorRegistry
  now assert columnNames.size() == columnTypes.size() in register().
  A future descriptor with mismatched lists fails fast at class-load.

Quality fixes:
- PocketGridRows.getColumn uses BASE_COLS-1 instead of literal 3 for
  the pocket column. Removed dead 2-arg PocketGridRows constructor
  (only 3 test sites used it; now inlined).
- PocketGridPointContext gets a compact-constructor validator that
  rejects negative pointIndex/pocketRank, limiting blast radius of an
  int-arg swap.

Test hardening:
- VolsiteSmoothGridPointDescriptorTest + VolsiteGridPointDescriptorTest
  now pin sigma/radius in @BeforeEach AND restore in @AfterEach, so
  the Params singleton is clean for subsequent test classes.
- New tests: HIS ND1 double-flag (single atom setting donor+acceptor),
  PrincipalMoments at cardinality=2, PrincipalMoments two coincident
  points, GridGenerator NaN-box throw, PocketDescriptorRegistry
  register/unregister round-trip, MorphologicalCloser maxIters=1.
- Renamed respectsMaxIters → maxItersZeroIsNoOp (the test only covered
  the maxIters=0 case despite the general name); added maxIters=1
  companion that verifies one iteration of fill actually runs.
- Extracted RendererTestFixtures.tinyGrid (was byte-identical in both
  renderer test files); unified the volsite atomAt signatures so the
  parameter order can't get swapped between the two volsite tests.
2026-05-19 15:36:12 +02:00
rdk
f06628dd63 Audit follow-ups: rename leftovers, doc fixes, numeric validation
- testsets.sh: 4 sites still invoking -export_pocket_grid_pml after the
  rename; they were hard-failing at startup.
- PocketGridPymolRenderer javadoc: pocket_dens_N -> pocket_gauss_N (3
  refs), pocket_vol_N default ON not OFF (changed long ago in 82daf58a).
- documentation/export-pocket-grid.md: vis_pocket_grid_volume_radius
  default is the -1 sentinel, not the auto-scaled 1.02 Å; ChimeraX layers
  doc now shows the #99 (spheres) + #100 (surfaces) split.
- Main.validatePocketGridParams: numeric range checks for spacing,
  max_dist, atom_buffer, assign_cutoff, fill_min_neighbors (must lie in
  the 26-neighborhood), fill_max_iters, vis_pocket_grid_volume_radius
  (-1 sentinel or strictly positive), and gaussian_iso. Catches values
  that would otherwise produce a NaN lattice, empty grid, or garbage
  passed to PyMOL/ChimeraX.
2026-05-19 07:24:16 +02:00
rdk
60220d7a57 Add pocket-grid + descriptors export with PyMOL / ChimeraX viz
Per-protein 3D grid of points around predicted pockets with per-pocket
assignment, plus per-pocket geometric descriptors (volume, sphericity,
radius_of_gyration, num_residues, num_surface_atoms, num_grid_points).

User-facing knobs (all under -export_pocket_*, -pocket_grid_*, -vis_pocket_*):

  -export_pocket_grid          CSV/Arrow/Parquet grid file
  -export_pocket_descriptors   CSV/Arrow/Parquet descriptors file
  -vis_pocket_grid             PyMOL/ChimeraX overlay scripts
  -pocket_grid_format          csv | csv.gz | csv.zst | arrow{,.gz,.zst} | parquet
  -pocket_grid_spacing         lattice edge (Å)
  -pocket_grid_max_dist        outer bound vs nearest pocket SAS point
  -pocket_grid_atom_buffer     inner bound vs vdw(nearest atom)
  -pocket_grid_assign_cutoff   per-pocket membership cutoff
  -pocket_grid_assigner        kdtree | voxel_hash
  -pocket_grid_fill            morph_closing | none
  -pocket_descriptors          subset of registered descriptors
  -vis_pocket_grid_volume_radius / _gaussian_iso  viz tuning

Renderers (PocketGridPymolRenderer, PocketGridChimeraXRenderer) overlay
on top of the standard pocket viz with per-pocket togglable layers:
discrete spheres, vdW-radius surface union, gaussian-iso (PyMOL only),
convex-hull wireframe (PyMOL only, requires scipy). Both honor
-vis_renderers membership.

Startup validation for all new params (Main.validatePocketGridParams,
Main.validateVisParams) — typos in renderer/format/fill/assigner names
fail fast instead of silently emitting nothing.

Performance: LongIntHashMap-backed lattice index, BitSet pocket
assignments, pluggable range-query (kdtree vs voxel-hash), morph-closing
frontier expansion. Most hot paths converted from Groovy to Java.

Docs: documentation/export-pocket-grid.md, export-pocket-descriptors.md.

Squashed from 70 commits (9b7d7a64..fec803ff). Pre-squash granular
history preserved on branch develop-backup-2026-05-19.
2026-05-19 03:03:33 +02:00
rdk
c78519c98e Cofactor smoke harness, CDK VdW workaround, analyze-cofactors fixes
Bumps faster-molecular-surface 1.0 -> 1.1, vendored in
lib/local-mvn-repo/. The 1.1 release adds a VdW radius fallback for
elements whose CDK Elements enum entry is null (Co, Ni, Cu, Rh, Os, Ir,
plus radioactive/synthetic). Without the fix, cobalamin-bearing
structures crashed surface computation under -cofactors.

PatchedCdkNumericalSurface wraps the default CDK NumericalSurface (used
when -use_optimized_surface 0) with the same fallback, via a Krypton
proxy for null-VdW atoms. Surface.groovy switched over to it. Unit tests
mirror the FMS-side regressions.

AnalyzeRoutine.cmdCofactors: replace Struct.getHetGroups with
Struct.getLigandGroups (2 call sites) so GDP/GTP/ATP and other groups
that BioJava classifies as NUCLEOTIDE/AMINOACID don't get falsely
reported as "name not in structure" in cofactor_matches.csv or omitted
from het_groups.csv. Mirrors the M1 fix applied earlier to
CofactorHandler.extractCofactorAtoms.

testsets.sh: new cofactors_full() function exercising the cofactor
demo + full datasets in p2rank-datasets2/other/cofactors/ (predict,
analyze cofactors, -aa_mapping composition, visualizations,
export-points). Uses -fail_fast 1 so per-structure errors surface as
test failures rather than silent skips.
2026-05-15 00:35:08 +02:00
rdk
79cda78473 Add cofactor-as-protein-surface feature (Issue #79 part 2)
The -cofactors flag and dataset cofactors column accept LigandDefinition
specifiers ("FAD", "FAD[atom_id:N]", "FAD[contact_res_ids:A_T259,A_D246]").
Matched HET groups merge into the protein surface (proteinAtoms) and are
excluded from ligand listings; per-item resolution lets a dataset column
override the global Params.cofactors.

New: analyze cofactors subcommand (HETATM survey + specifier dry-run),
PyMOL teal-stick visualization (vis_highlight_cofactors), distant-cofactor
and chain-excluded WARN diagnostics, aa_mapping collision WARN (R19),
drop-in safety benchmark with byte-equality on a never-present specifier.

Documentation in documentation/cofactors.md (user-facing) and
documentation/dev/cofactors.md (engineering record with R1-R24 design choices
and post-merge audit fixes). Tests in CofactorHandlerTest,
CofactorIntegrationTest, CofactorPipelineTest, CofactorAnalyzeTest,
DataTableCsvTest plus a Log4jCapture test helper.
2026-05-14 07:58:14 +02:00
rdk
7f4d37b5c4 Add comparative benchmark test for v1 vs v2 KdTree
Parametrized test generates random points, builds both trees, verifies
identical results for all query types, and measures relative performance.
Skipped during normal test runs; invoked via kdtree-benchmark.sh script.
2026-03-02 20:52:05 +01:00
rdk
9fcce6156f add UseCompactObjectHeaders note to local-env.sh template 2026-02-23 02:11:29 +01:00
rdk
57fb214881 update local-env.sh template with throughput-oriented JVM options 2026-02-23 01:17:32 +01:00
rdk
f3fc9329bc update FasterForest to 2.9.1, bump JUnit Jupiter to 6.0.3, and add NativePanama flattened eval tests 2026-02-23 00:35:18 +01:00
rdk
b8f802b145 refactor model flattening to use FasterForestConverter API with configurable target types
Generalize Model classifier from Classifier to Object to support both
trainable classifiers and flat BinaryForest models. Add rf_flatten_target
parameter for selecting forest type (FlatBinaryForest, LegacyFlatBinaryForest,
InterleavedBfsForest, etc). Deprecate rf_flatten_as_legacy in favor of the
new target type selection.
2026-02-16 01:00:55 +01:00
rdk
126a0653f0 move tutorials to documentation/, update rescoring tutorial and README
Move misc/tutorials/ to documentation/ and add index readme.
Update rescoring.md: add quick-start examples, paper links for all
methods, add Pocketeer to supported methods list.
Fix stale links in README.md (tutorials path, local-env.sh typo).
2026-02-11 10:52:20 +01:00
rdk
7634c57749 add pocketeer prediction loader and rescoring tutorial
Add PocketeerLoader that parses pockets.json output from Pocketeer,
including alpha spheres, residues, centroids, and surface atom mapping.
Register "pocketeer" as a prediction method in Dataset. Add unit tests
covering all 7 available datasets (CIF and PDB). Add rescoring tutorial
documenting all supported methods with examples.
2026-02-11 10:22:46 +01:00
rdk
8614bed9c5 add pocketeer output examples and schema 2026-02-11 08:38:02 +01:00
rdk
65aee4cc84 add aa-mapping tutorial documenting non-canonical residue mapping feature 2026-02-11 02:01:49 +01:00
rdk
9c35bd542c update export-points tutorial to document new export-points command 2026-02-10 22:17:11 +01:00
rdk
d44eb3ee20 export-points command that works with custom feature setup 2026-02-08 07:53:46 +01:00
rdk
48ab4450a2 Point export improvements: Parquet, compression, refactoring 2026-02-05 10:53:37 +01:00
rdk
1aca0bbd12 fix arrow export and update tutorial 2026-02-05 02:06:24 +01:00
rdk
088a0aed92 Improve point export: extract reusable TableExporter, add Arrow streaming format 2026-02-05 01:28:18 +01:00
rdk
ede62fb79f Implement SAS point export (-export_points, -export_points_format) 2026-02-04 23:53:43 +01:00
rdk
faef5f0344 update publications 2025-06-09 10:08:12 +02:00
rdk
c05438520a add citations.md 2024-11-12 22:30:38 +01:00
rdk
33dc3c9f24 add citations.md 2024-11-12 22:23:01 +01:00
rdk
e723a461c0 update chimerax visualisation image 2024-11-08 17:19:37 +01:00
rdk
1b33157395 add small chimerax visualisation image 2024-11-08 17:13:20 +01:00
rdk
0008e9c402 add chimerax visualisation image 2024-11-08 17:05:45 +01:00
rdk
1ad89edca5 chmod dev scripts 2024-11-04 06:57:17 +01:00
rdk
5cdc98978a cleanup main dir 2024-11-04 06:33:18 +01:00
rdk
77310a989b refactor score transformer loading, add trained transformers for rescore models 2024-11-03 16:52:23 +01:00
rdk
6a5ae80124 add ChimeraX visualization 2024-11-02 16:46:50 +01:00
rdk
a6ad36e9e3 bump dependencies 2024-10-25 19:50:39 +02:00
rdk
aae633f557 minor documentation updates 2024-10-19 16:19:57 +02:00
rdk
1bee81b8c7 implement fpocket-rescore command 2024-10-03 21:15:37 +02:00
rdk
588779768e typos 2024-10-01 01:30:50 +02:00
rdk
882a737e9a improve output of 'prank -version' by adding system info 2024-09-29 23:33:59 +02:00
rdk
7962bd319c set default value of rf_flatten back to false in some configs and tests 2024-09-27 22:33:00 +02:00
rdk
3b57926b5a cleanup and fixing typos 2024-09-27 15:56:51 +02:00
rdk
4a385ccb36 use faster-molecular-surface library to calculate SAS points 2024-08-03 13:55:25 +02:00
rdk
ea989cb8ac optimized FasterForest lib with batch prediction 2024-06-26 12:41:59 +02:00
rdk
48c3da9f11 update testsets script 2024-06-25 14:49:18 +02:00
rdk
7730cd0d76 update conservation tests 2024-06-20 18:34:43 +02:00
rdk
29f8150bbf bump dependencies, bump gradle to 8.7 2024-04-10 03:09:19 +02:00
rdk
958c19b7cc update testsets 2023-08-13 20:45:10 +02:00
rdk
981f67d938 update testsets 2023-08-13 19:00:43 +02:00
rdk
e6b06fa5ca update testsets 2023-08-13 18:58:32 +02:00