89 Commits

Author SHA1 Message Date
rdk
550ab9e1a0 testsets.sh: add opt-in pocket_grid_long sweep
Runs the full per-pocket + per-grid-point descriptor menu with
visualizations on the 4 main datasets. Kept out of all() — opt-in.
2026-05-20 11:35:58 +02:00
rdk
cbf4c3ffac testsets.sh: cover per-grid-point descriptors + aa_mapping smoke tests
- pocket_grid(): add per-grid-point descriptor invocations (volsite,
  volsite_smooth, both together, sigma/radius knobs, full combo with
  per-pocket descriptors + viz), plus a fail-fast for an unknown
  pocket_grid_point_descriptors name. Per-pocket descriptors were
  already covered by the full-menu line.
- new aa_mapping() function with -aa_mapping pdbfixer over predict /
  rescore / eval-predict paths; wired into the tests() aggregate.
2026-05-20 03:06:50 +02:00
rdk
6fad858bc6 Audit follow-ups: bug fix, doc refresh, exception taxonomy, test hardening
Bug fix:
- PrincipalMomentsDescriptor.clampNonNegative now also clamps NaN. The
  v<0 check was false for NaN, so a NaN eigenvalue (possible if a future
  code path bypasses GridGenerator.isFiniteBox) would have propagated
  to the CSV output.

Doc refresh:
- breaking-changes.md: 2.6 entry for the multi-column descriptor
  migration + the -vis_pocket_grid / pocket_grid_vis_* renames.
- export-pocket-descriptors.md: step 4 rewrites a self-contradicting
  rationale — adding to the default list IS a breaking change for
  index-based parsers; recommends parse-by-name + breaking-changes.md
  note for future additions.
- export-pocket-grid.md: added "Adding a new per-grid-point descriptor"
  recipe (parallel to the per-pocket one); unified √3/2 precision to
  0.866 across docs and Params.groovy.
- README.md: added an "Opt-in tabular exports" subsection mentioning
  -export_pocket_descriptors, -export_pocket_grid, -vis_pocket_grid.
- testsets.sh "Full descriptor menu" now lists all seven shipped
  descriptors (was six).

Exception taxonomy:
- PocketDescriptorsRows.groovy and PocketGridBuilder.java now throw
  PrankException (was IllegalArgumentException) for user-facing config
  errors, matching the rest of the codebase.

Registry hardening:
- Both PocketDescriptorRegistry and PocketGridPointDescriptorRegistry
  now assert columnNames.size() == columnTypes.size() in register().
  A future descriptor with mismatched lists fails fast at class-load.

Quality fixes:
- PocketGridRows.getColumn uses BASE_COLS-1 instead of literal 3 for
  the pocket column. Removed dead 2-arg PocketGridRows constructor
  (only 3 test sites used it; now inlined).
- PocketGridPointContext gets a compact-constructor validator that
  rejects negative pointIndex/pocketRank, limiting blast radius of an
  int-arg swap.

Test hardening:
- VolsiteSmoothGridPointDescriptorTest + VolsiteGridPointDescriptorTest
  now pin sigma/radius in @BeforeEach AND restore in @AfterEach, so
  the Params singleton is clean for subsequent test classes.
- New tests: HIS ND1 double-flag (single atom setting donor+acceptor),
  PrincipalMoments at cardinality=2, PrincipalMoments two coincident
  points, GridGenerator NaN-box throw, PocketDescriptorRegistry
  register/unregister round-trip, MorphologicalCloser maxIters=1.
- Renamed respectsMaxIters → maxItersZeroIsNoOp (the test only covered
  the maxIters=0 case despite the general name); added maxIters=1
  companion that verifies one iteration of fill actually runs.
- Extracted RendererTestFixtures.tinyGrid (was byte-identical in both
  renderer test files); unified the volsite atomAt signatures so the
  parameter order can't get swapped between the two volsite tests.
2026-05-19 15:36:12 +02:00
rdk
f06628dd63 Audit follow-ups: rename leftovers, doc fixes, numeric validation
- testsets.sh: 4 sites still invoking -export_pocket_grid_pml after the
  rename; they were hard-failing at startup.
- PocketGridPymolRenderer javadoc: pocket_dens_N -> pocket_gauss_N (3
  refs), pocket_vol_N default ON not OFF (changed long ago in 82daf58a).
- documentation/export-pocket-grid.md: vis_pocket_grid_volume_radius
  default is the -1 sentinel, not the auto-scaled 1.02 Å; ChimeraX layers
  doc now shows the #99 (spheres) + #100 (surfaces) split.
- Main.validatePocketGridParams: numeric range checks for spacing,
  max_dist, atom_buffer, assign_cutoff, fill_min_neighbors (must lie in
  the 26-neighborhood), fill_max_iters, vis_pocket_grid_volume_radius
  (-1 sentinel or strictly positive), and gaussian_iso. Catches values
  that would otherwise produce a NaN lattice, empty grid, or garbage
  passed to PyMOL/ChimeraX.
2026-05-19 07:24:16 +02:00
rdk
60220d7a57 Add pocket-grid + descriptors export with PyMOL / ChimeraX viz
Per-protein 3D grid of points around predicted pockets with per-pocket
assignment, plus per-pocket geometric descriptors (volume, sphericity,
radius_of_gyration, num_residues, num_surface_atoms, num_grid_points).

User-facing knobs (all under -export_pocket_*, -pocket_grid_*, -vis_pocket_*):

  -export_pocket_grid          CSV/Arrow/Parquet grid file
  -export_pocket_descriptors   CSV/Arrow/Parquet descriptors file
  -vis_pocket_grid             PyMOL/ChimeraX overlay scripts
  -pocket_grid_format          csv | csv.gz | csv.zst | arrow{,.gz,.zst} | parquet
  -pocket_grid_spacing         lattice edge (Å)
  -pocket_grid_max_dist        outer bound vs nearest pocket SAS point
  -pocket_grid_atom_buffer     inner bound vs vdw(nearest atom)
  -pocket_grid_assign_cutoff   per-pocket membership cutoff
  -pocket_grid_assigner        kdtree | voxel_hash
  -pocket_grid_fill            morph_closing | none
  -pocket_descriptors          subset of registered descriptors
  -vis_pocket_grid_volume_radius / _gaussian_iso  viz tuning

Renderers (PocketGridPymolRenderer, PocketGridChimeraXRenderer) overlay
on top of the standard pocket viz with per-pocket togglable layers:
discrete spheres, vdW-radius surface union, gaussian-iso (PyMOL only),
convex-hull wireframe (PyMOL only, requires scipy). Both honor
-vis_renderers membership.

Startup validation for all new params (Main.validatePocketGridParams,
Main.validateVisParams) — typos in renderer/format/fill/assigner names
fail fast instead of silently emitting nothing.

Performance: LongIntHashMap-backed lattice index, BitSet pocket
assignments, pluggable range-query (kdtree vs voxel-hash), morph-closing
frontier expansion. Most hot paths converted from Groovy to Java.

Docs: documentation/export-pocket-grid.md, export-pocket-descriptors.md.

Squashed from 70 commits (9b7d7a64..fec803ff). Pre-squash granular
history preserved on branch develop-backup-2026-05-19.
2026-05-19 03:03:33 +02:00
rdk
c78519c98e Cofactor smoke harness, CDK VdW workaround, analyze-cofactors fixes
Bumps faster-molecular-surface 1.0 -> 1.1, vendored in
lib/local-mvn-repo/. The 1.1 release adds a VdW radius fallback for
elements whose CDK Elements enum entry is null (Co, Ni, Cu, Rh, Os, Ir,
plus radioactive/synthetic). Without the fix, cobalamin-bearing
structures crashed surface computation under -cofactors.

PatchedCdkNumericalSurface wraps the default CDK NumericalSurface (used
when -use_optimized_surface 0) with the same fallback, via a Krypton
proxy for null-VdW atoms. Surface.groovy switched over to it. Unit tests
mirror the FMS-side regressions.

AnalyzeRoutine.cmdCofactors: replace Struct.getHetGroups with
Struct.getLigandGroups (2 call sites) so GDP/GTP/ATP and other groups
that BioJava classifies as NUCLEOTIDE/AMINOACID don't get falsely
reported as "name not in structure" in cofactor_matches.csv or omitted
from het_groups.csv. Mirrors the M1 fix applied earlier to
CofactorHandler.extractCofactorAtoms.

testsets.sh: new cofactors_full() function exercising the cofactor
demo + full datasets in p2rank-datasets2/other/cofactors/ (predict,
analyze cofactors, -aa_mapping composition, visualizations,
export-points). Uses -fail_fast 1 so per-structure errors surface as
test failures rather than silent skips.
2026-05-15 00:35:08 +02:00
rdk
79cda78473 Add cofactor-as-protein-surface feature (Issue #79 part 2)
The -cofactors flag and dataset cofactors column accept LigandDefinition
specifiers ("FAD", "FAD[atom_id:N]", "FAD[contact_res_ids:A_T259,A_D246]").
Matched HET groups merge into the protein surface (proteinAtoms) and are
excluded from ligand listings; per-item resolution lets a dataset column
override the global Params.cofactors.

New: analyze cofactors subcommand (HETATM survey + specifier dry-run),
PyMOL teal-stick visualization (vis_highlight_cofactors), distant-cofactor
and chain-excluded WARN diagnostics, aa_mapping collision WARN (R19),
drop-in safety benchmark with byte-equality on a never-present specifier.

Documentation in documentation/cofactors.md (user-facing) and
documentation/dev/cofactors.md (engineering record with R1-R24 design choices
and post-merge audit fixes). Tests in CofactorHandlerTest,
CofactorIntegrationTest, CofactorPipelineTest, CofactorAnalyzeTest,
DataTableCsvTest plus a Log4jCapture test helper.
2026-05-14 07:58:14 +02:00
rdk
7f4d37b5c4 Add comparative benchmark test for v1 vs v2 KdTree
Parametrized test generates random points, builds both trees, verifies
identical results for all query types, and measures relative performance.
Skipped during normal test runs; invoked via kdtree-benchmark.sh script.
2026-03-02 20:52:05 +01:00
rdk
f3fc9329bc update FasterForest to 2.9.1, bump JUnit Jupiter to 6.0.3, and add NativePanama flattened eval tests 2026-02-23 00:35:18 +01:00
rdk
b8f802b145 refactor model flattening to use FasterForestConverter API with configurable target types
Generalize Model classifier from Classifier to Object to support both
trainable classifiers and flat BinaryForest models. Add rf_flatten_target
parameter for selecting forest type (FlatBinaryForest, LegacyFlatBinaryForest,
InterleavedBfsForest, etc). Deprecate rf_flatten_as_legacy in favor of the
new target type selection.
2026-02-16 01:00:55 +01:00
rdk
d44eb3ee20 export-points command that works with custom feature setup 2026-02-08 07:53:46 +01:00
rdk
48ab4450a2 Point export improvements: Parquet, compression, refactoring 2026-02-05 10:53:37 +01:00
rdk
1aca0bbd12 fix arrow export and update tutorial 2026-02-05 02:06:24 +01:00
rdk
088a0aed92 Improve point export: extract reusable TableExporter, add Arrow streaming format 2026-02-05 01:28:18 +01:00
rdk
ede62fb79f Implement SAS point export (-export_points, -export_points_format) 2026-02-04 23:53:43 +01:00
rdk
1bee81b8c7 implement fpocket-rescore command 2024-10-03 21:15:37 +02:00
rdk
882a737e9a improve output of 'prank -version' by adding system info 2024-09-29 23:33:59 +02:00
rdk
7962bd319c set default value of rf_flatten back to false in some configs and tests 2024-09-27 22:33:00 +02:00
rdk
4a385ccb36 use faster-molecular-surface library to calculate SAS points 2024-08-03 13:55:25 +02:00
rdk
ea989cb8ac optimized FasterForest lib with batch prediction 2024-06-26 12:41:59 +02:00
rdk
48c3da9f11 update testsets script 2024-06-25 14:49:18 +02:00
rdk
7730cd0d76 update conservation tests 2024-06-20 18:34:43 +02:00
rdk
958c19b7cc update testsets 2023-08-13 20:45:10 +02:00
rdk
981f67d938 update testsets 2023-08-13 19:00:43 +02:00
rdk
e6b06fa5ca update testsets 2023-08-13 18:58:32 +02:00
rdk
f9b5050f79 FastRandomForest legacy flattening that gives exactly the same results 2023-08-13 01:30:37 +02:00
rdk
469a0dfdaa FastRandomForest flattening in prediction 2023-08-08 22:24:19 +02:00
rdk
0253d811f3 fix typo in testsets.sh 2023-08-08 14:43:58 +02:00
rdk
90e865dcde update test script speed routine 2023-08-05 13:41:09 +02:00
rdk
0baae79094 update testsets script and unit tests to for zst input files 2023-08-01 21:08:38 +02:00
rdk
1d5d4d1a1d fix testsets script 2023-03-14 16:46:09 +01:00
rdk
da468b5e7b Merge branch 'biojava6' into develop 2021-12-14 13:06:02 +01:00
rdk
b3e45aaab5 add alphafold models 2021-12-14 01:08:46 +01:00
rdk
92bba316f5 update test script 2021-12-13 18:51:13 +01:00
rdk
637cfe7998 add implementation of transform/reduce-to-chains command 2021-12-12 11:07:11 +01:00
rdk
e6b7b38a7b update test script 2021-11-29 22:25:18 +01:00
rdk
79b3c9a2f5 update test script 2021-11-29 22:09:49 +01:00
rdk
88ce5c9829 update testsets to ignore missing ligands as before 2021-11-22 01:21:19 +01:00
rdk
a17756d62b update testsets to ignore missing ligands as before 2021-11-21 20:45:19 +01:00
rdk
24d911d49f update dev configs and test script 2021-07-21 21:17:56 +02:00
rdk
d8c059a076 add configs for training new default models 2021-07-19 13:51:29 +02:00
rdk
743f06eba3 update test script 2021-07-15 21:42:19 +02:00
rdk
76c7149afd update testsets with conservation 2021-07-14 12:37:20 +02:00
rdk
1506e503bb add more testsets with conservation 2021-07-14 11:45:14 +02:00
rdk
94a2f69083 update test script 2021-07-11 05:59:04 +02:00
rdk
3a1f8e2bf9 update test script 2021-07-11 05:54:56 +02:00
rdk
afcb57661a fix testsets script 2021-07-09 19:32:47 +02:00
rdk
a92151891c add feature importances calculation also for RandomForest, add to tutorial 2021-07-09 19:23:33 +02:00
rdk
4c2aefab5a fix testsets 2021-07-01 13:16:55 +02:00
rdk
4606500b1f add fasta export commands 2021-07-01 13:12:21 +02:00