- Params.groovy: pocket_descriptors javadoc now lists all 7 shipped
descriptors (was: 6); softens the "essentially free" rationale to
acknowledge principal_moments' small eigendecomposition cost.
- PocketDescriptorsTest.groovy: class javadoc "six shipped descriptors"
→ "seven", names principal_moments alongside the rest.
- export-pocket-descriptors.md: "6 base shipped descriptors use this
adapter" → "6 of 7 use the adapter; principal_moments (multi-column)
implements PocketDescriptor directly". Removes a misleading count.
- export-pocket-{grid,descriptors}.md: default-list rationale no longer
claims adding descriptors is "essentially free" — clarifies that
grid-derived scalars are cheap once the grid is built but
principal_moments adds a small per-pocket compute on top, still
negligible vs the grid build.
Caught by deep audit of 60220d7a..73e7c9df focused on doc/comment drift
after the recent multi-column interface migration.
Unifies the per-pocket descriptor framework with the per-grid-point
framework: same shape (name + columnNames + columnTypes + double[]
compute), same multi-column "{name}.{col}" header convention, same
public register / unregister / dup-column-check registry. Shipped as
breaking change behind the same -pocket_descriptors knob.
Interface change:
String name();
List<String> columnNames();
List<ColumnType> columnTypes();
double[] compute(PocketGridContext);
boolean needsGrid(); // unchanged
Scalar descriptors stay one-liners via the new
AbstractScalarPocketDescriptor adapter (name + scalarType +
computeScalar). The 6 existing descriptors migrated; behavior and
output byte-identical to before.
New descriptor: PrincipalMomentsDescriptor (3 × DOUBLE) — the three
eigenvalues of the pocket grid points' gyration tensor, sorted
descending. Implementation uses Apache Commons Math 3
EigenDecomposition. Shape signature complement to sphericity /
radius_of_gyration; sum equals radius_of_gyration² (verified in test).
Added to the default -pocket_descriptors list.
Default list reordered to put num_* (cheap, integer-valued) first,
then geometric scalars, then principal_moments:
num_residues, num_surface_atoms, num_grid_points,
volume, sphericity, radius_of_gyration,
principal_moments
Tests:
- 5 new PrincipalMomentsDescriptor tests (cube isotropy, rod-shape
eigenvalues, sort order, degenerate empty/single, sum=Rg²)
- PocketDescriptorsRowsTest +2 (multi-column prefix rule, mixed
scalar + multi ordering)
- existing 13 callsites updated for the double[] return signature
- columnType() registry test → columnTypes()
User-visible change: the default -pocket_descriptors output now has
three new columns (principal_moments.lambda1/2/3) and the existing
columns appear in a different order. Scripts parsing by column name
are unaffected; scripts parsing by column index need updating.
Bug fixes:
- MorphologicalCloser: gate the "didn't converge" warning on maxIters>0.
maxIters=0 is a valid "disable fill" config and would otherwise log
spuriously on every protein.
- GridGenerator: hoist the isFiniteBox NaN guard into the (Box, edge)
ctor so both sampleGridPointsBetween and sampleGridPointsAroundAtoms
are covered (the second sampler was previously unguarded — used by
the training/feature path).
- PocketGridPdbSidecar.writePerPocket: serial-wrap warning added for
parity with the combined write() path.
Test hardening:
- PocketGridPointDescriptorRegistry: add unregister() so tests can
clean up fixture registrations; PocketGridRowsTest now @AfterAll
unregisters its scalar fixture so it doesn't leak into the JVM-wide
registry.
- VolsiteSmoothGridPointDescriptorTest: pin sigma via @BeforeEach so
other tests mutating the Params singleton can't shift expectations;
new weightAtExactCutoffEqualsExpMinusEight test pins the 4σ-inclusive
cutoff semantic (cutoutSphere is inclusive; exp(-8) ≈ 3.354e-4).
Docs / clarifications:
- Params.pocket_grid_point_descriptors javadoc: the silent-ignore when
-export_pocket_grid=false is intentional (symmetric with
-pocket_descriptors / -export_pocket_descriptors).
- PocketDescriptor javadoc: intentionally scalar-only; recommend
unifying with PocketGridPointDescriptor if multi-col is ever needed
rather than ad-hoc extending this one.
- PocketGridPointDescriptor javadoc: needsGrid() is intentionally
absent — every grid-point descriptor needs the grid by definition.
- documentation/export-pocket-grid.md: explain the default-empty
rationale (cost: per-row × per-atom, not backward-compat).
- VdwRadiusTable.resolveSymbol: comment that the name-prefix isotope
branch is a safety net, not a semantic mapping (e.g. "DA" in DNA
isn't deuterium).
Adds focused regression tests for the new framework: 11 tests in three
new files plus 4 added to PocketGridRowsTest.
PocketGridRowsTest +4
- descriptor schema uses "{name}.{col}" prefix for multi-col
- getRow appends descriptor values after the base 4 columns
- unknown descriptor name throws at construction
- scalar descriptor emits bare name() with no prefix (uses an
inline ScalarTestDescriptor registered via the now-public
registry hook — none of the shipped descriptors are scalar so
the branch was untested)
VolsiteGridPointDescriptorTest (new, 4 tests)
- covers indicator aggregation + radius cutoff
VolsiteSmoothGridPointDescriptorTest (new, 4 tests)
- covers Gaussian kernel arithmetic + 4σ cutoff
PocketGridPointDescriptorRegistryTest (new, 2 tests)
- shipped names resolve, unknown name throws helpful error
DescriptorListValidatorTest (new, 8 tests)
- null/empty/valid/unknown/duplicate/null-entry/blank/dash-prefix
Refactors Main.validateDescriptorList out to a self-contained Java
utility (DescriptorListValidator) under predict/output/. The two call
sites in Main.validatePocketGridParams now invoke the static helper;
the private helper in Main is removed (-37 lines).
PocketGridPointDescriptorRegistry.register is promoted from private to
public so tests (and future external descriptor plugins) can add
descriptors without touching the registry's static initializer. The
shipped registrations still happen at class-load.
Adds an opt-in extension to the pocket-grid export — extra columns per
(point, pocket) row driven by a registry of per-grid-point descriptors.
Mirrors the existing per-pocket descriptor framework (interface, context
record, static registry, name-driven CLI selection).
CLI:
-pocket_grid_point_descriptors list, default []
-pocket_grid_volsite_radius 4.0 Å (volsite indicator cutoff)
-pocket_grid_volsite_sigma 2.0 Å (volsite_smooth Gaussian σ)
Shipped descriptors (both 6-column, prefixed `{name}.`):
volsite INT 0/1 per pharmacophore type within radius
volsite_smooth DOUBLE Gaussian-weighted sum, kernel truncated at 4σ
Atom-level pharmacophore classification reuses VolSitePharmacophore — a
1 in volsite.vsCation here matches a 1 in vsCation from VolsiteFeature.
The 6 VolSite column names now live as VolSitePharmacophore.COLUMN_NAMES
(single source of truth, also used by VolsiteFeature). VolSitePharmacophore
gains a getAtomProperties(Atom) overload that does the PdbUtils hop.
Validation: -pocket_grid_point_descriptors goes through a new shared
validateDescriptorList(names, known, paramName) helper in Main, which
also replaces the open-coded equivalent for -pocket_descriptors. The
two new numeric params are bounds-checked.
- ChimeraX renderer: surfaces-layer rename now iterates the actual rank
set (perPocketBasenames.keySet) instead of 1..maxRank. The previous
code assumed every rank produces a ChimeraX submodel; a rank-skip
would mis-target the rename. Latent today (P2Rank reorders pockets
contiguously) but the assumption is now explicit in the code.
- PdbSidecar: warn when total grid atoms exceed the PDB 5-digit serial
column (wrap still happens; the warning surfaces the limit so users
with very fine grids know why bond-inference tools might misbehave).
- MorphologicalCloser: warn when loop exits at maxIters without
converging, naming the param to raise. Previously silent.
- GridGenerator: throw early on non-finite SAS-point bounding box.
IEEEremainder(NaN, edge) = NaN would otherwise produce a NaN-everywhere
lattice from a broken PDB.
- VdwRadiusTable: map D/T isotopes to H before CDK lookup. Previously
fell through to carbon (1.7 Å instead of hydrogen's 1.2 Å); marginal
effect because of the atom_buffer cushion but no reason to be wrong.
- PocketDescriptorsRows: throw at construction if grid==null and any
selected descriptor declares needsGrid()=true, instead of NPEing
inside compute(). The upstream gate in PocketGridOutputs already
honors this; the guard catches programming errors elsewhere.
- testsets.sh: 4 sites still invoking -export_pocket_grid_pml after the
rename; they were hard-failing at startup.
- PocketGridPymolRenderer javadoc: pocket_dens_N -> pocket_gauss_N (3
refs), pocket_vol_N default ON not OFF (changed long ago in 82daf58a).
- documentation/export-pocket-grid.md: vis_pocket_grid_volume_radius
default is the -1 sentinel, not the auto-scaled 1.02 Å; ChimeraX layers
doc now shows the #99 (spheres) + #100 (surfaces) split.
- Main.validatePocketGridParams: numeric range checks for spacing,
max_dist, atom_buffer, assign_cutoff, fill_min_neighbors (must lie in
the 26-neighborhood), fill_max_iters, vis_pocket_grid_volume_radius
(-1 sentinel or strictly positive), and gaussian_iso. Catches values
that would otherwise produce a NaN lattice, empty grid, or garbage
passed to PyMOL/ChimeraX.
- Add SwinSite and Seq2Pocket rows to the supported methods table, with
GitHub + paper links and a note that they point at per-protein
directories rather than single files
- Add a "Rescoring directory-based predictions" example covering the
per-directory dataset pattern
- Add a "Conservation-aware rescoring" section documenting
-c rescore_conservation and the .hom file requirement
- Quick Start: add a swinsite example line
Both fpocket and Seq2Pocket loaders could previously produce a pocket
with a null centroid that NPEs downstream feature extraction:
- FPocketLoader: skip the pocket if its voronoi-centers het group is
empty (Atoms.centerOfMass returns null on empty list). Guard runs
before rank assignment so surviving ranks stay sequential.
- Seq2PocketLoader: skip the pocket if the input named atom serials
but none resolved against queryProtein.allAtoms (otherwise the
pocket would carry empty surfaceAtoms and null centroid). Real
inputs rarely trigger this; synthetic test covers it.
Neither path is expected with well-formed input; both fixes are
defensive.
Parses per-protein <ID>_predictions.txt (semicolon CSV) and resolves
atom_ids against queryProtein.allAtoms by PDB serial. Empty/header-only
files produce 0 pockets gracefully. Prediction is bound to the
caller-supplied queryProtein, avoiding the ConcavityLoader bug class.
- Dataset.groovy: new case "seq2pocket"
- README.md: list SwinSite and Seq2Pocket in rescoring methods;
cite pocketeer.ds + swinsite.ds in test_data/ examples
- CLAUDE.md: note that distro/README.md is a transient build artifact
- Test fixtures: 5 real predictions under distro/test_data/, plus
unsorted/header-only/path-independence variants under src/test/resources/
- Seq2PocketLoaderTest: 10 tests, all passing
- GenericVector.toList(): replace deprecated DefaultGroovyMethods.toList
(Groovy 5) with a plain Java loop; drop unused addTo() (no callers)
- Atoms(List<? extends Atom>): @SuppressWarnings("unchecked") for the
intentional wrap-without-copy
- KdNode.splitLeafNode: @SuppressWarnings("unchecked") for casts from
the Object[] backing store
- Drop dead mask_unknown_residues=true from default(_rescore).groovy
(param removed from Params.groovy in 1b7809a6, 2019; configs missed)
- Rewrite distro/models/readme.md to match models on disk (add rescore_2024,
rescore_conservation; remove nonexistent conservation.model)
- Remove broken documentation/rescoring.md link from distro/README.md
- distro/config/readme.md: drop nonexistent working.groovy reference,
fix github link master->develop
- Delete dead commented-out method bodies in PdbUtils, RPlotter,
PredictionVisualizer
- Fix typo in Main.groovy javadoc
Bumps faster-molecular-surface 1.0 -> 1.1, vendored in
lib/local-mvn-repo/. The 1.1 release adds a VdW radius fallback for
elements whose CDK Elements enum entry is null (Co, Ni, Cu, Rh, Os, Ir,
plus radioactive/synthetic). Without the fix, cobalamin-bearing
structures crashed surface computation under -cofactors.
PatchedCdkNumericalSurface wraps the default CDK NumericalSurface (used
when -use_optimized_surface 0) with the same fallback, via a Krypton
proxy for null-VdW atoms. Surface.groovy switched over to it. Unit tests
mirror the FMS-side regressions.
AnalyzeRoutine.cmdCofactors: replace Struct.getHetGroups with
Struct.getLigandGroups (2 call sites) so GDP/GTP/ATP and other groups
that BioJava classifies as NUCLEOTIDE/AMINOACID don't get falsely
reported as "name not in structure" in cofactor_matches.csv or omitted
from het_groups.csv. Mirrors the M1 fix applied earlier to
CofactorHandler.extractCofactorAtoms.
testsets.sh: new cofactors_full() function exercising the cofactor
demo + full datasets in p2rank-datasets2/other/cofactors/ (predict,
analyze cofactors, -aa_mapping composition, visualizations,
export-points). Uses -fail_fast 1 so per-structure errors surface as
test failures rather than silent skips.
The -cofactors flag and dataset cofactors column accept LigandDefinition
specifiers ("FAD", "FAD[atom_id:N]", "FAD[contact_res_ids:A_T259,A_D246]").
Matched HET groups merge into the protein surface (proteinAtoms) and are
excluded from ligand listings; per-item resolution lets a dataset column
override the global Params.cofactors.
New: analyze cofactors subcommand (HETATM survey + specifier dry-run),
PyMOL teal-stick visualization (vis_highlight_cofactors), distant-cofactor
and chain-excluded WARN diagnostics, aa_mapping collision WARN (R19),
drop-in safety benchmark with byte-equality on a never-present specifier.
Documentation in documentation/cofactors.md (user-facing) and
documentation/dev/cofactors.md (engineering record with R1-R24 design choices
and post-merge audit fixes). Tests in CofactorHandlerTest,
CofactorIntegrationTest, CofactorPipelineTest, CofactorAnalyzeTest,
DataTableCsvTest plus a Log4jCapture test helper.
Registers `swinsite` as a third-party predictor in Dataset.groovy. The
loader reads grid<N>_score_<float>.mol2 (raw voxel points) per pocket,
parses score from the filename, computes pocket centroid from the grid,
and derives surfaceAtoms via cutoutShell against queryProtein.exposedAtoms
(4.5 -> 10 A expanding shell), mirroring ConcavityLoader.
Reads grid mol2 instead of pocket mol2: pocket mol2 atoms are standalone
copies with chain reset to 'A' and synthetic residue names, so they break
P2Rank's residue/conservation/ASA feature lookups. Grid + cutoutShell
keeps surfaceAtoms bound to real queryProtein atoms.
Mol2 parsing is a small inline @<TRIPOS>ATOM scan rather than CDK's
Mol2Reader: CDK has a lazy-init race in AtomTypeFactory that NPEs under
parallel dataset processing.
Ships swinsite.ds plus 6 protein PDBs (1tjw_A from SwinSite's
test_protein_only example, plus 1a26A/1a2kC/1afkA/1atlA/1bqoB from
coach420) covering 1/2/3/4/6-pocket cases. 1atlA's on-disk N-order is
non-monotonic in score (0.7288, 0.0664, 0.3433), exercising the rerank.
SwinSiteLoaderTest covers all six fixtures plus the
predictionIsBoundToQueryProtein contract and empty-dir tolerance.
The score and pocket columns share the same predict/rescore-only
origin, so describe them together in the prose, the export-points
"not contained" caveat, the predict/rescore output description, and
the "Which command to use?" table.
Add documentation/dev/evaluation-metric-fixes-2.6.md covering DSO/DSWO integer-
division fixes, the ResidueSite DCC centroid fix, and the BioJava GroupType
ligand-detection fix. Mention the ligand-detection change in breaking-changes.md
since it shifts DCA/DCC on datasets containing GDP/GTP/ATP/SHR-like ligands.
The points export (predict/rescore -export_points 1) now includes an
integer 'pocket' column matching newRank in *_predictions.csv, so users
can directly aggregate per-pocket descriptors without a spatial join.
Standalone 'export-points' (no prediction) omits the column.
Pocket-extension shells can overlap, so a single SAS point can sit in
multiple pocket.labeledPoints lists. Previously the assignment loop
last-write-wins gave the worst rank to shared points, which was
counter-intuitive for both visualization (PredictionVisualizer PDB
output) and descriptor aggregation. PocketRescorer.setNewRanks now
iterates pockets best-first with a guard, so the lowest newRank wins;
the redundant lp.pocket write in PocketPredictor is removed.
TableData gains a per-column ColumnType (DOUBLE default, INT) so
TableExporter emits true integers in CSV (no decimals), Arrow (Int32),
and Parquet (INT32) for the pocket column.
Bump version to 2.6.0-dev.8.
ConcavityLoader.loadPrediction was ignoring its queryProtein parameter
and binding the returned Prediction to a Protein loaded from
*_residue.pdb (a pocket-touching residue subset, not the full protein).
Downstream features keyed on prediction.protein.fileName then resolved
against the wrong basename — most visibly conservation lookup, which
searched for "<ID>_<submethod>_residue_<chain>.hom" instead of
"<ID>_<chain>.hom" and silently produced zero conservation features.
Other feature extractors were similarly reading the truncated atom set.
The residue subset is still loaded and used to define the per-pocket
surface-atom shell (no behaviour change there), but the Prediction is
now bound to queryProtein, matching FPocketLoader and PUResNetLoader.
Add ConcavityLoaderTest plus a matching test in FPocketLoaderTest that
assert the loader-contract invariant prediction.protein === queryProtein.
PUResNet pocket PDBs occasionally left-shift the residue insertion code
into column 26 instead of column 27, breaking BioJava's strict resSeq
parser with NumberFormatException and silently dropping affected
predictions (216 of 9955 entries on holo4k+pdbbind2020).
Add PUResNetPdbRepair which detects the malformed pattern and rewrites
it in memory before parsing. Wire PUResNetLoader through it. PdbUtils
and the rest of the load path are unchanged.
- Replace manual line.split(",") with Apache Commons CSV (column-name access)
- Support both reduced (9-col) and full (59-col) ahoj_ubs CSV formats
- Add AhojSiteInfo: typed data class for 14 pocket metadata fields
- Add secondaryData map to ResidueSite for extensible metadata
- Export AhojSiteInfo columns in observed_sites.csv when available
- Add comprehensive parser tests for both CSV formats
- Add test data files and format documentation
Protein.sites now holds ground-truth binding sites for both ligand-defined
and explicit (residue-based) evaluation modes. Sites are populated from
ligands via populateSitesFromLigands() when no explicit sites are defined.
- Add predictedPocket and setSasPoints to BindingSite interface
- Add predictedPocket field to ResidueSite
- Rename assignPocketsToLigands to assignPocketsToSites (works on BindingSite)
- Update calcCoveragesProt to use BindingSite.predictedPocket
- Determine isLigandMode via instanceof instead of sites.isEmpty()
- Unify PymolRenderer sites/ligands branch into single BindingSite loop
- Simplify AnalyzeRoutine.cmdBindingSiteCenters to use p.sites directly
- Rename SiteCentroidMethod to SiteCenterMethod
- Extract getCenterForMethod(SiteCenterMethod) into BindingSite interface
for thread-safe, param-independent center calculation
- Refactor Ligand/ResidueSite getCenterForEval() to delegate to getCenterForMethod()
- Add analyze binding-site-centers command comparing all center methods per site
- Add Dataset.Result.writeErrorsAndGetSummary() and use it across all
AnalyzeRoutine commands for consistent error reporting to both console and CSV