Bug fix: - PrincipalMomentsDescriptor.clampNonNegative now also clamps NaN. The v<0 check was false for NaN, so a NaN eigenvalue (possible if a future code path bypasses GridGenerator.isFiniteBox) would have propagated to the CSV output. Doc refresh: - breaking-changes.md: 2.6 entry for the multi-column descriptor migration + the -vis_pocket_grid / pocket_grid_vis_* renames. - export-pocket-descriptors.md: step 4 rewrites a self-contradicting rationale — adding to the default list IS a breaking change for index-based parsers; recommends parse-by-name + breaking-changes.md note for future additions. - export-pocket-grid.md: added "Adding a new per-grid-point descriptor" recipe (parallel to the per-pocket one); unified √3/2 precision to 0.866 across docs and Params.groovy. - README.md: added an "Opt-in tabular exports" subsection mentioning -export_pocket_descriptors, -export_pocket_grid, -vis_pocket_grid. - testsets.sh "Full descriptor menu" now lists all seven shipped descriptors (was six). Exception taxonomy: - PocketDescriptorsRows.groovy and PocketGridBuilder.java now throw PrankException (was IllegalArgumentException) for user-facing config errors, matching the rest of the codebase. Registry hardening: - Both PocketDescriptorRegistry and PocketGridPointDescriptorRegistry now assert columnNames.size() == columnTypes.size() in register(). A future descriptor with mismatched lists fails fast at class-load. Quality fixes: - PocketGridRows.getColumn uses BASE_COLS-1 instead of literal 3 for the pocket column. Removed dead 2-arg PocketGridRows constructor (only 3 test sites used it; now inlined). - PocketGridPointContext gets a compact-constructor validator that rejects negative pointIndex/pocketRank, limiting blast radius of an int-arg swap. Test hardening: - VolsiteSmoothGridPointDescriptorTest + VolsiteGridPointDescriptorTest now pin sigma/radius in @BeforeEach AND restore in @AfterEach, so the Params singleton is clean for subsequent test classes. - New tests: HIS ND1 double-flag (single atom setting donor+acceptor), PrincipalMoments at cardinality=2, PrincipalMoments two coincident points, GridGenerator NaN-box throw, PocketDescriptorRegistry register/unregister round-trip, MorphologicalCloser maxIters=1. - Renamed respectsMaxIters → maxItersZeroIsNoOp (the test only covered the maxIters=0 case despite the general name); added maxIters=1 companion that verifies one iteration of fill actually runs. - Extracted RendererTestFixtures.tinyGrid (was byte-identical in both renderer test files); unified the volsite atomAt signatures so the parameter order can't get swapped between the two volsite tests.
7.7 KiB
Exporting Pocket Descriptors
Per-pocket geometric/chemical descriptors (volume, sphericity, residue
counts, etc.) written to a tabular file alongside any predict or
rescore run when -export_pocket_descriptors is on.
Cost note. Most descriptors (
volume,sphericity,radius_of_gyration,num_grid_points) are derived from the pocket grid, so selecting any of them triggers the full grid build (lattice generation + per-pocket assignment + shape fill) even with-export_pocket_grid 0—-export_pocket_grid 0only suppresses the per-protein grid file, not the computation.num_residuesandnum_surface_atomsdo not need the grid; selecting only those two with-export_pocket_grid 0skips the grid build entirely (a near zero-cost descriptors export).
Quick start
# Default: every shipped descriptor (num_residues, num_surface_atoms,
# num_grid_points, volume, sphericity, radius_of_gyration, principal_moments)
prank predict -f protein.pdb -export_pocket_descriptors 1
# Narrow set + tighter grid for more accurate volume/sphericity
prank predict dataset.ds -export_pocket_descriptors 1 \
-pocket_descriptors "volume,sphericity" \
-pocket_grid_spacing 0.75
# Rescoring path also supports it
prank rescore fpocket.ds -export_pocket_descriptors 1
Output format
One row per predicted pocket.
| Column | Type | Notes |
|---|---|---|
name |
string | pocket.name (e.g. pocket.1) |
rank |
i32 | 1-based pocket rank |
score |
f64 | Raw P2Rank pocket score |
probability |
f64 | Calibrated probability from the score transformer. Column is omitted entirely when no transformer ran |
center_x, center_y, center_z |
f64 | Pocket centroid coordinates |
| (one or more columns per requested descriptor) | f64 / i32 | See descriptor catalog below |
Descriptor columns appear in the order given on the command line via
-pocket_descriptors. Most descriptors emit a single column whose header is
the descriptor name; multi-column descriptors emit N columns prefixed with
"{name}." (e.g. principal_moments.lambda1, principal_moments.lambda2,
principal_moments.lambda3).
Descriptor catalog
| Name | Columns | Definition |
|---|---|---|
volume |
1 × f64 | Pocket volume in ų: |assigned grid points| × pocket_grid_spacing³. Accuracy scales with the lattice spacing (smaller pocket_grid_spacing → finer estimate). |
sphericity |
1 × f64 ∈ [0, 1] | V_pocket / V_bounding_sphere. Bounding sphere is centered at the centroid of the pocket's grid points (not pocket.centroid which is atom-derived); radius is the max distance from that centroid. Quantization-free. 1 = perfect sphere; ≪ 1 = elongated / irregular. |
radius_of_gyration |
1 × f64 | Radius of gyration in Å: sqrt(mean(|r_i - r_cm|²)) over the pocket's grid points (equal weights). Absolute spatial extent — pairs well with sphericity, which only captures compactness. 0 for empty / single-point pockets. |
num_residues |
1 × i32 | Number of distinct residues touching the pocket (reuses Pocket.getResidues()). |
num_surface_atoms |
1 × i32 | Size of pocket.surfaceAtoms. |
num_grid_points |
1 × i32 | Total grid points assigned to the pocket (cardinality of the BitSet after shape fill). Raw count complement to volume. |
principal_moments |
3 × f64 | Three eigenvalues of the pocket grid points' gyration tensor (equal-weight PCA), sorted descending: principal_moments.lambda1 ≥ lambda2 ≥ lambda3. Unit Ų. Shape signature: λ₁≈λ₂≈λ₃ → sphere; λ₁≫λ₂,λ₃ → rod; λ₁≈λ₂≫λ₃ → disk. Sum equals radius_of_gyration². 0s for pockets with <2 grid points. |
-pocket_descriptors defaults to all of the above. The grid-derived
scalar descriptors share the same pocket-grid input, so adding or removing
them costs essentially nothing once the grid is built. principal_moments
adds a small 3×3 eigendecomposition per pocket — also negligible relative
to the grid build itself. To narrow the set, list the wanted names
comma-separated. Unknown names cause a fail-fast error at startup with
the list of registered names.
Parameters
The descriptors file shares all of the pocket-grid params (the grid is built once and reused). The descriptor-specific knobs are:
| Parameter | Default | Notes |
|---|---|---|
export_pocket_descriptors |
false |
Master gate |
pocket_descriptors |
all shipped descriptors | List of descriptor names to compute. See catalog above. |
pocket_grid_format |
csv.gz |
Same allowed values as the grid file |
The grid generator's params (pocket_grid_spacing, _max_dist,
_atom_buffer, _assign_cutoff, _fill, _fill_*) directly affect
the volume and sphericity descriptors — see
export-pocket-grid.md.
Adding a new descriptor
Implementations live under
src/main/groovy/cz/siret/prank/program/routines/predict/output/descriptors/.
-
Implement the
PocketDescriptorinterface:String name(); // CLI token and multi-column header prefix List<String> columnNames(); // sub-names; scalar entry IGNORED at output List<ColumnType> columnTypes(); // parallel to columnNames() double[] compute(PocketGridContext); // same length as columnNames() boolean needsGrid(); // default true; override to false if compute() // doesn't read ctx.grid() or ctx.gridPointIndices()PocketGridContextexposespocket,protein,grid, and the per-pocketgridPointIndicesset. If yourcompute()only readsctx.pocket()(i.e., domain fields likesurfaceAtomsorresidues), overrideneedsGrid()to returnfalse— that lets the orchestrator skip the full grid build when only grid-free descriptors are selected.For scalar descriptors (one column), extend
AbstractScalarPocketDescriptorinstead of implementing the interface directly — it boils the boilerplate down toname(),scalarType(), andcomputeScalar(ctx). Of the seven shipped descriptors, six use this adapter;principal_moments(multi-column) implementsPocketDescriptordirectly.For multi-column descriptors (e.g.
principal_momentswith three eigenvalues from a single decomposition), implementPocketDescriptordirectly; output column headers are"{name()}.{columnNames()[i]}". -
Register the implementation in
PocketDescriptorRegistry's static initializer (Java; no auto-discovery). The registry rejects descriptors that declare duplicatecolumnNamesat registration time. -
Users can opt into it by name via
-pocket_descriptors "volume,my_new_descriptor". -
To include it in the default output, also add the name to the
pocket_descriptorsdefault list inParams.groovy. The default is declared explicitly (rather than derived fromRegistry.knownNames()) so each addition to the default schema is a conscious choice — but adding to the default IS a user-visible breaking change for anyone parsing the output by column index. Two recommendations:- Parse the descriptors file by column name, not by column index.
- When you add a descriptor to the default list, note it in
breaking-changes.md.
Skip step 4 if the new descriptor is opt-in only.
INT columns return their value as a double that the writer downcasts at
output time, matching the existing TableData convention. Implementations
must guarantee the value fits in i32.
See also
export-pocket-grid.md— the underlying grid that volume/sphericity are computed againstexport-points.md— SAS-points export