Unifies the per-pocket descriptor framework with the per-grid-point
framework: same shape (name + columnNames + columnTypes + double[]
compute), same multi-column "{name}.{col}" header convention, same
public register / unregister / dup-column-check registry. Shipped as
breaking change behind the same -pocket_descriptors knob.
Interface change:
String name();
List<String> columnNames();
List<ColumnType> columnTypes();
double[] compute(PocketGridContext);
boolean needsGrid(); // unchanged
Scalar descriptors stay one-liners via the new
AbstractScalarPocketDescriptor adapter (name + scalarType +
computeScalar). The 6 existing descriptors migrated; behavior and
output byte-identical to before.
New descriptor: PrincipalMomentsDescriptor (3 × DOUBLE) — the three
eigenvalues of the pocket grid points' gyration tensor, sorted
descending. Implementation uses Apache Commons Math 3
EigenDecomposition. Shape signature complement to sphericity /
radius_of_gyration; sum equals radius_of_gyration² (verified in test).
Added to the default -pocket_descriptors list.
Default list reordered to put num_* (cheap, integer-valued) first,
then geometric scalars, then principal_moments:
num_residues, num_surface_atoms, num_grid_points,
volume, sphericity, radius_of_gyration,
principal_moments
Tests:
- 5 new PrincipalMomentsDescriptor tests (cube isotropy, rod-shape
eigenvalues, sort order, degenerate empty/single, sum=Rg²)
- PocketDescriptorsRowsTest +2 (multi-column prefix rule, mixed
scalar + multi ordering)
- existing 13 callsites updated for the double[] return signature
- columnType() registry test → columnTypes()
User-visible change: the default -pocket_descriptors output now has
three new columns (principal_moments.lambda1/2/3) and the existing
columns appear in a different order. Scripts parsing by column name
are unaffected; scripts parsing by column index need updating.
7.2 KiB
Exporting Pocket Descriptors
Per-pocket geometric/chemical descriptors (volume, sphericity, residue
counts, etc.) written to a tabular file alongside any predict or
rescore run when -export_pocket_descriptors is on.
Cost note. Most descriptors (
volume,sphericity,radius_of_gyration,num_grid_points) are derived from the pocket grid, so selecting any of them triggers the full grid build (lattice generation + per-pocket assignment + shape fill) even with-export_pocket_grid 0—-export_pocket_grid 0only suppresses the per-protein grid file, not the computation.num_residuesandnum_surface_atomsdo not need the grid; selecting only those two with-export_pocket_grid 0skips the grid build entirely (a near zero-cost descriptors export).
Quick start
# Default: every shipped descriptor (num_residues, num_surface_atoms,
# num_grid_points, volume, sphericity, radius_of_gyration, principal_moments)
prank predict -f protein.pdb -export_pocket_descriptors 1
# Narrow set + tighter grid for more accurate volume/sphericity
prank predict dataset.ds -export_pocket_descriptors 1 \
-pocket_descriptors "volume,sphericity" \
-pocket_grid_spacing 0.75
# Rescoring path also supports it
prank rescore fpocket.ds -export_pocket_descriptors 1
Output format
One row per predicted pocket.
| Column | Type | Notes |
|---|---|---|
name |
string | pocket.name (e.g. pocket.1) |
rank |
i32 | 1-based pocket rank |
score |
f64 | Raw P2Rank pocket score |
probability |
f64 | Calibrated probability from the score transformer. Column is omitted entirely when no transformer ran |
center_x, center_y, center_z |
f64 | Pocket centroid coordinates |
| (one or more columns per requested descriptor) | f64 / i32 | See descriptor catalog below |
Descriptor columns appear in the order given on the command line via
-pocket_descriptors. Most descriptors emit a single column whose header is
the descriptor name; multi-column descriptors emit N columns prefixed with
"{name}." (e.g. principal_moments.lambda1, principal_moments.lambda2,
principal_moments.lambda3).
Descriptor catalog
| Name | Columns | Definition |
|---|---|---|
volume |
1 × f64 | Pocket volume in ų: |assigned grid points| × pocket_grid_spacing³. Accuracy scales with the lattice spacing (smaller pocket_grid_spacing → finer estimate). |
sphericity |
1 × f64 ∈ [0, 1] | V_pocket / V_bounding_sphere. Bounding sphere is centered at the centroid of the pocket's grid points (not pocket.centroid which is atom-derived); radius is the max distance from that centroid. Quantization-free. 1 = perfect sphere; ≪ 1 = elongated / irregular. |
radius_of_gyration |
1 × f64 | Radius of gyration in Å: sqrt(mean(|r_i - r_cm|²)) over the pocket's grid points (equal weights). Absolute spatial extent — pairs well with sphericity, which only captures compactness. 0 for empty / single-point pockets. |
num_residues |
1 × i32 | Number of distinct residues touching the pocket (reuses Pocket.getResidues()). |
num_surface_atoms |
1 × i32 | Size of pocket.surfaceAtoms. |
num_grid_points |
1 × i32 | Total grid points assigned to the pocket (cardinality of the BitSet after shape fill). Raw count complement to volume. |
principal_moments |
3 × f64 | Three eigenvalues of the pocket grid points' gyration tensor (equal-weight PCA), sorted descending: principal_moments.lambda1 ≥ lambda2 ≥ lambda3. Unit Ų. Shape signature: λ₁≈λ₂≈λ₃ → sphere; λ₁≫λ₂,λ₃ → rod; λ₁≈λ₂≫λ₃ → disk. Sum equals radius_of_gyration². 0s for pockets with <2 grid points. |
-pocket_descriptors defaults to all of the above — they share the
pocket-grid input, so adding more is essentially free once the grid is
built. To narrow the set, list the wanted names comma-separated.
Unknown names cause a fail-fast error at startup with the list of
registered names.
Parameters
The descriptors file shares all of the pocket-grid params (the grid is built once and reused). The descriptor-specific knobs are:
| Parameter | Default | Notes |
|---|---|---|
export_pocket_descriptors |
false |
Master gate |
pocket_descriptors |
all shipped descriptors | List of descriptor names to compute. See catalog above. |
pocket_grid_format |
csv.gz |
Same allowed values as the grid file |
The grid generator's params (pocket_grid_spacing, _max_dist,
_atom_buffer, _assign_cutoff, _fill, _fill_*) directly affect
the volume and sphericity descriptors — see
export-pocket-grid.md.
Adding a new descriptor
Implementations live under
src/main/groovy/cz/siret/prank/program/routines/predict/output/descriptors/.
-
Implement the
PocketDescriptorinterface:String name(); // CLI token and multi-column header prefix List<String> columnNames(); // sub-names; scalar entry IGNORED at output List<ColumnType> columnTypes(); // parallel to columnNames() double[] compute(PocketGridContext); // same length as columnNames() boolean needsGrid(); // default true; override to false if compute() // doesn't read ctx.grid() or ctx.gridPointIndices()PocketGridContextexposespocket,protein,grid, and the per-pocketgridPointIndicesset. If yourcompute()only readsctx.pocket()(i.e., domain fields likesurfaceAtomsorresidues), overrideneedsGrid()to returnfalse— that lets the orchestrator skip the full grid build when only grid-free descriptors are selected.For scalar descriptors (one column), extend
AbstractScalarPocketDescriptorinstead of implementing the interface directly — it boils the boilerplate down toname(),scalarType(), andcomputeScalar(ctx). The 6 base shipped descriptors use this adapter.For multi-column descriptors (e.g.
principal_momentswith three eigenvalues from a single decomposition), implementPocketDescriptordirectly; output column headers are"{name()}.{columnNames()[i]}". -
Register the implementation in
PocketDescriptorRegistry's static initializer (Java; no auto-discovery). The registry rejects descriptors that declare duplicatecolumnNamesat registration time. -
Users can opt into it by name via
-pocket_descriptors "volume,my_new_descriptor". -
To include it in the default output, also add the name to the
pocket_descriptorsdefault list inParams.groovy. The default is declared explicitly rather than derived fromRegistry.knownNames()so that adding a descriptor doesn't silently change every existing user's output schema — that's intentional; skip step 4 if the new descriptor is opt-in only.
INT columns return their value as a double that the writer downcasts at
output time, matching the existing TableData convention. Implementations
must guarantee the value fits in i32.
See also
export-pocket-grid.md— the underlying grid that volume/sphericity are computed againstexport-points.md— SAS-points export