Files
p2rank/documentation/export-pocket-descriptors.md
rdk 6fad858bc6 Audit follow-ups: bug fix, doc refresh, exception taxonomy, test hardening
Bug fix:
- PrincipalMomentsDescriptor.clampNonNegative now also clamps NaN. The
  v<0 check was false for NaN, so a NaN eigenvalue (possible if a future
  code path bypasses GridGenerator.isFiniteBox) would have propagated
  to the CSV output.

Doc refresh:
- breaking-changes.md: 2.6 entry for the multi-column descriptor
  migration + the -vis_pocket_grid / pocket_grid_vis_* renames.
- export-pocket-descriptors.md: step 4 rewrites a self-contradicting
  rationale — adding to the default list IS a breaking change for
  index-based parsers; recommends parse-by-name + breaking-changes.md
  note for future additions.
- export-pocket-grid.md: added "Adding a new per-grid-point descriptor"
  recipe (parallel to the per-pocket one); unified √3/2 precision to
  0.866 across docs and Params.groovy.
- README.md: added an "Opt-in tabular exports" subsection mentioning
  -export_pocket_descriptors, -export_pocket_grid, -vis_pocket_grid.
- testsets.sh "Full descriptor menu" now lists all seven shipped
  descriptors (was six).

Exception taxonomy:
- PocketDescriptorsRows.groovy and PocketGridBuilder.java now throw
  PrankException (was IllegalArgumentException) for user-facing config
  errors, matching the rest of the codebase.

Registry hardening:
- Both PocketDescriptorRegistry and PocketGridPointDescriptorRegistry
  now assert columnNames.size() == columnTypes.size() in register().
  A future descriptor with mismatched lists fails fast at class-load.

Quality fixes:
- PocketGridRows.getColumn uses BASE_COLS-1 instead of literal 3 for
  the pocket column. Removed dead 2-arg PocketGridRows constructor
  (only 3 test sites used it; now inlined).
- PocketGridPointContext gets a compact-constructor validator that
  rejects negative pointIndex/pocketRank, limiting blast radius of an
  int-arg swap.

Test hardening:
- VolsiteSmoothGridPointDescriptorTest + VolsiteGridPointDescriptorTest
  now pin sigma/radius in @BeforeEach AND restore in @AfterEach, so
  the Params singleton is clean for subsequent test classes.
- New tests: HIS ND1 double-flag (single atom setting donor+acceptor),
  PrincipalMoments at cardinality=2, PrincipalMoments two coincident
  points, GridGenerator NaN-box throw, PocketDescriptorRegistry
  register/unregister round-trip, MorphologicalCloser maxIters=1.
- Renamed respectsMaxIters → maxItersZeroIsNoOp (the test only covered
  the maxIters=0 case despite the general name); added maxIters=1
  companion that verifies one iteration of fill actually runs.
- Extracted RendererTestFixtures.tinyGrid (was byte-identical in both
  renderer test files); unified the volsite atomAt signatures so the
  parameter order can't get swapped between the two volsite tests.
2026-05-19 15:36:12 +02:00

7.7 KiB
Raw Blame History

Exporting Pocket Descriptors

Per-pocket geometric/chemical descriptors (volume, sphericity, residue counts, etc.) written to a tabular file alongside any predict or rescore run when -export_pocket_descriptors is on.

Cost note. Most descriptors (volume, sphericity, radius_of_gyration, num_grid_points) are derived from the pocket grid, so selecting any of them triggers the full grid build (lattice generation + per-pocket assignment + shape fill) even with -export_pocket_grid 0-export_pocket_grid 0 only suppresses the per-protein grid file, not the computation. num_residues and num_surface_atoms do not need the grid; selecting only those two with -export_pocket_grid 0 skips the grid build entirely (a near zero-cost descriptors export).

Quick start

# Default: every shipped descriptor (num_residues, num_surface_atoms,
# num_grid_points, volume, sphericity, radius_of_gyration, principal_moments)
prank predict -f protein.pdb -export_pocket_descriptors 1

# Narrow set + tighter grid for more accurate volume/sphericity
prank predict dataset.ds -export_pocket_descriptors 1 \
    -pocket_descriptors "volume,sphericity" \
    -pocket_grid_spacing 0.75

# Rescoring path also supports it
prank rescore fpocket.ds -export_pocket_descriptors 1

Output format

One row per predicted pocket.

Column Type Notes
name string pocket.name (e.g. pocket.1)
rank i32 1-based pocket rank
score f64 Raw P2Rank pocket score
probability f64 Calibrated probability from the score transformer. Column is omitted entirely when no transformer ran
center_x, center_y, center_z f64 Pocket centroid coordinates
(one or more columns per requested descriptor) f64 / i32 See descriptor catalog below

Descriptor columns appear in the order given on the command line via -pocket_descriptors. Most descriptors emit a single column whose header is the descriptor name; multi-column descriptors emit N columns prefixed with "{name}." (e.g. principal_moments.lambda1, principal_moments.lambda2, principal_moments.lambda3).

Descriptor catalog

Name Columns Definition
volume 1 × f64 Pocket volume in ų: |assigned grid points| × pocket_grid_spacing³. Accuracy scales with the lattice spacing (smaller pocket_grid_spacing → finer estimate).
sphericity 1 × f64 ∈ [0, 1] V_pocket / V_bounding_sphere. Bounding sphere is centered at the centroid of the pocket's grid points (not pocket.centroid which is atom-derived); radius is the max distance from that centroid. Quantization-free. 1 = perfect sphere; ≪ 1 = elongated / irregular.
radius_of_gyration 1 × f64 Radius of gyration in Å: sqrt(mean(|r_i - r_cm|²)) over the pocket's grid points (equal weights). Absolute spatial extent — pairs well with sphericity, which only captures compactness. 0 for empty / single-point pockets.
num_residues 1 × i32 Number of distinct residues touching the pocket (reuses Pocket.getResidues()).
num_surface_atoms 1 × i32 Size of pocket.surfaceAtoms.
num_grid_points 1 × i32 Total grid points assigned to the pocket (cardinality of the BitSet after shape fill). Raw count complement to volume.
principal_moments 3 × f64 Three eigenvalues of the pocket grid points' gyration tensor (equal-weight PCA), sorted descending: principal_moments.lambda1lambda2lambda3. Unit Ų. Shape signature: λ₁≈λ₂≈λ₃ → sphere; λ₁≫λ₂,λ₃ → rod; λ₁≈λ₂≫λ₃ → disk. Sum equals radius_of_gyration². 0s for pockets with <2 grid points.

-pocket_descriptors defaults to all of the above. The grid-derived scalar descriptors share the same pocket-grid input, so adding or removing them costs essentially nothing once the grid is built. principal_moments adds a small 3×3 eigendecomposition per pocket — also negligible relative to the grid build itself. To narrow the set, list the wanted names comma-separated. Unknown names cause a fail-fast error at startup with the list of registered names.

Parameters

The descriptors file shares all of the pocket-grid params (the grid is built once and reused). The descriptor-specific knobs are:

Parameter Default Notes
export_pocket_descriptors false Master gate
pocket_descriptors all shipped descriptors List of descriptor names to compute. See catalog above.
pocket_grid_format csv.gz Same allowed values as the grid file

The grid generator's params (pocket_grid_spacing, _max_dist, _atom_buffer, _assign_cutoff, _fill, _fill_*) directly affect the volume and sphericity descriptors — see export-pocket-grid.md.

Adding a new descriptor

Implementations live under src/main/groovy/cz/siret/prank/program/routines/predict/output/descriptors/.

  1. Implement the PocketDescriptor interface:

    String name();                          // CLI token and multi-column header prefix
    List<String> columnNames();             // sub-names; scalar entry IGNORED at output
    List<ColumnType> columnTypes();         // parallel to columnNames()
    double[] compute(PocketGridContext);    // same length as columnNames()
    boolean needsGrid();                    // default true; override to false if compute()
                                            // doesn't read ctx.grid() or ctx.gridPointIndices()
    

    PocketGridContext exposes pocket, protein, grid, and the per-pocket gridPointIndices set. If your compute() only reads ctx.pocket() (i.e., domain fields like surfaceAtoms or residues), override needsGrid() to return false — that lets the orchestrator skip the full grid build when only grid-free descriptors are selected.

    For scalar descriptors (one column), extend AbstractScalarPocketDescriptor instead of implementing the interface directly — it boils the boilerplate down to name(), scalarType(), and computeScalar(ctx). Of the seven shipped descriptors, six use this adapter; principal_moments (multi-column) implements PocketDescriptor directly.

    For multi-column descriptors (e.g. principal_moments with three eigenvalues from a single decomposition), implement PocketDescriptor directly; output column headers are "{name()}.{columnNames()[i]}".

  2. Register the implementation in PocketDescriptorRegistry's static initializer (Java; no auto-discovery). The registry rejects descriptors that declare duplicate columnNames at registration time.

  3. Users can opt into it by name via -pocket_descriptors "volume,my_new_descriptor".

  4. To include it in the default output, also add the name to the pocket_descriptors default list in Params.groovy. The default is declared explicitly (rather than derived from Registry.knownNames()) so each addition to the default schema is a conscious choice — but adding to the default IS a user-visible breaking change for anyone parsing the output by column index. Two recommendations:

    • Parse the descriptors file by column name, not by column index.
    • When you add a descriptor to the default list, note it in breaking-changes.md.

    Skip step 4 if the new descriptor is opt-in only.

INT columns return their value as a double that the writer downcasts at output time, matching the existing TableData convention. Implementations must guarantee the value fits in i32.

See also