mirror of
https://github.com/rdk/p2rank.git
synced 2026-06-04 12:44:24 +08:00
Drop frozen pocket-grid PLAN/SPEC; refine audit punch-list
PLAN.md and SPEC.md were pre-implementation design docs for the pocket-grid feature. The feature has shipped, so they're frozen artifacts in the active todo/ namespace. Delete them and strip the three "see SPEC.md" comments that pointed at SPEC.md from Main.groovy and the predict/rescore routines. Also reassess the PyMOL rank-gap entry in the audit: P2Rank ranks pockets contiguously throughout the predict path and all in-tree loaders (except SiteHoundLoader), so the previously-listed "renderer ignores rank gaps" is cosmetic-only (empty objects in the Models panel for small pockets whose filled BitSet ended up empty). Downgrade to a parity nit under Inconsistencies; promote the PUResNet surfaceAtoms re-linking to the Top-5.
This commit is contained in:
@@ -33,14 +33,6 @@ focused cleanups.
|
||||
validate. Fix: `volatile` + DCL, or do one-shot init under a synchronized guard
|
||||
in `preProcessProtein`; update the concurrency test to construct under contention.
|
||||
|
||||
- **PyMOL pocket-grid renderer ignores rank gaps.**
|
||||
`src/main/groovy/cz/siret/prank/program/visualization/renderers/PocketGridPymolRenderer.groovy:167,201,242`
|
||||
loop `for (rank = 1; rank <= maxRank; rank++)`. If a pocket rank is missing
|
||||
(`{1,3}`), the script emits an empty `pocket_grid_2` and shifts palette indices.
|
||||
ChimeraX was fixed in commit `a3efd084`; PyMOL still has the latent bug.
|
||||
Comment at lines 131-132 even acknowledges it. Fix: iterate
|
||||
`grid.pocketToPointIndices.keySet()` like ChimeraX does.
|
||||
|
||||
- **Coulomb plumbing is dead code.**
|
||||
`EnergyCalculator.getAtomCharge` always returns 0
|
||||
(`src/main/groovy/cz/siret/prank/features/implementation/energy2/calc/EnergyCalculator.groovy:351-357`).
|
||||
@@ -161,6 +153,17 @@ focused cleanups.
|
||||
everything into a `TreeMap` (`EvalResults.groovy:189`) — insertion order is
|
||||
lost. Either drop the misleading comment or use `LinkedHashMap` downstream.
|
||||
|
||||
- **PyMOL pocket-grid renderer iterates `1..maxRank`; ChimeraX iterates
|
||||
`perPocketBasenames.keySet()`.**
|
||||
`PocketGridPymolRenderer.groovy:167,201,242`.
|
||||
Cosmetic-only: P2Rank ranks pockets contiguously (every `predict`-path and
|
||||
in-tree loader except `SiteHoundLoader` assign `i++`/`rank++`), and the
|
||||
sidecar PDB strips ranks whose `filled` BitSet is empty. PyMOL therefore
|
||||
emits empty `pocket_grid_N`/`pocket_vol_N`/`pocket_gauss_N`/`pocket_hull_N`
|
||||
objects when the assigner produced no points for a small pocket — they
|
||||
render as invisible but clutter the Models panel. Mirror the ChimeraX
|
||||
iteration pattern (`a3efd084`) for parity; not a correctness fix.
|
||||
|
||||
- **PyMOL grid `solvent_radius=0` vs ChimeraX non-zero probe.**
|
||||
`PocketGridPymolRenderer.groovy:189-190` vs `PocketGridChimeraXRenderer.groovy:264`.
|
||||
`vis_pocket_grid_volume_radius` means different things to the two renderers.
|
||||
@@ -219,9 +222,6 @@ focused cleanups.
|
||||
- **`documentation/readme.md`** index misses `cofactors.md`, `conservation.md`,
|
||||
`export-pocket-grid.md`, `export-pocket-descriptors.md`.
|
||||
|
||||
- **`misc/todo/pocket_grid/{SPEC,PLAN}.md`** still mention
|
||||
`export_pocket_grid_pml` (renamed to `vis_pocket_grid`).
|
||||
|
||||
- **CI matrix is `17,21,25,26` only** (`.github/workflows/develop.yml:23`).
|
||||
README claims "Java 17 or later (tested up to Java 25)"; 18–20/22/23/24 not
|
||||
exercised; "tested up to Java 25" lags the now-present 26.
|
||||
@@ -369,14 +369,15 @@ focused cleanups.
|
||||
1. **Fix `VoxelHashAssigner` cell-prune lower bound** (or drop it and rely on
|
||||
the post-fetch distance check). Restores the assigner-strategy equivalence
|
||||
the docs promise.
|
||||
2. **Apply the rank-gap fix to `PocketGridPymolRenderer`** — mirror what
|
||||
commit `a3efd084` did for ChimeraX.
|
||||
3. **Make energy-feature lazy-init actually thread-safe**
|
||||
2. **Make energy-feature lazy-init actually thread-safe**
|
||||
(`MethylEnergyFeature`, `AbstractProbeEnergyFeature`); fix `ConcurrencyTest`
|
||||
to construct calculators under contention.
|
||||
4. **Guard `AhojSiteInfo.fromCsvRecord` with `record.isMapped(...)`** for the
|
||||
3. **Guard `AhojSiteInfo.fromCsvRecord` with `record.isMapped(...)`** for the
|
||||
new `rg`/`n_unp_pockets[_multichain]` columns, so the parser doesn't crash
|
||||
on older "full" CSVs.
|
||||
4. **Re-link `PUResNetLoader.surfaceAtoms` to `queryProtein`** by PDB serial
|
||||
(mirror `FPocketLoader.groovy:137`); same identity-mismatch class as the
|
||||
Concavity fix.
|
||||
5. **README/help.txt/`distro/prank.bat` trio**: bump the version badge, fix
|
||||
the `./make-disro.sh` typo, regenerate `help.txt` to list current commands,
|
||||
and bring Windows launcher JVM flags up to parity with the Bash launchers.
|
||||
|
||||
@@ -1,446 +0,0 @@
|
||||
# Plan — Pocket grid points export + per-pocket descriptors
|
||||
|
||||
Companion to `SPEC.md`. Ordered, atomic phases. Each phase is a single
|
||||
reviewable commit (or two if splitting tests helps). Compile + test must
|
||||
be green at the end of every phase.
|
||||
|
||||
## Phase order rationale
|
||||
|
||||
Layered, foundation-first. Each phase only depends on phases above it.
|
||||
|
||||
```
|
||||
1. TableData STRING refactor (foundation, no behavior change)
|
||||
2. VdW radius helper + grid generator (foundation)
|
||||
3. PocketGrid data class + fill strategies
|
||||
4. PocketGridBuilder (orchestration)
|
||||
5. Descriptors infrastructure + menu
|
||||
6. Export-data classes + exporters
|
||||
7. PyMOL renderer + PDB sidecar
|
||||
8. Params + Main-startup validation
|
||||
9. Wire into PredictPockets + RescorePockets routines
|
||||
10. Documentation (2 new MD files + cross-ref)
|
||||
11. Smoke test on real data
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Phase 1 — `TableData` STRING column-type refactor
|
||||
|
||||
**Goal:** Extend the export infrastructure to support string columns. No
|
||||
behavioral change to existing SAS-points export.
|
||||
|
||||
**Changes:**
|
||||
- `TableData.groovy` — add `ColumnType.STRING`; new method
|
||||
`default String getString(int rowIndex, int colIndex) { throw ... }`
|
||||
for STRING columns; default `getColumn` only meaningful for numeric.
|
||||
- `TableExporter.groovy`:
|
||||
- `writeCsv` — string branch with RFC 4180 quoting (escape `,`, `"`, newline).
|
||||
- `writeArrow` — `VarCharVector` for STRING columns; `buildSchema` updated.
|
||||
- `writeParquet` — `BINARY` with `LogicalTypeAnnotation.stringType()`;
|
||||
`RowDehydrator` updated.
|
||||
- `PointExportData.groovy` — no functional change; verify
|
||||
`getColumnType` doesn't accidentally return STRING (it currently can't —
|
||||
all columns are DOUBLE/INT).
|
||||
|
||||
**Tests:**
|
||||
- `TableExporterTest` — new round-trip tests for a synthetic table with
|
||||
one STRING column, one INT, one DOUBLE; csv, csv.gz, arrow, parquet.
|
||||
- CSV quoting edge cases: value contains `,`, `"`, `\n`.
|
||||
- Regression: existing `PointsExporterTest` still passes (no schema
|
||||
changes to SAS export).
|
||||
|
||||
**Commit:** `Extend TableData with STRING column type`
|
||||
|
||||
---
|
||||
|
||||
## Phase 2 — VdW radius helper + `GridGenerator` extension
|
||||
|
||||
**Goal:** Make per-atom VdW radii available; extend the existing grid
|
||||
sampler.
|
||||
|
||||
**Changes:**
|
||||
- New `src/main/groovy/cz/siret/prank/program/routines/predict/output/grid/VdwRadiusTable.groovy`:
|
||||
- `static double get(Atom atom)` — looks up via CDK `Elements` by
|
||||
element symbol; if `null`, falls back to Krypton's 2.02 Å (matches
|
||||
the existing pattern in `PatchedCdkNumericalSurface.groovy:54-56`).
|
||||
- Caches `String elementSymbol → double radius` in a
|
||||
`ConcurrentHashMap` (predict runs multi-threaded via
|
||||
`Dataset.process(...)`, so the cache is shared across threads;
|
||||
`computeIfAbsent` is safe and avoids races).
|
||||
- `GridGenerator.java` — extend
|
||||
`sampleGridPointsAroundAtoms(Atoms, edge, radius)` into a new variant
|
||||
`sampleGridPointsBetween(Atoms, edge, maxDist, double atomBuffer)`:
|
||||
- Keep existing method unchanged.
|
||||
- New method uses `Atoms.withKdTreeConditional()`, walks the lattice,
|
||||
for each cell computes `nearest = atoms.nearestSqrDist(p)`,
|
||||
`vdw = VdwRadiusTable.get(nearestAtom)`, drops if
|
||||
`sqrt(nearest) < vdw + atomBuffer` or `sqrt(nearest) > maxDist`.
|
||||
- Note: `nearestSqrDist` returns squared distance only; for the per-atom
|
||||
VdW check we need the actual nearest **atom**, not just distance.
|
||||
Use `Atoms.findNearest(point)` (`Atoms.java:244`) which returns the
|
||||
Atom; then compute `dist` once.
|
||||
|
||||
**Tests:**
|
||||
- `VdwRadiusTableTest` — known elements (C, N, O, S, P, Fe, Cu, Co)
|
||||
return non-null; Co/Ni/Cu use the Krypton fallback (2.02 Å); unknown
|
||||
symbol → fallback.
|
||||
- `GridGeneratorTest` (new file or existing if present) — synthetic
|
||||
small `Atoms` set, verify min/max filtering on cubic lattice
|
||||
produces expected count. Edge case: single-atom input.
|
||||
|
||||
**Commit:** `Add VdwRadiusTable and GridGenerator min/max sampler`
|
||||
|
||||
---
|
||||
|
||||
## Phase 3 — `PocketGrid` data class + fill strategies
|
||||
|
||||
**Goal:** Pure data + algorithms, no orchestration.
|
||||
|
||||
**Changes:**
|
||||
- `PocketGrid.groovy`:
|
||||
- Fields:
|
||||
- `Atoms allPoints` — kept grid points after filtering, wrapped as
|
||||
`Atoms` (since `Point implements Atom`). Reusing `Atoms` gives us
|
||||
`cutoutShell`, `withKdTree`, `getByID` for free.
|
||||
- `Map<Integer, Set<Integer>> pocketToPointIndices` (rank → indices
|
||||
into `allPoints`).
|
||||
- `Set<Integer> assignedIndices` (union of all per-pocket sets).
|
||||
- `double spacing`.
|
||||
- `Map<LatticeCoord, Integer> latticeIndex` — integer-lattice
|
||||
coordinate `(i, j, k)` → point index. Computed from
|
||||
`originX/Y/Z` + `spacing` during grid generation; **required by
|
||||
`MorphologicalCloser`** for `O(1)` neighbor lookups (without it
|
||||
morph closing degrades to all-pairs distance comparisons).
|
||||
- `LatticeCoord` is a small immutable value class with proper
|
||||
`equals`/`hashCode`.
|
||||
- Provides: `Atoms pointsForPocket(int rank)`,
|
||||
`Set<Integer> pocketsForPoint(int pointIndex)`,
|
||||
`Set<Integer> neighborsOf(int pointIndex, int connectivity)` (where
|
||||
`connectivity ∈ {6, 18, 26}` consults `latticeIndex`).
|
||||
- `fill/PocketShapeFiller.groovy` — interface:
|
||||
```groovy
|
||||
Set<Integer> fill(Set<Integer> rawShellIndices,
|
||||
List<Point> allPoints,
|
||||
double spacing,
|
||||
Params params)
|
||||
```
|
||||
- `fill/NoOpFiller.groovy` — returns input unchanged.
|
||||
- `fill/MorphologicalCloser.groovy`:
|
||||
- Operates on a `Map<(int,int,int) → Integer>` lattice index built from
|
||||
allPoints. For each iteration, scans candidate cells (immediate
|
||||
neighbors of assigned cells) and promotes those whose neighbor count
|
||||
≥ `pocket_grid_fill_min_neighbors`. Stops at fixed-point or
|
||||
`pocket_grid_fill_max_iters`.
|
||||
- Neighborhood: 26-connectivity (configurable later if needed).
|
||||
- `fill/ConvexHullFiller.groovy` — initial **stub** that throws
|
||||
`UnsupportedOperationException("convex_hull fill not yet implemented")`
|
||||
so users get a clear error if they select it. Real impl in a followup.
|
||||
|
||||
**Tests:**
|
||||
- `MorphologicalCloserTest` — synthetic shapes:
|
||||
- Pure sphere shell (3-cell-thick) → fills to solid sphere within
|
||||
`max_iters`.
|
||||
- U-shape with concavity → concavity filled in.
|
||||
- Disconnected components → not merged when far apart.
|
||||
- `NoOpFillerTest` — identity.
|
||||
|
||||
**Commit:** `Add PocketGrid data class and morph-closing fill strategy`
|
||||
|
||||
---
|
||||
|
||||
## Phase 4 — `PocketGridBuilder` (orchestration)
|
||||
|
||||
**Goal:** End-to-end grid generation + per-pocket assignment + fill.
|
||||
|
||||
**Changes:**
|
||||
- `PocketGridBuilder.groovy`:
|
||||
- `static PocketGrid build(Protein protein, List<? extends Pocket> pockets, Params params)`
|
||||
- Steps:
|
||||
1. Call the new sampler from Phase 2 →
|
||||
`Atoms allPoints` of kept lattice points + their lattice
|
||||
coordinates. Store both in the resulting `PocketGrid`.
|
||||
2. Build a KdTree on `allPoints` (`allPoints.withKdTree()`) — cheap
|
||||
once, reused by callers downstream.
|
||||
3. For each pocket `p`:
|
||||
- `p.surfaceAtoms.withKdTreeConditional()` (small set, KdTree
|
||||
built on demand).
|
||||
- Iterate `allPoints` once; for each point at index `i`, keep
|
||||
`i` in the **raw shell** set if
|
||||
`p.surfaceAtoms.nearestDist(allPoints.list[i]) <= params.pocket_grid_assign_cutoff`.
|
||||
O(|allPoints| × log|surfaceAtoms|) per pocket.
|
||||
- Pass the raw shell set + `latticeIndex` to
|
||||
`filler.fill(...)` → final per-pocket index set.
|
||||
4. Aggregate into `PocketGrid.pocketToPointIndices`; derive
|
||||
`assignedIndices` as the union.
|
||||
- Filler selection: dispatch on `params.pocket_grid_fill` enum value.
|
||||
- All `@CompileStatic` + `@Slf4j`.
|
||||
|
||||
**Tests:**
|
||||
- `PocketGridBuilderTest`:
|
||||
- 1fbl.pdb fixture (small, fast). Predict pockets via existing
|
||||
`PrankFacade`; build grid; assert:
|
||||
- `allPoints` count is reasonable for the bounding box (sanity check).
|
||||
- Each pocket has a non-empty point set after fill.
|
||||
- Multi-pocket overlap can occur (count of `(point, pocket)` pairs
|
||||
> count of distinct points).
|
||||
- Edge case: protein with 0 predicted pockets → `PocketGrid` with
|
||||
`allPoints` non-empty but `pocketToPointIndices` empty.
|
||||
|
||||
**Commit:** `Add PocketGridBuilder orchestrating grid + assignment + fill`
|
||||
|
||||
---
|
||||
|
||||
## Phase 5 — Descriptors infrastructure + initial 4
|
||||
|
||||
**Goal:** Pluggable descriptors with default `["volume"]`.
|
||||
|
||||
**Changes:**
|
||||
- `descriptors/PocketGridContext.groovy` — data class: `pocket`, `protein`,
|
||||
`gridPointsForPocket`, `pocketGrid`, `params`.
|
||||
- `descriptors/PocketDescriptor.groovy` — interface:
|
||||
```groovy
|
||||
String name()
|
||||
ColumnType columnType() // INT or DOUBLE
|
||||
double compute(PocketGridContext ctx)
|
||||
```
|
||||
(Return type `double` — INT descriptors cast at write time, mirroring
|
||||
TableData's int-as-double convention.)
|
||||
- `descriptors/PocketDescriptorRegistry.groovy`:
|
||||
- `static Map<String, PocketDescriptor> REGISTRY` — populated at
|
||||
classload with the 4 shipped descriptors.
|
||||
- `static PocketDescriptor get(String name)` — throws `PrankException`
|
||||
on unknown.
|
||||
- `static Set<String> knownNames()`.
|
||||
- `VolumeDescriptor.groovy` — `count(gridPoints) × spacing³`.
|
||||
- `SphericityDescriptor.groovy` — bounding-sphere variant. **Centroid is
|
||||
the centroid of the pocket's assigned grid points**, not
|
||||
`pocket.centroid` (which is derived from surfaceAtoms and would give
|
||||
misleading numbers for asymmetric pockets):
|
||||
- `gridCentroid = mean(p for p in ctx.gridPointsForPocket)`
|
||||
- `r = max(dist(p, gridCentroid))`
|
||||
- `V_sphere = (4/3) · π · r³`
|
||||
- `result = V_pocket / V_sphere` (≤ 1 by construction; clamp is
|
||||
defensive)
|
||||
- `NumResiduesDescriptor.groovy` — `pocket.residues.size()`.
|
||||
- `NumSurfaceAtomsDescriptor.groovy` — `pocket.surfaceAtoms.count`.
|
||||
|
||||
**Tests:**
|
||||
- Per-descriptor unit tests using a synthetic small `PocketGridContext`:
|
||||
- Volume: 8 grid cells @ 1Å spacing → V = 8 ų.
|
||||
- Sphericity: solid sphere of N cells → sphericity ≈ 1.0 (within
|
||||
tolerance for lattice quantization); flat disc → sphericity << 1.
|
||||
- num_residues / num_surface_atoms: stub pockets.
|
||||
- `PocketDescriptorRegistryTest` — known names resolve; unknown throws.
|
||||
|
||||
**Commit:** `Add pocket descriptor framework with 4 initial descriptors`
|
||||
|
||||
---
|
||||
|
||||
## Phase 6 — Export-data classes + exporters
|
||||
|
||||
**Goal:** Bridge `PocketGrid` and descriptor computations to `TableExporter`.
|
||||
|
||||
**Changes:**
|
||||
- `PocketGridExportData.groovy` (implements `TableData`):
|
||||
- Constructor takes `PocketGrid` and `boolean includeUnassigned`.
|
||||
- Materializes long-format rows during construction (point-pocket pairs);
|
||||
sort by `(pocket, x, y, z)`.
|
||||
- Columns: `x`, `y`, `z` (DOUBLE), `pocket` (INT).
|
||||
- `PocketDescriptorsExportData.groovy` (implements `TableData`):
|
||||
- Constructor takes pockets, descriptor results, `boolean includeProbability`.
|
||||
- Columns: `name` (STRING — uses Phase 1 refactor), `rank` (INT),
|
||||
`score` (DOUBLE), `probability` (DOUBLE, conditional),
|
||||
`center_x/y/z` (DOUBLE), then one column per descriptor (INT or
|
||||
DOUBLE per the descriptor's `columnType()`).
|
||||
- `PocketGridExporter.groovy`:
|
||||
- `static void tryExport(PocketGrid grid, String outdir, String label, Params params)`
|
||||
- Gated by `params.export_pocket_grid`; uses `params.pocket_grid_format`.
|
||||
- Writes `{outdir}/{label}_pocket_grid.{format}`.
|
||||
- `PocketDescriptorsExporter.groovy`:
|
||||
- `static void tryExport(List<? extends Pocket> pockets, PocketGrid grid, Protein protein, Params params, String outdir, String label)`
|
||||
- Derives `includeProbability` from the data itself:
|
||||
`pockets.any { !Double.isNaN(it.probaTP) }`. No extra parameter
|
||||
threaded through the wiring.
|
||||
- Iterates `params.pocket_descriptors`, computes each, builds
|
||||
`PocketDescriptorsExportData`, writes to file.
|
||||
|
||||
**Tests:**
|
||||
- `PocketGridExportDataTest` — assert row count, sort order, column types
|
||||
on a synthetic `PocketGrid`.
|
||||
- `PocketDescriptorsExportDataTest` — STRING column round-trips through
|
||||
CSV correctly (depends on Phase 1).
|
||||
- Integration smoke: small fixture, export to all 7 formats, re-read with
|
||||
the same reader paths used by `PointExportDataTest`.
|
||||
|
||||
**Commit:** `Add pocket grid and descriptors exporters`
|
||||
|
||||
---
|
||||
|
||||
## Phase 7 — PyMOL renderer + PDB sidecar
|
||||
|
||||
**Goal:** Visualization of the grid in PyMOL.
|
||||
|
||||
**Changes:**
|
||||
- New util in `PocketGridPymolRenderer.groovy`:
|
||||
- `static void render(PocketGrid grid, String outdir, String label, Params params)`
|
||||
- Writes:
|
||||
1. `{outdir}/visualizations/data/{label}_pocket_grid.pdb.gz` — one
|
||||
HETATM per `(point, pocket)` pair; pocket rank in residue-sequence
|
||||
column (cols 23-26); element column = `H` (or `D` for dummy).
|
||||
Mirrors `PredictionVisualizer.writeLabeledPointsPdb:44-56`.
|
||||
2. `{outdir}/visualizations/{label}_pocket_grid.pml`:
|
||||
- `load data/{label}_pocket_grid.pdb.gz, pocket_grid`
|
||||
- Per pocket rank N:
|
||||
- `create pocket_grid_<N>, pocket_grid and resi <N>`
|
||||
- `color <hex>, pocket_grid_<N>` (color via
|
||||
`PredictionVisualizer.generatePocketColors(numPockets)`)
|
||||
- `show spheres, pocket_grid_*`; `set sphere_scale, 0.3`
|
||||
- `delete pocket_grid` (drop the bulk object).
|
||||
- All paths via `Futils` for cross-platform safety.
|
||||
|
||||
**Tests:**
|
||||
- `PocketGridPymolRendererTest` — synthetic small `PocketGrid` (3 pockets,
|
||||
~20 points each); assert output files exist; spot-check PML contains
|
||||
`load`, `create pocket_grid_1`, `color`, `show spheres`.
|
||||
- Sanity: PDB output gzip-decompresses to valid HETATM records.
|
||||
|
||||
**Commit:** `Add PocketGridPymolRenderer with PDB sidecar`
|
||||
|
||||
---
|
||||
|
||||
## Phase 8 — Params + Main-startup validation
|
||||
|
||||
**Goal:** All 12 new params wired and validated.
|
||||
|
||||
**Changes:**
|
||||
- `Params.groovy` — add 12 `@RuntimeParam` fields with javadoc, defaults
|
||||
per spec table. Place near `export_points` / `export_points_format`.
|
||||
- `Main.groovy` — extend the existing param-validation block (around
|
||||
`:142-153`, same pattern used by cofactors):
|
||||
- `pocket_grid_format` ∈ allowed enumeration.
|
||||
- `pocket_grid_fill` ∈ {`morph_closing`, `convex_hull`, `none`}.
|
||||
- Every name in `pocket_descriptors` ∈ `PocketDescriptorRegistry.knownNames()`.
|
||||
- If `export_pocket_grid_pml` and `!export_pocket_grid` → throw
|
||||
`PrankException("export_pocket_grid_pml requires export_pocket_grid=true")`.
|
||||
|
||||
**Tests:**
|
||||
- `ParamsTest` — defaults match spec.
|
||||
- `MainTest` (or wherever cofactor validation is tested) — each of the 4
|
||||
validation failures triggers a fail-fast with a clear message.
|
||||
|
||||
**Commit:** `Add pocket grid params and startup validation`
|
||||
|
||||
---
|
||||
|
||||
## Phase 9 — Wire into routines
|
||||
|
||||
**Goal:** Call the new pipeline from prediction routines.
|
||||
|
||||
**Changes:**
|
||||
- `PredictPocketsRoutine.groovy`:
|
||||
- After score transformation and the existing
|
||||
`PointsExporter.tryExportPoints(...)` call, insert:
|
||||
```groovy
|
||||
PocketGrid grid = null
|
||||
if (params.export_pocket_grid || params.export_pocket_descriptors || params.export_pocket_grid_pml) {
|
||||
grid = PocketGridBuilder.build(item.protein, prediction.pockets, params)
|
||||
}
|
||||
PocketGridExporter.tryExport(grid, outdir, item.label, params)
|
||||
PocketDescriptorsExporter.tryExport(prediction.pockets, grid, item.protein, params, outdir, item.label)
|
||||
if (params.visualizations && params.export_pocket_grid_pml) {
|
||||
PocketGridPymolRenderer.render(grid, outdir, item.label, params)
|
||||
}
|
||||
```
|
||||
- `RescorePocketsRoutine.groovy` — identical hook at the analogous point.
|
||||
- Order is critical: build → grid file → descriptors (needs grid for
|
||||
volume) → PML (needs grid).
|
||||
|
||||
**Tests:**
|
||||
- `PredictPocketsRoutineTest` (extend existing) — run a small prediction
|
||||
with `-export_pocket_grid 1 -export_pocket_descriptors 1
|
||||
-export_pocket_grid_pml 1` on 1fbl.pdb; verify all four output files
|
||||
appear at the right paths.
|
||||
|
||||
**Commit:** `Wire pocket grid/descriptors/PML into prediction routines`
|
||||
|
||||
---
|
||||
|
||||
## Phase 10 — Documentation
|
||||
|
||||
**Goal:** User-facing docs.
|
||||
|
||||
**Changes:**
|
||||
- New `documentation/export-pocket-grid.md`:
|
||||
- Sections: Overview, Output file format (long format, sort order,
|
||||
formats), Algorithm summary (grid generation, assignment, fill),
|
||||
Params table, CLI examples, PyMOL visualization, Notes.
|
||||
- Mirrors the structure of `documentation/export-points.md`.
|
||||
- New `documentation/export-pocket-descriptors.md`:
|
||||
- Sections: Overview, Output file format, Descriptor catalog
|
||||
(volume, sphericity, num_residues, num_surface_atoms — with
|
||||
formulas), Extensibility (how to add a new descriptor), Params
|
||||
relevant to descriptors.
|
||||
- `documentation/export-points.md` — append a brief "See also" block at
|
||||
the end pointing to the two new docs.
|
||||
- `README.md` — single bullet in "What's new" for 2.7 (or whenever this
|
||||
ships) referencing the two new docs.
|
||||
|
||||
**Tests:** none (docs only).
|
||||
|
||||
**Commit:** `Document pocket grid and descriptors export`
|
||||
|
||||
---
|
||||
|
||||
## Phase 11 — Smoke test on real data
|
||||
|
||||
**Goal:** End-to-end on real proteins; eyeball outputs.
|
||||
|
||||
**Changes:** none.
|
||||
|
||||
**Verification (manual):**
|
||||
- Run on `distro/test_data/1fbl.pdb` with `-export_pocket_grid 1
|
||||
-export_pocket_descriptors 1 -export_pocket_grid_pml 1`.
|
||||
- Verify:
|
||||
- Grid CSV row counts and centroid statistics look right (small protein
|
||||
→ maybe 5k-15k assigned point-rows).
|
||||
- Descriptors CSV — volume in 50-2000 ų range per pocket; sphericity
|
||||
in [0, 1]; residue/atom counts non-zero.
|
||||
- PyMOL: open the PML; visually confirm grid points cluster near
|
||||
predicted pockets, colored consistently with the main pocket PML.
|
||||
- Run on one of the SwinSite test proteins (1tjw_A) for cross-method
|
||||
sanity.
|
||||
- No regressions in existing SAS-points export.
|
||||
|
||||
**Commit:** none (or "Smoke test results: …" in a project log under `local/`).
|
||||
|
||||
---
|
||||
|
||||
## Risks / clarifications
|
||||
|
||||
Notes from the plan review that don't require code changes but are worth
|
||||
flagging:
|
||||
|
||||
- **Sphericity clamp is redundant** — `V_pocket ≤ V_bounding_sphere`
|
||||
always (covering sphere by construction). The `[0, 1]` clamp is purely
|
||||
defensive; keep it.
|
||||
- **Heavy Phase 4 integration test** — `PocketGridBuilderTest` uses
|
||||
`PrankFacade` to predict pockets, which is slow. Keep the integration
|
||||
test but also add a fast unit test that constructs `Pocket` instances
|
||||
manually with a synthetic `surfaceAtoms` set.
|
||||
- **Empty `pocket_descriptors`** — `-pocket_descriptors ""` (empty list)
|
||||
is supported: descriptors file emits only the base columns
|
||||
(`name, rank, score, [probability,] center_x/y/z`). Add a regression
|
||||
test in Phase 6.
|
||||
- **PDB residue-sequence column** is 4 chars (cols 23-26) → pockets are
|
||||
capped at rank 9999 in the PML output. Real pockets stay well under
|
||||
100; document the limit in the PML renderer's javadoc.
|
||||
- **CSV string quoting** added in Phase 1 fires only for STRING columns.
|
||||
Existing DOUBLE/INT writes stay unquoted — no CSV-format drift for
|
||||
SAS-points export. Mention this in the Phase 1 commit message.
|
||||
|
||||
## Out-of-scope (followups noted in spec)
|
||||
|
||||
- Per-residue descriptors.
|
||||
- `convex_hull` filler real implementation.
|
||||
- Pocket overlap matrix output file.
|
||||
- Long-format SAS-points export.
|
||||
@@ -1,353 +0,0 @@
|
||||
# Spec — Pocket grid points export + per-pocket descriptors
|
||||
|
||||
Status: spec, not plan. Author decisions captured in two rounds:
|
||||
|
||||
- **Initial 6 Qs:** (1) long-format grid CSV, (2) morph-closing proxy with
|
||||
strategy switch, (3) defaults OK, (4) separate descriptors file,
|
||||
(5) no standalone command, (6) initial descriptor menu accepted.
|
||||
- **20-audit cross-check vs. code:** see below; all 20 decisions are applied
|
||||
in this revision.
|
||||
|
||||
## Goals
|
||||
|
||||
Two new opt-in outputs, both produced by any `predict` or `rescore` run,
|
||||
plus an optional PyMOL visualization:
|
||||
|
||||
1. **`{outdir}/{name}_pocket_grid.{format}`** — regular 3D grid of points
|
||||
covering the empty space around the protein, in **long format**: one row
|
||||
per `(point, pocket)` pair. By default only **assigned** points are
|
||||
written (one or more rows per point, one per pocket they belong to).
|
||||
Unassigned points (`pocket = 0`) can be opted in with
|
||||
`pocket_grid_include_unassigned`.
|
||||
2. **`{outdir}/{name}_pocket_descriptors.{format}`** — one row per predicted
|
||||
pocket with score, rank, centroid, and an extensible list of
|
||||
geometric/chemical descriptors (volume from grid-point count, plus
|
||||
others).
|
||||
3. **`{outdir}/visualizations/{name}_pocket_grid.pml`** — optional PyMOL
|
||||
visualization, produced by a new renderer.
|
||||
|
||||
Both data files reuse the existing `TableExporter` (csv / csv.gz / csv.zst /
|
||||
arrow / arrow.gz / arrow.zst / parquet), matching the SAS-points export
|
||||
pattern documented at `documentation/export-points.md`. Decoupled from the
|
||||
prediction algorithm: P2Rank still scores SAS points exactly as today; the
|
||||
grid is a post-prediction geometric overlay used only for descriptor
|
||||
computation.
|
||||
|
||||
## Prerequisite refactor
|
||||
|
||||
**`TableData` and the three writers (`writeCsv`/`writeArrow`/`writeParquet`)
|
||||
must be extended to support a `STRING` column type** (audit #1). Currently
|
||||
`TableData` only accepts `DOUBLE` and `INT`
|
||||
(`src/main/groovy/cz/siret/prank/program/routines/predict/output/TableData.groovy:13-15`).
|
||||
Without this, the descriptors file's `name` column cannot be written.
|
||||
|
||||
Scope of the refactor:
|
||||
- Add `ColumnType.STRING` and a `String[] getStringColumn(int)` (or boxed
|
||||
`Object` access path) to `TableData`.
|
||||
- Extend `writeCsv` to emit strings with proper CSV quoting (escape `,`,
|
||||
`"`, newlines per RFC 4180).
|
||||
- Extend `writeArrow` to use `VarCharVector` for string columns.
|
||||
- Extend `writeParquet` to use `BINARY` (UTF8) primitive type for string
|
||||
columns.
|
||||
- Update `PointExportData` to declare its columns via the new type system
|
||||
(no functional change for SAS-point export — no strings used today).
|
||||
|
||||
## Algorithms
|
||||
|
||||
### Grid generation (once per protein)
|
||||
|
||||
1. Build a KdTree over `protein.proteinAtoms`
|
||||
(`protein.proteinAtoms.withKdTreeConditional()`). Note: when
|
||||
`CofactorHandler` is enabled, cofactor atoms are already merged into
|
||||
`proteinAtoms` (`Protein.groovy:571-583`) — no separate union step
|
||||
needed (audit #4).
|
||||
2. Bounding box around `protein.proteinAtoms`, expanded by
|
||||
`pocket_grid_max_dist` in every direction (reuses
|
||||
`Box.aroundAtoms(...).withMargin(...)`).
|
||||
3. Walk a regular cubic lattice with edge `pocket_grid_spacing` inside the
|
||||
box (reuses `GridGenerator.forBox(box, edge)`).
|
||||
4. Per-atom VdW radius via CDK `Elements` (audit #2). Reuse the same
|
||||
accessor pattern as `PatchedCdkNumericalSurface` — when CDK returns
|
||||
`null` for an element, fall back to the Krypton proxy (2.02 Å), matching
|
||||
the existing null-VdW workaround. Implemented as a small helper
|
||||
`VdwRadiusTable.get(Atom) → double`.
|
||||
5. For each lattice point:
|
||||
- **drop** if `min_dist_to_proteinAtoms < vdw_radius(nearest_atom) + pocket_grid_atom_buffer`
|
||||
— overlaps the protein;
|
||||
- **drop** if `min_dist_to_proteinAtoms > pocket_grid_max_dist` — too
|
||||
far from the surface;
|
||||
- **keep** otherwise.
|
||||
|
||||
**Implementation note** (audit #3): extend
|
||||
`GridGenerator.sampleGridPointsAroundAtoms` (`GridGenerator.java:157-172`)
|
||||
to accept both `minDist` (semantically per-atom: VdW + buffer) and
|
||||
`maxDist`. The current method already does the `maxDist` side; the
|
||||
extension is the per-atom-VdW exclusion check.
|
||||
|
||||
### Per-pocket assignment (multi-valued)
|
||||
|
||||
1. For each pocket `p`, take all kept grid points within
|
||||
`pocket_grid_assign_cutoff` of any atom in `p.surfaceAtoms`. That's the
|
||||
*raw shell* — analogous to `SwinSiteLoader`'s `cutoutShell` at
|
||||
`SwinSiteLoader.groovy:92-100`.
|
||||
2. **Shape fill** (pluggable via `pocket_grid_fill`, runs **per-pocket** —
|
||||
each pocket's raw shell is dilated independently, audit #6):
|
||||
- `morph_closing` (default): morphological closing on the lattice. Mark
|
||||
any unassigned lattice cell whose 6-/18-/26-neighborhood contains
|
||||
≥ `pocket_grid_fill_min_neighbors` already-assigned cells; iterate
|
||||
until stable or `pocket_grid_fill_max_iters` reached. Integer-grid
|
||||
native, no extra deps.
|
||||
- `convex_hull`: build the 3D convex hull of the raw shell (Quickhull or
|
||||
equivalent — TBD at plan time); include every lattice point inside.
|
||||
Exact; pulls a hull dependency.
|
||||
- `none`: keep the raw shell exactly.
|
||||
|
||||
The `PocketShapeFiller` strategy interface (see Extensibility) makes
|
||||
adding alternatives a single-file change.
|
||||
3. A grid point may belong to multiple pockets. In the output file each
|
||||
`(point, pocket)` membership is a separate row.
|
||||
|
||||
### Descriptor computation
|
||||
|
||||
After assignment, for each pocket and each name in `pocket_descriptors`,
|
||||
look up the registered `PocketDescriptor` and compute. See "Initial
|
||||
descriptor menu" below.
|
||||
|
||||
## New params (additions to `Params.groovy`)
|
||||
|
||||
All carry `@RuntimeParam` (audit #7) — runtime / output concerns, not
|
||||
training.
|
||||
|
||||
Allowed values for `pocket_grid_format` (audit #8, enumerated explicitly to
|
||||
avoid drift): `csv`, `csv.gz`, `csv.zst`, `arrow`, `arrow.gz`, `arrow.zst`,
|
||||
`parquet`.
|
||||
|
||||
| Param | Default | Notes |
|
||||
|---|---|---|
|
||||
| `export_pocket_grid` | `false` | gate for the grid-points file |
|
||||
| `export_pocket_descriptors` | `false` | gate for the descriptors file |
|
||||
| `export_pocket_grid_pml` | `false` | gate for the PyMOL visualization; requires `export_pocket_grid=true` (fail-fast otherwise, audit #16) |
|
||||
| `pocket_grid_format` | `"csv"` | one of the enumerated values above |
|
||||
| `pocket_grid_include_unassigned` | `false` | include `pocket = 0` rows in the grid file |
|
||||
| `pocket_grid_spacing` | `1.0` (Å) | lattice edge; volume scales with this³ |
|
||||
| `pocket_grid_max_dist` | `6.0` (Å) | upper bound: nearest-atom distance to keep a grid point |
|
||||
| `pocket_grid_atom_buffer` | `0.5` (Å) | additive buffer on per-atom VdW exclusion: keep if `dist > vdw_radius(atom) + buffer` (audit #9) |
|
||||
| `pocket_grid_assign_cutoff` | `4.5` (Å) | membership cutoff vs. `pocket.surfaceAtoms`; matches `SwinSiteLoader.SURFACE_ATOMS_CUTOFF` |
|
||||
| `pocket_grid_fill` | `"morph_closing"` | one of `morph_closing`, `convex_hull`, `none` |
|
||||
| `pocket_grid_fill_min_neighbors` | `3` | morph_closing only — neighbor count threshold |
|
||||
| `pocket_grid_fill_max_iters` | `5` | morph_closing only — guard against runaway dilation |
|
||||
| `pocket_descriptors` | `["volume"]` | list-param; each name selects a registered descriptor |
|
||||
|
||||
**Validation** (audit #10): unknown values in `pocket_descriptors`,
|
||||
`pocket_grid_fill`, and `pocket_grid_format`, plus the
|
||||
`export_pocket_grid_pml ⇒ export_pocket_grid` invariant, are checked at
|
||||
Main startup. Same pattern as the cofactor validation at
|
||||
`Main.groovy:142-153`.
|
||||
|
||||
## Output schemas
|
||||
|
||||
### `{name}_pocket_grid.{format}` (long format)
|
||||
|
||||
| Column | Type | Description |
|
||||
|---|---|---|
|
||||
| `x`, `y`, `z` | f64 | grid point coordinate |
|
||||
| `pocket` | i32 | pocket rank this row belongs to; `0` only present if `pocket_grid_include_unassigned` is on |
|
||||
|
||||
**Sort order** (audit #5): rows sorted by `pocket` asc, then `x` asc,
|
||||
`y` asc, `z` asc. `pocket=0` (if enabled) goes last so readers that only
|
||||
care about assigned points can stop early. Deterministic and reproducible
|
||||
across runs.
|
||||
|
||||
### `{name}_pocket_descriptors.{format}`
|
||||
|
||||
Base columns (always present), then one column per name in
|
||||
`pocket_descriptors`:
|
||||
|
||||
| Column | Type | Source |
|
||||
|---|---|---|
|
||||
| `name` | string | `pocket.name` (requires `TableData` STRING support, prerequisite refactor) |
|
||||
| `rank` | i32 | `pocket.rank` |
|
||||
| `score` | f64 | `pocket.score` |
|
||||
| `probability` | f64 | from score transformer; **column omitted entirely** when no transformer ran |
|
||||
| `center_x`, `center_y`, `center_z` | f64 | `pocket.centroid` |
|
||||
| `<descriptor>` | f64 / i32 | one per requested descriptor |
|
||||
|
||||
**`probability` column inclusion** (audit #19): controlled by a constructor
|
||||
flag on the export-data class, mirroring `PointExportData.includeScore`
|
||||
(`PointExportData.groovy:47-48`). Schema is fixed at construction; no
|
||||
runtime branching on row write.
|
||||
|
||||
## Initial descriptor menu
|
||||
|
||||
Shipped registry:
|
||||
|
||||
| Name | Output | Definition |
|
||||
|---|---|---|
|
||||
| `volume` | f64 (ų) | `\|assigned grid points\| × pocket_grid_spacing³` |
|
||||
| `sphericity` | f64 in [0, 1] | `V_pocket / V_bounding_sphere`, where `V_bounding_sphere = (4/3)π · r³` with `r = max(\|p − centroid\|)` over the pocket's grid points. Quantization-free; 1 = perfect sphere. (audit #18 — replaces the boundary-area formula) |
|
||||
| `num_residues` | i32 | `pocket.residues.size()` (reuses existing accessor, audit #17) |
|
||||
| `num_surface_atoms` | i32 | `pocket.surfaceAtoms.count` |
|
||||
|
||||
`volume` is the default value of `pocket_descriptors`. Others must be opted
|
||||
in by name.
|
||||
|
||||
## Extensibility
|
||||
|
||||
All new Groovy classes carry `@CompileStatic` and `@Slf4j` per repo
|
||||
convention (audit #20).
|
||||
|
||||
```
|
||||
src/main/groovy/cz/siret/prank/program/routines/predict/output/descriptors/
|
||||
├── PocketDescriptor.groovy # interface: String name(); Object compute(PocketGridContext ctx)
|
||||
├── PocketDescriptorRegistry.groovy # name → factory; selection from Params.pocket_descriptors
|
||||
├── VolumeDescriptor.groovy
|
||||
├── SphericityDescriptor.groovy
|
||||
├── NumResiduesDescriptor.groovy
|
||||
└── NumSurfaceAtomsDescriptor.groovy
|
||||
|
||||
src/main/groovy/cz/siret/prank/program/routines/predict/output/grid/
|
||||
├── PocketGrid.groovy # data: kept points + per-pocket assignment map
|
||||
├── PocketGridBuilder.groovy # generation + assignment + fill orchestration
|
||||
├── VdwRadiusTable.groovy # Atom → double, via CDK Elements + Krypton fallback
|
||||
└── fill/
|
||||
├── PocketShapeFiller.groovy # interface: Set<Point> fill(rawShell, allPoints, params)
|
||||
├── MorphologicalCloser.groovy
|
||||
├── ConvexHullFiller.groovy # may be stub initially
|
||||
└── NoOpFiller.groovy
|
||||
```
|
||||
|
||||
`PocketGridContext` exposes: the per-pocket grid-point set, the global
|
||||
grid, the pocket, the protein, and `Params`. Adding a descriptor = drop one
|
||||
file in `descriptors/` + register the name. Adding a fill strategy = drop
|
||||
one file in `fill/` + extend the enum.
|
||||
|
||||
## Pocket grid visualization
|
||||
|
||||
Output:
|
||||
- `{outdir}/visualizations/data/{name}_pocket_grid.pdb.gz` — one HETATM per
|
||||
grid point; pocket rank stored in the residue-sequence column (mirrors
|
||||
`writeLabeledPointsPdb` at `PredictionVisualizer.groovy:44-56`); generated
|
||||
in long format (one HETATM per `(point, pocket)` pair so PyMOL can split
|
||||
by residue).
|
||||
- `{outdir}/visualizations/{name}_pocket_grid.pml` — small PyMOL script
|
||||
that `load`s the PDB and colors by residue.
|
||||
|
||||
This **PDB-sidecar approach** (audit #11) replaces the earlier inline
|
||||
`pseudoatom`-per-point design — at ~20k–100k grid points the inline
|
||||
approach would take seconds-to-minutes for PyMOL to load.
|
||||
|
||||
**Renderer:**
|
||||
`src/main/groovy/cz/siret/prank/program/visualization/renderers/PocketGridPymolRenderer.groovy`,
|
||||
parallel to `PymolRenderer` / `ChimeraXRenderer`. Takes the in-memory
|
||||
`PocketGrid` (not the CSV file — the grid is already in memory and the PDB
|
||||
sidecar is derived from it, audit #15 makes the format constraint moot).
|
||||
|
||||
**Colors:** reuse `PredictionVisualizer.generatePocketColors(numPockets)`
|
||||
(`PredictionVisualizer.groovy:38`) so the grid PML matches the main pocket
|
||||
PML palette (audit #13).
|
||||
|
||||
**Layout in the PML:**
|
||||
- `load .../data/{name}_pocket_grid.pdb.gz, pocket_grid`
|
||||
- Per pocket: `create pocket_grid_<rank>, pocket_grid and resi <rank>` and
|
||||
`color <hex>, pocket_grid_<rank>`.
|
||||
- `show spheres, pocket_grid_*` with small `sphere_scale` (e.g. 0.3).
|
||||
|
||||
**Path layout** (audit #12): data files (`_pocket_grid.{fmt}`,
|
||||
`_pocket_descriptors.{fmt}`) at the root of `outdir`, matching the SAS
|
||||
points export. Visualization artifacts (`_pocket_grid.pdb.gz`,
|
||||
`_pocket_grid.pml`) under `visualizations/` / `visualizations/data/`,
|
||||
matching the existing main-PML layout.
|
||||
|
||||
**Master visualization switch** (audit #14): respects `visualizations=false`
|
||||
— if visualizations are globally off, the grid PML + PDB sidecar are
|
||||
skipped even when `export_pocket_grid_pml=true`. Single off-switch for ALL
|
||||
viz.
|
||||
|
||||
**Independence from `vis_renderers`:** the new renderer has its own gate
|
||||
(`export_pocket_grid_pml`) and does *not* tie into the
|
||||
`["pymol", "chimerax"]` renderer list. The grid PML is a power-user output
|
||||
that shouldn't be implicit. Easy to revisit if usage patterns argue
|
||||
otherwise.
|
||||
|
||||
## CLI examples
|
||||
|
||||
```bash
|
||||
# grid + default descriptors (just volume), parquet
|
||||
prank predict -f protein.pdb -export_pocket_grid 1 -export_pocket_descriptors 1 \
|
||||
-pocket_grid_format parquet
|
||||
|
||||
# custom descriptor list + tighter grid
|
||||
prank predict dataset.ds -export_pocket_descriptors 1 \
|
||||
-pocket_descriptors "volume,sphericity,num_residues,num_surface_atoms" \
|
||||
-pocket_grid_spacing 0.75 -pocket_grid_max_dist 5
|
||||
|
||||
# rescore with grid export, arrow.zst
|
||||
prank rescore fpocket.ds -export_pocket_grid 1 -pocket_grid_format arrow.zst
|
||||
|
||||
# switch fill strategy (e.g. for ablation studies)
|
||||
prank predict -f protein.pdb -export_pocket_grid 1 -pocket_grid_fill none
|
||||
|
||||
# grid CSV + PyMOL visualization
|
||||
prank predict -f protein.pdb -export_pocket_grid 1 -export_pocket_grid_pml 1
|
||||
|
||||
# also keep the unassigned envelope (e.g. for debugging the grid generator)
|
||||
prank predict -f protein.pdb -export_pocket_grid 1 -pocket_grid_include_unassigned 1
|
||||
```
|
||||
|
||||
## Files touched (preview, plan will refine)
|
||||
|
||||
New:
|
||||
- `descriptors/` and `grid/` packages as above
|
||||
- `PocketGridExporter.groovy` + `PocketDescriptorsExporter.groovy` next to
|
||||
`PointsExporter.groovy`
|
||||
- `PocketGridExportData` / `PocketDescriptorsExportData` data classes next
|
||||
to `PointExportData.groovy`
|
||||
- `PocketGridPymolRenderer.groovy` under `program/visualization/renderers/`
|
||||
- Tests next to each new class
|
||||
- **`documentation/export-pocket-grid.md`** — user-facing how-to for the
|
||||
grid file: algorithm summary, sort order, params, format options, CLI
|
||||
examples, PyMOL visualization details
|
||||
- **`documentation/export-pocket-descriptors.md`** — descriptors file
|
||||
format, descriptor catalog with formulas, extensibility for adding new
|
||||
descriptors
|
||||
|
||||
Modified:
|
||||
- `Params.groovy` — 11 new `@RuntimeParam` fields (table above)
|
||||
- `Main.groovy` — startup validation hooks for `pocket_descriptors`,
|
||||
`pocket_grid_fill`, `pocket_grid_format`, and the
|
||||
`export_pocket_grid_pml ⇒ export_pocket_grid` invariant
|
||||
- `PredictPocketsRoutine.groovy` + `RescorePocketsRoutine.groovy` — wire
|
||||
the new exporters and renderer at the same hook point as
|
||||
`PointsExporter.tryExportPoints`
|
||||
- `TableData.groovy` + `TableExporter.groovy` + `PointExportData.groovy` —
|
||||
STRING column-type support (prerequisite refactor)
|
||||
- `GridGenerator.java` — extend `sampleGridPointsAroundAtoms` to accept a
|
||||
per-atom minDist (VdW + buffer) alongside the existing maxDist
|
||||
- `documentation/export-points.md` — cross-reference the two new docs from
|
||||
the "See also" section
|
||||
|
||||
Not touched:
|
||||
- `PredictionSummary.toCSV()` / `predictions.csv` schema — descriptors live
|
||||
in their own file.
|
||||
- `PocketStats.realVolumeApprox` — keep as-is; SwinSite still uses it. The
|
||||
new grid-volume is independent.
|
||||
|
||||
## Scope notes
|
||||
|
||||
- Cofactor atoms participate in the bounding box and the VdW exclusion via
|
||||
their inclusion in `protein.proteinAtoms`. They do **not** affect
|
||||
`pocket.surfaceAtoms` membership for assignment — the existing pocket
|
||||
surface-atom set defines membership.
|
||||
- Outputs are computed *after* score transformation so `probability` is
|
||||
available when applicable.
|
||||
- `breaking-changes.md` (2.7 or whenever this ships) gets a bullet for the
|
||||
new param family and the new output files.
|
||||
|
||||
## Followups / not in this spec
|
||||
|
||||
- Per-residue descriptors (different file, different aggregation).
|
||||
- Pocket overlap matrix (cheap byproduct of the long-format grid file —
|
||||
group-by `pocket` and intersect, or compute eagerly and dump as
|
||||
`{name}_pocket_overlap.csv`).
|
||||
- Long-format SAS-points export (parallel change, separate spec).
|
||||
- Real-3D-hull `convex_hull` filler (initial ship may stub it).
|
||||
@@ -204,10 +204,7 @@ class Main implements Parametrized, Writable {
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Fail-fast validation for the pocket-grid export feature
|
||||
* (see misc/todo/pocket_grid/SPEC.md).
|
||||
*/
|
||||
/** Fail-fast validation for the pocket-grid export feature. */
|
||||
private void validatePocketGridParams() {
|
||||
// pocket_grid_format must be one of the values supported by TableExporter.
|
||||
Set<String> allowedFormats = ['csv', 'csv.gz', 'csv.zst',
|
||||
|
||||
@@ -156,8 +156,7 @@ class PredictPocketsRoutine extends Routine {
|
||||
new GetcleftOutputCalculator().generateGetcleftSasPdbFiles(pair.prediction, outdir)
|
||||
}
|
||||
|
||||
// Pocket grid + descriptors export + optional PyMOL viz
|
||||
// (see misc/todo/pocket_grid/SPEC.md).
|
||||
// Pocket grid + descriptors export + optional PyMOL viz.
|
||||
PocketGridOutputs.exportIfEnabled(pair.prediction, item.protein, outdir, item.label)
|
||||
}
|
||||
|
||||
|
||||
@@ -129,8 +129,7 @@ class RescorePocketsRoutine extends Routine {
|
||||
// Export SAS points with feature vectors and scores (pocket points only in rescore mode)
|
||||
PointsExporter.tryExportPoints(rescorer.exportData, outdir, item.label)
|
||||
|
||||
// Pocket grid + descriptors export + optional PyMOL viz
|
||||
// (see misc/todo/pocket_grid/SPEC.md).
|
||||
// Pocket grid + descriptors export + optional PyMOL viz.
|
||||
PocketGridOutputs.exportIfEnabled(pair.prediction, item.protein, outdir, item.label)
|
||||
}
|
||||
|
||||
|
||||
Reference in New Issue
Block a user