p2rank

mirror of https://github.com/rdk/p2rank.git synced 2026-06-04 12:44:24 +08:00

Author	SHA1	Message	Date
rdk	af1d6eeb18	Drop frozen pocket-grid PLAN/SPEC; refine audit punch-list PLAN.md and SPEC.md were pre-implementation design docs for the pocket-grid feature. The feature has shipped, so they're frozen artifacts in the active todo/ namespace. Delete them and strip the three "see SPEC.md" comments that pointed at SPEC.md from Main.groovy and the predict/rescore routines. Also reassess the PyMOL rank-gap entry in the audit: P2Rank ranks pockets contiguously throughout the predict path and all in-tree loaders (except SiteHoundLoader), so the previously-listed "renderer ignores rank gaps" is cosmetic-only (empty objects in the Models panel for small pockets whose filled BitSet ended up empty). Downgrade to a parity nit under Inconsistencies; promote the PUResNet surfaceAtoms re-linking to the Top-5.	2026-05-20 19:42:47 +02:00
rdk	556ea9faa8	Cleanup batch: BitSet reuse + idiom touches (no user-visible perf change) Pure technical cleanup, not a perf win — savings are microseconds per protein. The useful artifact is item 2 below: bytecode verification that the existing Groovy/BitSet workaround is still needed. - MorphologicalCloser: pre-allocate the two per-iteration BitSets and reuse via swap+clear. Zero BitSet allocations inside the loop (vs two per iter previously). - PocketGridRows: tried replacing the manual BitSet-OR loop with a direct .or(bs) call. Bytecode inspection showed Groovy dispatches it under @CompileStatic to DefaultGroovyMethods.or(BitSet, BitSet) which RETURNS a new BitSet rather than mutating in place — test failed. Reverted; updated the comment with the verification and the escape hatch (move the block into a Java helper if we ever want BitSet#or). - PocketGridChimeraXRenderer: palette color loop iterates the present rank set (perPocketBasenames.keySet()) instead of dense 1..maxRank, matching the layer loops below and avoiding unreferenced color definitions for missing ranks. - PocketDescriptorsRows: replaced `pockets.any { ... }` Groovy closure with manual loop under @CompileStatic — consistent with the rest of the constructor and one fewer per-protein closure allocation. - DescriptorListValidator: HashSet → LinkedHashSet for the dedup tracker. Tiny UX improvement (deterministic order in any future multi-duplicate debug output). Output byte-identical end-to-end; full test suite green.	2026-05-19 19:59:29 +02:00
rdk	6c1e394ea5	Extract three framework helpers (centroid, schema, registry) Tier 3 reuse refactor: collapse ~120 lines of duplication across the descriptor framework. Composition over inheritance throughout — no public API change, no behavior change (smoke run output byte-identical). NamedRegistryHelper<T> (new, generic): - Composition helper for name-keyed registries. Both descriptor registries (per-pocket and per-grid-point) now delegate register/ unregister/get/knownNames to one shared helper, keeping their public static API. Per-registry invariants (the size/dup-cols check) stay in each registry's private validate() and plug in via a Consumer<T> hook. PocketDescriptorRegistry shrinks ~80→55 lines; PocketGridPointDescriptorRegistry shrinks ~75→55. DescriptorSchemaHelper.appendColumns (new): - Single point where the "{name}.{col}" multi-column header rule lives. Both PocketDescriptorsRows and PocketGridRows route schema build through it. Interface-agnostic (takes name + colNames + colTypes directly), so it works for both descriptor types without coupling. GridPointStats.centroid (new): - Static helper for the centroid loop duplicated across SphericityDescriptor, RadiusOfGyramentDescriptor, and PrincipalMomentsDescriptor. Three descriptors each had the same BitSet → allPoints centroid pass; now one method call. Skipped from the same plan (per Tier-3+4 reconsideration): - vis_renderers validator merge (item 13): semantic mismatch (null handling, error wording) makes the abstraction lossy. - AbstractVolsiteGridPointDescriptor base (item 16): two impls is below the threshold where a shared base earns its keep. - Pre-classify protein atoms, per-point cache, Params hoist (items 18-20): real wins on the volsite hot path but speculative without a benchmarked workload. Defer until someone reports volsite descriptor compute as a bottleneck.	2026-05-19 16:23:09 +02:00
rdk	6fad858bc6	Audit follow-ups: bug fix, doc refresh, exception taxonomy, test hardening Bug fix: - PrincipalMomentsDescriptor.clampNonNegative now also clamps NaN. The v<0 check was false for NaN, so a NaN eigenvalue (possible if a future code path bypasses GridGenerator.isFiniteBox) would have propagated to the CSV output. Doc refresh: - breaking-changes.md: 2.6 entry for the multi-column descriptor migration + the -vis_pocket_grid / pocket_grid_vis_* renames. - export-pocket-descriptors.md: step 4 rewrites a self-contradicting rationale — adding to the default list IS a breaking change for index-based parsers; recommends parse-by-name + breaking-changes.md note for future additions. - export-pocket-grid.md: added "Adding a new per-grid-point descriptor" recipe (parallel to the per-pocket one); unified √3/2 precision to 0.866 across docs and Params.groovy. - README.md: added an "Opt-in tabular exports" subsection mentioning -export_pocket_descriptors, -export_pocket_grid, -vis_pocket_grid. - testsets.sh "Full descriptor menu" now lists all seven shipped descriptors (was six). Exception taxonomy: - PocketDescriptorsRows.groovy and PocketGridBuilder.java now throw PrankException (was IllegalArgumentException) for user-facing config errors, matching the rest of the codebase. Registry hardening: - Both PocketDescriptorRegistry and PocketGridPointDescriptorRegistry now assert columnNames.size() == columnTypes.size() in register(). A future descriptor with mismatched lists fails fast at class-load. Quality fixes: - PocketGridRows.getColumn uses BASE_COLS-1 instead of literal 3 for the pocket column. Removed dead 2-arg PocketGridRows constructor (only 3 test sites used it; now inlined). - PocketGridPointContext gets a compact-constructor validator that rejects negative pointIndex/pocketRank, limiting blast radius of an int-arg swap. Test hardening: - VolsiteSmoothGridPointDescriptorTest + VolsiteGridPointDescriptorTest now pin sigma/radius in @BeforeEach AND restore in @AfterEach, so the Params singleton is clean for subsequent test classes. - New tests: HIS ND1 double-flag (single atom setting donor+acceptor), PrincipalMoments at cardinality=2, PrincipalMoments two coincident points, GridGenerator NaN-box throw, PocketDescriptorRegistry register/unregister round-trip, MorphologicalCloser maxIters=1. - Renamed respectsMaxIters → maxItersZeroIsNoOp (the test only covered the maxIters=0 case despite the general name); added maxIters=1 companion that verifies one iteration of fill actually runs. - Extracted RendererTestFixtures.tinyGrid (was byte-identical in both renderer test files); unified the volsite atomAt signatures so the parameter order can't get swapped between the two volsite tests.	2026-05-19 15:36:12 +02:00
rdk	cb6f7f75eb	Doc / comment refresh after the multi-column descriptor migration - Params.groovy: pocket_descriptors javadoc now lists all 7 shipped descriptors (was: 6); softens the "essentially free" rationale to acknowledge principal_moments' small eigendecomposition cost. - PocketDescriptorsTest.groovy: class javadoc "six shipped descriptors" → "seven", names principal_moments alongside the rest. - export-pocket-descriptors.md: "6 base shipped descriptors use this adapter" → "6 of 7 use the adapter; principal_moments (multi-column) implements PocketDescriptor directly". Removes a misleading count. - export-pocket-{grid,descriptors}.md: default-list rationale no longer claims adding descriptors is "essentially free" — clarifies that grid-derived scalars are cheap once the grid is built but principal_moments adds a small per-pocket compute on top, still negligible vs the grid build. Caught by deep audit of 60220d7a..73e7c9df focused on doc/comment drift after the recent multi-column interface migration.	2026-05-19 14:41:03 +02:00
rdk	73e7c9df9a	Per-pocket descriptors: multi-column interface + PrincipalMomentsDescriptor Unifies the per-pocket descriptor framework with the per-grid-point framework: same shape (name + columnNames + columnTypes + double[] compute), same multi-column "{name}.{col}" header convention, same public register / unregister / dup-column-check registry. Shipped as breaking change behind the same -pocket_descriptors knob. Interface change: String name(); List<String> columnNames(); List<ColumnType> columnTypes(); double[] compute(PocketGridContext); boolean needsGrid(); // unchanged Scalar descriptors stay one-liners via the new AbstractScalarPocketDescriptor adapter (name + scalarType + computeScalar). The 6 existing descriptors migrated; behavior and output byte-identical to before. New descriptor: PrincipalMomentsDescriptor (3 × DOUBLE) — the three eigenvalues of the pocket grid points' gyration tensor, sorted descending. Implementation uses Apache Commons Math 3 EigenDecomposition. Shape signature complement to sphericity / radius_of_gyration; sum equals radius_of_gyration² (verified in test). Added to the default -pocket_descriptors list. Default list reordered to put num_* (cheap, integer-valued) first, then geometric scalars, then principal_moments: num_residues, num_surface_atoms, num_grid_points, volume, sphericity, radius_of_gyration, principal_moments Tests: - 5 new PrincipalMomentsDescriptor tests (cube isotropy, rod-shape eigenvalues, sort order, degenerate empty/single, sum=Rg²) - PocketDescriptorsRowsTest +2 (multi-column prefix rule, mixed scalar + multi ordering) - existing 13 callsites updated for the double[] return signature - columnType() registry test → columnTypes() User-visible change: the default -pocket_descriptors output now has three new columns (principal_moments.lambda1/2/3) and the existing columns appear in a different order. Scripts parsing by column name are unaffected; scripts parsing by column index need updating.	2026-05-19 14:34:33 +02:00
rdk	0e044f6bb3	Audit follow-ups: fill warning, NaN guard, test hardening + docs Bug fixes: - MorphologicalCloser: gate the "didn't converge" warning on maxIters>0. maxIters=0 is a valid "disable fill" config and would otherwise log spuriously on every protein. - GridGenerator: hoist the isFiniteBox NaN guard into the (Box, edge) ctor so both sampleGridPointsBetween and sampleGridPointsAroundAtoms are covered (the second sampler was previously unguarded — used by the training/feature path). - PocketGridPdbSidecar.writePerPocket: serial-wrap warning added for parity with the combined write() path. Test hardening: - PocketGridPointDescriptorRegistry: add unregister() so tests can clean up fixture registrations; PocketGridRowsTest now @AfterAll unregisters its scalar fixture so it doesn't leak into the JVM-wide registry. - VolsiteSmoothGridPointDescriptorTest: pin sigma via @BeforeEach so other tests mutating the Params singleton can't shift expectations; new weightAtExactCutoffEqualsExpMinusEight test pins the 4σ-inclusive cutoff semantic (cutoutSphere is inclusive; exp(-8) ≈ 3.354e-4). Docs / clarifications: - Params.pocket_grid_point_descriptors javadoc: the silent-ignore when -export_pocket_grid=false is intentional (symmetric with -pocket_descriptors / -export_pocket_descriptors). - PocketDescriptor javadoc: intentionally scalar-only; recommend unifying with PocketGridPointDescriptor if multi-col is ever needed rather than ad-hoc extending this one. - PocketGridPointDescriptor javadoc: needsGrid() is intentionally absent — every grid-point descriptor needs the grid by definition. - documentation/export-pocket-grid.md: explain the default-empty rationale (cost: per-row × per-atom, not backward-compat). - VdwRadiusTable.resolveSymbol: comment that the name-prefix isotope branch is a safety net, not a semantic mapping (e.g. "DA" in DNA isn't deuterium).	2026-05-19 13:29:10 +02:00
rdk	6888716aa0	Tests for pocket-grid-point descriptors + extract DescriptorListValidator Adds focused regression tests for the new framework: 11 tests in three new files plus 4 added to PocketGridRowsTest. PocketGridRowsTest +4 - descriptor schema uses "{name}.{col}" prefix for multi-col - getRow appends descriptor values after the base 4 columns - unknown descriptor name throws at construction - scalar descriptor emits bare name() with no prefix (uses an inline ScalarTestDescriptor registered via the now-public registry hook — none of the shipped descriptors are scalar so the branch was untested) VolsiteGridPointDescriptorTest (new, 4 tests) - covers indicator aggregation + radius cutoff VolsiteSmoothGridPointDescriptorTest (new, 4 tests) - covers Gaussian kernel arithmetic + 4σ cutoff PocketGridPointDescriptorRegistryTest (new, 2 tests) - shipped names resolve, unknown name throws helpful error DescriptorListValidatorTest (new, 8 tests) - null/empty/valid/unknown/duplicate/null-entry/blank/dash-prefix Refactors Main.validateDescriptorList out to a self-contained Java utility (DescriptorListValidator) under predict/output/. The two call sites in Main.validatePocketGridParams now invoke the static helper; the private helper in Main is removed (-37 lines). PocketGridPointDescriptorRegistry.register is promoted from private to public so tests (and future external descriptor plugins) can add descriptors without touching the registry's static initializer. The shipped registrations still happen at class-load.	2026-05-19 10:29:02 +02:00
rdk	1931ef1f93	Pocket-grid-point descriptors: framework + two VolSite descriptors Adds an opt-in extension to the pocket-grid export — extra columns per (point, pocket) row driven by a registry of per-grid-point descriptors. Mirrors the existing per-pocket descriptor framework (interface, context record, static registry, name-driven CLI selection). CLI: -pocket_grid_point_descriptors list, default [] -pocket_grid_volsite_radius 4.0 Å (volsite indicator cutoff) -pocket_grid_volsite_sigma 2.0 Å (volsite_smooth Gaussian σ) Shipped descriptors (both 6-column, prefixed `{name}.`): volsite INT 0/1 per pharmacophore type within radius volsite_smooth DOUBLE Gaussian-weighted sum, kernel truncated at 4σ Atom-level pharmacophore classification reuses VolSitePharmacophore — a 1 in volsite.vsCation here matches a 1 in vsCation from VolsiteFeature. The 6 VolSite column names now live as VolSitePharmacophore.COLUMN_NAMES (single source of truth, also used by VolsiteFeature). VolSitePharmacophore gains a getAtomProperties(Atom) overload that does the PdbUtils hop. Validation: -pocket_grid_point_descriptors goes through a new shared validateDescriptorList(names, known, paramName) helper in Main, which also replaces the open-coded equivalent for -pocket_descriptors. The two new numeric params are bounds-checked.	2026-05-19 09:59:37 +02:00
rdk	a3efd0840c	Pocket-grid defensive guards + ChimeraX rank-gap fix - ChimeraX renderer: surfaces-layer rename now iterates the actual rank set (perPocketBasenames.keySet) instead of 1..maxRank. The previous code assumed every rank produces a ChimeraX submodel; a rank-skip would mis-target the rename. Latent today (P2Rank reorders pockets contiguously) but the assumption is now explicit in the code. - PdbSidecar: warn when total grid atoms exceed the PDB 5-digit serial column (wrap still happens; the warning surfaces the limit so users with very fine grids know why bond-inference tools might misbehave). - MorphologicalCloser: warn when loop exits at maxIters without converging, naming the param to raise. Previously silent. - GridGenerator: throw early on non-finite SAS-point bounding box. IEEEremainder(NaN, edge) = NaN would otherwise produce a NaN-everywhere lattice from a broken PDB. - VdwRadiusTable: map D/T isotopes to H before CDK lookup. Previously fell through to carbon (1.7 Å instead of hydrogen's 1.2 Å); marginal effect because of the atom_buffer cushion but no reason to be wrong. - PocketDescriptorsRows: throw at construction if grid==null and any selected descriptor declares needsGrid()=true, instead of NPEing inside compute(). The upstream gate in PocketGridOutputs already honors this; the guard catches programming errors elsewhere.	2026-05-19 07:47:43 +02:00
rdk	f06628dd63	Audit follow-ups: rename leftovers, doc fixes, numeric validation - testsets.sh: 4 sites still invoking -export_pocket_grid_pml after the rename; they were hard-failing at startup. - PocketGridPymolRenderer javadoc: pocket_dens_N -> pocket_gauss_N (3 refs), pocket_vol_N default ON not OFF (changed long ago in 82daf58a). - documentation/export-pocket-grid.md: vis_pocket_grid_volume_radius default is the -1 sentinel, not the auto-scaled 1.02 Å; ChimeraX layers doc now shows the #99 (spheres) + #100 (surfaces) split. - Main.validatePocketGridParams: numeric range checks for spacing, max_dist, atom_buffer, assign_cutoff, fill_min_neighbors (must lie in the 26-neighborhood), fill_max_iters, vis_pocket_grid_volume_radius (-1 sentinel or strictly positive), and gaussian_iso. Catches values that would otherwise produce a NaN lattice, empty grid, or garbage passed to PyMOL/ChimeraX.	2026-05-19 07:24:16 +02:00
rdk	60220d7a57	Add pocket-grid + descriptors export with PyMOL / ChimeraX viz Per-protein 3D grid of points around predicted pockets with per-pocket assignment, plus per-pocket geometric descriptors (volume, sphericity, radius_of_gyration, num_residues, num_surface_atoms, num_grid_points). User-facing knobs (all under -export_pocket_, -pocket_grid_, -vis_pocket_*): -export_pocket_grid CSV/Arrow/Parquet grid file -export_pocket_descriptors CSV/Arrow/Parquet descriptors file -vis_pocket_grid PyMOL/ChimeraX overlay scripts -pocket_grid_format csv \| csv.gz \| csv.zst \| arrow{,.gz,.zst} \| parquet -pocket_grid_spacing lattice edge (Å) -pocket_grid_max_dist outer bound vs nearest pocket SAS point -pocket_grid_atom_buffer inner bound vs vdw(nearest atom) -pocket_grid_assign_cutoff per-pocket membership cutoff -pocket_grid_assigner kdtree \| voxel_hash -pocket_grid_fill morph_closing \| none -pocket_descriptors subset of registered descriptors -vis_pocket_grid_volume_radius / _gaussian_iso viz tuning Renderers (PocketGridPymolRenderer, PocketGridChimeraXRenderer) overlay on top of the standard pocket viz with per-pocket togglable layers: discrete spheres, vdW-radius surface union, gaussian-iso (PyMOL only), convex-hull wireframe (PyMOL only, requires scipy). Both honor -vis_renderers membership. Startup validation for all new params (Main.validatePocketGridParams, Main.validateVisParams) — typos in renderer/format/fill/assigner names fail fast instead of silently emitting nothing. Performance: LongIntHashMap-backed lattice index, BitSet pocket assignments, pluggable range-query (kdtree vs voxel-hash), morph-closing frontier expansion. Most hot paths converted from Groovy to Java. Docs: documentation/export-pocket-grid.md, export-pocket-descriptors.md. Squashed from 70 commits (9b7d7a64..fec803ff). Pre-squash granular history preserved on branch develop-backup-2026-05-19.	2026-05-19 03:03:33 +02:00
rdk	0ef60da818	Guard pocket loaders against degenerate input Both fpocket and Seq2Pocket loaders could previously produce a pocket with a null centroid that NPEs downstream feature extraction: - FPocketLoader: skip the pocket if its voronoi-centers het group is empty (Atoms.centerOfMass returns null on empty list). Guard runs before rank assignment so surviving ranks stay sequential. - Seq2PocketLoader: skip the pocket if the input named atom serials but none resolved against queryProtein.allAtoms (otherwise the pocket would carry empty surfaceAtoms and null centroid). Real inputs rarely trigger this; synthetic test covers it. Neither path is expected with well-formed input; both fixes are defensive.	2026-05-17 01:44:29 +02:00
rdk	ddd5d8a11c	Add Seq2PocketLoader for Seq2Pocket pocket predictions Parses per-protein <ID>_predictions.txt (semicolon CSV) and resolves atom_ids against queryProtein.allAtoms by PDB serial. Empty/header-only files produce 0 pockets gracefully. Prediction is bound to the caller-supplied queryProtein, avoiding the ConcavityLoader bug class. - Dataset.groovy: new case "seq2pocket" - README.md: list SwinSite and Seq2Pocket in rescoring methods; cite pocketeer.ds + swinsite.ds in test_data/ examples - CLAUDE.md: note that distro/README.md is a transient build artifact - Test fixtures: 5 real predictions under distro/test_data/, plus unsorted/header-only/path-independence variants under src/test/resources/ - Seq2PocketLoaderTest: 10 tests, all passing	2026-05-16 12:40:36 +02:00
rdk	e9641680c1	Silence javac deprecation/unchecked notes - GenericVector.toList(): replace deprecated DefaultGroovyMethods.toList (Groovy 5) with a plain Java loop; drop unused addTo() (no callers) - Atoms(List<? extends Atom>): @SuppressWarnings("unchecked") for the intentional wrap-without-copy - KdNode.splitLeafNode: @SuppressWarnings("unchecked") for casts from the Object[] backing store	2026-05-15 16:15:26 +02:00
rdk	c6ee163ece	Audit cleanup: remove dead param, dead commented code, stale docs - Drop dead mask_unknown_residues=true from default(_rescore).groovy (param removed from Params.groovy in `1b7809a6`, 2019; configs missed) - Rewrite distro/models/readme.md to match models on disk (add rescore_2024, rescore_conservation; remove nonexistent conservation.model) - Remove broken documentation/rescoring.md link from distro/README.md - distro/config/readme.md: drop nonexistent working.groovy reference, fix github link master->develop - Delete dead commented-out method bodies in PdbUtils, RPlotter, PredictionVisualizer - Fix typo in Main.groovy javadoc	2026-05-15 09:34:28 +02:00
rdk	c78519c98e	Cofactor smoke harness, CDK VdW workaround, analyze-cofactors fixes Bumps faster-molecular-surface 1.0 -> 1.1, vendored in lib/local-mvn-repo/. The 1.1 release adds a VdW radius fallback for elements whose CDK Elements enum entry is null (Co, Ni, Cu, Rh, Os, Ir, plus radioactive/synthetic). Without the fix, cobalamin-bearing structures crashed surface computation under -cofactors. PatchedCdkNumericalSurface wraps the default CDK NumericalSurface (used when -use_optimized_surface 0) with the same fallback, via a Krypton proxy for null-VdW atoms. Surface.groovy switched over to it. Unit tests mirror the FMS-side regressions. AnalyzeRoutine.cmdCofactors: replace Struct.getHetGroups with Struct.getLigandGroups (2 call sites) so GDP/GTP/ATP and other groups that BioJava classifies as NUCLEOTIDE/AMINOACID don't get falsely reported as "name not in structure" in cofactor_matches.csv or omitted from het_groups.csv. Mirrors the M1 fix applied earlier to CofactorHandler.extractCofactorAtoms. testsets.sh: new cofactors_full() function exercising the cofactor demo + full datasets in p2rank-datasets2/other/cofactors/ (predict, analyze cofactors, -aa_mapping composition, visualizations, export-points). Uses -fail_fast 1 so per-structure errors surface as test failures rather than silent skips.	2026-05-15 00:35:08 +02:00
rdk	79cda78473	Add cofactor-as-protein-surface feature (Issue #79 part 2) The -cofactors flag and dataset cofactors column accept LigandDefinition specifiers ("FAD", "FAD[atom_id:N]", "FAD[contact_res_ids:A_T259,A_D246]"). Matched HET groups merge into the protein surface (proteinAtoms) and are excluded from ligand listings; per-item resolution lets a dataset column override the global Params.cofactors. New: analyze cofactors subcommand (HETATM survey + specifier dry-run), PyMOL teal-stick visualization (vis_highlight_cofactors), distant-cofactor and chain-excluded WARN diagnostics, aa_mapping collision WARN (R19), drop-in safety benchmark with byte-equality on a never-present specifier. Documentation in documentation/cofactors.md (user-facing) and documentation/dev/cofactors.md (engineering record with R1-R24 design choices and post-merge audit fixes). Tests in CofactorHandlerTest, CofactorIntegrationTest, CofactorPipelineTest, CofactorAnalyzeTest, DataTableCsvTest plus a Log4jCapture test helper.	2026-05-14 07:58:14 +02:00
rdk	0e8bb0cb33	Add SwinSiteLoader for SwinSite pocket predictions Registers `swinsite` as a third-party predictor in Dataset.groovy. The loader reads grid<N>_score_<float>.mol2 (raw voxel points) per pocket, parses score from the filename, computes pocket centroid from the grid, and derives surfaceAtoms via cutoutShell against queryProtein.exposedAtoms (4.5 -> 10 A expanding shell), mirroring ConcavityLoader. Reads grid mol2 instead of pocket mol2: pocket mol2 atoms are standalone copies with chain reset to 'A' and synthetic residue names, so they break P2Rank's residue/conservation/ASA feature lookups. Grid + cutoutShell keeps surfaceAtoms bound to real queryProtein atoms. Mol2 parsing is a small inline @<TRIPOS>ATOM scan rather than CDK's Mol2Reader: CDK has a lazy-init race in AtomTypeFactory that NPEs under parallel dataset processing. Ships swinsite.ds plus 6 protein PDBs (1tjw_A from SwinSite's test_protein_only example, plus 1a26A/1a2kC/1afkA/1atlA/1bqoB from coach420) covering 1/2/3/4/6-pocket cases. 1atlA's on-disk N-order is non-monotonic in score (0.7288, 0.0664, 0.3433), exercising the rerank. SwinSiteLoaderTest covers all six fixtures plus the predictionIsBoundToQueryProtein contract and empty-dir tolerance.	2026-05-08 01:05:15 +02:00
rdk	15349bb48f	Add pocket rank column to points export, fix overlap labeling The points export (predict/rescore -export_points 1) now includes an integer 'pocket' column matching newRank in *_predictions.csv, so users can directly aggregate per-pocket descriptors without a spatial join. Standalone 'export-points' (no prediction) omits the column. Pocket-extension shells can overlap, so a single SAS point can sit in multiple pocket.labeledPoints lists. Previously the assignment loop last-write-wins gave the worst rank to shared points, which was counter-intuitive for both visualization (PredictionVisualizer PDB output) and descriptor aggregation. PocketRescorer.setNewRanks now iterates pockets best-first with a guard, so the lowest newRank wins; the redundant lp.pocket write in PocketPredictor is removed. TableData gains a per-column ColumnType (DOUBLE default, INT) so TableExporter emits true integers in CSV (no decimals), Arrow (Int32), and Parquet (INT32) for the pocket column. Bump version to 2.6.0-dev.8.	2026-05-06 14:08:29 +02:00
rdk	c143e0fa9c	Fix ConcavityLoader to bind prediction to queryProtein ConcavityLoader.loadPrediction was ignoring its queryProtein parameter and binding the returned Prediction to a Protein loaded from *_residue.pdb (a pocket-touching residue subset, not the full protein). Downstream features keyed on prediction.protein.fileName then resolved against the wrong basename — most visibly conservation lookup, which searched for "<ID>_<submethod>_residue_<chain>.hom" instead of "<ID>_<chain>.hom" and silently produced zero conservation features. Other feature extractors were similarly reading the truncated atom set. The residue subset is still loaded and used to define the per-pocket surface-atom shell (no behaviour change there), but the Prediction is now bound to queryProtein, matching FPocketLoader and PUResNetLoader. Add ConcavityLoaderTest plus a matching test in FPocketLoaderTest that assert the loader-contract invariant prediction.protein === queryProtein.	2026-04-29 00:41:01 +02:00
rdk	42dfe7fd6f	Fix PUResNet pocket loader to handle shifted insertion codes PUResNet pocket PDBs occasionally left-shift the residue insertion code into column 26 instead of column 27, breaking BioJava's strict resSeq parser with NumberFormatException and silently dropping affected predictions (216 of 9955 entries on holo4k+pdbbind2020). Add PUResNetPdbRepair which detects the malformed pattern and rewrites it in memory before parsing. Wire PUResNetLoader through it. PdbUtils and the rest of the load path are unchanged.	2026-04-28 22:25:44 +02:00
rdk	43b1f7dcf1	Fix pocket centroid calculation in ConcavityLoader and PUResNetLoader Use centroid instead of centerOfMass in ConcavityLoader, set centroid explicitly in PUResNetLoader, fix POCKET_GRID_TO_SURFACE_DIST type to int.	2026-04-03 19:30:27 +02:00
rdk	17a4304d29	Add rg, n_unp_pockets, n_unp_pockets_multichain fields to AhojSiteInfo	2026-04-01 12:44:10 +02:00
rdk	858ba45fe7	Refactor AhojUbsSiteParser to use CSV library and add AhojSiteInfo data class - Replace manual line.split(",") with Apache Commons CSV (column-name access) - Support both reduced (9-col) and full (59-col) ahoj_ubs CSV formats - Add AhojSiteInfo: typed data class for 14 pocket metadata fields - Add secondaryData map to ResidueSite for extensible metadata - Export AhojSiteInfo columns in observed_sites.csv when available - Add comprehensive parser tests for both CSV formats - Add test data files and format documentation	2026-04-01 10:22:43 +02:00
rdk	6cf293478a	Add atom hybridization feature (one-hot sp2/sp3) CSV-based lookup for standard amino acid atoms with tiered fallback for non-standard residues (backbone name match, then element-based default).	2026-03-21 21:55:00 +01:00
rdk	a66bea74be	Add eval_output_prediction_files param to output per-protein prediction CSVs in eval commands	2026-03-17 18:59:13 +01:00
rdk	faddcfb70f	Lazy-init EnergyCalculator and LJEnergyCalculator in energy features	2026-03-16 07:55:16 +01:00
rdk	48cb681aaa	Refactor DSO/DSWO: replace Tuple2 with OverlapCounts, cache counts instead of Atoms, simplify CdkUtils	2026-03-16 03:20:48 +01:00
rdk	5b4613c3a4	Extract FpocketAdHocHelper, add run_fpocket_ad_hoc param for eval-rescore and rescore commands	2026-03-16 03:20:41 +01:00
rdk	ba53b97e90	Add per-method CSVs and grouped summary to binding-site-centers, add DataTable filter/distinctValues/formatGroupedSummaryTable	2026-03-16 01:06:44 +01:00
rdk	8852739016	Add DCC_4 protein-centric success rate metrics	2026-03-15 21:35:53 +01:00
rdk	a814157e2b	Minor cleanups: fix typos, normalize loop syntax and imports in Evaluation	2026-03-15 21:32:23 +01:00
rdk	f3616da217	Unify Protein.sites to contain all binding sites, add predictedPocket to BindingSite interface Protein.sites now holds ground-truth binding sites for both ligand-defined and explicit (residue-based) evaluation modes. Sites are populated from ligands via populateSitesFromLigands() when no explicit sites are defined. - Add predictedPocket and setSasPoints to BindingSite interface - Add predictedPocket field to ResidueSite - Rename assignPocketsToLigands to assignPocketsToSites (works on BindingSite) - Update calcCoveragesProt to use BindingSite.predictedPocket - Determine isLigandMode via instanceof instead of sites.isEmpty() - Unify PymolRenderer sites/ligands branch into single BindingSite loop - Simplify AnalyzeRoutine.cmdBindingSiteCenters to use p.sites directly	2026-03-15 21:25:49 +01:00
rdk	829cf9b8be	Return typed result objects from calcConservationStats and calcOverlapStatsForPockets	2026-03-15 20:28:51 +01:00
rdk	8a516228e1	Fix @CompileStatic errors in Evaluation: destructuring assignment, int-to-Double casts	2026-03-15 19:59:15 +01:00
rdk	5ac9aab18a	Refactor Evaluation: simplify avg/div methods, use Function instead of Closure, extract writeScoresToFileIfRequested	2026-03-15 19:27:15 +01:00
rdk	20236ef092	Refactor conservation/chains analysis, add @CompileStatic to Evaluation, rename criterium to criterion	2026-03-15 17:59:53 +01:00
rdk	d9de1fba7e	Add contact_atoms_centroid site evaluation center method for ligand-defined sites	2026-03-15 17:09:04 +01:00
rdk	49a8430a7d	Add binding-site-centers command, refactor center methods, consolidate error reporting - Rename SiteCentroidMethod to SiteCenterMethod - Extract getCenterForMethod(SiteCenterMethod) into BindingSite interface for thread-safe, param-independent center calculation - Refactor Ligand/ResidueSite getCenterForEval() to delegate to getCenterForMethod() - Add analyze binding-site-centers command comparing all center methods per site - Add Dataset.Result.writeErrorsAndGetSummary() and use it across all AnalyzeRoutine commands for consistent error reporting to both console and CSV	2026-03-14 18:22:47 +01:00
rdk	0e0cb47907	Add ca_atoms_centroid site evaluation center method with tests	2026-03-14 15:57:41 +01:00
rdk	1ecb29f876	Add load_ligands_from_separate_files param for loading ligands from individual ligand_* files	2026-03-13 18:21:26 +01:00
rdk	0b5b61304d	Add legacy conservation file name format fallback (e.g. 2ed4_A.)	2026-03-13 17:22:27 +01:00
rdk	e7fc457f6a	Fix ligand detection for BioJava GroupType misclassifications BioJava assigns GroupType based on its Chemical Component Dictionary, not structural role. Ligands in non-polymer chains can get any GroupType: - GDP, GTP, ATP -> GroupType.NUCLEOTIDE - SHR and similar -> GroupType.AMINOACID - Most others -> GroupType.HETATM Previously only HETATM groups were detected as ligands, causing errors like "Ligand definition 'GDP' matches no ligands" for nucleotide and amino acid derivative ligands. Fix: any non-water group in a NONPOLYMER chain is now a ligand candidate, regardless of GroupType. Polymer chain groups (protein AA, DNA/RNA) are only included if they have GroupType.HETATM. Add test PDB files (1a2kC.pdb with GDP, 1e5qA.pdb with SHR) and comprehensive tests for all three GroupType cases.	2026-03-10 14:34:28 +01:00
rdk	d78f80ee73	Extract writeCases() method, rename sites.csv to observed_sites.csv Consolidate case CSV writing into Evaluation.writeCases(). Remove duplicate DSO_0.1 criterion and stale TODO comments.	2026-03-10 03:24:44 +01:00
rdk	838b0a697f	Fix integer division bug in DSO criterion and clean up The Jaccard ratio was computed as int/int, always producing 0 or 1, making fractional thresholds ineffective. Cast to double for correct floating-point division. Also fix typo (cahe->cache), remove debug comments, and update javadoc.	2026-03-10 02:27:11 +01:00
rdk	2de315e9e0	Rename API: PocketCriterium->PocketCriterion, getLigandAtoms->getAtoms, centroid->center - Rename PocketCriterium to PocketCriterion (fix Latin spelling) - Revert getLigandAtoms() back to getAtoms() in BindingSite interface - Rename getCentroidForEval() to getCenterForEval() - Rename explicitCentroid to explicitCenter in ResidueSite - Rename SiteCentroidMethod values: explicit_centroid->explicit, sas_points_center_of_mass->sas_points_centroid - Rename site_centroid_method param to site_eval_center_method - Ligand.getCentroid() now delegates to getCenterForEval()	2026-03-10 02:02:47 +01:00
rdk	412c590dcb	Fix CSV spacing consistency: remove padding and trailing spaces Remove leading-space padding from fmt calls in getMiscStatsCSV and FeatureImportances, fix header/data spacing mismatch in toPocketsCSV, and remove trailing space in toLigandsCSV header.	2026-03-09 13:32:51 +01:00
rdk	fdebd71daf	Add example Jupyter notebook for analyzing P2Rank output Add notebook loading _predictions.csv and _residues.csv with example data from predict_1fbl. Clean up CSV formatting: remove padding from values, add fmtCsv() without leading spaces for CSV output.	2026-03-09 12:05:00 +01:00
rdk	61b8863c27	Simplify CSV output formatting and add null guard in CsvRow Remove fixed-width column padding from PredictionSummary, fix spacing in ResidueLabelings CSV output, and add null safety in CsvRow.add().	2026-03-09 11:17:59 +01:00

1 2 3 4 5 ...

1129 Commits