p2rank

mirror of https://github.com/rdk/p2rank.git synced 2026-06-04 12:44:24 +08:00

Author	SHA1	Message	Date
rdk	ddd5d8a11c	Add Seq2PocketLoader for Seq2Pocket pocket predictions Parses per-protein <ID>_predictions.txt (semicolon CSV) and resolves atom_ids against queryProtein.allAtoms by PDB serial. Empty/header-only files produce 0 pockets gracefully. Prediction is bound to the caller-supplied queryProtein, avoiding the ConcavityLoader bug class. - Dataset.groovy: new case "seq2pocket" - README.md: list SwinSite and Seq2Pocket in rescoring methods; cite pocketeer.ds + swinsite.ds in test_data/ examples - CLAUDE.md: note that distro/README.md is a transient build artifact - Test fixtures: 5 real predictions under distro/test_data/, plus unsorted/header-only/path-independence variants under src/test/resources/ - Seq2PocketLoaderTest: 10 tests, all passing	2026-05-16 12:40:36 +02:00
rdk	e9641680c1	Silence javac deprecation/unchecked notes - GenericVector.toList(): replace deprecated DefaultGroovyMethods.toList (Groovy 5) with a plain Java loop; drop unused addTo() (no callers) - Atoms(List<? extends Atom>): @SuppressWarnings("unchecked") for the intentional wrap-without-copy - KdNode.splitLeafNode: @SuppressWarnings("unchecked") for casts from the Object[] backing store	2026-05-15 16:15:26 +02:00
rdk	c6ee163ece	Audit cleanup: remove dead param, dead commented code, stale docs - Drop dead mask_unknown_residues=true from default(_rescore).groovy (param removed from Params.groovy in `1b7809a6`, 2019; configs missed) - Rewrite distro/models/readme.md to match models on disk (add rescore_2024, rescore_conservation; remove nonexistent conservation.model) - Remove broken documentation/rescoring.md link from distro/README.md - distro/config/readme.md: drop nonexistent working.groovy reference, fix github link master->develop - Delete dead commented-out method bodies in PdbUtils, RPlotter, PredictionVisualizer - Fix typo in Main.groovy javadoc	2026-05-15 09:34:28 +02:00
rdk	9fd7ffe0db	Bump gradle wrapper 9.5.0->9.5.1, slf4j 2.0.17->2.0.18, parquet-floor 1.65->1.69	2026-05-15 02:57:07 +02:00
rdk	c78519c98e	Cofactor smoke harness, CDK VdW workaround, analyze-cofactors fixes Bumps faster-molecular-surface 1.0 -> 1.1, vendored in lib/local-mvn-repo/. The 1.1 release adds a VdW radius fallback for elements whose CDK Elements enum entry is null (Co, Ni, Cu, Rh, Os, Ir, plus radioactive/synthetic). Without the fix, cobalamin-bearing structures crashed surface computation under -cofactors. PatchedCdkNumericalSurface wraps the default CDK NumericalSurface (used when -use_optimized_surface 0) with the same fallback, via a Krypton proxy for null-VdW atoms. Surface.groovy switched over to it. Unit tests mirror the FMS-side regressions. AnalyzeRoutine.cmdCofactors: replace Struct.getHetGroups with Struct.getLigandGroups (2 call sites) so GDP/GTP/ATP and other groups that BioJava classifies as NUCLEOTIDE/AMINOACID don't get falsely reported as "name not in structure" in cofactor_matches.csv or omitted from het_groups.csv. Mirrors the M1 fix applied earlier to CofactorHandler.extractCofactorAtoms. testsets.sh: new cofactors_full() function exercising the cofactor demo + full datasets in p2rank-datasets2/other/cofactors/ (predict, analyze cofactors, -aa_mapping composition, visualizations, export-points). Uses -fail_fast 1 so per-structure errors surface as test failures rather than silent skips.	2026-05-15 00:35:08 +02:00
rdk	79cda78473	Add cofactor-as-protein-surface feature (Issue #79 part 2) The -cofactors flag and dataset cofactors column accept LigandDefinition specifiers ("FAD", "FAD[atom_id:N]", "FAD[contact_res_ids:A_T259,A_D246]"). Matched HET groups merge into the protein surface (proteinAtoms) and are excluded from ligand listings; per-item resolution lets a dataset column override the global Params.cofactors. New: analyze cofactors subcommand (HETATM survey + specifier dry-run), PyMOL teal-stick visualization (vis_highlight_cofactors), distant-cofactor and chain-excluded WARN diagnostics, aa_mapping collision WARN (R19), drop-in safety benchmark with byte-equality on a never-present specifier. Documentation in documentation/cofactors.md (user-facing) and documentation/dev/cofactors.md (engineering record with R1-R24 design choices and post-merge audit fixes). Tests in CofactorHandlerTest, CofactorIntegrationTest, CofactorPipelineTest, CofactorAnalyzeTest, DataTableCsvTest plus a Log4jCapture test helper.	2026-05-14 07:58:14 +02:00
rdk	b2a23179f1	Bump groovy 5.0.5->5.0.6, log4j 2.25.4->2.26.0, zstd-jni 1.5.7-7->1.5.7-8	2026-05-12 01:56:07 +02:00
rdk	0e8bb0cb33	Add SwinSiteLoader for SwinSite pocket predictions Registers `swinsite` as a third-party predictor in Dataset.groovy. The loader reads grid<N>_score_<float>.mol2 (raw voxel points) per pocket, parses score from the filename, computes pocket centroid from the grid, and derives surfaceAtoms via cutoutShell against queryProtein.exposedAtoms (4.5 -> 10 A expanding shell), mirroring ConcavityLoader. Reads grid mol2 instead of pocket mol2: pocket mol2 atoms are standalone copies with chain reset to 'A' and synthetic residue names, so they break P2Rank's residue/conservation/ASA feature lookups. Grid + cutoutShell keeps surfaceAtoms bound to real queryProtein atoms. Mol2 parsing is a small inline @<TRIPOS>ATOM scan rather than CDK's Mol2Reader: CDK has a lazy-init race in AtomTypeFactory that NPEs under parallel dataset processing. Ships swinsite.ds plus 6 protein PDBs (1tjw_A from SwinSite's test_protein_only example, plus 1a26A/1a2kC/1afkA/1atlA/1bqoB from coach420) covering 1/2/3/4/6-pocket cases. 1atlA's on-disk N-order is non-monotonic in score (0.7288, 0.0664, 0.3433), exercising the rerank. SwinSiteLoaderTest covers all six fixtures plus the predictionIsBoundToQueryProtein contract and empty-dir tolerance.	2026-05-08 01:05:15 +02:00
rdk	59bc84c265	Mention pocket column alongside score in export-points docs The score and pocket columns share the same predict/rescore-only origin, so describe them together in the prose, the export-points "not contained" caveat, the predict/rescore output description, and the "Which command to use?" table.	2026-05-07 03:21:38 +02:00
rdk	f5ad22f604	Document 2.6 evaluation-metric fixes and note ligand-detection breaking change Add documentation/dev/evaluation-metric-fixes-2.6.md covering DSO/DSWO integer- division fixes, the ResidueSite DCC centroid fix, and the BioJava GroupType ligand-detection fix. Mention the ligand-detection change in breaking-changes.md since it shifts DCA/DCC on datasets containing GDP/GTP/ATP/SHR-like ligands.	2026-05-06 14:46:26 +02:00
rdk	15349bb48f	Add pocket rank column to points export, fix overlap labeling The points export (predict/rescore -export_points 1) now includes an integer 'pocket' column matching newRank in *_predictions.csv, so users can directly aggregate per-pocket descriptors without a spatial join. Standalone 'export-points' (no prediction) omits the column. Pocket-extension shells can overlap, so a single SAS point can sit in multiple pocket.labeledPoints lists. Previously the assignment loop last-write-wins gave the worst rank to shared points, which was counter-intuitive for both visualization (PredictionVisualizer PDB output) and descriptor aggregation. PocketRescorer.setNewRanks now iterates pockets best-first with a guard, so the lowest newRank wins; the redundant lp.pocket write in PocketPredictor is removed. TableData gains a per-column ColumnType (DOUBLE default, INT) so TableExporter emits true integers in CSV (no decimals), Arrow (Int32), and Parquet (INT32) for the pocket column. Bump version to 2.6.0-dev.8.	2026-05-06 14:08:29 +02:00
rdk	ee8ff7b471	Bump Gradle wrapper 9.4.1->9.5.0	2026-04-30 12:07:55 +02:00
rdk	9fe0e28bc0	Bump gradle-versions-plugin 0.53.0->0.54.0, commons-io 2.21.0->2.22.0, guava 33.5.0->33.6.0, gson 2.13.2->2.14.0, parquet-floor 1.64->1.65	2026-04-29 22:33:45 +02:00
rdk	c143e0fa9c	Fix ConcavityLoader to bind prediction to queryProtein ConcavityLoader.loadPrediction was ignoring its queryProtein parameter and binding the returned Prediction to a Protein loaded from *_residue.pdb (a pocket-touching residue subset, not the full protein). Downstream features keyed on prediction.protein.fileName then resolved against the wrong basename — most visibly conservation lookup, which searched for "<ID>_<submethod>_residue_<chain>.hom" instead of "<ID>_<chain>.hom" and silently produced zero conservation features. Other feature extractors were similarly reading the truncated atom set. The residue subset is still loaded and used to define the per-pocket surface-atom shell (no behaviour change there), but the Prediction is now bound to queryProtein, matching FPocketLoader and PUResNetLoader. Add ConcavityLoaderTest plus a matching test in FPocketLoaderTest that assert the loader-contract invariant prediction.protein === queryProtein.	2026-04-29 00:41:01 +02:00
rdk	42dfe7fd6f	Fix PUResNet pocket loader to handle shifted insertion codes PUResNet pocket PDBs occasionally left-shift the residue insertion code into column 26 instead of column 27, breaking BioJava's strict resSeq parser with NumberFormatException and silently dropping affected predictions (216 of 9955 entries on holo4k+pdbbind2020). Add PUResNetPdbRepair which detects the malformed pattern and rewrites it in memory before parsing. Wire PUResNetLoader through it. PdbUtils and the rest of the load path are unchanged.	2026-04-28 22:25:44 +02:00
rdk	43b1f7dcf1	Fix pocket centroid calculation in ConcavityLoader and PUResNetLoader Use centroid instead of centerOfMass in ConcavityLoader, set centroid explicitly in PUResNetLoader, fix POCKET_GRID_TO_SURFACE_DIST type to int.	2026-04-03 19:30:27 +02:00
rdk	994ad45238	Bump groovy 5.0.4->5.0.5, log4j 2.25.3->2.25.4	2026-04-01 22:25:51 +02:00
rdk	17a4304d29	Add rg, n_unp_pockets, n_unp_pockets_multichain fields to AhojSiteInfo	2026-04-01 12:44:10 +02:00
rdk	858ba45fe7	Refactor AhojUbsSiteParser to use CSV library and add AhojSiteInfo data class - Replace manual line.split(",") with Apache Commons CSV (column-name access) - Support both reduced (9-col) and full (59-col) ahoj_ubs CSV formats - Add AhojSiteInfo: typed data class for 14 pocket metadata fields - Add secondaryData map to ResidueSite for extensible metadata - Export AhojSiteInfo columns in observed_sites.csv when available - Add comprehensive parser tests for both CSV formats - Add test data files and format documentation	2026-04-01 10:22:43 +02:00
rdk	6cf293478a	Add atom hybridization feature (one-hot sp2/sp3) CSV-based lookup for standard amino acid atoms with tiered fallback for non-standard residues (backbone name match, then element-based default).	2026-03-21 21:55:00 +01:00
rdk	1997ab948e	switch CI Java distribution from temurin to oracle	2026-03-21 18:42:22 +01:00
rdk	1c636757d6	update CI Java version matrix: drop 23/24, add 26	2026-03-21 17:54:56 +01:00
rdk	b58726c27e	bump arrow and parquet-floor dependencies	2026-03-21 17:52:37 +01:00
rdk	0a51f504d0	bump gradle	2026-03-21 16:04:31 +01:00
rdk	a66bea74be	Add eval_output_prediction_files param to output per-protein prediction CSVs in eval commands	2026-03-17 18:59:13 +01:00
rdk	faddcfb70f	Lazy-init EnergyCalculator and LJEnergyCalculator in energy features 2.6.0-dev.7	2026-03-16 07:55:16 +01:00
rdk	48cb681aaa	Refactor DSO/DSWO: replace Tuple2 with OverlapCounts, cache counts instead of Atoms, simplify CdkUtils	2026-03-16 03:20:48 +01:00
rdk	5b4613c3a4	Extract FpocketAdHocHelper, add run_fpocket_ad_hoc param for eval-rescore and rescore commands	2026-03-16 03:20:41 +01:00
rdk	ba53b97e90	Add per-method CSVs and grouped summary to binding-site-centers, add DataTable filter/distinctValues/formatGroupedSummaryTable	2026-03-16 01:06:44 +01:00
rdk	91987129fe	Bump version to 2.6.0-dev.7	2026-03-15 21:37:05 +01:00
rdk	8852739016	Add DCC_4 protein-centric success rate metrics	2026-03-15 21:35:53 +01:00
rdk	a814157e2b	Minor cleanups: fix typos, normalize loop syntax and imports in Evaluation	2026-03-15 21:32:23 +01:00
rdk	f3616da217	Unify Protein.sites to contain all binding sites, add predictedPocket to BindingSite interface Protein.sites now holds ground-truth binding sites for both ligand-defined and explicit (residue-based) evaluation modes. Sites are populated from ligands via populateSitesFromLigands() when no explicit sites are defined. - Add predictedPocket and setSasPoints to BindingSite interface - Add predictedPocket field to ResidueSite - Rename assignPocketsToLigands to assignPocketsToSites (works on BindingSite) - Update calcCoveragesProt to use BindingSite.predictedPocket - Determine isLigandMode via instanceof instead of sites.isEmpty() - Unify PymolRenderer sites/ligands branch into single BindingSite loop - Simplify AnalyzeRoutine.cmdBindingSiteCenters to use p.sites directly	2026-03-15 21:25:49 +01:00
rdk	829cf9b8be	Return typed result objects from calcConservationStats and calcOverlapStatsForPockets	2026-03-15 20:28:51 +01:00
rdk	8a516228e1	Fix @CompileStatic errors in Evaluation: destructuring assignment, int-to-Double casts	2026-03-15 19:59:15 +01:00
rdk	5ac9aab18a	Refactor Evaluation: simplify avg/div methods, use Function instead of Closure, extract writeScoresToFileIfRequested	2026-03-15 19:27:15 +01:00
rdk	20236ef092	Refactor conservation/chains analysis, add @CompileStatic to Evaluation, rename criterium to criterion	2026-03-15 17:59:53 +01:00
rdk	d9de1fba7e	Add contact_atoms_centroid site evaluation center method for ligand-defined sites	2026-03-15 17:09:04 +01:00
rdk	49a8430a7d	Add binding-site-centers command, refactor center methods, consolidate error reporting - Rename SiteCentroidMethod to SiteCenterMethod - Extract getCenterForMethod(SiteCenterMethod) into BindingSite interface for thread-safe, param-independent center calculation - Refactor Ligand/ResidueSite getCenterForEval() to delegate to getCenterForMethod() - Add analyze binding-site-centers command comparing all center methods per site - Add Dataset.Result.writeErrorsAndGetSummary() and use it across all AnalyzeRoutine commands for consistent error reporting to both console and CSV	2026-03-14 18:22:47 +01:00
rdk	0e0cb47907	Add ca_atoms_centroid site evaluation center method with tests	2026-03-14 15:57:41 +01:00
rdk	1ecb29f876	Add load_ligands_from_separate_files param for loading ligands from individual ligand_* files	2026-03-13 18:21:26 +01:00
rdk	0b5b61304d	Add legacy conservation file name format fallback (e.g. 2ed4_A.)	2026-03-13 17:22:27 +01:00
rdk	e7fc457f6a	Fix ligand detection for BioJava GroupType misclassifications BioJava assigns GroupType based on its Chemical Component Dictionary, not structural role. Ligands in non-polymer chains can get any GroupType: - GDP, GTP, ATP -> GroupType.NUCLEOTIDE - SHR and similar -> GroupType.AMINOACID - Most others -> GroupType.HETATM Previously only HETATM groups were detected as ligands, causing errors like "Ligand definition 'GDP' matches no ligands" for nucleotide and amino acid derivative ligands. Fix: any non-water group in a NONPOLYMER chain is now a ligand candidate, regardless of GroupType. Polymer chain groups (protein AA, DNA/RNA) are only included if they have GroupType.HETATM. Add test PDB files (1a2kC.pdb with GDP, 1e5qA.pdb with SHR) and comprehensive tests for all three GroupType cases. 2.6.0-dev.6	2026-03-10 14:34:28 +01:00
rdk	d78f80ee73	Extract writeCases() method, rename sites.csv to observed_sites.csv Consolidate case CSV writing into Evaluation.writeCases(). Remove duplicate DSO_0.1 criterion and stale TODO comments.	2026-03-10 03:24:44 +01:00
rdk	838b0a697f	Fix integer division bug in DSO criterion and clean up The Jaccard ratio was computed as int/int, always producing 0 or 1, making fractional thresholds ineffective. Cast to double for correct floating-point division. Also fix typo (cahe->cache), remove debug comments, and update javadoc.	2026-03-10 02:27:11 +01:00
rdk	2de315e9e0	Rename API: PocketCriterium->PocketCriterion, getLigandAtoms->getAtoms, centroid->center - Rename PocketCriterium to PocketCriterion (fix Latin spelling) - Revert getLigandAtoms() back to getAtoms() in BindingSite interface - Rename getCentroidForEval() to getCenterForEval() - Rename explicitCentroid to explicitCenter in ResidueSite - Rename SiteCentroidMethod values: explicit_centroid->explicit, sas_points_center_of_mass->sas_points_centroid - Rename site_centroid_method param to site_eval_center_method - Ligand.getCentroid() now delegates to getCenterForEval()	2026-03-10 02:02:47 +01:00
rdk	412c590dcb	Fix CSV spacing consistency: remove padding and trailing spaces Remove leading-space padding from fmt calls in getMiscStatsCSV and FeatureImportances, fix header/data spacing mismatch in toPocketsCSV, and remove trailing space in toLigandsCSV header.	2026-03-09 13:32:51 +01:00
rdk	fdebd71daf	Add example Jupyter notebook for analyzing P2Rank output Add notebook loading _predictions.csv and _residues.csv with example data from predict_1fbl. Clean up CSV formatting: remove padding from values, add fmtCsv() without leading spaces for CSV output.	2026-03-09 12:05:00 +01:00
rdk	61b8863c27	Simplify CSV output formatting and add null guard in CsvRow Remove fixed-width column padding from PredictionSummary, fix spacing in ResidueLabelings CSV output, and add null safety in CsvRow.add().	2026-03-09 11:17:59 +01:00
rdk	42ad4dfe9f	Move centerOfMass and calculateCentroid to PerfUtils to avoid array allocation Reimplements BioJava's centerOfMass and Atoms.calculateCentroid in PerfUtils accepting Collection directly, avoiding temporary array allocation. Adds delegate methods in Struct.	2026-03-09 02:22:48 +01:00

1 2 3 4 5 ...

1939 Commits