Protein.sites now holds ground-truth binding sites for both ligand-defined
and explicit (residue-based) evaluation modes. Sites are populated from
ligands via populateSitesFromLigands() when no explicit sites are defined.
- Add predictedPocket and setSasPoints to BindingSite interface
- Add predictedPocket field to ResidueSite
- Rename assignPocketsToLigands to assignPocketsToSites (works on BindingSite)
- Update calcCoveragesProt to use BindingSite.predictedPocket
- Determine isLigandMode via instanceof instead of sites.isEmpty()
- Unify PymolRenderer sites/ligands branch into single BindingSite loop
- Simplify AnalyzeRoutine.cmdBindingSiteCenters to use p.sites directly
- Rename SiteCentroidMethod to SiteCenterMethod
- Extract getCenterForMethod(SiteCenterMethod) into BindingSite interface
for thread-safe, param-independent center calculation
- Refactor Ligand/ResidueSite getCenterForEval() to delegate to getCenterForMethod()
- Add analyze binding-site-centers command comparing all center methods per site
- Add Dataset.Result.writeErrorsAndGetSummary() and use it across all
AnalyzeRoutine commands for consistent error reporting to both console and CSV
BioJava assigns GroupType based on its Chemical Component Dictionary,
not structural role. Ligands in non-polymer chains can get any GroupType:
- GDP, GTP, ATP -> GroupType.NUCLEOTIDE
- SHR and similar -> GroupType.AMINOACID
- Most others -> GroupType.HETATM
Previously only HETATM groups were detected as ligands, causing errors
like "Ligand definition 'GDP' matches no ligands" for nucleotide and
amino acid derivative ligands.
Fix: any non-water group in a NONPOLYMER chain is now a ligand
candidate, regardless of GroupType. Polymer chain groups (protein AA,
DNA/RNA) are only included if they have GroupType.HETATM.
Add test PDB files (1a2kC.pdb with GDP, 1e5qA.pdb with SHR) and
comprehensive tests for all three GroupType cases.
The Jaccard ratio was computed as int/int, always producing 0 or 1,
making fractional thresholds ineffective. Cast to double for correct
floating-point division. Also fix typo (cahe->cache), remove debug
comments, and update javadoc.
- Rename PocketCriterium to PocketCriterion (fix Latin spelling)
- Revert getLigandAtoms() back to getAtoms() in BindingSite interface
- Rename getCentroidForEval() to getCenterForEval()
- Rename explicitCentroid to explicitCenter in ResidueSite
- Rename SiteCentroidMethod values: explicit_centroid->explicit,
sas_points_center_of_mass->sas_points_centroid
- Rename site_centroid_method param to site_eval_center_method
- Ligand.getCentroid() now delegates to getCenterForEval()
Remove leading-space padding from fmt calls in getMiscStatsCSV and
FeatureImportances, fix header/data spacing mismatch in toPocketsCSV,
and remove trailing space in toLigandsCSV header.
Add notebook loading _predictions.csv and _residues.csv with example
data from predict_1fbl. Clean up CSV formatting: remove padding from
values, add fmtCsv() without leading spaces for CSV output.
Add unified sites.csv (alongside ligands.csv) containing site type, centroid
coordinates, radius, and residue counts for both ligand-defined and explicit
sites. Rename BindingSite.getAtoms() to getLigandAtoms() for clarity and
update all callers.
Prevents NPE when site centroid is null (e.g. buried residues with no
SAS points when using sas_points_center_of_mass) or pocket centroid is
null (e.g. PUResNetPocket).
Write errors.csv, errors_aggregated.csv, and errors_full.txt.gz with
full stack traces when processing errors occur. Also rename pockets.csv
to predicted_pockets.csv in eval results output.
Add SiteCentroidMethod enum with support flags for ligand/explicit sites.
Rename ResidueSite.centroid to explicitCentroid, add getCentroidForEval()
to BindingSite interface used by DCC. Add site_eval_sas_pts_as_atoms param
to allow DCA to use SAS points instead of atoms for site representation.
Render predicted pocket centroids with individual pocket colors and
explicit site centroids (or ligand centroids as fallback) as hotpink
spheres, controlled by vis_site_centers param.
DCC was computing site.atoms.centroid (geometric center of resolved
residue atoms) instead of site.getCentroid(). For ResidueSites this
returns the predefined centroid from the input file, which is the
authoritative binding site location. For Ligands this changes from
geometric to mass-weighted center (negligible difference).
- PocketRescorer: fall back to explicit site residue atoms for point
labeling when no ligand atoms are available, fixing 0-positives in
binary classification stats for site-based eval-predict
- SLinkClustererV2: log cluster count and sizes instead of full contents
- New vis_site_centers param (default false) renders centroids as hotpink
pseudoatom spheres in both old (PymolRenderer) and new (NewPymolRenderer)
- Pass site centroids via RenderingModel.siteCentroids for analyze command
- Old renderer shows predicted pocket centroids and ligand centroids
- Fix empty visualizations/ dir in eval-predict: create vis dir under
predDir instead of top-level outdir
- Use bulk atom ID selections instead of per-residue named selections to
avoid exceeding PyMOL's object limit on large proteins
- Convert CIF inputs to PDB format with correct .pdb extension (PyMOL
can't reliably parse BioJava CIF and uses extension to pick parser)
- Rename PyMOL object from "protein" to "prot" to avoid reserved keyword
- Fix null interpolation in PML when no ligands or no labeling
- Build BinaryLabeling from explicit site residues for visualization
(item.binaryLabeling doesn't support site-based datasets)
- Add PyMol visualizations using dataset.binaryResidueLabeler
- Add site_radius column (max distance from centroid to any site atom)
- Add excludeFromSummary param to DataTable.formatSummaryTable to skip
center coordinates from numeric summary stats
- Load ExplicitSitesIndex eagerly during dataset loading (fail-fast)
- Skip CSV rows with empty residue/coordinate fields in AhojUbsSiteParser
- Write items without binding sites to separate file in outdir
Switch Atoms.kdTree field and buildKdTree() to use the AtomKdTree
interface instead of AtomKdTreeV1 directly. Add @NonNull to iterator(),
improve initial capacity estimates, and fix whitespace.
Implement ExplicitSitesIndex for loading binding site definitions from
external CSV files (pluggable format system, first format: ahoj_ubs).
Sites are resolved during item loading via DatasetItemLoader.
Add 'analyze binding-sites' sub-command producing unified CSV and summary
stats for both ligand-based and explicit site datasets, with unresolved
residue/site tracking for explicit datasets.
Remove unused SiteLoader stub.
Add GenericVector.addWeighted() for fused multiply-add, eliminating per-
neighbor array allocation in feature vector aggregation. Add SLinkClustererV2
using union-find with path compression, reducing single-linkage clustering
from O(N³) to O(N²). Wire V2 via factory methods on AtomClusterer and
AtomGroupClusterer.
Rename Atoms.consolidate() to Atoms.sparsify() for clarity. Use mutable
V1 KdTree for O(N log N) incremental insertion instead of periodic
rebuilds. Add surface_sparsify runtime param (default true) to allow
disabling surface point sparsification. Hardcode AtomKdTreeV1 in
Atoms.buildKdTree() and delegate Dataset cache clearing to item methods.
Rewrite AtomKdTreeV1 from Groovy to Java to eliminate Groovy IndyInterface
monitor contention that serialized 16 parallel threads down to ~2.
Move V1 KdTree into v1/ subpackage, extract AtomKdTree as a Java interface
with factory method dispatching by kdtree_implementation param, and rename
the old v2 wrapper to AtomKdTreeV2 implementing the same interface.
Add runtime parameter to switch between KdTree3D (default) and v1
AtomKdTree. Fix O(N²) quickselect degeneration on duplicate coordinates
by adding post-partition equal-range scan.
- Bounding boxes computed bottom-up from leaf scans instead of scanning
full data range at every tree level (O(N) vs O(N log N))
- Approximate parent bounds passed down for split-axis selection (O(1)
per node instead of O(range) scan)
- Remove findNodeCount() and dead code; buildNode returns max index
- Resolve split-axis array once in quickselect inner loop
Rename processssItem to processItem. Add per-item conditional cache
clearing after processing to reduce peak memory. Refactor cleanCaches
into clearCache/clearPrimaryCache/clearSecondaryCache with null-safety.