- New vis_site_centers param (default false) renders centroids as hotpink
pseudoatom spheres in both old (PymolRenderer) and new (NewPymolRenderer)
- Pass site centroids via RenderingModel.siteCentroids for analyze command
- Old renderer shows predicted pocket centroids and ligand centroids
- Fix empty visualizations/ dir in eval-predict: create vis dir under
predDir instead of top-level outdir
- Use bulk atom ID selections instead of per-residue named selections to
avoid exceeding PyMOL's object limit on large proteins
- Convert CIF inputs to PDB format with correct .pdb extension (PyMOL
can't reliably parse BioJava CIF and uses extension to pick parser)
- Rename PyMOL object from "protein" to "prot" to avoid reserved keyword
- Fix null interpolation in PML when no ligands or no labeling
- Build BinaryLabeling from explicit site residues for visualization
(item.binaryLabeling doesn't support site-based datasets)
- Add PyMol visualizations using dataset.binaryResidueLabeler
- Add site_radius column (max distance from centroid to any site atom)
- Add excludeFromSummary param to DataTable.formatSummaryTable to skip
center coordinates from numeric summary stats
- Load ExplicitSitesIndex eagerly during dataset loading (fail-fast)
- Skip CSV rows with empty residue/coordinate fields in AhojUbsSiteParser
- Write items without binding sites to separate file in outdir
Switch Atoms.kdTree field and buildKdTree() to use the AtomKdTree
interface instead of AtomKdTreeV1 directly. Add @NonNull to iterator(),
improve initial capacity estimates, and fix whitespace.
Implement ExplicitSitesIndex for loading binding site definitions from
external CSV files (pluggable format system, first format: ahoj_ubs).
Sites are resolved during item loading via DatasetItemLoader.
Add 'analyze binding-sites' sub-command producing unified CSV and summary
stats for both ligand-based and explicit site datasets, with unresolved
residue/site tracking for explicit datasets.
Remove unused SiteLoader stub.
Add GenericVector.addWeighted() for fused multiply-add, eliminating per-
neighbor array allocation in feature vector aggregation. Add SLinkClustererV2
using union-find with path compression, reducing single-linkage clustering
from O(N³) to O(N²). Wire V2 via factory methods on AtomClusterer and
AtomGroupClusterer.
Rename Atoms.consolidate() to Atoms.sparsify() for clarity. Use mutable
V1 KdTree for O(N log N) incremental insertion instead of periodic
rebuilds. Add surface_sparsify runtime param (default true) to allow
disabling surface point sparsification. Hardcode AtomKdTreeV1 in
Atoms.buildKdTree() and delegate Dataset cache clearing to item methods.
Rewrite AtomKdTreeV1 from Groovy to Java to eliminate Groovy IndyInterface
monitor contention that serialized 16 parallel threads down to ~2.
Move V1 KdTree into v1/ subpackage, extract AtomKdTree as a Java interface
with factory method dispatching by kdtree_implementation param, and rename
the old v2 wrapper to AtomKdTreeV2 implementing the same interface.
Add runtime parameter to switch between KdTree3D (default) and v1
AtomKdTree. Fix O(N²) quickselect degeneration on duplicate coordinates
by adding post-partition equal-range scan.
- Bounding boxes computed bottom-up from leaf scans instead of scanning
full data range at every tree level (O(N) vs O(N log N))
- Approximate parent bounds passed down for split-axis selection (O(1)
per node instead of O(range) scan)
- Remove findNodeCount() and dead code; buildNode returns max index
- Resolve split-axis array once in quickselect inner loop
Rename processssItem to processItem. Add per-item conditional cache
clearing after processing to reduce peak memory. Refactor cleanCaches
into clearCache/clearPrimaryCache/clearSecondaryCache with null-safety.
Parametrized test generates random points, builds both trees, verifies
identical results for all query types, and measures relative performance.
Skipped during normal test runs; invoked via kdtree-benchmark.sh script.
Cache aggregated errors in Dataset.Result to avoid recomputing.
Use direct x/y/z field access instead of getCoords() in
Atoms.copyPoints and PointExportData to avoid double[3] allocations.
Add Dataset.Item.getRow() to reconstruct dataset row strings.
In cmdProteins(), collect items into with/without protein chains
using ConcurrentLinkedQueue and write split .ds files when any
structures lack protein chains.
Add Atom-based sqrDist/dist overloads in PerfUtils that use
getX/getY/getZ directly instead of allocating double[] via getCoords().
Refactor Point to store x/y/z as individual fields instead of a
double[] array. Fix Point.setCoords() which was previously a no-op.
Pre-build KD tree in Ligands.makeLigands() before the ligand loop.
Simplify KD tree usage in Atoms.dist/sqrDist by removing redundant
size threshold check.
Add writeAggregatedItemErrorsToCsv to Dataset.Result that groups errors
by message and outputs count/error sorted by frequency. Update
getErrorSummary to display an aggregated error table instead of just
the count. Add writeErrorCsvs(outdir) that writes all three error files
(per-item, aggregated, full stack traces) and use it in AnalyzeRoutine.
Add 'analyze proteins' command that outputs per-protein stats CSV
(chain counts, residues, atoms, ligands, peptides) and a summary table
with min/max/avg/median. Add 'analyze parse-proteins' for parsing
all dataset items and reporting errors only.
Introduce DataTable — a lock-free, pre-registered-column table for
structured data collection across threads, with CSV and summary output.
- Add ResidueSite and SiteLoader
- Update pocket criteria (DCA, DCC, DPA, DSA, DSO, DSWO) for site evaluation
- Extend Evaluation with site-metrics support
- Bump version to 2.6.0-dev.1
Memory optimization: per-prediction cost reduced from ~40 bytes (PPred object)
to ~9 bytes (parallel double[] + boolean[] arrays). For large datasets this
reduces prediction storage by ~77%.
PredictedScores provides: ArrayList-style growth, bulk addAll via arraycopy,
cached observedPositiveCount, stable descending merge sort (required for
reproducible metrics with tied RF scores), and direct backing array access
for hot loops in Metrics/Curves.
- Add NativePanamaForest/NativePanamaForestAvx2 availability checks in ModelConverter
- Refactor flattening logic to separate trainable forest preparation from conversion
- Track all eval times and compute average excluding first run (caching warmup)
- Rename TIME_M to TIME_TRAINEVAL_AVG_M, add TIME_EVAL_AVG_M stat
Generalize Model classifier from Classifier to Object to support both
trainable classifiers and flat BinaryForest models. Add rf_flatten_target
parameter for selecting forest type (FlatBinaryForest, LegacyFlatBinaryForest,
InterleavedBfsForest, etc). Deprecate rf_flatten_as_legacy in favor of the
new target type selection.
Replace flat jar with local Maven repo dependency at correct path
(groupId/artifactId/version/). Fix GString-to-String type errors in
AnalyzeRoutine that broke compilation with @CompileStatic.