The score and pocket columns share the same predict/rescore-only origin, so describe them together in the prose, the export-points "not contained" caveat, the predict/rescore output description, and the "Which command to use?" table.
5.6 KiB
Exporting SAS Points with Feature Vectors
Export SAS points with feature vectors and (optionally) predicted ligandability scores and pocket assignments.
Commands
There are two ways to export SAS points:
export-points - standalone export (no model needed)
prank export-points -f protein.pdb
prank export-points -f protein.pdb -export_points_format parquet
prank export-points dataset.ds -export_points_format arrow.zst
The export-points command calculates SAS surface points with feature vectors and exports them directly - no model is loaded and no prediction is made.
This means the output does not contain score or pocket columns, but you are free to use any custom feature setup via -features and -extra_features parameters.
predict / rescore - export alongside prediction
prank predict -f protein.pdb -export_points 1
prank predict -f protein.pdb -export_points 1 -export_points_format csv.gz
prank predict -f protein.pdb -export_points 1 -export_points_format arrow
prank predict -f protein.pdb -export_points 1 -export_points_format parquet
prank predict dataset.ds -export_points 1 -export_points_format arrow.zst
The rescore command also supports export (pocket points only):
prank rescore joined-fpocket.ds -export_points 1 -export_points_format arrow.zst
With predict/rescore, the output includes a score column with predicted ligandability and a pocket column with the predicted pocket rank (0 if the point is not assigned to any pocket).
However, because prediction relies on a pre-trained model that expects a particular set and order of features,
you cannot customize the feature setup (changing -features or -extra_features would break the model).
Which command to use?
export-points |
predict -export_points 1 |
|
|---|---|---|
| Custom feature setup | Yes | No (must match the model) |
Predicted score and pocket columns |
No | Yes |
| Requires a model | No | Yes |
Output
For each protein file, a {protein_file}_points.{format} file is generated:
| Column | Description |
|---|---|
x, y, z |
SAS point coordinates |
score |
Predicted ligandability [0-1] (predict/rescore only) |
pocket |
Predicted pocket rank (1, 2, …); 0 = point not assigned to any pocket. Integer column, present in predict/rescore output, absent in standalone export-points |
feature1, ... |
Feature values based on effective feature setup (-features, -extra_features) |
Example - predict -export_points 1 (CSV):
x,y,z,score,pocket,chem.hydrophobic,chem.aromatic,protrusion,...
12.3456,23.4567,34.5678,0.8234,1,0.5123,-0.2345,15.0000,...
Example - export-points (CSV):
x,y,z,chem.hydrophobic,chem.aromatic,protrusion,...
12.3456,23.4567,34.5678,0.5123,-0.2345,15.0000,...
Parameters
| Parameter | Default | Values |
|---|---|---|
export_points |
false |
true / false |
export_points_format |
csv |
csv, csv.gz, csv.zst, arrow, arrow.gz, arrow.zst, parquet |
Arrow format preserves full double precision. Offers faster loading and lower memory usage compared to CSV.
Parquet format is a columnar storage format widely supported by data analysis tools (pandas, polars, DuckDB, Spark). Uses SNAPPY compression internally.
Format Recommendations
| Use Case | Recommended Format |
|---|---|
| Smallest file size | csv.zst or arrow.zst |
| Python/R analysis | parquet or csv.gz |
| Streaming/pipes | arrow (uncompressed) |
| Maximum compatibility | csv |
Notes
export-pointsandpredictexport all SAS points;rescoreexports only pocket pointsexport-pointsdoes not require-export_points 1- exporting is always on- CSV format uses up to 7 decimal places for floating-point values; the
pocketcolumn is written as a plain integer - Arrow uses IPC streaming format with 64-bit floats; the
pocketcolumn uses Int32 - Parquet uses SNAPPY compression (not configurable); the
pocketcolumn uses INT32 - Zstd compression uses level 16 for good compression ratio
- Export is disabled when using
-output_only_stats 1 pocketmatches therankcolumn of*_predictions.csv. Boundary points that fall within the extended shells of two pockets (controlled byextended_pocket_cutoff) are labeled with the best (lowest) rank they belong to.
Example Analysis
Python (CSV):
import pandas as pd
df = pd.read_csv('protein_points.csv.gz')
high_score = df[df['score'] > 0.5]
print(df.describe())
# Per-pocket aggregated descriptors (predict / rescore output only):
pocket_descriptors = df[df['pocket'] > 0].groupby('pocket').mean()
Python (Arrow):
import pyarrow as pa
# Uncompressed
df = pa.ipc.open_stream('protein_points.arrow').read_pandas()
# Gzip-compressed - streaming format allows direct reading
import gzip
with gzip.open('protein_points.arrow.gz', 'rb') as f:
df = pa.ipc.open_stream(f).read_pandas()
# Zstd-compressed
import zstandard as zstd
with open('protein_points.arrow.zst', 'rb') as f:
df = pa.ipc.open_stream(zstd.ZstdDecompressor().stream_reader(f)).read_pandas()
Python (Parquet):
import pandas as pd
df = pd.read_parquet('protein_points.parquet')
# Or with PyArrow directly
import pyarrow.parquet as pq
table = pq.read_table('protein_points.parquet')
df = table.to_pandas()
Python (Polars):
import polars as pl
# Parquet (fastest)
df = pl.read_parquet('protein_points.parquet')
# CSV with compression
df = pl.read_csv('protein_points.csv.gz')
# Filter high-scoring points
high_score = df.filter(pl.col('score') > 0.5)