improve readme files in distro

This commit is contained in:
rdk
2020-11-05 17:52:16 +01:00
parent b9a2090599
commit 7ce58b285f
14 changed files with 85 additions and 50 deletions

View File

@@ -1,6 +1,6 @@
MIT License
Copyright (c) 2017 Radoslav Krivak, David Hoksza, Lukas Jendele and other contributors
Copyright (c) 2017-2020 Radoslav Krivak, David Hoksza, Lukas Jendele, Petr Skoda and other contributors
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal

View File

@@ -36,9 +36,11 @@ P2Rank requires no installation. Binary packages can be downloaded from the proj
See more usage examples below...
### Compilation
### Build
This project uses [Gradle](https://gradle.org/) build system. Build with `./make.sh` or `./gradlew assemble`.
This project uses [Gradle](https://gradle.org/) build system via included Gradle wrapper.
Build with `./make.sh` or `./gradlew assemble`.
### Algorithm
@@ -52,7 +54,7 @@ If you use P2Rank, please cite relevant papers:
* [Software article](https://doi.org/10.1186/s13321-018-0285-8) in JChem about P2Rank pocket prediction tool
Krivak R, Hoksza D. *P2Rank: machine learning based tool for rapid and accurate prediction of ligand binding sites from protein structure.* Journal of Cheminformatics. 2018 Aug.
* [Web-server article](https://doi.org/10.1093/nar/gkz424) in NAR about the web interface accessible at [p2rank.cz](https://p2rank.cz)
* [Web-server article](https://doi.org/10.1093/nar/gkz424) in NAR about the web interface accessible at [prankweb.cz](https://prankweb.cz)
Jendele L, Krivak R, Skoda P, Novotny M, Hoksza D. *PrankWeb: a web server for ligand binding site prediction and visualization.* Nucleic Acids Research, Volume 47, Issue W1, 02 July 2019, Pages W345W349
* [Conference paper](https://doi.org/10.1007/978-3-319-21233-3_4) inroducing P2Rank prediction algorithm
Krivak R, Hoksza D. *P2RANK: Knowledge-Based Ligand Binding Site Prediction Using Aggregated Local Features.* InInternational Conference on Algorithms for Computational Biology 2015 Aug 4 (pp. 41-52). Springer
@@ -68,13 +70,13 @@ Following commands can be executed in the installation directory.
### Print help
~~~sh
~~~bash
prank help
~~~
### Predict ligand binding sites (P2Rank algorithm)
~~~sh
~~~bash
prank predict test.ds # run on whole dataset (containing list of pdb files)
prank predict -f test_data/1fbl.pdb # run on single pdb file
@@ -89,16 +91,16 @@ prank predict -c predict2.groovy test.ds # specify configuration file (
### Evaluate prediction model
...on a file or a dataset with known ligands.
~~~sh
~~~bash
prank eval-predict -f test_data/1fbl.pdb
prank eval-predict test.ds
~~~
### Prediction output
For each file in the dataset P2Rank produces produces several output files:
For each file in the dataset P2Rank produces several output files:
* `<pdb_file_name>_predictions.csv`: contains an ordered list of predicted pockets, their scores, coordinates
of their centers together with a list of adjacent residues and a list of adjacent protein surface atoms
of their centers together with a list of adjacent residues, and a list of adjacent protein surface atoms
* `<pdb_file_name>_residues.csv`: contains list of all residues from the input protein with their scores,
mapping to predicted pockets and calibrated probability of being a ligand-binding residue
* PyMol visualization (`.pml` script with data files)
@@ -111,7 +113,7 @@ prank eval-predict test.ds
You can override the default params with a custom config file:
~~~sh
~~~bash
prank predict -c config/example.groovy test.ds
prank predict -c example.groovy test.ds
~~~
@@ -119,7 +121,7 @@ prank predict -c example.groovy test.ds
It is also possible to override the default params on the command line using their full name. To see complete list of params look into `config/default.groovy`.
~~~sh
~~~bash
prank predict -seed 151 -threads 8 test.ds
prank predict -c example.groovy -seed 151 -threads 8 test.ds
~~~
@@ -130,7 +132,7 @@ In addition to predicting new ligand binding sites,
P2Rank is also able to rescore pockets predicted by other methods
(Fpocket, ConCavity, SiteHound, MetaPocket2, LISE and DeepSite are supported at the moment).
~~~sh
~~~bash
prank rescore test_data/fpocket.ds
prank rescore fpocket.ds # test_data/ is default 'dataset_base_dir'
prank rescore fpocket.ds -o output_dir # test_output/ is default 'output_base_dir'
@@ -144,20 +146,20 @@ prank eval-rescore fpocket.ds
## Comparison with Fpocket
[Fpocket](http://fpocket.sourceforge.net/) is widely used open source ligand binding site prediction program.
[Fpocket](https://github.com/Discngine/fpocket) is widely used open source ligand binding site prediction program.
It is fast, easy to use and well documented. As such, it was a great inspiration for this project.
Fpocket is written in C, and it is based on a different geometric algorithm.
Some practical differences:
* Fpocket
* **Fpocket**
- has much smaller memory footprint
- runs faster when executed on a single protein
- produces a high number of less relevant pockets (and since the default scoring function isn't very effective the most relevant pockets often doesn't get to the top)
- contains MDpocket algorithm for pocket predictions from molecular trajectories
- still better documented
* P2Rank
- achieves significantly better identification success rates when considering top-ranked pockets
* **P2Rank**
- achieves significantly higher identification success rates when considering top-ranked pockets
- produces smaller number of more relevant pockets
- speed:
+ slower when running on a single protein (due to JVM startup cost)

View File

@@ -1,6 +1,6 @@
MIT License
Copyright (c) 2017 Radoslav Krivak, David Hoksza, Lukas Jendele and other contributors
Copyright (c) 2017-2020 Radoslav Krivak, David Hoksza, Lukas Jendele, Petr Skoda and other contributors
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal

View File

@@ -36,9 +36,11 @@ P2Rank requires no installation. Binary packages can be downloaded from the proj
See more usage examples below...
### Compilation
### Build
This project uses [Gradle](https://gradle.org/) build system. Build with `./make.sh` or `./gradlew assemble`.
This project uses [Gradle](https://gradle.org/) build system via included Gradle wrapper.
Build with `./make.sh` or `./gradlew assemble`.
### Algorithm
@@ -52,7 +54,7 @@ If you use P2Rank, please cite relevant papers:
* [Software article](https://doi.org/10.1186/s13321-018-0285-8) in JChem about P2Rank pocket prediction tool
Krivak R, Hoksza D. *P2Rank: machine learning based tool for rapid and accurate prediction of ligand binding sites from protein structure.* Journal of Cheminformatics. 2018 Aug.
* [Web-server article](https://doi.org/10.1093/nar/gkz424) in NAR about the web interface accessible at [p2rank.cz](https://p2rank.cz)
* [Web-server article](https://doi.org/10.1093/nar/gkz424) in NAR about the web interface accessible at [prankweb.cz](https://prankweb.cz)
Jendele L, Krivak R, Skoda P, Novotny M, Hoksza D. *PrankWeb: a web server for ligand binding site prediction and visualization.* Nucleic Acids Research, Volume 47, Issue W1, 02 July 2019, Pages W345W349
* [Conference paper](https://doi.org/10.1007/978-3-319-21233-3_4) inroducing P2Rank prediction algorithm
Krivak R, Hoksza D. *P2RANK: Knowledge-Based Ligand Binding Site Prediction Using Aggregated Local Features.* InInternational Conference on Algorithms for Computational Biology 2015 Aug 4 (pp. 41-52). Springer
@@ -68,13 +70,13 @@ Following commands can be executed in the installation directory.
### Print help
~~~sh
~~~bash
prank help
~~~
### Predict ligand binding sites (P2Rank algorithm)
~~~sh
~~~bash
prank predict test.ds # run on whole dataset (containing list of pdb files)
prank predict -f test_data/1fbl.pdb # run on single pdb file
@@ -89,16 +91,16 @@ prank predict -c predict2.groovy test.ds # specify configuration file (
### Evaluate prediction model
...on a file or a dataset with known ligands.
~~~sh
~~~bash
prank eval-predict -f test_data/1fbl.pdb
prank eval-predict test.ds
~~~
### Prediction output
For each file in the dataset P2Rank produces produces several output files:
For each file in the dataset P2Rank produces several output files:
* `<pdb_file_name>_predictions.csv`: contains an ordered list of predicted pockets, their scores, coordinates
of their centers together with a list of adjacent residues and a list of adjacent protein surface atoms
of their centers together with a list of adjacent residues, and a list of adjacent protein surface atoms
* `<pdb_file_name>_residues.csv`: contains list of all residues from the input protein with their scores,
mapping to predicted pockets and calibrated probability of being a ligand-binding residue
* PyMol visualization (`.pml` script with data files)
@@ -111,7 +113,7 @@ prank eval-predict test.ds
You can override the default params with a custom config file:
~~~sh
~~~bash
prank predict -c config/example.groovy test.ds
prank predict -c example.groovy test.ds
~~~
@@ -119,7 +121,7 @@ prank predict -c example.groovy test.ds
It is also possible to override the default params on the command line using their full name. To see complete list of params look into `config/default.groovy`.
~~~sh
~~~bash
prank predict -seed 151 -threads 8 test.ds
prank predict -c example.groovy -seed 151 -threads 8 test.ds
~~~
@@ -130,7 +132,7 @@ In addition to predicting new ligand binding sites,
P2Rank is also able to rescore pockets predicted by other methods
(Fpocket, ConCavity, SiteHound, MetaPocket2, LISE and DeepSite are supported at the moment).
~~~sh
~~~bash
prank rescore test_data/fpocket.ds
prank rescore fpocket.ds # test_data/ is default 'dataset_base_dir'
prank rescore fpocket.ds -o output_dir # test_output/ is default 'output_base_dir'
@@ -144,20 +146,20 @@ prank eval-rescore fpocket.ds
## Comparison with Fpocket
[Fpocket](http://fpocket.sourceforge.net/) is widely used open source ligand binding site prediction program.
[Fpocket](https://github.com/Discngine/fpocket) is widely used open source ligand binding site prediction program.
It is fast, easy to use and well documented. As such, it was a great inspiration for this project.
Fpocket is written in C, and it is based on a different geometric algorithm.
Some practical differences:
* Fpocket
* **Fpocket**
- has much smaller memory footprint
- runs faster when executed on a single protein
- produces a high number of less relevant pockets (and since the default scoring function isn't very effective the most relevant pockets often doesn't get to the top)
- contains MDpocket algorithm for pocket predictions from molecular trajectories
- still better documented
* P2Rank
- achieves significantly better identification success rates when considering top-ranked pockets
* **P2Rank**
- achieves significantly higher identification success rates when considering top-ranked pockets
- produces smaller number of more relevant pockets
- speed:
+ slower when running on a single protein (due to JVM startup cost)

View File

@@ -136,7 +136,7 @@ import cz.siret.prank.program.params.Params
point_sampler = "SurfacePointSampler"
/**
* multiplier for random posampling
* multiplier for random point sub/super-sampling
*/
sampling_multiplier = 3
@@ -146,12 +146,13 @@ import cz.siret.prank.program.params.Params
solvent_radius = 1.6
/**
* Connolly potessellation (~density) used in pradiction step
* SAS tessellation (~density) used in prediction step.
* Higher tessellation = higher density (+1 ~~ x4 points)
*/
tessellation = 2
/**
* Connolly potessellation (~density) used in training step
* SAS tessellation (~density) used in training step
*/
train_tessellation = 2
@@ -224,7 +225,7 @@ import cz.siret.prank.program.params.Params
vis_copy_proteins = true
/**
* use sctrictly inner pocket points or more wider pocket neighbourhood
* use strictly inner pocket points or more wider pocket neighbourhood
*/
strict_inner_points = false
@@ -244,7 +245,7 @@ import cz.siret.prank.program.params.Params
predictions = false
/**
* minimum ligandability score for Connolly poto be considered ligandable
* minimum ligandability score for SAS point to be considered ligandable
*/
pred_point_threshold = 0.4
@@ -269,12 +270,12 @@ import cz.siret.prank.program.params.Params
out_prefix_date = false
/**
*
* Place all output files in this sub-directory of the output directory
*/
out_subdir = null
/**
* balance Connolly poscore weight by density
* Balance SAS point score weight by density (points in denser areas will have lower weight)
*/
balance_density = false
@@ -305,12 +306,12 @@ import cz.siret.prank.program.params.Params
plb_rescorer_atomic = false
/**
* stop processing the datsaset on the first unrecoverable error with a dataset item
* stop processing the dataset on the first unrecoverable error with a dataset item
*/
fail_fast = false
/**
* don't procuce prediction files for individual proteins (useful for long repetitive experments)
* don't produce prediction files for individual proteins (useful for long repetitive experiments)
*/
output_only_stats = false
}

View File

@@ -149,7 +149,8 @@ import cz.siret.prank.program.params.Params
solvent_radius = 1.6
/**
* SAS Points tessellation (~= density) used in prediction step
* SAS tessellation (~density) used in prediction step.
* Higher tessellation = higher density (+1 ~~ x4 points)
*/
tessellation = 2
@@ -277,12 +278,12 @@ import cz.siret.prank.program.params.Params
out_prefix_date = false
/**
*
* Place all output files in this sub-directory of the output directory
*/
out_subdir = null
/**
* balance Connolly points score weight by density
* Balance SAS point score weight by density (points in denser areas will have lower weight)
*/
balance_density = false
@@ -335,7 +336,7 @@ import cz.siret.prank.program.params.Params
plb_rescorer_atomic = false
/**
* stop processing the datsaset on the first unrecoverable error with a dataset item
* stop processing the dataset on the first unrecoverable error with a dataset item
*/
fail_fast = false

View File

@@ -24,7 +24,7 @@ import cz.siret.prank.program.params.Params
/**
* stop processing a datsaset on the first unrecoverable error with a dataset item
* stop processing a dataset on the first unrecoverable error with a dataset item
*/
fail_fast = true

25
distro/config/readme.md Normal file
View File

@@ -0,0 +1,25 @@
Config Directory
================
This is a directory with P2Rank config files.
Initially, P2Rank loads configuration from `default.groovy` (and from `default-rescore.groovy` in case you run `prank rescore ...`).
Parameters can be then overriden in a custom config file (`-c <config.file>`) or directly on the command line.
## Details
Parameters can be set in 2 ways:
1. on the command line `-<param_name> <value>`
2. in config groovy file specified with `-c <config.file>` (see working.groovy for an example... `prank -c working.groovy`).
Parameters on the command line override those in the config file, which override defaults.
Parameter application priority (last wins):
1. default values in the source code (`Params.groovy`)
2. defaults in `default.groovy`
3. (optionally) defaults in `default-rescore.groovy` only if you run `prank rescore ...`
4. parameters in custom config file `-c <config.file>`
5. parameters on the command line
To see comprehensive list of all possible params see Params.groovy in the source code:
https://github.com/rdk/p2rank/blob/master/src/main/groovy/cz/siret/prank/program/params/Params.groovy

View File

@@ -2,8 +2,10 @@
Directory with pre-trained classiers.
Prank looks here for model specified by (-model/-m) parameter.
Model should be used in combination with the parameters or config file that was used to train it.
The feature extraction has to be executed with the same parameters.
Model should be always used only in combination with the parameters or config file that was used to train it.
I.e., the feature extraction has to be executed with the same parameters.
## List of models
conservation.model ... for P2Rank (predictions), trained on bench-fpocket.ds dataset with config conservation.groovy
p2rank_a.model ... for P2Rank (predictions), trained on bench-fpocket.ds dataset with config default.groovy

View File

@@ -0,0 +1,3 @@
This directory contains example input data.
See `doc/dataset-file-format.md` and comments in example `*.ds` files.

View File

@@ -1 +0,0 @@
This directory contains example input data.

View File

@@ -16,7 +16,7 @@ This file should provide introduction for people who want to train and evaluate
## Parameters
P2Rank uses global static parameters object. In code it can be accessed with `Params.getInst()` or through `Parametrized` trait. For full list of parameters see `Params.groovy`.
P2Rank uses global static parameters object. In the code, it can be accessed with `Params.getInst()` or through `Parametrized` trait. For full list of parameters see `Params.groovy`.
Parameters can be set in 2 ways:
1. on the command line `-<param_name> <value>`