mirror of
https://github.com/rdk/p2rank.git
synced 2026-06-04 12:44:24 +08:00
improve readme files in distro
This commit is contained in:
@@ -1,6 +1,6 @@
|
||||
MIT License
|
||||
|
||||
Copyright (c) 2017 Radoslav Krivak, David Hoksza, Lukas Jendele and other contributors
|
||||
Copyright (c) 2017-2020 Radoslav Krivak, David Hoksza, Lukas Jendele, Petr Skoda and other contributors
|
||||
|
||||
Permission is hereby granted, free of charge, to any person obtaining a copy
|
||||
of this software and associated documentation files (the "Software"), to deal
|
||||
|
||||
32
README.md
32
README.md
@@ -36,9 +36,11 @@ P2Rank requires no installation. Binary packages can be downloaded from the proj
|
||||
|
||||
See more usage examples below...
|
||||
|
||||
### Compilation
|
||||
### Build
|
||||
|
||||
This project uses [Gradle](https://gradle.org/) build system. Build with `./make.sh` or `./gradlew assemble`.
|
||||
This project uses [Gradle](https://gradle.org/) build system via included Gradle wrapper.
|
||||
|
||||
Build with `./make.sh` or `./gradlew assemble`.
|
||||
|
||||
### Algorithm
|
||||
|
||||
@@ -52,7 +54,7 @@ If you use P2Rank, please cite relevant papers:
|
||||
|
||||
* [Software article](https://doi.org/10.1186/s13321-018-0285-8) in JChem about P2Rank pocket prediction tool
|
||||
Krivak R, Hoksza D. *P2Rank: machine learning based tool for rapid and accurate prediction of ligand binding sites from protein structure.* Journal of Cheminformatics. 2018 Aug.
|
||||
* [Web-server article](https://doi.org/10.1093/nar/gkz424) in NAR about the web interface accessible at [p2rank.cz](https://p2rank.cz)
|
||||
* [Web-server article](https://doi.org/10.1093/nar/gkz424) in NAR about the web interface accessible at [prankweb.cz](https://prankweb.cz)
|
||||
Jendele L, Krivak R, Skoda P, Novotny M, Hoksza D. *PrankWeb: a web server for ligand binding site prediction and visualization.* Nucleic Acids Research, Volume 47, Issue W1, 02 July 2019, Pages W345–W349
|
||||
* [Conference paper](https://doi.org/10.1007/978-3-319-21233-3_4) inroducing P2Rank prediction algorithm
|
||||
Krivak R, Hoksza D. *P2RANK: Knowledge-Based Ligand Binding Site Prediction Using Aggregated Local Features.* InInternational Conference on Algorithms for Computational Biology 2015 Aug 4 (pp. 41-52). Springer
|
||||
@@ -68,13 +70,13 @@ Following commands can be executed in the installation directory.
|
||||
|
||||
### Print help
|
||||
|
||||
~~~sh
|
||||
~~~bash
|
||||
prank help
|
||||
~~~
|
||||
|
||||
### Predict ligand binding sites (P2Rank algorithm)
|
||||
|
||||
~~~sh
|
||||
~~~bash
|
||||
prank predict test.ds # run on whole dataset (containing list of pdb files)
|
||||
|
||||
prank predict -f test_data/1fbl.pdb # run on single pdb file
|
||||
@@ -89,16 +91,16 @@ prank predict -c predict2.groovy test.ds # specify configuration file (
|
||||
### Evaluate prediction model
|
||||
...on a file or a dataset with known ligands.
|
||||
|
||||
~~~sh
|
||||
~~~bash
|
||||
prank eval-predict -f test_data/1fbl.pdb
|
||||
prank eval-predict test.ds
|
||||
~~~
|
||||
|
||||
### Prediction output
|
||||
|
||||
For each file in the dataset P2Rank produces produces several output files:
|
||||
For each file in the dataset P2Rank produces several output files:
|
||||
* `<pdb_file_name>_predictions.csv`: contains an ordered list of predicted pockets, their scores, coordinates
|
||||
of their centers together with a list of adjacent residues and a list of adjacent protein surface atoms
|
||||
of their centers together with a list of adjacent residues, and a list of adjacent protein surface atoms
|
||||
* `<pdb_file_name>_residues.csv`: contains list of all residues from the input protein with their scores,
|
||||
mapping to predicted pockets and calibrated probability of being a ligand-binding residue
|
||||
* PyMol visualization (`.pml` script with data files)
|
||||
@@ -111,7 +113,7 @@ prank eval-predict test.ds
|
||||
|
||||
You can override the default params with a custom config file:
|
||||
|
||||
~~~sh
|
||||
~~~bash
|
||||
prank predict -c config/example.groovy test.ds
|
||||
prank predict -c example.groovy test.ds
|
||||
~~~
|
||||
@@ -119,7 +121,7 @@ prank predict -c example.groovy test.ds
|
||||
|
||||
It is also possible to override the default params on the command line using their full name. To see complete list of params look into `config/default.groovy`.
|
||||
|
||||
~~~sh
|
||||
~~~bash
|
||||
prank predict -seed 151 -threads 8 test.ds
|
||||
prank predict -c example.groovy -seed 151 -threads 8 test.ds
|
||||
~~~
|
||||
@@ -130,7 +132,7 @@ In addition to predicting new ligand binding sites,
|
||||
P2Rank is also able to rescore pockets predicted by other methods
|
||||
(Fpocket, ConCavity, SiteHound, MetaPocket2, LISE and DeepSite are supported at the moment).
|
||||
|
||||
~~~sh
|
||||
~~~bash
|
||||
prank rescore test_data/fpocket.ds
|
||||
prank rescore fpocket.ds # test_data/ is default 'dataset_base_dir'
|
||||
prank rescore fpocket.ds -o output_dir # test_output/ is default 'output_base_dir'
|
||||
@@ -144,20 +146,20 @@ prank eval-rescore fpocket.ds
|
||||
|
||||
## Comparison with Fpocket
|
||||
|
||||
[Fpocket](http://fpocket.sourceforge.net/) is widely used open source ligand binding site prediction program.
|
||||
[Fpocket](https://github.com/Discngine/fpocket) is widely used open source ligand binding site prediction program.
|
||||
It is fast, easy to use and well documented. As such, it was a great inspiration for this project.
|
||||
Fpocket is written in C, and it is based on a different geometric algorithm.
|
||||
|
||||
Some practical differences:
|
||||
|
||||
* Fpocket
|
||||
* **Fpocket**
|
||||
- has much smaller memory footprint
|
||||
- runs faster when executed on a single protein
|
||||
- produces a high number of less relevant pockets (and since the default scoring function isn't very effective the most relevant pockets often doesn't get to the top)
|
||||
- contains MDpocket algorithm for pocket predictions from molecular trajectories
|
||||
- still better documented
|
||||
* P2Rank
|
||||
- achieves significantly better identification success rates when considering top-ranked pockets
|
||||
* **P2Rank**
|
||||
- achieves significantly higher identification success rates when considering top-ranked pockets
|
||||
- produces smaller number of more relevant pockets
|
||||
- speed:
|
||||
+ slower when running on a single protein (due to JVM startup cost)
|
||||
|
||||
@@ -1,6 +1,6 @@
|
||||
MIT License
|
||||
|
||||
Copyright (c) 2017 Radoslav Krivak, David Hoksza, Lukas Jendele and other contributors
|
||||
Copyright (c) 2017-2020 Radoslav Krivak, David Hoksza, Lukas Jendele, Petr Skoda and other contributors
|
||||
|
||||
Permission is hereby granted, free of charge, to any person obtaining a copy
|
||||
of this software and associated documentation files (the "Software"), to deal
|
||||
|
||||
@@ -36,9 +36,11 @@ P2Rank requires no installation. Binary packages can be downloaded from the proj
|
||||
|
||||
See more usage examples below...
|
||||
|
||||
### Compilation
|
||||
### Build
|
||||
|
||||
This project uses [Gradle](https://gradle.org/) build system. Build with `./make.sh` or `./gradlew assemble`.
|
||||
This project uses [Gradle](https://gradle.org/) build system via included Gradle wrapper.
|
||||
|
||||
Build with `./make.sh` or `./gradlew assemble`.
|
||||
|
||||
### Algorithm
|
||||
|
||||
@@ -52,7 +54,7 @@ If you use P2Rank, please cite relevant papers:
|
||||
|
||||
* [Software article](https://doi.org/10.1186/s13321-018-0285-8) in JChem about P2Rank pocket prediction tool
|
||||
Krivak R, Hoksza D. *P2Rank: machine learning based tool for rapid and accurate prediction of ligand binding sites from protein structure.* Journal of Cheminformatics. 2018 Aug.
|
||||
* [Web-server article](https://doi.org/10.1093/nar/gkz424) in NAR about the web interface accessible at [p2rank.cz](https://p2rank.cz)
|
||||
* [Web-server article](https://doi.org/10.1093/nar/gkz424) in NAR about the web interface accessible at [prankweb.cz](https://prankweb.cz)
|
||||
Jendele L, Krivak R, Skoda P, Novotny M, Hoksza D. *PrankWeb: a web server for ligand binding site prediction and visualization.* Nucleic Acids Research, Volume 47, Issue W1, 02 July 2019, Pages W345–W349
|
||||
* [Conference paper](https://doi.org/10.1007/978-3-319-21233-3_4) inroducing P2Rank prediction algorithm
|
||||
Krivak R, Hoksza D. *P2RANK: Knowledge-Based Ligand Binding Site Prediction Using Aggregated Local Features.* InInternational Conference on Algorithms for Computational Biology 2015 Aug 4 (pp. 41-52). Springer
|
||||
@@ -68,13 +70,13 @@ Following commands can be executed in the installation directory.
|
||||
|
||||
### Print help
|
||||
|
||||
~~~sh
|
||||
~~~bash
|
||||
prank help
|
||||
~~~
|
||||
|
||||
### Predict ligand binding sites (P2Rank algorithm)
|
||||
|
||||
~~~sh
|
||||
~~~bash
|
||||
prank predict test.ds # run on whole dataset (containing list of pdb files)
|
||||
|
||||
prank predict -f test_data/1fbl.pdb # run on single pdb file
|
||||
@@ -89,16 +91,16 @@ prank predict -c predict2.groovy test.ds # specify configuration file (
|
||||
### Evaluate prediction model
|
||||
...on a file or a dataset with known ligands.
|
||||
|
||||
~~~sh
|
||||
~~~bash
|
||||
prank eval-predict -f test_data/1fbl.pdb
|
||||
prank eval-predict test.ds
|
||||
~~~
|
||||
|
||||
### Prediction output
|
||||
|
||||
For each file in the dataset P2Rank produces produces several output files:
|
||||
For each file in the dataset P2Rank produces several output files:
|
||||
* `<pdb_file_name>_predictions.csv`: contains an ordered list of predicted pockets, their scores, coordinates
|
||||
of their centers together with a list of adjacent residues and a list of adjacent protein surface atoms
|
||||
of their centers together with a list of adjacent residues, and a list of adjacent protein surface atoms
|
||||
* `<pdb_file_name>_residues.csv`: contains list of all residues from the input protein with their scores,
|
||||
mapping to predicted pockets and calibrated probability of being a ligand-binding residue
|
||||
* PyMol visualization (`.pml` script with data files)
|
||||
@@ -111,7 +113,7 @@ prank eval-predict test.ds
|
||||
|
||||
You can override the default params with a custom config file:
|
||||
|
||||
~~~sh
|
||||
~~~bash
|
||||
prank predict -c config/example.groovy test.ds
|
||||
prank predict -c example.groovy test.ds
|
||||
~~~
|
||||
@@ -119,7 +121,7 @@ prank predict -c example.groovy test.ds
|
||||
|
||||
It is also possible to override the default params on the command line using their full name. To see complete list of params look into `config/default.groovy`.
|
||||
|
||||
~~~sh
|
||||
~~~bash
|
||||
prank predict -seed 151 -threads 8 test.ds
|
||||
prank predict -c example.groovy -seed 151 -threads 8 test.ds
|
||||
~~~
|
||||
@@ -130,7 +132,7 @@ In addition to predicting new ligand binding sites,
|
||||
P2Rank is also able to rescore pockets predicted by other methods
|
||||
(Fpocket, ConCavity, SiteHound, MetaPocket2, LISE and DeepSite are supported at the moment).
|
||||
|
||||
~~~sh
|
||||
~~~bash
|
||||
prank rescore test_data/fpocket.ds
|
||||
prank rescore fpocket.ds # test_data/ is default 'dataset_base_dir'
|
||||
prank rescore fpocket.ds -o output_dir # test_output/ is default 'output_base_dir'
|
||||
@@ -144,20 +146,20 @@ prank eval-rescore fpocket.ds
|
||||
|
||||
## Comparison with Fpocket
|
||||
|
||||
[Fpocket](http://fpocket.sourceforge.net/) is widely used open source ligand binding site prediction program.
|
||||
[Fpocket](https://github.com/Discngine/fpocket) is widely used open source ligand binding site prediction program.
|
||||
It is fast, easy to use and well documented. As such, it was a great inspiration for this project.
|
||||
Fpocket is written in C, and it is based on a different geometric algorithm.
|
||||
|
||||
Some practical differences:
|
||||
|
||||
* Fpocket
|
||||
* **Fpocket**
|
||||
- has much smaller memory footprint
|
||||
- runs faster when executed on a single protein
|
||||
- produces a high number of less relevant pockets (and since the default scoring function isn't very effective the most relevant pockets often doesn't get to the top)
|
||||
- contains MDpocket algorithm for pocket predictions from molecular trajectories
|
||||
- still better documented
|
||||
* P2Rank
|
||||
- achieves significantly better identification success rates when considering top-ranked pockets
|
||||
* **P2Rank**
|
||||
- achieves significantly higher identification success rates when considering top-ranked pockets
|
||||
- produces smaller number of more relevant pockets
|
||||
- speed:
|
||||
+ slower when running on a single protein (due to JVM startup cost)
|
||||
|
||||
@@ -136,7 +136,7 @@ import cz.siret.prank.program.params.Params
|
||||
point_sampler = "SurfacePointSampler"
|
||||
|
||||
/**
|
||||
* multiplier for random posampling
|
||||
* multiplier for random point sub/super-sampling
|
||||
*/
|
||||
sampling_multiplier = 3
|
||||
|
||||
@@ -146,12 +146,13 @@ import cz.siret.prank.program.params.Params
|
||||
solvent_radius = 1.6
|
||||
|
||||
/**
|
||||
* Connolly potessellation (~density) used in pradiction step
|
||||
* SAS tessellation (~density) used in prediction step.
|
||||
* Higher tessellation = higher density (+1 ~~ x4 points)
|
||||
*/
|
||||
tessellation = 2
|
||||
|
||||
/**
|
||||
* Connolly potessellation (~density) used in training step
|
||||
* SAS tessellation (~density) used in training step
|
||||
*/
|
||||
train_tessellation = 2
|
||||
|
||||
@@ -224,7 +225,7 @@ import cz.siret.prank.program.params.Params
|
||||
vis_copy_proteins = true
|
||||
|
||||
/**
|
||||
* use sctrictly inner pocket points or more wider pocket neighbourhood
|
||||
* use strictly inner pocket points or more wider pocket neighbourhood
|
||||
*/
|
||||
strict_inner_points = false
|
||||
|
||||
@@ -244,7 +245,7 @@ import cz.siret.prank.program.params.Params
|
||||
predictions = false
|
||||
|
||||
/**
|
||||
* minimum ligandability score for Connolly poto be considered ligandable
|
||||
* minimum ligandability score for SAS point to be considered ligandable
|
||||
*/
|
||||
pred_point_threshold = 0.4
|
||||
|
||||
@@ -269,12 +270,12 @@ import cz.siret.prank.program.params.Params
|
||||
out_prefix_date = false
|
||||
|
||||
/**
|
||||
*
|
||||
* Place all output files in this sub-directory of the output directory
|
||||
*/
|
||||
out_subdir = null
|
||||
|
||||
/**
|
||||
* balance Connolly poscore weight by density
|
||||
* Balance SAS point score weight by density (points in denser areas will have lower weight)
|
||||
*/
|
||||
balance_density = false
|
||||
|
||||
@@ -305,12 +306,12 @@ import cz.siret.prank.program.params.Params
|
||||
plb_rescorer_atomic = false
|
||||
|
||||
/**
|
||||
* stop processing the datsaset on the first unrecoverable error with a dataset item
|
||||
* stop processing the dataset on the first unrecoverable error with a dataset item
|
||||
*/
|
||||
fail_fast = false
|
||||
|
||||
/**
|
||||
* don't procuce prediction files for individual proteins (useful for long repetitive experments)
|
||||
* don't produce prediction files for individual proteins (useful for long repetitive experiments)
|
||||
*/
|
||||
output_only_stats = false
|
||||
}
|
||||
|
||||
@@ -149,7 +149,8 @@ import cz.siret.prank.program.params.Params
|
||||
solvent_radius = 1.6
|
||||
|
||||
/**
|
||||
* SAS Points tessellation (~= density) used in prediction step
|
||||
* SAS tessellation (~density) used in prediction step.
|
||||
* Higher tessellation = higher density (+1 ~~ x4 points)
|
||||
*/
|
||||
tessellation = 2
|
||||
|
||||
@@ -277,12 +278,12 @@ import cz.siret.prank.program.params.Params
|
||||
out_prefix_date = false
|
||||
|
||||
/**
|
||||
*
|
||||
* Place all output files in this sub-directory of the output directory
|
||||
*/
|
||||
out_subdir = null
|
||||
|
||||
/**
|
||||
* balance Connolly points score weight by density
|
||||
* Balance SAS point score weight by density (points in denser areas will have lower weight)
|
||||
*/
|
||||
balance_density = false
|
||||
|
||||
@@ -335,7 +336,7 @@ import cz.siret.prank.program.params.Params
|
||||
plb_rescorer_atomic = false
|
||||
|
||||
/**
|
||||
* stop processing the datsaset on the first unrecoverable error with a dataset item
|
||||
* stop processing the dataset on the first unrecoverable error with a dataset item
|
||||
*/
|
||||
fail_fast = false
|
||||
|
||||
|
||||
@@ -24,7 +24,7 @@ import cz.siret.prank.program.params.Params
|
||||
|
||||
|
||||
/**
|
||||
* stop processing a datsaset on the first unrecoverable error with a dataset item
|
||||
* stop processing a dataset on the first unrecoverable error with a dataset item
|
||||
*/
|
||||
fail_fast = true
|
||||
|
||||
|
||||
25
distro/config/readme.md
Normal file
25
distro/config/readme.md
Normal file
@@ -0,0 +1,25 @@
|
||||
Config Directory
|
||||
================
|
||||
|
||||
This is a directory with P2Rank config files.
|
||||
|
||||
Initially, P2Rank loads configuration from `default.groovy` (and from `default-rescore.groovy` in case you run `prank rescore ...`).
|
||||
Parameters can be then overriden in a custom config file (`-c <config.file>`) or directly on the command line.
|
||||
|
||||
## Details
|
||||
|
||||
Parameters can be set in 2 ways:
|
||||
1. on the command line `-<param_name> <value>`
|
||||
2. in config groovy file specified with `-c <config.file>` (see working.groovy for an example... `prank -c working.groovy`).
|
||||
|
||||
Parameters on the command line override those in the config file, which override defaults.
|
||||
|
||||
Parameter application priority (last wins):
|
||||
1. default values in the source code (`Params.groovy`)
|
||||
2. defaults in `default.groovy`
|
||||
3. (optionally) defaults in `default-rescore.groovy` only if you run `prank rescore ...`
|
||||
4. parameters in custom config file `-c <config.file>`
|
||||
5. parameters on the command line
|
||||
|
||||
To see comprehensive list of all possible params see Params.groovy in the source code:
|
||||
https://github.com/rdk/p2rank/blob/master/src/main/groovy/cz/siret/prank/program/params/Params.groovy
|
||||
@@ -2,8 +2,10 @@
|
||||
Directory with pre-trained classiers.
|
||||
Prank looks here for model specified by (-model/-m) parameter.
|
||||
|
||||
Model should be used in combination with the parameters or config file that was used to train it.
|
||||
The feature extraction has to be executed with the same parameters.
|
||||
Model should be always used only in combination with the parameters or config file that was used to train it.
|
||||
I.e., the feature extraction has to be executed with the same parameters.
|
||||
|
||||
## List of models
|
||||
|
||||
conservation.model ... for P2Rank (predictions), trained on bench-fpocket.ds dataset with config conservation.groovy
|
||||
p2rank_a.model ... for P2Rank (predictions), trained on bench-fpocket.ds dataset with config default.groovy
|
||||
3
distro/test_data/readme.md
Normal file
3
distro/test_data/readme.md
Normal file
@@ -0,0 +1,3 @@
|
||||
This directory contains example input data.
|
||||
|
||||
See `doc/dataset-file-format.md` and comments in example `*.ds` files.
|
||||
@@ -1 +0,0 @@
|
||||
This directory contains example input data.
|
||||
@@ -16,7 +16,7 @@ This file should provide introduction for people who want to train and evaluate
|
||||
|
||||
## Parameters
|
||||
|
||||
P2Rank uses global static parameters object. In code it can be accessed with `Params.getInst()` or through `Parametrized` trait. For full list of parameters see `Params.groovy`.
|
||||
P2Rank uses global static parameters object. In the code, it can be accessed with `Params.getInst()` or through `Parametrized` trait. For full list of parameters see `Params.groovy`.
|
||||
|
||||
Parameters can be set in 2 ways:
|
||||
1. on the command line `-<param_name> <value>`
|
||||
|
||||
Reference in New Issue
Block a user