diff --git a/LICENSE.txt b/LICENSE.txt index 049c43ee..e4070d41 100644 --- a/LICENSE.txt +++ b/LICENSE.txt @@ -1,6 +1,6 @@ MIT License -Copyright (c) 2017 Radoslav Krivak, David Hoksza, Lukas Jendele and other contributors +Copyright (c) 2017-2020 Radoslav Krivak, David Hoksza, Lukas Jendele, Petr Skoda and other contributors Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal diff --git a/README.md b/README.md index 5e1034cc..029bf325 100644 --- a/README.md +++ b/README.md @@ -36,9 +36,11 @@ P2Rank requires no installation. Binary packages can be downloaded from the proj See more usage examples below... -### Compilation +### Build -This project uses [Gradle](https://gradle.org/) build system. Build with `./make.sh` or `./gradlew assemble`. +This project uses [Gradle](https://gradle.org/) build system via included Gradle wrapper. + +Build with `./make.sh` or `./gradlew assemble`. ### Algorithm @@ -52,7 +54,7 @@ If you use P2Rank, please cite relevant papers: * [Software article](https://doi.org/10.1186/s13321-018-0285-8) in JChem about P2Rank pocket prediction tool Krivak R, Hoksza D. *P2Rank: machine learning based tool for rapid and accurate prediction of ligand binding sites from protein structure.* Journal of Cheminformatics. 2018 Aug. -* [Web-server article](https://doi.org/10.1093/nar/gkz424) in NAR about the web interface accessible at [p2rank.cz](https://p2rank.cz) +* [Web-server article](https://doi.org/10.1093/nar/gkz424) in NAR about the web interface accessible at [prankweb.cz](https://prankweb.cz) Jendele L, Krivak R, Skoda P, Novotny M, Hoksza D. *PrankWeb: a web server for ligand binding site prediction and visualization.* Nucleic Acids Research, Volume 47, Issue W1, 02 July 2019, Pages W345–W349 * [Conference paper](https://doi.org/10.1007/978-3-319-21233-3_4) inroducing P2Rank prediction algorithm Krivak R, Hoksza D. *P2RANK: Knowledge-Based Ligand Binding Site Prediction Using Aggregated Local Features.* InInternational Conference on Algorithms for Computational Biology 2015 Aug 4 (pp. 41-52). Springer @@ -68,13 +70,13 @@ Following commands can be executed in the installation directory. ### Print help -~~~sh +~~~bash prank help ~~~ ### Predict ligand binding sites (P2Rank algorithm) -~~~sh +~~~bash prank predict test.ds # run on whole dataset (containing list of pdb files) prank predict -f test_data/1fbl.pdb # run on single pdb file @@ -89,16 +91,16 @@ prank predict -c predict2.groovy test.ds # specify configuration file ( ### Evaluate prediction model ...on a file or a dataset with known ligands. -~~~sh +~~~bash prank eval-predict -f test_data/1fbl.pdb prank eval-predict test.ds ~~~ ### Prediction output - For each file in the dataset P2Rank produces produces several output files: + For each file in the dataset P2Rank produces several output files: * `_predictions.csv`: contains an ordered list of predicted pockets, their scores, coordinates - of their centers together with a list of adjacent residues and a list of adjacent protein surface atoms + of their centers together with a list of adjacent residues, and a list of adjacent protein surface atoms * `_residues.csv`: contains list of all residues from the input protein with their scores, mapping to predicted pockets and calibrated probability of being a ligand-binding residue * PyMol visualization (`.pml` script with data files) @@ -111,7 +113,7 @@ prank eval-predict test.ds You can override the default params with a custom config file: -~~~sh +~~~bash prank predict -c config/example.groovy test.ds prank predict -c example.groovy test.ds ~~~ @@ -119,7 +121,7 @@ prank predict -c example.groovy test.ds It is also possible to override the default params on the command line using their full name. To see complete list of params look into `config/default.groovy`. -~~~sh +~~~bash prank predict -seed 151 -threads 8 test.ds prank predict -c example.groovy -seed 151 -threads 8 test.ds ~~~ @@ -130,7 +132,7 @@ In addition to predicting new ligand binding sites, P2Rank is also able to rescore pockets predicted by other methods (Fpocket, ConCavity, SiteHound, MetaPocket2, LISE and DeepSite are supported at the moment). -~~~sh +~~~bash prank rescore test_data/fpocket.ds prank rescore fpocket.ds # test_data/ is default 'dataset_base_dir' prank rescore fpocket.ds -o output_dir # test_output/ is default 'output_base_dir' @@ -144,20 +146,20 @@ prank eval-rescore fpocket.ds ## Comparison with Fpocket -[Fpocket](http://fpocket.sourceforge.net/) is widely used open source ligand binding site prediction program. +[Fpocket](https://github.com/Discngine/fpocket) is widely used open source ligand binding site prediction program. It is fast, easy to use and well documented. As such, it was a great inspiration for this project. Fpocket is written in C, and it is based on a different geometric algorithm. Some practical differences: -* Fpocket +* **Fpocket** - has much smaller memory footprint - runs faster when executed on a single protein - produces a high number of less relevant pockets (and since the default scoring function isn't very effective the most relevant pockets often doesn't get to the top) - contains MDpocket algorithm for pocket predictions from molecular trajectories - still better documented -* P2Rank - - achieves significantly better identification success rates when considering top-ranked pockets +* **P2Rank** + - achieves significantly higher identification success rates when considering top-ranked pockets - produces smaller number of more relevant pockets - speed: + slower when running on a single protein (due to JVM startup cost) diff --git a/distro/LICENSE.txt b/distro/LICENSE.txt index 049c43ee..e4070d41 100644 --- a/distro/LICENSE.txt +++ b/distro/LICENSE.txt @@ -1,6 +1,6 @@ MIT License -Copyright (c) 2017 Radoslav Krivak, David Hoksza, Lukas Jendele and other contributors +Copyright (c) 2017-2020 Radoslav Krivak, David Hoksza, Lukas Jendele, Petr Skoda and other contributors Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal diff --git a/distro/README.md b/distro/README.md index 5e1034cc..029bf325 100644 --- a/distro/README.md +++ b/distro/README.md @@ -36,9 +36,11 @@ P2Rank requires no installation. Binary packages can be downloaded from the proj See more usage examples below... -### Compilation +### Build -This project uses [Gradle](https://gradle.org/) build system. Build with `./make.sh` or `./gradlew assemble`. +This project uses [Gradle](https://gradle.org/) build system via included Gradle wrapper. + +Build with `./make.sh` or `./gradlew assemble`. ### Algorithm @@ -52,7 +54,7 @@ If you use P2Rank, please cite relevant papers: * [Software article](https://doi.org/10.1186/s13321-018-0285-8) in JChem about P2Rank pocket prediction tool Krivak R, Hoksza D. *P2Rank: machine learning based tool for rapid and accurate prediction of ligand binding sites from protein structure.* Journal of Cheminformatics. 2018 Aug. -* [Web-server article](https://doi.org/10.1093/nar/gkz424) in NAR about the web interface accessible at [p2rank.cz](https://p2rank.cz) +* [Web-server article](https://doi.org/10.1093/nar/gkz424) in NAR about the web interface accessible at [prankweb.cz](https://prankweb.cz) Jendele L, Krivak R, Skoda P, Novotny M, Hoksza D. *PrankWeb: a web server for ligand binding site prediction and visualization.* Nucleic Acids Research, Volume 47, Issue W1, 02 July 2019, Pages W345–W349 * [Conference paper](https://doi.org/10.1007/978-3-319-21233-3_4) inroducing P2Rank prediction algorithm Krivak R, Hoksza D. *P2RANK: Knowledge-Based Ligand Binding Site Prediction Using Aggregated Local Features.* InInternational Conference on Algorithms for Computational Biology 2015 Aug 4 (pp. 41-52). Springer @@ -68,13 +70,13 @@ Following commands can be executed in the installation directory. ### Print help -~~~sh +~~~bash prank help ~~~ ### Predict ligand binding sites (P2Rank algorithm) -~~~sh +~~~bash prank predict test.ds # run on whole dataset (containing list of pdb files) prank predict -f test_data/1fbl.pdb # run on single pdb file @@ -89,16 +91,16 @@ prank predict -c predict2.groovy test.ds # specify configuration file ( ### Evaluate prediction model ...on a file or a dataset with known ligands. -~~~sh +~~~bash prank eval-predict -f test_data/1fbl.pdb prank eval-predict test.ds ~~~ ### Prediction output - For each file in the dataset P2Rank produces produces several output files: + For each file in the dataset P2Rank produces several output files: * `_predictions.csv`: contains an ordered list of predicted pockets, their scores, coordinates - of their centers together with a list of adjacent residues and a list of adjacent protein surface atoms + of their centers together with a list of adjacent residues, and a list of adjacent protein surface atoms * `_residues.csv`: contains list of all residues from the input protein with their scores, mapping to predicted pockets and calibrated probability of being a ligand-binding residue * PyMol visualization (`.pml` script with data files) @@ -111,7 +113,7 @@ prank eval-predict test.ds You can override the default params with a custom config file: -~~~sh +~~~bash prank predict -c config/example.groovy test.ds prank predict -c example.groovy test.ds ~~~ @@ -119,7 +121,7 @@ prank predict -c example.groovy test.ds It is also possible to override the default params on the command line using their full name. To see complete list of params look into `config/default.groovy`. -~~~sh +~~~bash prank predict -seed 151 -threads 8 test.ds prank predict -c example.groovy -seed 151 -threads 8 test.ds ~~~ @@ -130,7 +132,7 @@ In addition to predicting new ligand binding sites, P2Rank is also able to rescore pockets predicted by other methods (Fpocket, ConCavity, SiteHound, MetaPocket2, LISE and DeepSite are supported at the moment). -~~~sh +~~~bash prank rescore test_data/fpocket.ds prank rescore fpocket.ds # test_data/ is default 'dataset_base_dir' prank rescore fpocket.ds -o output_dir # test_output/ is default 'output_base_dir' @@ -144,20 +146,20 @@ prank eval-rescore fpocket.ds ## Comparison with Fpocket -[Fpocket](http://fpocket.sourceforge.net/) is widely used open source ligand binding site prediction program. +[Fpocket](https://github.com/Discngine/fpocket) is widely used open source ligand binding site prediction program. It is fast, easy to use and well documented. As such, it was a great inspiration for this project. Fpocket is written in C, and it is based on a different geometric algorithm. Some practical differences: -* Fpocket +* **Fpocket** - has much smaller memory footprint - runs faster when executed on a single protein - produces a high number of less relevant pockets (and since the default scoring function isn't very effective the most relevant pockets often doesn't get to the top) - contains MDpocket algorithm for pocket predictions from molecular trajectories - still better documented -* P2Rank - - achieves significantly better identification success rates when considering top-ranked pockets +* **P2Rank** + - achieves significantly higher identification success rates when considering top-ranked pockets - produces smaller number of more relevant pockets - speed: + slower when running on a single protein (due to JVM startup cost) diff --git a/distro/config/default-rescore.groovy b/distro/config/default-rescore.groovy index 6c033da3..1e03a901 100644 --- a/distro/config/default-rescore.groovy +++ b/distro/config/default-rescore.groovy @@ -136,7 +136,7 @@ import cz.siret.prank.program.params.Params point_sampler = "SurfacePointSampler" /** - * multiplier for random posampling + * multiplier for random point sub/super-sampling */ sampling_multiplier = 3 @@ -146,12 +146,13 @@ import cz.siret.prank.program.params.Params solvent_radius = 1.6 /** - * Connolly potessellation (~density) used in pradiction step + * SAS tessellation (~density) used in prediction step. + * Higher tessellation = higher density (+1 ~~ x4 points) */ tessellation = 2 /** - * Connolly potessellation (~density) used in training step + * SAS tessellation (~density) used in training step */ train_tessellation = 2 @@ -224,7 +225,7 @@ import cz.siret.prank.program.params.Params vis_copy_proteins = true /** - * use sctrictly inner pocket points or more wider pocket neighbourhood + * use strictly inner pocket points or more wider pocket neighbourhood */ strict_inner_points = false @@ -244,7 +245,7 @@ import cz.siret.prank.program.params.Params predictions = false /** - * minimum ligandability score for Connolly poto be considered ligandable + * minimum ligandability score for SAS point to be considered ligandable */ pred_point_threshold = 0.4 @@ -269,12 +270,12 @@ import cz.siret.prank.program.params.Params out_prefix_date = false /** - * + * Place all output files in this sub-directory of the output directory */ out_subdir = null /** - * balance Connolly poscore weight by density + * Balance SAS point score weight by density (points in denser areas will have lower weight) */ balance_density = false @@ -305,12 +306,12 @@ import cz.siret.prank.program.params.Params plb_rescorer_atomic = false /** - * stop processing the datsaset on the first unrecoverable error with a dataset item + * stop processing the dataset on the first unrecoverable error with a dataset item */ fail_fast = false /** - * don't procuce prediction files for individual proteins (useful for long repetitive experments) + * don't produce prediction files for individual proteins (useful for long repetitive experiments) */ output_only_stats = false } diff --git a/distro/config/default.groovy b/distro/config/default.groovy index 5360de6e..3f5d492e 100644 --- a/distro/config/default.groovy +++ b/distro/config/default.groovy @@ -149,7 +149,8 @@ import cz.siret.prank.program.params.Params solvent_radius = 1.6 /** - * SAS Points tessellation (~= density) used in prediction step + * SAS tessellation (~density) used in prediction step. + * Higher tessellation = higher density (+1 ~~ x4 points) */ tessellation = 2 @@ -277,12 +278,12 @@ import cz.siret.prank.program.params.Params out_prefix_date = false /** - * + * Place all output files in this sub-directory of the output directory */ out_subdir = null /** - * balance Connolly points score weight by density + * Balance SAS point score weight by density (points in denser areas will have lower weight) */ balance_density = false @@ -335,7 +336,7 @@ import cz.siret.prank.program.params.Params plb_rescorer_atomic = false /** - * stop processing the datsaset on the first unrecoverable error with a dataset item + * stop processing the dataset on the first unrecoverable error with a dataset item */ fail_fast = false diff --git a/distro/config/dev.groovy b/distro/config/dev.groovy index 088e8e94..f3791cfa 100644 --- a/distro/config/dev.groovy +++ b/distro/config/dev.groovy @@ -24,7 +24,7 @@ import cz.siret.prank.program.params.Params /** - * stop processing a datsaset on the first unrecoverable error with a dataset item + * stop processing a dataset on the first unrecoverable error with a dataset item */ fail_fast = true diff --git a/distro/config/readme.md b/distro/config/readme.md new file mode 100644 index 00000000..5fe32c0d --- /dev/null +++ b/distro/config/readme.md @@ -0,0 +1,25 @@ +Config Directory +================ + +This is a directory with P2Rank config files. + +Initially, P2Rank loads configuration from `default.groovy` (and from `default-rescore.groovy` in case you run `prank rescore ...`). +Parameters can be then overriden in a custom config file (`-c `) or directly on the command line. + +## Details + +Parameters can be set in 2 ways: +1. on the command line `- ` +2. in config groovy file specified with `-c ` (see working.groovy for an example... `prank -c working.groovy`). + +Parameters on the command line override those in the config file, which override defaults. + +Parameter application priority (last wins): +1. default values in the source code (`Params.groovy`) +2. defaults in `default.groovy` +3. (optionally) defaults in `default-rescore.groovy` only if you run `prank rescore ...` +4. parameters in custom config file `-c ` +5. parameters on the command line + +To see comprehensive list of all possible params see Params.groovy in the source code: +https://github.com/rdk/p2rank/blob/master/src/main/groovy/cz/siret/prank/program/params/Params.groovy diff --git a/distro/config/readme.txt b/distro/config/readme.txt deleted file mode 100644 index e69de29b..00000000 diff --git a/distro/models/readme.txt b/distro/models/readme.md similarity index 69% rename from distro/models/readme.txt rename to distro/models/readme.md index 575134de..e0c7e3a9 100644 --- a/distro/models/readme.txt +++ b/distro/models/readme.md @@ -2,8 +2,10 @@ Directory with pre-trained classiers. Prank looks here for model specified by (-model/-m) parameter. -Model should be used in combination with the parameters or config file that was used to train it. -The feature extraction has to be executed with the same parameters. +Model should be always used only in combination with the parameters or config file that was used to train it. +I.e., the feature extraction has to be executed with the same parameters. + +## List of models conservation.model ... for P2Rank (predictions), trained on bench-fpocket.ds dataset with config conservation.groovy p2rank_a.model ... for P2Rank (predictions), trained on bench-fpocket.ds dataset with config default.groovy diff --git a/distro/models/score/residue/readme.txt b/distro/models/score/residue/readme.md similarity index 100% rename from distro/models/score/residue/readme.txt rename to distro/models/score/residue/readme.md diff --git a/distro/test_data/readme.md b/distro/test_data/readme.md new file mode 100644 index 00000000..a703af72 --- /dev/null +++ b/distro/test_data/readme.md @@ -0,0 +1,3 @@ +This directory contains example input data. + +See `doc/dataset-file-format.md` and comments in example `*.ds` files. diff --git a/distro/test_data/readme.txt b/distro/test_data/readme.txt deleted file mode 100644 index 64dfb3a8..00000000 --- a/distro/test_data/readme.txt +++ /dev/null @@ -1 +0,0 @@ -This directory contains example input data. diff --git a/misc/tutorials/training-tutorial.md b/misc/tutorials/training-tutorial.md index 262205be..c3aa18c7 100644 --- a/misc/tutorials/training-tutorial.md +++ b/misc/tutorials/training-tutorial.md @@ -16,7 +16,7 @@ This file should provide introduction for people who want to train and evaluate ## Parameters -P2Rank uses global static parameters object. In code it can be accessed with `Params.getInst()` or through `Parametrized` trait. For full list of parameters see `Params.groovy`. +P2Rank uses global static parameters object. In the code, it can be accessed with `Params.getInst()` or through `Parametrized` trait. For full list of parameters see `Params.groovy`. Parameters can be set in 2 ways: 1. on the command line `- `