improve readme files in distro

2026-06-04 12:44:24 +08:00 · 2020-11-05 17:52:16 +01:00
parent b9a2090599
commit 7ce58b285f
14 changed files with 85 additions and 50 deletions
--- a/LICENSE.txt
+++ b/LICENSE.txt
@@ -1,6 +1,6 @@
 MIT License

-Copyright (c) 2017 Radoslav Krivak, David Hoksza, Lukas Jendele and other contributors
+Copyright (c) 2017-2020 Radoslav Krivak, David Hoksza, Lukas Jendele, Petr Skoda and other contributors

 Permission is hereby granted, free of charge, to any person obtaining a copy
 of this software and associated documentation files (the "Software"), to deal
--- a/README.md
+++ b/README.md
@@ -36,9 +36,11 @@ P2Rank requires no installation. Binary packages can be downloaded from the proj

 See more usage examples below...

-### Compilation
+### Build

-This project uses [Gradle](https://gradle.org/) build system. Build with `./make.sh` or `./gradlew assemble`.
+This project uses [Gradle](https://gradle.org/) build system via included Gradle wrapper. 
+
+Build with `./make.sh` or `./gradlew assemble`.

 ### Algorithm

@@ -52,7 +54,7 @@ If you use P2Rank, please cite relevant papers:

 * [Software article](https://doi.org/10.1186/s13321-018-0285-8) in JChem about P2Rank pocket prediction tool  
 Krivak R, Hoksza D. *P2Rank: machine learning based tool for rapid and accurate prediction of ligand binding sites from protein structure.* Journal of Cheminformatics. 2018 Aug.
-* [Web-server article](https://doi.org/10.1093/nar/gkz424) in NAR about the web interface accessible at [p2rank.cz](https://p2rank.cz)  
+* [Web-server article](https://doi.org/10.1093/nar/gkz424) in NAR about the web interface accessible at [prankweb.cz](https://prankweb.cz)  
 Jendele L, Krivak R, Skoda P, Novotny M, Hoksza D. *PrankWeb: a web server for ligand binding site prediction and visualization.* Nucleic Acids Research, Volume 47, Issue W1, 02 July 2019, Pages W345–W349 
 * [Conference paper](https://doi.org/10.1007/978-3-319-21233-3_4) inroducing P2Rank prediction algorithm  
 Krivak R, Hoksza D. *P2RANK: Knowledge-Based Ligand Binding Site Prediction Using Aggregated Local Features.* InInternational Conference on Algorithms for Computational Biology 2015 Aug 4 (pp. 41-52). Springer
@@ -68,13 +70,13 @@ Following commands can be executed in the installation directory.

 ### Print help

-~~~sh
+~~~bash
 prank help
 ~~~

 ### Predict ligand binding sites (P2Rank algorithm)

-~~~sh
+~~~bash
 prank predict test.ds                             # run on whole dataset (containing list of pdb files)

 prank predict -f test_data/1fbl.pdb               # run on single pdb file
@@ -89,16 +91,16 @@ prank predict -c predict2.groovy  test.ds         # specify configuration file (
 ### Evaluate prediction model
 ...on a file or a dataset with known ligands.

-~~~sh
+~~~bash
 prank eval-predict -f test_data/1fbl.pdb
 prank eval-predict test.ds
 ~~~

 ### Prediction output 

-   For each file in the dataset P2Rank produces produces several output files:
+   For each file in the dataset P2Rank produces several output files:
   * `<pdb_file_name>_predictions.csv`: contains an ordered list of predicted pockets, their scores, coordinates 
-   of their centers together with a list of adjacent residues and a list of adjacent protein surface atoms
+   of their centers together with a list of adjacent residues, and a list of adjacent protein surface atoms
   * `<pdb_file_name>_residues.csv`: contains list of all residues from the input protein with their scores, 
   mapping to predicted pockets and calibrated probability of being a ligand-binding residue
   * PyMol visualization (`.pml` script with data files) 
@@ -111,7 +113,7 @@ prank eval-predict test.ds

 You can override the default params with a custom config file:

-~~~sh
+~~~bash
 prank predict -c config/example.groovy  test.ds
 prank predict -c example.groovy         test.ds
 ~~~
@@ -119,7 +121,7 @@ prank predict -c example.groovy         test.ds

 It is also possible to override the default params on the command line using their full name. To see complete list of params look into `config/default.groovy`.

-~~~sh
+~~~bash
 prank predict                   -seed 151 -threads 8  test.ds
 prank predict -c example.groovy -seed 151 -threads 8  test.ds
 ~~~
@@ -130,7 +132,7 @@ In addition to predicting new ligand binding sites,
 P2Rank is also able to rescore pockets predicted by other methods 
 (Fpocket, ConCavity, SiteHound, MetaPocket2, LISE and DeepSite are supported at the moment).

-~~~sh
+~~~bash
 prank rescore test_data/fpocket.ds
 prank rescore fpocket.ds                 # test_data/ is default 'dataset_base_dir'
 prank rescore fpocket.ds -o output_dir   # test_output/ is default 'output_base_dir'
@@ -144,20 +146,20 @@ prank eval-rescore fpocket.ds

 ## Comparison with Fpocket

-[Fpocket](http://fpocket.sourceforge.net/) is widely used open source ligand binding site prediction program.
+[Fpocket](https://github.com/Discngine/fpocket) is widely used open source ligand binding site prediction program.
 It is fast, easy to use and well documented. As such, it was a great inspiration for this project.
 Fpocket is written in C, and it is based on a different geometric algorithm.

 Some practical differences:

-* Fpocket
+* **Fpocket**
    - has much smaller memory footprint 
    - runs faster when executed on a single protein
    - produces a high number of less relevant pockets (and since the default scoring function isn't very effective the most relevant pockets often doesn't get to the top)
    - contains MDpocket algorithm for pocket predictions from molecular trajectories 
    - still better documented
-* P2Rank 
-    - achieves significantly better identification success rates when considering top-ranked pockets
+* **P2Rank** 
+    - achieves significantly higher identification success rates when considering top-ranked pockets
    - produces smaller number of more relevant pockets
    - speed:
        + slower when running on a single protein (due to JVM startup cost)
--- a/distro/LICENSE.txt
+++ b/distro/LICENSE.txt
@@ -1,6 +1,6 @@
 MIT License

-Copyright (c) 2017 Radoslav Krivak, David Hoksza, Lukas Jendele and other contributors
+Copyright (c) 2017-2020 Radoslav Krivak, David Hoksza, Lukas Jendele, Petr Skoda and other contributors

 Permission is hereby granted, free of charge, to any person obtaining a copy
 of this software and associated documentation files (the "Software"), to deal
--- a/distro/README.md
+++ b/distro/README.md
@@ -36,9 +36,11 @@ P2Rank requires no installation. Binary packages can be downloaded from the proj

 See more usage examples below...

-### Compilation
+### Build

-This project uses [Gradle](https://gradle.org/) build system. Build with `./make.sh` or `./gradlew assemble`.
+This project uses [Gradle](https://gradle.org/) build system via included Gradle wrapper. 
+
+Build with `./make.sh` or `./gradlew assemble`.

 ### Algorithm

@@ -52,7 +54,7 @@ If you use P2Rank, please cite relevant papers:

 * [Software article](https://doi.org/10.1186/s13321-018-0285-8) in JChem about P2Rank pocket prediction tool  
 Krivak R, Hoksza D. *P2Rank: machine learning based tool for rapid and accurate prediction of ligand binding sites from protein structure.* Journal of Cheminformatics. 2018 Aug.
-* [Web-server article](https://doi.org/10.1093/nar/gkz424) in NAR about the web interface accessible at [p2rank.cz](https://p2rank.cz)  
+* [Web-server article](https://doi.org/10.1093/nar/gkz424) in NAR about the web interface accessible at [prankweb.cz](https://prankweb.cz)  
 Jendele L, Krivak R, Skoda P, Novotny M, Hoksza D. *PrankWeb: a web server for ligand binding site prediction and visualization.* Nucleic Acids Research, Volume 47, Issue W1, 02 July 2019, Pages W345–W349 
 * [Conference paper](https://doi.org/10.1007/978-3-319-21233-3_4) inroducing P2Rank prediction algorithm  
 Krivak R, Hoksza D. *P2RANK: Knowledge-Based Ligand Binding Site Prediction Using Aggregated Local Features.* InInternational Conference on Algorithms for Computational Biology 2015 Aug 4 (pp. 41-52). Springer
@@ -68,13 +70,13 @@ Following commands can be executed in the installation directory.

 ### Print help

-~~~sh
+~~~bash
 prank help
 ~~~

 ### Predict ligand binding sites (P2Rank algorithm)

-~~~sh
+~~~bash
 prank predict test.ds                             # run on whole dataset (containing list of pdb files)

 prank predict -f test_data/1fbl.pdb               # run on single pdb file
@@ -89,16 +91,16 @@ prank predict -c predict2.groovy  test.ds         # specify configuration file (
 ### Evaluate prediction model
 ...on a file or a dataset with known ligands.

-~~~sh
+~~~bash
 prank eval-predict -f test_data/1fbl.pdb
 prank eval-predict test.ds
 ~~~

 ### Prediction output 

-   For each file in the dataset P2Rank produces produces several output files:
+   For each file in the dataset P2Rank produces several output files:
   * `<pdb_file_name>_predictions.csv`: contains an ordered list of predicted pockets, their scores, coordinates 
-   of their centers together with a list of adjacent residues and a list of adjacent protein surface atoms
+   of their centers together with a list of adjacent residues, and a list of adjacent protein surface atoms
   * `<pdb_file_name>_residues.csv`: contains list of all residues from the input protein with their scores, 
   mapping to predicted pockets and calibrated probability of being a ligand-binding residue
   * PyMol visualization (`.pml` script with data files) 
@@ -111,7 +113,7 @@ prank eval-predict test.ds

 You can override the default params with a custom config file:

-~~~sh
+~~~bash
 prank predict -c config/example.groovy  test.ds
 prank predict -c example.groovy         test.ds
 ~~~
@@ -119,7 +121,7 @@ prank predict -c example.groovy         test.ds

 It is also possible to override the default params on the command line using their full name. To see complete list of params look into `config/default.groovy`.

-~~~sh
+~~~bash
 prank predict                   -seed 151 -threads 8  test.ds
 prank predict -c example.groovy -seed 151 -threads 8  test.ds
 ~~~
@@ -130,7 +132,7 @@ In addition to predicting new ligand binding sites,
 P2Rank is also able to rescore pockets predicted by other methods 
 (Fpocket, ConCavity, SiteHound, MetaPocket2, LISE and DeepSite are supported at the moment).

-~~~sh
+~~~bash
 prank rescore test_data/fpocket.ds
 prank rescore fpocket.ds                 # test_data/ is default 'dataset_base_dir'
 prank rescore fpocket.ds -o output_dir   # test_output/ is default 'output_base_dir'
@@ -144,20 +146,20 @@ prank eval-rescore fpocket.ds

 ## Comparison with Fpocket

-[Fpocket](http://fpocket.sourceforge.net/) is widely used open source ligand binding site prediction program.
+[Fpocket](https://github.com/Discngine/fpocket) is widely used open source ligand binding site prediction program.
 It is fast, easy to use and well documented. As such, it was a great inspiration for this project.
 Fpocket is written in C, and it is based on a different geometric algorithm.

 Some practical differences:

-* Fpocket
+* **Fpocket**
    - has much smaller memory footprint 
    - runs faster when executed on a single protein
    - produces a high number of less relevant pockets (and since the default scoring function isn't very effective the most relevant pockets often doesn't get to the top)
    - contains MDpocket algorithm for pocket predictions from molecular trajectories 
    - still better documented
-* P2Rank 
-    - achieves significantly better identification success rates when considering top-ranked pockets
+* **P2Rank** 
+    - achieves significantly higher identification success rates when considering top-ranked pockets
    - produces smaller number of more relevant pockets
    - speed:
        + slower when running on a single protein (due to JVM startup cost)
--- a/distro/config/default-rescore.groovy
+++ b/distro/config/default-rescore.groovy
@@ -136,7 +136,7 @@ import cz.siret.prank.program.params.Params
    point_sampler = "SurfacePointSampler"

    /**
-     * multiplier for random posampling
+     * multiplier for random point sub/super-sampling
     */
    sampling_multiplier = 3

@@ -146,12 +146,13 @@ import cz.siret.prank.program.params.Params
    solvent_radius = 1.6

    /**
-     * Connolly potessellation (~density) used in pradiction step
+     * SAS tessellation (~density) used in prediction step.
+     * Higher tessellation = higher density (+1 ~~ x4 points)
     */
    tessellation = 2

    /**
-     * Connolly potessellation (~density) used in training step
+     * SAS tessellation (~density) used in training step
     */
    train_tessellation = 2

@@ -224,7 +225,7 @@ import cz.siret.prank.program.params.Params
    vis_copy_proteins = true

    /**
-     * use sctrictly inner pocket points or more wider pocket neighbourhood
+     * use strictly inner pocket points or more wider pocket neighbourhood
     */
    strict_inner_points = false

@@ -244,7 +245,7 @@ import cz.siret.prank.program.params.Params
    predictions = false

    /**
-     * minimum ligandability score for Connolly poto be considered ligandable
+     * minimum ligandability score for SAS point to be considered ligandable
     */
    pred_point_threshold = 0.4

@@ -269,12 +270,12 @@ import cz.siret.prank.program.params.Params
    out_prefix_date = false

    /**
-     *
+     * Place all output files in this sub-directory of the output directory
     */
    out_subdir = null

    /**
-     * balance Connolly poscore weight by density
+     * Balance SAS point score weight by density (points in denser areas will have lower weight)
     */
    balance_density = false

@@ -305,12 +306,12 @@ import cz.siret.prank.program.params.Params
    plb_rescorer_atomic = false

    /**
-     * stop processing the datsaset on the first unrecoverable error with a dataset item
+     * stop processing the dataset on the first unrecoverable error with a dataset item
     */
    fail_fast = false

    /**
-     * don't procuce prediction files for individual proteins (useful for long repetitive experments)
+     * don't produce prediction files for individual proteins (useful for long repetitive experiments)
     */
    output_only_stats = false
 }
--- a/distro/config/default.groovy
+++ b/distro/config/default.groovy
@@ -149,7 +149,8 @@ import cz.siret.prank.program.params.Params
    solvent_radius = 1.6

    /**
-     * SAS Points tessellation (~= density) used in prediction step
+     * SAS tessellation (~density) used in prediction step.
+     * Higher tessellation = higher density (+1 ~~ x4 points)
     */
    tessellation = 2

@@ -277,12 +278,12 @@ import cz.siret.prank.program.params.Params
    out_prefix_date = false

    /**
-     *
+     * Place all output files in this sub-directory of the output directory
     */
    out_subdir = null

    /**
-     * balance Connolly points score weight by density
+     * Balance SAS point score weight by density (points in denser areas will have lower weight)
     */
    balance_density = false

@@ -335,7 +336,7 @@ import cz.siret.prank.program.params.Params
    plb_rescorer_atomic = false

    /**
-     * stop processing the datsaset on the first unrecoverable error with a dataset item
+     * stop processing the dataset on the first unrecoverable error with a dataset item
     */
    fail_fast = false

--- a/distro/config/dev.groovy
+++ b/distro/config/dev.groovy
@@ -24,7 +24,7 @@ import cz.siret.prank.program.params.Params


    /**
-     * stop processing a datsaset on the first unrecoverable error with a dataset item
+     * stop processing a dataset on the first unrecoverable error with a dataset item
     */
    fail_fast = true

--- a/distro/config/readme.md
+++ b/distro/config/readme.md
@@ -0,0 +1,25 @@
+Config Directory
+================
+
+This is a directory with P2Rank config files.
+
+Initially, P2Rank loads configuration from `default.groovy` (and from `default-rescore.groovy` in case you run `prank rescore ...`).
+Parameters can be then overriden in a custom config file (`-c <config.file>`) or directly on the command line.
+
+## Details
+
+Parameters can be set in 2 ways:
+1. on the command line `-<param_name> <value>`
+2. in config groovy file specified with `-c <config.file>` (see working.groovy for an example... `prank -c working.groovy`). 
+
+Parameters on the command line override those in the config file, which override defaults.
+
+Parameter application priority (last wins):
+1. default values in the source code (`Params.groovy`)
+2. defaults in `default.groovy`
+3. (optionally) defaults in `default-rescore.groovy` only if you run `prank rescore ...`
+4. parameters in custom config file `-c <config.file>`
+5. parameters on the command line
+
+To see comprehensive list of all possible params see Params.groovy in the source code:
+https://github.com/rdk/p2rank/blob/master/src/main/groovy/cz/siret/prank/program/params/Params.groovy
--- a/distro/config/readme.txt
+++ b/distro/config/readme.txt
--- a/distro/models/readme.txt
+++ b/distro/models/readme.txt
@@ -2,8 +2,10 @@
 Directory with pre-trained classiers.
 Prank looks here for model specified by (-model/-m) parameter.

-Model should be used in combination with the parameters or config file that was used to train it.
-The feature extraction has to be executed with the same parameters.
+Model should be always used only in combination with the parameters or config file that was used to train it.
+I.e., the feature extraction has to be executed with the same parameters.
+
+## List of models

 conservation.model  ... for P2Rank (predictions), trained on bench-fpocket.ds dataset with config conservation.groovy
 p2rank_a.model      ... for P2Rank (predictions), trained on bench-fpocket.ds dataset with config default.groovy
--- a/distro/models/score/residue/readme.txt
+++ b/distro/models/score/residue/readme.txt
--- a/distro/test_data/readme.md
+++ b/distro/test_data/readme.md
@@ -0,0 +1,3 @@
+This directory contains example input data.
+
+See `doc/dataset-file-format.md` and comments in example `*.ds` files.
--- a/distro/test_data/readme.txt
+++ b/distro/test_data/readme.txt
@@ -1 +0,0 @@
-This directory contains example input data.
--- a/misc/tutorials/training-tutorial.md
+++ b/misc/tutorials/training-tutorial.md
@@ -16,7 +16,7 @@ This file should provide introduction for people who want to train and evaluate

 ## Parameters

-P2Rank uses global static parameters object. In code it can be accessed with `Params.getInst()` or through `Parametrized` trait. For full list of parameters see `Params.groovy`.
+P2Rank uses global static parameters object. In the code, it can be accessed with `Params.getInst()` or through `Parametrized` trait. For full list of parameters see `Params.groovy`.

 Parameters can be set in 2 ways:
 1. on the command line `-<param_name> <value>`
				`@@ -1 +0,0 @@`
				`This directory contains example input data.`