Compare commits

..

57 Commits
3.1 ... 3.1.4.1

Author SHA1 Message Date
Peter Schmidtke
39f2df9ba9 Merge pull request #42 from Discngine/documentation
Documentation
2020-06-06 16:07:31 +02:00
pschmidtke
16ed338bc1 moving old documentation to deprecated 2020-06-06 16:04:20 +02:00
pschmidtke
fdb4de240a documentation test 2020-06-06 16:00:27 +02:00
pschmidtke
e30a9f3641 fpocket advanced doc 2020-06-06 14:24:58 +02:00
pschmidtke
9f10e283c5 mdpocket basic features 2020-06-05 23:43:38 +02:00
pschmidtke
6e7e62d0f6 documentation update 2020-06-05 22:46:51 +02:00
pschmidtke
8f72305ee4 adding dpocket sample files & images 2020-06-05 21:47:06 +02:00
pschmidtke
707de5ad96 update main readme 2020-06-05 20:49:13 +02:00
pschmidtke
f91bd7a1fc documentation revamp 1 2020-06-05 20:34:34 +02:00
pschmidtke
65c809d04b explaining explicit pocket detection params 2020-06-02 17:36:07 +02:00
Peter Schmidtke
20635fa1a8 Merge pull request #29 from drewnutt/master
fix mdpocket to allow for reading in list of pdb files
2020-05-14 23:21:59 +02:00
Peter Schmidtke
a32fc24dc0 Merge branch 'master' into master 2020-05-14 23:19:46 +02:00
pschmidtke
b05d452d19 bugfix issue #24 2020-05-14 23:16:27 +02:00
pschmidtke
a03516e737 sample mdpocket output files 2020-05-14 23:15:51 +02:00
pschmidtke
019ad934bd sample mdpocket input files files 2020-05-14 23:15:35 +02:00
Peter Schmidtke
eb8287f325 adding install instructions for mac 2020-05-14 19:37:04 +02:00
Andrew(Drew) McNutt
40a446ea2b fix division to produce int as done in python2 2020-04-17 17:46:19 -04:00
Andrew(Drew) McNutt
6726e3cb37 python2 to python3 using 2to3 2020-04-02 14:48:38 -04:00
Andrew(Drew) McNutt
662e535f0a fix mdpocket to allow for reading in list of pdb files 2020-04-01 15:14:11 -04:00
Peter Schmidtke
82c0796ecb Update INSTALL.txt 2020-03-10 11:56:35 +01:00
Peter Schmidtke
0e012b4e28 Update README.md 2020-03-10 11:55:38 +01:00
Peter Schmidtke
fc4ad14f55 Merge pull request #28 from Discngine/add-license-1
Create LICENSE
2020-03-10 03:54:58 -07:00
Peter Schmidtke
0d998c42ba Create LICENSE 2020-03-10 11:54:42 +01:00
Peter Schmidtke
c5309000ec supporting setting architecture in command line 2019-05-26 23:12:13 +02:00
Peter Schmidtke
0a9a1df05a dropping unnecessary files 2019-05-26 23:05:12 +02:00
Peter Schmidtke
6159eab71f updating shared mofile for osx 64bits 2019-05-26 23:04:52 +02:00
Peter Schmidtke
c9ed2e4768 adapted for gcc & clang compilations + OSX 64 bit 2019-05-25 14:22:17 +02:00
Peter Schmidtke
76139ac4f4 added frequent error in documentation 2019-05-25 13:21:21 +02:00
Peter Schmidtke
33867da7c3 updating gitignore 2019-05-25 02:20:50 +02:00
Peter Schmidtke
1a381a3815 new molfile plugin compilation 2019-05-25 02:20:23 +02:00
Peter Schmidtke
78d585cd4f allow fpocket only or mdpocket compilation 2019-05-25 02:16:52 +02:00
Peter Schmidtke
6246f86252 updating the documentation on common issues 2019-05-24 23:58:00 +02:00
pschmidtke
d9012c150b dropping fprintf in db output 2018-07-24 06:16:40 +00:00
Peter Schmidtke
d36eafc12e dropping -j12 from qhull make 2018-07-24 06:03:03 +00:00
pschmidtke
0eece05649 fixing bug on makedir in db output of fpocket 2018-07-24 06:01:07 +00:00
pschmidtke
7ec07dc24d dev-13 wrapping up final build with new qhull 2018-07-23 21:53:46 +00:00
Peter Schmidtke
ffe88f56df dev-13 build only what is needed in qhull 2018-07-23 23:35:41 +02:00
Peter Schmidtke
9d4e3dc010 dev-13 adding back makefile? 2018-07-23 23:30:38 +02:00
Peter Schmidtke
c16c54072d dropping gitignore? 2018-07-23 23:20:39 +02:00
Peter Schmidtke
c21bee5482 dev-13 now?2 2018-07-23 23:14:29 +02:00
Peter Schmidtke
9b3443c9dc dev-13 now? 2018-07-23 23:13:07 +02:00
Peter Schmidtke
229728bcd6 dev-13 dropping calls to prompts in qconex 2018-07-23 23:10:51 +02:00
Peter Schmidtke
a96e6e8f8b dev-13 dropping prompts from qconvex 2018-07-23 23:09:32 +02:00
Peter Schmidtke
232f546bfe dev-13 adding missing userprintf_rbox.o dependency 2018-07-23 23:04:24 +02:00
Peter Schmidtke
17d171401e devel-13 back to qobjs in makefile 2018-07-23 22:56:05 +02:00
Peter Schmidtke
9f737c4070 dev-13 make clean qhull in fpocket makefile 2018-07-23 22:52:29 +02:00
Peter Schmidtke
81a864715c dev-13 adding back run_qvoronoi and run_qconvex 2018-07-23 22:51:22 +02:00
Peter Schmidtke
633efaa841 dev-13 including full qhull with tests 2018-07-23 22:43:19 +02:00
Peter Schmidtke
bce0e0640a dev-13 dropped tests etc from qhull makefile 2018-07-23 22:41:16 +02:00
Peter Schmidtke
6d99b179bf dev-13 make qhull 2018-07-23 22:38:43 +02:00
Peter Schmidtke
e68ab59034 dev-13 make qhull introduced 2018-07-23 22:37:55 +02:00
Peter Schmidtke
402b611b71 dev-13 integrating new qhull source code 2018-07-23 22:34:52 +02:00
Peter
93f081ce13 dev-13 drop other remove 2018-07-23 14:26:42 +02:00
Peter
a58493977f dev-13 drop removal of voronoi tmp files for debugging purposes 2018-07-23 14:24:07 +02:00
Peter
c6f4a2620b dev-13 enable debugging mode 2018-07-23 14:10:42 +02:00
Peter Schmidtke
9b5d5b341d Merge pull request #12 from mshyu24/master
Update git clone in README
2018-07-23 14:03:03 +02:00
mshyu24
4a5d14afa6 Update git clone in README 2018-07-19 19:48:04 -05:00
1098 changed files with 551473 additions and 6311 deletions

2
.gitignore vendored
View File

@@ -1,3 +1,5 @@
nbproject
*.o
*_out
.vscode
src/qhull/bin/

View File

@@ -1,22 +1,4 @@
// Fpocket brought to you by Vincent Le Guilloux & Peter Schmidtke
//
// GNU GPL
//
// This file is part of the fpocket package.
//
// fpocket is free software: you can redistribute it and/or modify
// it under the terms of the GNU General Public License as published by
// the Free Software Foundation, either version 3 of the License, or
// (at your option) any later version.
//
// fpocket is distributed in the hope that it will be useful,
// but WITHOUT ANY WARRANTY; without even the implied warranty of
// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
// GNU General Public License for more details.
//
// You should have received a copy of the GNU General Public License
// along with fpocket. If not, see <http://www.gnu.org/licenses/>.
//
===========================
DEPENDENCIES :
===========================

21
LICENSE Normal file
View File

@@ -0,0 +1,21 @@
MIT License
Copyright (c) 2020 Peter Schmidtke & Vincent Le Guilloux
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

View File

@@ -1,11 +1,17 @@
# fpocket project
![fpocket logo](doc/images/fpocket_logo.png)
The fpocket suite of programs is a very fast open source protein pocket detection algorithm based on Voronoi tessellation. The platform is suited for the scientific community willing to develop new scoring functions and extract pocket descriptors on a large scale level.
Detailed documentation is available here: [User Manual](doc/MANUAL.md).
The documentation below here is just a quick & rough overview.
## Content
fpocket: the original pocket prediction on a single protein structure
mdpocket: extension of fpocket to analyse conformational ensembles of proteins (MD trajectories for instance)
dpocket: extract pocket descriptors
tpocket: test your pocket scoring function
* fpocket: the original pocket prediction on a single protein structure
* mdpocket: extension of fpocket to analyse conformational ensembles of proteins (MD trajectories for instance)
* dpocket: extract pocket descriptors
* tpocket: test your pocket scoring function
## What's new compared to fpocket 2.0 (old sourceforge repo)
fpocket:
@@ -14,6 +20,7 @@ fpocket:
- pocket flexibility using temperature factors is better considered (less very flexible pockets on very solvent exposed areas)
- druggability score has been reoptimized vs original paper. Yields now slightly better results than the original implementation.
- compiler bug on newer compilers fixed
mdpocket:
- can now read Gromacs XTC, netcdf and dcd trajectories
- can also read prmtop topologies
@@ -22,8 +29,6 @@ mdpocket:
## Getting Started
These instructions will get you a copy of the project up and running on your local machine for development and testing purposes. See deployment for notes on how to deploy the project on a live system.
### Prerequisites
The most recent versions (starting with fpocket 3.0) make use of the molfile plugin from VMD. This plugin is shipped with fpocket. However, now you need to install the netcdf library on your system. This is typically called netcdf-devel or so, depending on you linux distribution.
@@ -41,36 +46,51 @@ sudo yum install netcdf-devel.x86_64
Download the sources from github via the website or using git clone and then build and deploy fpocket using the following commands.
#### Compiling on Linux
```
git clone https://github.com/Discngine/fpocket.git .
git clone https://github.com/Discngine/fpocket.git
cd fpocket
make
sudo make install
```
End with an example of getting some data out of the system or using it for a little demo
## Running the tests
The source code of fpocket is shipped with samples. They can be found in the data/sample folder. Try to run fpocket against the 1uyd sample to check if it's running OK.
#### Compiling on Mac
Install MacPorts https://www.macports.org/ for instance (needed for netcdf install)
```bash
sudo port install netcdf
export LIBRARY_PATH=/opt/local/lib
git clone https://github.com/Discngine/fpocket.git
cd fpocket
make ARCH=MACOSXX86_64
sudo make install
```
cd data/sample
fpocket -f 1UYD.pdb
### Running fpocket
You can run fpocket using the following command line as an example:
```bash
fpocket -f 1uyd.pdb
```
fpocket should state when it's beginning to search pocket and also when it's ending the search. Upon completion the folder should now contain a folder called 1UYD_out. Check whether the folder exists and the pdb files contain data and the pocket info file contains results.
This will detect all pockets on the input pdb file, named 1uyd.pdb
If you want to get all command line args for fpocket, simply type `fpocket``
## User Manual
For now the user manual (still the one from fpocket 2.0) can be found in the doc folder. When I have some time to kill (or if somebody else has) we could add that here somewhere.
### Running mdpocket
To detect all pockets and create a pocket frequency grid on a sample input trajectory in an xtc format for instance you can run:
```bash
mdpocket --trajectory_file input.xtc --trajectory_format xtc -f topology.pdb
```
## Detailed User Manual
You can access the detailed user manual here * [User Manual](doc/MANUAL.md)
## Contributing
Please read [CONTRIBUTING.md](https://gist.github.com/PurpleBooth/b24679402957c63ec426) for details on our code of conduct, and the process for submitting pull requests to us.
## Authors
* **Peter Schmidtke** - *Initial work* - [pschmidtke](https://github.com/pschmidtke)
@@ -79,8 +99,5 @@ Please read [CONTRIBUTING.md](https://gist.github.com/PurpleBooth/b24679402957c6
## License
This project is licensed under the MIT License - see the [LICENSE.md](LICENSE.md) file for details
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details
## Acknowledgments
* to be filled

3697
data/sample/1ATP.pdb Normal file

File diff suppressed because it is too large Load Diff

3015
data/sample/3LKF.pdb Normal file

File diff suppressed because it is too large Load Diff

3444
data/sample/5WA6.pdb Normal file

File diff suppressed because it is too large Load Diff

4756
data/sample/7TAA.pdb Normal file

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,10 @@
2yex.pdb
3ot3.pdb
3pa3.pdb
3pa4.pdb
3tki.pdb
4hyi.pdb
4rvk.pdb
5opb.pdb
5opu.pdb
5oq5.pdb

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,184 @@
ATOM 1 C PTH 1 -8.641 4.782 5.052 0.00 0.00
ATOM 2 C PTH 1 -8.641 4.782 6.052 0.00 0.00
ATOM 3 C PTH 1 -8.641 5.782 6.052 0.00 0.00
ATOM 4 C PTH 1 -7.641 4.782 5.052 0.00 0.00
ATOM 5 C PTH 1 -7.641 4.782 6.052 0.00 0.00
ATOM 6 C PTH 1 -7.641 5.782 3.052 0.00 0.00
ATOM 7 C PTH 1 -7.641 5.782 4.052 0.00 0.00
ATOM 8 C PTH 1 -7.641 5.782 6.052 0.00 0.00
ATOM 9 C PTH 1 -7.641 6.782 7.052 0.00 0.00
ATOM 10 C PTH 1 -6.641 4.782 3.052 0.00 0.00
ATOM 11 C PTH 1 -6.641 4.782 4.052 0.00 0.00
ATOM 12 C PTH 1 -6.641 5.782 3.052 0.00 0.00
ATOM 13 C PTH 1 -6.641 5.782 4.052 0.00 0.00
ATOM 14 C PTH 1 -6.641 5.782 5.052 0.00 0.00
ATOM 15 C PTH 1 -6.641 5.782 6.052 0.00 0.00
ATOM 16 C PTH 1 -6.641 5.782 7.052 0.00 0.00
ATOM 17 C PTH 1 -6.641 6.782 3.052 0.00 0.00
ATOM 18 C PTH 1 -6.641 6.782 4.052 0.00 0.00
ATOM 19 C PTH 1 -6.641 6.782 7.052 0.00 0.00
ATOM 20 C PTH 1 -6.641 20.782 12.052 0.00 0.00
ATOM 21 C PTH 1 -5.641 4.782 4.052 0.00 0.00
ATOM 22 C PTH 1 -5.641 5.782 3.052 0.00 0.00
ATOM 23 C PTH 1 -5.641 5.782 4.052 0.00 0.00
ATOM 24 C PTH 1 -5.641 5.782 5.052 0.00 0.00
ATOM 25 C PTH 1 -5.641 6.782 4.052 0.00 0.00
ATOM 26 C PTH 1 -5.641 6.782 5.052 0.00 0.00
ATOM 27 C PTH 1 -4.641 5.782 4.052 0.00 0.00
ATOM 28 C PTH 1 -4.641 5.782 5.052 0.00 0.00
ATOM 29 C PTH 1 -4.641 6.782 4.052 0.00 0.00
ATOM 30 C PTH 1 -2.641 12.782 -14.948 0.00 0.00
ATOM 31 C PTH 1 -2.641 13.782 -14.948 0.00 0.00
ATOM 32 C PTH 1 -2.641 13.782 -13.948 0.00 0.00
ATOM 33 C PTH 1 -1.641 12.782 -14.948 0.00 0.00
ATOM 34 C PTH 1 -1.641 13.782 -15.948 0.00 0.00
ATOM 35 C PTH 1 -1.641 13.782 -14.948 0.00 0.00
ATOM 36 C PTH 1 -0.641 11.782 -15.948 0.00 0.00
ATOM 37 C PTH 1 -0.641 12.782 -15.948 0.00 0.00
ATOM 38 C PTH 1 0.359 11.782 -15.948 0.00 0.00
ATOM 39 C PTH 1 1.359 7.782 18.052 0.00 0.00
ATOM 40 C PTH 1 1.359 7.782 19.052 0.00 0.00
ATOM 41 C PTH 1 1.359 8.782 19.052 0.00 0.00
ATOM 42 C PTH 1 1.359 8.782 20.052 0.00 0.00
ATOM 43 C PTH 1 1.359 26.782 -9.948 0.00 0.00
ATOM 44 C PTH 1 2.359 6.782 18.052 0.00 0.00
ATOM 45 C PTH 1 2.359 7.782 18.052 0.00 0.00
ATOM 46 C PTH 1 2.359 7.782 19.052 0.00 0.00
ATOM 47 C PTH 1 2.359 8.782 19.052 0.00 0.00
ATOM 48 C PTH 1 2.359 8.782 20.052 0.00 0.00
ATOM 49 C PTH 1 3.359 4.782 17.052 0.00 0.00
ATOM 50 C PTH 1 3.359 5.782 17.052 0.00 0.00
ATOM 51 C PTH 1 3.359 5.782 18.052 0.00 0.00
ATOM 52 C PTH 1 3.359 6.782 17.052 0.00 0.00
ATOM 53 C PTH 1 3.359 6.782 18.052 0.00 0.00
ATOM 54 C PTH 1 3.359 8.782 19.052 0.00 0.00
ATOM 55 C PTH 1 3.359 8.782 20.052 0.00 0.00
ATOM 56 C PTH 1 3.359 9.782 20.052 0.00 0.00
ATOM 57 C PTH 1 3.359 19.782 -13.948 0.00 0.00
ATOM 58 C PTH 1 3.359 19.782 -12.948 0.00 0.00
ATOM 59 C PTH 1 3.359 20.782 -12.948 0.00 0.00
ATOM 60 C PTH 1 3.359 21.782 -12.948 0.00 0.00
ATOM 61 C PTH 1 3.359 21.782 -11.948 0.00 0.00
ATOM 62 C PTH 1 4.359 4.782 16.052 0.00 0.00
ATOM 63 C PTH 1 4.359 4.782 17.052 0.00 0.00
ATOM 64 C PTH 1 4.359 5.782 15.052 0.00 0.00
ATOM 65 C PTH 1 4.359 5.782 16.052 0.00 0.00
ATOM 66 C PTH 1 4.359 5.782 17.052 0.00 0.00
ATOM 67 C PTH 1 4.359 5.782 18.052 0.00 0.00
ATOM 68 C PTH 1 4.359 6.782 16.052 0.00 0.00
ATOM 69 C PTH 1 4.359 6.782 17.052 0.00 0.00
ATOM 70 C PTH 1 4.359 8.782 19.052 0.00 0.00
ATOM 71 C PTH 1 4.359 8.782 20.052 0.00 0.00
ATOM 72 C PTH 1 4.359 9.782 19.052 0.00 0.00
ATOM 73 C PTH 1 4.359 9.782 20.052 0.00 0.00
ATOM 74 C PTH 1 4.359 20.782 -12.948 0.00 0.00
ATOM 75 C PTH 1 4.359 21.782 -12.948 0.00 0.00
ATOM 76 C PTH 1 5.359 4.782 15.052 0.00 0.00
ATOM 77 C PTH 1 5.359 5.782 15.052 0.00 0.00
ATOM 78 C PTH 1 5.359 5.782 16.052 0.00 0.00
ATOM 79 C PTH 1 6.359 5.782 -0.948 0.00 0.00
ATOM 80 C PTH 1 6.359 5.782 0.052 0.00 0.00
ATOM 81 C PTH 1 6.359 5.782 1.052 0.00 0.00
ATOM 82 C PTH 1 6.359 6.782 0.052 0.00 0.00
ATOM 83 C PTH 1 6.359 6.782 1.052 0.00 0.00
ATOM 84 C PTH 1 6.359 7.782 0.052 0.00 0.00
ATOM 85 C PTH 1 6.359 7.782 1.052 0.00 0.00
ATOM 86 C PTH 1 7.359 -0.218 -1.948 0.00 0.00
ATOM 87 C PTH 1 7.359 0.782 -1.948 0.00 0.00
ATOM 88 C PTH 1 7.359 5.782 -0.948 0.00 0.00
ATOM 89 C PTH 1 7.359 5.782 0.052 0.00 0.00
ATOM 90 C PTH 1 7.359 5.782 1.052 0.00 0.00
ATOM 91 C PTH 1 7.359 6.782 -0.948 0.00 0.00
ATOM 92 C PTH 1 7.359 6.782 0.052 0.00 0.00
ATOM 93 C PTH 1 7.359 6.782 1.052 0.00 0.00
ATOM 94 C PTH 1 7.359 6.782 16.052 0.00 0.00
ATOM 95 C PTH 1 7.359 7.782 -0.948 0.00 0.00
ATOM 96 C PTH 1 7.359 7.782 0.052 0.00 0.00
ATOM 97 C PTH 1 7.359 7.782 16.052 0.00 0.00
ATOM 98 C PTH 1 7.359 7.782 17.052 0.00 0.00
ATOM 99 C PTH 1 7.359 8.782 17.052 0.00 0.00
ATOM 100 C PTH 1 7.359 8.782 18.052 0.00 0.00
ATOM 101 C PTH 1 7.359 9.782 18.052 0.00 0.00
ATOM 102 C PTH 1 7.359 9.782 19.052 0.00 0.00
ATOM 103 C PTH 1 8.359 -1.218 -2.948 0.00 0.00
ATOM 104 C PTH 1 8.359 -1.218 -1.948 0.00 0.00
ATOM 105 C PTH 1 8.359 -1.218 -0.948 0.00 0.00
ATOM 106 C PTH 1 8.359 -0.218 -2.948 0.00 0.00
ATOM 107 C PTH 1 8.359 -0.218 -1.948 0.00 0.00
ATOM 108 C PTH 1 8.359 -0.218 -0.948 0.00 0.00
ATOM 109 C PTH 1 8.359 0.782 -2.948 0.00 0.00
ATOM 110 C PTH 1 8.359 0.782 -1.948 0.00 0.00
ATOM 111 C PTH 1 8.359 4.782 15.052 0.00 0.00
ATOM 112 C PTH 1 8.359 5.782 16.052 0.00 0.00
ATOM 113 C PTH 1 8.359 6.782 -0.948 0.00 0.00
ATOM 114 C PTH 1 8.359 6.782 16.052 0.00 0.00
ATOM 115 C PTH 1 8.359 6.782 17.052 0.00 0.00
ATOM 116 C PTH 1 8.359 7.782 -0.948 0.00 0.00
ATOM 117 C PTH 1 8.359 7.782 16.052 0.00 0.00
ATOM 118 C PTH 1 8.359 7.782 17.052 0.00 0.00
ATOM 119 C PTH 1 8.359 8.782 -0.948 0.00 0.00
ATOM 120 C PTH 1 8.359 8.782 18.052 0.00 0.00
ATOM 121 C PTH 1 8.359 9.782 17.052 0.00 0.00
ATOM 122 C PTH 1 8.359 9.782 18.052 0.00 0.00
ATOM 123 C PTH 1 9.359 -2.218 -1.948 0.00 0.00
ATOM 124 C PTH 1 9.359 -1.218 -2.948 0.00 0.00
ATOM 125 C PTH 1 9.359 -1.218 -1.948 0.00 0.00
ATOM 126 C PTH 1 9.359 -1.218 -0.948 0.00 0.00
ATOM 127 C PTH 1 9.359 -0.218 -2.948 0.00 0.00
ATOM 128 C PTH 1 9.359 -0.218 -1.948 0.00 0.00
ATOM 129 C PTH 1 9.359 0.782 -2.948 0.00 0.00
ATOM 130 C PTH 1 9.359 5.782 15.052 0.00 0.00
ATOM 131 C PTH 1 9.359 6.782 16.052 0.00 0.00
ATOM 132 C PTH 1 9.359 7.782 -14.948 0.00 0.00
ATOM 133 C PTH 1 9.359 17.782 -6.948 0.00 0.00
ATOM 134 C PTH 1 9.359 18.782 -6.948 0.00 0.00
ATOM 135 C PTH 1 9.359 24.782 -0.948 0.00 0.00
ATOM 136 C PTH 1 10.359 6.782 16.052 0.00 0.00
ATOM 137 C PTH 1 10.359 7.782 16.052 0.00 0.00
ATOM 138 C PTH 1 10.359 10.782 15.052 0.00 0.00
ATOM 139 C PTH 1 10.359 17.782 -6.948 0.00 0.00
ATOM 140 C PTH 1 10.359 17.782 -5.948 0.00 0.00
ATOM 141 C PTH 1 10.359 18.782 -6.948 0.00 0.00
ATOM 142 C PTH 1 10.359 18.782 -5.948 0.00 0.00
ATOM 143 C PTH 1 10.359 18.782 -4.948 0.00 0.00
ATOM 144 C PTH 1 10.359 18.782 -3.948 0.00 0.00
ATOM 145 C PTH 1 10.359 19.782 -3.948 0.00 0.00
ATOM 146 C PTH 1 10.359 24.782 -0.948 0.00 0.00
ATOM 147 C PTH 1 10.359 25.782 -0.948 0.00 0.00
ATOM 148 C PTH 1 11.359 7.782 15.052 0.00 0.00
ATOM 149 C PTH 1 11.359 8.782 15.052 0.00 0.00
ATOM 150 C PTH 1 11.359 9.782 15.052 0.00 0.00
ATOM 151 C PTH 1 11.359 10.782 14.052 0.00 0.00
ATOM 152 C PTH 1 11.359 17.782 -6.948 0.00 0.00
ATOM 153 C PTH 1 11.359 17.782 -5.948 0.00 0.00
ATOM 154 C PTH 1 11.359 18.782 -6.948 0.00 0.00
ATOM 155 C PTH 1 11.359 18.782 -5.948 0.00 0.00
ATOM 156 C PTH 1 11.359 18.782 -4.948 0.00 0.00
ATOM 157 C PTH 1 11.359 18.782 -3.948 0.00 0.00
ATOM 158 C PTH 1 11.359 19.782 -5.948 0.00 0.00
ATOM 159 C PTH 1 11.359 19.782 -4.948 0.00 0.00
ATOM 160 C PTH 1 11.359 19.782 -3.948 0.00 0.00
ATOM 161 C PTH 1 11.359 19.782 -2.948 0.00 0.00
ATOM 162 C PTH 1 11.359 20.782 -3.948 0.00 0.00
ATOM 163 C PTH 1 11.359 20.782 -2.948 0.00 0.00
ATOM 164 C PTH 1 11.359 21.782 -0.948 0.00 0.00
ATOM 165 C PTH 1 11.359 22.782 -0.948 0.00 0.00
ATOM 166 C PTH 1 12.359 8.782 14.052 0.00 0.00
ATOM 167 C PTH 1 12.359 8.782 15.052 0.00 0.00
ATOM 168 C PTH 1 12.359 9.782 15.052 0.00 0.00
ATOM 169 C PTH 1 12.359 19.782 -4.948 0.00 0.00
ATOM 170 C PTH 1 12.359 20.782 -2.948 0.00 0.00
ATOM 171 C PTH 1 12.359 21.782 -1.948 0.00 0.00
ATOM 172 C PTH 1 12.359 21.782 -0.948 0.00 0.00
ATOM 173 C PTH 1 12.359 22.782 -1.948 0.00 0.00
ATOM 174 C PTH 1 12.359 22.782 -0.948 0.00 0.00
ATOM 175 C PTH 1 12.359 22.782 0.052 0.00 0.00
ATOM 176 C PTH 1 12.359 23.782 -0.948 0.00 0.00
ATOM 177 C PTH 1 13.359 13.782 27.052 0.00 0.00
ATOM 178 C PTH 1 13.359 13.782 28.052 0.00 0.00
ATOM 179 C PTH 1 13.359 22.782 -0.948 0.00 0.00
ATOM 180 C PTH 1 14.359 17.782 -7.948 0.00 0.00
ATOM 181 C PTH 1 15.359 16.782 -8.948 0.00 0.00
ATOM 182 C PTH 1 15.359 16.782 -7.948 0.00 0.00
ATOM 183 C PTH 1 15.359 17.782 -8.948 0.00 0.00
ATOM 184 C PTH 1 15.359 17.782 -7.948 0.00 0.00

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,10 @@
0.215480
0.271399
0.217237
0.212487
0.221198
0.217482
0.234323
0.224301
0.218361
0.211592

View File

@@ -0,0 +1,3 @@
data/sample/3LKF.pdb pc
data/sample/1ATP.pdb atp
data/sample/7TAA.pdb abc

972
doc/GETTINGSTARTED.md Normal file
View File

@@ -0,0 +1,972 @@
# Getting started & Advanced guides
##### fpocket
* [fpocket basics](#fpocket-simple-pocket-detection)
* [fpocket advanced](#fpocket-advanced)
##### mdpocket
* [mdpocket basics](#mdpocket-pocket-detection-on-md-trajectories)
* [mdpocket advanced](#mdpocket-advanced)
##### dpocket
* [dpocket basics](#dpocket-descriptor-extraction)
* [dpocket advanced](#dpocket-advanced)
##### tpocket
* [tpocket basics](#tpocket-scoring-ranging-and-evaluation)
* [tpocket advanced](#tpocket-advanced)
##### other
* [pocket descriptors](#pocket-descriptors)
* [cofactor definitions](#Cofactor-definition)
* [customizing fpocket/mdpocket](#Customizing-fpocket)
## fpocket - simple pocket detection
To run the following examples, we use several sample input file `data/sample/` directory).
### Example
Here you have a very simple and straightforward example of how to run fpocket on a single PDB file downloaded from the RCSB PDB. The following command line will execute fpocket on the 1UYD.pdb file situated in the sample directory.
`fpocket -f sample/1UYD.pdb`
It is mandatory to give a PDB input file using the -f flag in command line. If nothing is given, fpocket prints the fpocket usage/help to the screen. fpocket will use standard parameters for the detection of pockets. Fore more information about these parameters see the [advanced fpocket features](#fpocket-advanced).
If fpocket works properly the output on the screen should look like this :
```bash
=========== Pocket hunting begins ==========
=========== Pocket hunting ends ============
```
If you have a look now in the sample directory, you will notice that fpocket created a folder named 1UYD_out/. This folder contains all the output from fpocket, so what you are actually interested in. If you just want to see rapidly the results, go to the 1UYD_out directory and launch the 1UYD_VMD.sh script. This script will launch the VMD molecular visualizer and load the protein with binding site information coming from fpocket.
![VMD with fpocket output](images/vmd1.png)
The illustration above is somehow what you will see if you launch the VMD script. VMD is well suited for representing the volume of alpha spheres and their respective centers. Usually the visual volume information is not of primordial importance, as the larger alpha spheres tend to reach far out of the protein and smaller alpha spheres are not visible because they are recovered by larger ones. As it can be seen within the Main VMD window, the visualization script loads 3 structures, all of them are explained in more detail in the output section of this chapter.
If you had a closer look before on the methodological aspects of this algorithm (we invite you to read the paper) a natural question would be how to represent apolar and polar alpha spheres. Currently the color code represents only the residue ID (rank of the cavity). If you want to see characteristics of alpha spheres we invite you to change the representation of alpha spheres. This can be found by clicking Graphics -> Representations. Another window will show up. There you select the first molecule (1UYD_out.pdb), like represented on the figure below.
![VMD representations](images/vmd2.png)
A script for fast visualization using PyMOL is also provided. PyMOL provides nice features browsing and selecting different pockets, using the predefined selection patterns on the right side of the main window. However, PyMOL does not interpret well the pqr file format, so alpha sphere volumes are not accurate and only alpha sphere centers can be shown.
![VMD representations](images/pymol1.png)
### Basic input
#### Mandatory (1 OR 2):
1: flag -f : one standard PDB file name.
2: flag -F : one text file containing a simple list of pdb path
#### Optional:
For more details on optional fpocket arguments see [advanced fpocket features](#fpocket-advanced).
### Output
Fpocket output is made of many files. To have a detailed overview of those files, see [advanced fpocket features](#fpocket-advanced).
Is there something else? No, you are done. Congratulations, you have successfully performed your first pocket prediction with fpocket...without any accidents we hope. As you might have seen, usage of fpocket is rather simple, although it is command line based software. Furthermore you should have seen that fpocket is very fast, well, lets say if you do not run it on a P1 100Mhz.
As mentioned before, fpocket provides much more possibilities especially for filtering out unwanted pockets, clustering of alpha spheres. For all these issues and usage of these more advanced features, refer to [advanced fpocket features](#fpocket-advanced)
## mdpocket pocket detection on MD trajectories
The fpocket developer team is proud to present a very new feature as part of the fpocket software package. As programmers are very creative people, as you might know, we called this program mdpocket, as acronym of Molecular Dynamics pocket (very original isn't it?). In the next paragraphs we will refer to Molecular Dynamics as MD.
Well, mdpocket is a freely available software that allows you to do the following very nice things in a quite fast way :
* pocket detection on MD trajectories (I already said this one)
* visualization of transient pockets (oh, will we have all the Pharma people on our back?)
* extraction of pocket descriptors during the MD trajectory (like pocket volume for example)
* get a static image of pocket occurrences during the MD trajectory (this you do not necessarily see the usefulness, but this will become clearer later)
* perform on the fly energy calculations within a detected pocket
If you are already used to run and analyze MD trajectories you know that there is a bunch of different software available to perform calculation and analysis of MD trajectories. Mdpocket is able to read plain PDB files describing the conformations of a protein, but now you can also read Amber crd files, gromacs xtc, netcdf and charmm and Namd dcd files (that was a nightmare to integrate and compile, so please make use of it).
### Example
It is VERY IMPORTANT to first align (superimpose) all snapshots onto each other. Why? Well, you have to do this due to the methodology used behind mdpocket. For more information on how mdpocket works feel free to read the mdpocket paper:
(http://bioinformatics.oxfordjournals.org/content/27/23/3276.long)
Below is an example for Amber for instance, but you can do the same with gromacs tools, mdtraj in python or in VMD analyzing NAMD trajectories for instance:
With Amber you can do the structural alignment and transformation using the freely available ptraj or cpptraj program and the following steps:
- 1: create a ptraj input file with the following content :
trajin ../md_1.x.gz 1 250 10
trajin ../md_2.x.gz 1 250 10
trajin ../md_3.x.gz 1 250 10
reference ../reference.pdb
strip !:1-208
rms reference :25-88,120-196@CA,C,N,O
trajout trajectory_superimposed.dcd charmm
go
- 2: Run ptraj using the following command:
`ptraj your_topology.top < ptraj_input_file.ptr`
A few words about what we are doing here. First, the ptraj input reads trajectory files. In this example, the trajectory is split up in 3 files. Each file has 250 snapshots. Here we only read every tenth snapshot of the 250. We set a reference PDB structure for the alignment.
The strip command allows you to drop residues, here everything other than the protein (solvent, counter ions etc...).
Next, we align each snapshot on the reference structure, using only the heavy atoms of residues 25-88 and 120 to 196.
The output is written to trajectory_superimposed.dcd. Here we write a dcd file just for demonstration purposes, you can write mdcrd or netcdf files as well with ptraj.
Now, here we are, we can run mdpocket (finally...):
`mdpocket --trajectory_file trajectory_superimposed.dcd --trajectory_format dcd -f reference.pdb`
NB: you still have to provide a pdb file containing the actual topology of the structure as most of the supported MD formats only store coordinates, but no information on the actual atom & residue types of the structure.
The following part will take a while, depending on the number of atoms in your system and the number of snapshots you analyze. In average on a sample MD of 4000 snapshots (3258 atoms) 0.4 seconds of calculation time were necessary for analysis of 1 snapshot on one core of a 2.66Ghz Intel Quad with 4Gb of RAM.
Mdpocket will print out some things and the actual status of advance of the calculation. Once finished you will be able to find the following output files in your current folder :
* `mdpout_freq_grid.dx`: This is an output grid file. The grid contains only a measure of frequency of how many times the pocket was open during a MD trajectory. This, averaged by the number of snapshots, gives a range of possible iso-values between 0 and 1. Currently we provide both types of grid files (frequency & density, as both have proven their usefulness during in-house studies. However, the frequency grid file is usually much easier to interpret.
This representation gives you already a lot of information especially about existing paths during a MD. For mechanistic studies this can often be enough, However, if you want to do measurements of the volume (for example) of a certain pocket you have to select this region first. As VMD and the grid file are not really suitable for selection, mdpocket provides two last output files called :
* `mdpout_dens_grid.dx`: This is one of the two grid output files coming from mdpocket. Briefly, a grid is superposed to all alpha spheres of all snapshots and the number of alpha spheres around each grid point is counted. This output is very useful as working file for a first crude visualization using PyMOL or VMD. In the following example we will show VMD as the visualization of grids is easier and less heavy with it. Open VMD and load the DX file. You should have something like this (colors are different) :
![VMD with mdpocket output](images/vmd3.png)
Well, this is nice, but you can hardly see anything interpretable in there. In order to see more clearly we recommend to change the representation by going to Graphics -> Representations as shown in the following illustration:
![VMD with mdpocket output](images/vmd4.png)
Now you basically can play with the Isovalue slider to get more or less conserved cavities during the MD trajectory. The unit of this isovalue can be expressed as number of Voronoi Vertices (alpha sphere centers) in a 8Å3 cube around each grid point per snapshot. The more a cavity is conserved (or dense) the higher this value. Thus, you will usually get internal pockets and protein internal channels. If you are interested in very superficial or transient binding sites you should decrease the isovalue until you see it.
* `mdpout_dens_iso_8.pdb`: This file contains all grid points having 3 or more Voronoi Vertices in the 8A3 volume around the grid point for each snapshot. Using PyMOL you can now select and save only the grid points of the pocket you are interested in. Save these points to another pdb file. Let us call this file my_pocket.pdb. The choice of the correct grid points for your pocket definition depends completely on you. As rule of a thumb we would recommend to use a high (like 5) isovalue if you want to show open channels in a protein or protein internal binding pockets. You should lower this isovalue (maybe to 2 or 3) if you are interested in transient phenomena (opening, closing of paths, transient pockets etc...). Refer to advanced features to know how to extract these pdb files with other iso values.
* `mdpout_freq_iso_0_5.pdb`: This is similar to the previous pdb file, just being produced on the frequency grid with a cut-off of 0.5.
In order to measure the pocket around your previously defined pocket during the MD trajectory you have to rerun mdpocket in a slightly different way:
`mdpocket --trajectory_file trajectory_superimposed.dcd --trajectory_format dcd -f reference.pdb --selected_pocket my_pocket.pdb`
As you can see, now you have to pass your pocket definition using the --selected_pocket flag of mdpocket. To see how to define your pocket, see the section [Pocket Selection](#pocket-selection). The -v flag is optional, it is just to provide reasonably good volume calculations in a reasonably good execution time. As during the first mdpocket run you should see some output first and the advancement of mdpocket through all you snapshots. Once finished you will find some other output files in your folder:
* `mdpout_mdpocket.pdb`: This is a pdb file that contains all Voronoi vertices in the selected pocket zone for each snapshot. Each snapshot is handled as separated model (like a NMR structure) and can thus be viewed as MD using PyMOL. Show the surface of the vertices and you can visualize the movement of your pocket. Be careful, VMD does not read this file, as from one snapshot to the other a different number and type of Voronoi vertices can be part of the model.
* `mdpout_mdpocket_atoms.pdb`: This is a pdb file similar to the previous output, but this time containing all receptor atoms defining the binding pocket.
* `mdpout_descritpors.txt`: Last but not least, maybe the most important file containing the pocket descriptors. You will find for each snapshot the pocket volume, the number of alpha spheres and all other default fpocket descriptors:
snapshot pock_volume nb_AS mean_as_ray ...
1 793.47 183 3.76
2 726.95 158 3.86
3 711.87 213 3.59
4 700.82 172 3.61
5 762.24 196 3.85
6 618.31 193 3.77
This output file can be easily analyzed using R, gnuplot or other suitable software. An example R output for the pocket volume would be:
![Pocket volume plot](images/volume.png)
If you want to reproduce this, simply launch R and type:
```R
r=read.table("mdpout_descriptors.txt",h=T)
ylim=c(400,1200)
plot(r[,"pock_volume"],ty='l',ylim=ylim,main="",xlab="",ylab="")
par(new=T)
plot(smooth.spline(r[,"pock_volume"],df=40),col="red",lwd=3,ylim=ylim,ty="l",xlab="snapshot",ylab="volume")
```
On this figure you can see a clear volume increase of the pocket in the beginning of the trajectory. Now you can check to what phenomena this increase is due to by analyzing the mdpout_mdpocket.pdb output in PyMOL. Not shown in this example, mdpocket now provides also measurements of the polar and apolar surface area (van der Waals + 1.4Å probe) of the pocket.
### Pocket Selection
In order to be able to track some nifty properties of your cavities, like the solvent accessible surface area, the volume or other fpocket descriptors, you have to select the zone you are interested in. This process is crucial and can depply influence sub-sequent results.
But first of all, what is a selected pocket here? Here, this means a PDB file containing dummy atoms at the positions of grid points that overlap with grid points in the pocket grid you calculated in the first run (frequency or density grid). How can you obtain these dummy atoms? This can be done in two different ways.
__The fast way:__ The first, easy and not very accurate way is to use the defaut pdb files coming from the first run of mdpocket to detect the pocket grids. If you read this manual with a huge attention and did not fall asleep in between, then you remember that mdpocket provides two files called `mdpout_freq_iso_0_5.pdb` and `mdpout_dens_8.pdb`. These files contain dummy atoms at grid point positions that were extracted at grid points having a given value or higher (iso-value of 0.5 and 8 respectively). Now you can use one of these files (depending on if you are more comfortable with one or the other grid, and open them in a molecular viewer that is able to edit structures. PyMOL is an excellent choice to perform this task. Simply select all dummy atoms in the zone of interest (your pocket you want to track) and then create an object with this selection. In the end, the result should look somehow like this:
![pocket selection](images/pymol2.png)
Here the red cloud corresponds to the grid points I have selected by hand. You can now save the grid points that you selected as a PDB file and use this as an input for tracking the properties of the cavity.
__The better way:__ In order to get a good estimate of the volume and extent of the pocket you will notice that the default output pdb files for the two grids are not always sufficient, because of their predefined iso-values. This why you should extract the grid points as a PDB file using your own choice of iso-values. As a general rule, take the iso-values as low as possible. You should still be able to distinguish the different pockets in the density grid, but it's volume should not be very tiny!
You can extract these grid points using a python script that is available in the scripts directory of the fpocket distribution, called `extractIso.py`. Simply execute it with `python extractIso.py` to see how to use it.
### Basic Input
#### Mandatory (running mode 1 - detecting pockets):
##### either:
--trajectory_file : input trajectory file in one of the supported formats
--trajectory_format : (dcd,xtc,netcdf,crd,crdbox,dtr,trr)
-f : topology of the structure as input PDB file
##### or :
-L: a mdpocket input file, this file has to contain the paths to the PDB files of all snapshots (one path per line)
#### Mandatory (running mode 2 - calculating descriptors):
##### either:
--trajectory_file : input trajectory file in one of the supported formats
--trajectory_format : (dcd,xtc,netcdf,crd,crdbox,dtr,trr)
-f : topology of the structure as input PDB file
--selected_pocket : a PDB file containing the sitepoints in the pocket to be selected
##### or :
-L: a mdpocket input file, this file has to contain the paths to the PDB files of all snapshots (one path per line)
--selected_pocket : a PDB file containing the sitepoints in the pocket to be selected
#### Optional:
-o : the prefix you want to give to mdpocket output files
Note that mdpocket determines its running mode by the input given by the user. Thus if you do not provide a wanted pocket using the --selected_pocket flag, mdpocket will automatically only perform cavity detection. mdpocket offers much more optional parameters in order to guide the pocket detection. All fpocket parameters for pocket clustering and filtering are also available in mdpocket. For this see [advanced mdpocket features](#mdpocket-advanced).
### Output (running mode 1 - pocket detection)
* `mdpout_dens_grid.dx`: A dx formatted grid output. This grid contains the number of Voronoi vertices seen per snapshot nearby the grid point. It can be easily visualized using VMD.
* `mdpout_freq_grid.dx`: Similar to the prevous file, this grid file contains the frequency of opening of a pocket at each grid point. It can be visualized using VMD.
* `mdpout_dens_iso_8.pdb`: A pdb file of all grid point positions corresponding to grid points having 8 or more Voronoi vertices nearby per snapshot. This file is provided in order to be able to edit the grid points using PyMOL and select only the points defining the pocket of interest. This pocket of interest should be used as input of mdpocket in the 2nd running mode. If you want to extract gridpoints with other isovalues, use the provided `extractISO.py` file in the scripts directory.
* `mdpout_freq_iso_0_5.pdb`: A pdb file of all grid point positions corresponding to grid points that are 50% of the trajectory overlapping with a pocket. This file is provided in order to be able to edit the grid points using PyMOL and select only the points defining the pocket of interest. This pocket of interest should be used as input of mdpocket in the 2nd running mode. If you want to extract gridpoints with other isovalues, use the provided `extractISO.py` file in the scripts directory.
### Output (running mode 2 - pocket characterization)
* `mdpout_mdpocket.pdb`: A pdb file containing all Voronoi vertices within the selected pocket region for all snapshots. This file is an NMR like file, containing each snapshot as
separated model. This file is best viewed using PyMOL and can be used to create pocket motion movies.
* `mdpout_mdpocket_atoms.pdb`: A pdb file containing all receptor atoms surrounding the selected pocket region. Like the previous output file, this is a NMR like file, containing each snapshot as separated model. This file can be viewed with VMD and PyMOL.
* `mdpout_descriptors.txt`: A text file containing the fpocket pocket descriptors of the selected pocket region for each snapshot. This file can be easily analyzed using standard statistical software like R.
## dpocket descriptor extraction
Until now you have seen what the majority of cavity detection algorithms can do. So a part from speed and hopefully prediction results, nothing distinguishes fpocket from other algorithms like ligsite, sitemap, sitefinder, pocketpicker, pass ...
This is just partially true, because the fpocket package contains dpocket. D is an acronym for describing. One purpose a cavity detection algorithm can be used for is the extraction of descriptors of the physico-chemical environment of the cavity. dpocket allows to do this in a very simple and straightforward way. As extracting binding pocket descriptors on only one protein would be somehow meaningless for studying pocket characteristics, dpocket enables analysis of multiple structures. So now, no longer scripting and automation is necessary to do these kind of things. But lets have a closer look using again a very simple example you can try on your workstation.
### Example
Here we go. dpocket requires one single input file. This input file must be a text file containing the following information:
- 1: the PDB file of the protein you want to analyze and
- 2: the ID of the ligand you would like to have as reference in order to define an explicitly defined binding pocket. The file used in this example (data/sample/test_dpocket.txt) looks like this :
```
data/sample/3LKF.pdb pc1
data/sample/1ATP.pdb atp
data/sample/7TAA.pdb abc
```
Here we analyze three pdb files. Note that the ligand name should be separated by a tabulation from the pdb file name. You can launch dpocket on this sample file using the following command:
`dpocket -f sample/test_dpocket.txt`
dpocket will yield 3 results files in the current directory. These files will be by default :
- dpout_explicitp.txt
- dpout_fpocketnp.txt
- dpout_fpocketp.txt
If you want to change naming of these files, use the `-o` flag in command line to define a new prefix for the fpocket output files, for example `my_test` as prefix would yield `my_test_explicitp.txt`. The three output files contain the in fpocket implemented pocket descriptors for each binding pocket found by fpocket :
- __fpocketp.txt__: describes all binding pockets found by fpocket that match one of the detection criteria. In other word, fpocket found several pocket in the protein, and this file will contain descriptors of pocket that are considered to be the binding pocket using some detection criteria.
- __fpocketnp.txt__: describes on the contrary all pockets found by fpocket that are not found to be the actual pocket using the detection criteria.
- __explicitp.txt__: describes the pockets explicitely defined. By explicitely defined here, we mean that the pocket will be defined as all vertices/atoms situated at a given distance of the ligand (4A by default), regardless of what fpocket found during the algorithm.
The ouput files are tab separated ASCII text files that are easy to parse using statistical software such as R. Thus statistical analysis of pocket descriptors becomes a very straightforward and easy process. Basically, the two first files might be used to establish a new scoring function as they describe what fpocket finds, while the last file could be used for a more detailed and accurate analysis of the exact part of the protein that interact with the ligand.
For more details of the output refer to the output section below, or to [advanced dpocket features](#dpocket-advanced).
### Basic input
#### Mandatory:
flag -f : a dpocket input file, this file has to contain the path to the PDB file, as well as the residuename of the reference ligand, separated by tabulation.
#### Optional:
flag -o : the prefix you want to give to dpocket output files
dpocket offers much more optional parameters in order to guide the pocket detection. For this see Advanced features chapter [advanced dpocket features](#dpocket-advanced).
### Output
Refer to [advanced dpocket features](#dpocket-advanced) for a detailed description of the dpocket output files.
In conclusion of this first very easy dpocket run, you can see that you have a very fast and reliable tool to extract pocket descriptors, of binding pockets and “non binding pockets” on a large scale level. These descriptor files provide an excellent tool for further statistical analysis and model building, which leads immediately to your wish to write a new scoring function for ranking pockets using the different descriptors. Well, fpocket, dpocket and tpocket are very useful tools to do exactly this! So go ahead. Lets suppose you have passed several thousands of PDB files and analyzed statistically the significance of all descriptors. You have set up a new scoring function. Now you have an external test set of PDB files you haven't tested. How can you evaluate your scoring function? This is actually also a very easy task, using tpocket.
## tpocket scoring ranging and evaluation
As already mentioned in the previous paragraph, tpocket can be used in order to evaluate rapidly cavity scoring functions. If you are for example in the pharmaceutical industry and you want to set up the ultimate drugability prediction score, you might be able to do this with fpocket and dpocket. Afterwards you can actually test your method using tpocket. T is an acronym for testing, here.
Something fancy we did not tell you about before is that you can also test your scoring function on apo structures using tpocket. The only requirement is the need to align holo and apo structure to obtain superposed apo and holo pockets. But lets explain this with an example. Of course, testing a holo dataset is even more easy, you just need to provide the resname of the ligand and tpocket will do the rest.
### Example tpocket on apo structures
If you had a look to the fpocket paper, you might have seen that the algorithm was validated on a dataset of 48 proteins previously used to evaluate several pocket detection algorithms. As fpocket programmers are, by definition, very nice people, they have included this data set (holo and aligned apo structures) in the distribution of fpocket, released as `fpocket-1.0-data` with the original fpocket 1 release. [The tar.gz is available on sourceforge](https://sourceforge.net/projects/fpocket/files/fpocket-1.0/fpocket-src-1.0/fpocket-data-1.0.tgz/download)
So let us use this set as example here. When you extract the dataset in your folder you should have a data folder containing among others two files, `pp_apo-t.txt` and `pp_cplx-t.txt`. The first file is a tpocket input file in order to assess the capacity of the scoring function to rank correctly known binding sites on apo structures. The second file is also a tpocket inputfile, but this time for known binding sites on holo structures. Here is a part of `pp_apo-t.txt`:
data/pp_data/unbound/1QIF-1ACJ.pdb data/pp_data/complex/1ACJ.pdb tha
data/pp_data/unbound/3APP-1APU.pdb data/pp_data/complex/1APU.pdb iva
data/pp_data/unbound/1HSI-1IDA.pdb data/pp_data/complex/1IDA.pdb qnd
data/pp_data/unbound/1PSN-1PSO.pdb data/pp_data/complex/1PSO.pdb iva
data/pp_data/unbound/1L3F-2TMN.pdb data/pp_data/complex/2TMN.pdb po3
data/pp_data/unbound/3TMS-1BID.pdb data/pp_data/complex/1BID.pdb UMP
data/pp_data/unbound/8ADH-1CDO.pdb data/pp_data/complex/1CDO.pdb NAD
data/pp_data/unbound/1HXF-1DWD.pdb data/pp_data/complex/1DWD.pdb MID
Here the first column contains the path to the apo structure, aligned to the holo structure, which is given in the second column. Using a holo dataset, the first and the second column would be the same. The third column indicates the PDB HETATM code of the ligand in the holo structure that is situated in the binding site.
You can use this file to run tpocket using the following command line :
`tpocket -L data/pp_apo-t.txt`
Let us continue with the more interesting case, the first example, with a lot of structures. After some time of calculation, tpocket will provide two standard output files. The moment has come, you will finally know if you discovered the ultimate method of drugability prediction, or sugar binding site prediction or whatever. The first file is called by default `stats_g.txt`. It contains global statistics about the prediction using all evaluation criterias available in tpocket, so for example how many binding sites you found among the 3 first ranked cavities. For representational purposes only the first of the six tables available in this file is depicted hereafter:
Ratio of good predictions (dist = 4A)
-------------------------------------
Rank <= 1 : 0.69
Rank <= 2 : 0.83
Rank <= 3 : 0.94
Rank <= 4 : 0.94
Rank <= 5 : 0.94
Rank <= 6 : 0.94
Rank <= 7 : 0.94
Rank <= 8 : 0.94
Rank <= 9 : 0.94
Rank <= 10 : 0.94
-------------------------------------
Mean distance : 2.924573
Mean relative overlap : 39.373226
This table schedules the capacity of your scoring function to identify the binding sites of the 48 apo structures using the criteria published within the original pocket picker paper. Not represented here, tpocket provides two other, maybe more accurate, measures for a correctly identified binding site. These measures are explained in more detail in the [advanced tpocket features section](#tpocket-advanced), as they can be a bit more tricky.
The second output file provides more accurate statistics about each structure analyzed. This file, called `stats_p.txt` enables the user to analyze more closely why scoring might not work well on a specific structure. Here is an extract of the first columns and lines of this file:
LIG | COMPLEXE | APO | NB_PCK | OVLP1 | OVLP2 | DIST_CM | POS1 | POS2 | POS3
THA 1ACJ.pdb 1QIF-1ACJ.pdb 22 79.31 78.33 0.00 1 1 0
IVA 1APU.pdb 3APP-1APU.pdb 4 0.00 0.00 3.43 0 0 1
QND 1IDA.pdb 1HSI-1IDA.pdb 4 82.69 81.65 3.19 1 1 1
IVA 1PSO.pdb 1PSN-1PSO.pdb 9 80.00 51.38 3.49 1 1 1
PO3 2TMN.pdb 1L3F-2TMN.pdb 10 58.33 72.00 2.69 1 1 1
UMP 1BID.pdb 3TMS-1BID.pdb 15 63.64 60.78 3.52 1 1 1
NAD 1CDO.pdb 8ADH-1CDO.pdb 18 0.00 0.00 3.41 0 0 1
MID 1DWD.pdb 1HXF-1DWD.pdb 10 93.48 81.37 3.86 1 1 1
Using this output you have a detailed view of what worked and what did not worked for all criteria. For instance, in this example, fpocket detects well all apo binding sites a part from the first one using the PocketPicker criterion for binding site identification (DIST_CM). POS3 corresponds to the rank of the cavity using the scoring function of fpocket. You have further information about the number of pockets per protein and the exact overlap with the actual pocket.
Now if you want to assess your scoring function on holo structures, you also can use tpocket. This time you only have to provide the `pp_cplx.txt`, also provided within the sample tar.gz file. As you can see, this file is very similar to `pp_apo.txt`. Only the first column repeats the path to the complex structure like this:
data/pp_data/complex/1acj.pdb data/pp_data/complex/1acj.pdb tha
data/pp_data/complex/1apu.pdb data/pp_data/complex/1apu.pdb iva
data/pp_data/complex/1ida.pdb data/pp_data/complex/1ida.pdb qnd
data/pp_data/complex/1pso.pdb data/pp_data/complex/1pso.pdb iva
data/pp_data/complex/2tmn.pdb data/pp_data/complex/2tmn.pdb po3
data/pp_data/complex/1bid.pdb data/pp_data/complex/1bid.pdb ump
data/pp_data/complex/1cdo.pdb data/pp_data/complex/1cdo.pdb nad
### Basic Input
#### Mandatory:
flag -L : a tpocket input file, this file has to contain the paths to the PDB files (apo, holo or holo,holo if you want to test fpocket only on holo structures), as well as the residuename of the reference ligand, separated by tabulation.
#### Optional:
flag -o : the prefix you want to give to tpocket detailed statistics
flag -e : the prefix you want to give to tpocket general statistics
tpocket offers much more optional parameters in order to guide the pocket detection. For this see the [advanced tpocket features section](#tpocket-advanced).
### Output
Using standard parameters on the example tpocket list given in the example paragraph above, tpocket returns two output files:
* `stats_p.txt`: This file contains the detailed statistics of tpocket. The name and the ligand of the analyzed PDB structure are repeated, as well as the exact overlap of the fpocket identified binding pocket with the actual binding pocket (identified with the help of the ligand, called OVLP here). You will see two different overlaps in the output. For further informations refer to the [advanced tpocket features section](#tpocket-advanced). Furthermore, the distance criterion used in the Chemistry Central Journal paper for publication of PocketPicker was used (DIST_CM). Next, you can also have exact information about the rank of the cavity using the fpocket scoring function.
* `sats_g.txt`: Second, tpocket provides more general statistics about pocket identification on the dataset provided. For both overlap criterions the ranking performance (the capacity of the fpocket scoring to rank correctly a binding site having a certain minimum overlap with the actual binding site) is printed into this file. Thus, statistics in this file gives you a rapid overview over the global performance of your method.
Summarizing features of tpocket, one could retain, that tpocket is a very fast way to test fpockets performance on your own dataset and test your own scoring functions for ranking purposes of identified binding sites.
You have finished the Getting started section. We hope that you notice the usefulness (hopefully;) of this package of programs for the research of new features, descriptors and scoring functions in the binding site identification field. Well, this was only a very fast overview over the very basic features of fpocket, dpocket and tpocket. If you want to dive into development of your own pocket descriptors and scoring functions, or if you want to change the pocket detection parameters for your purposes, continue with the Advanced features section, next.
# Advanced Features
You want to know more about fpocket? This is the section for you, here we tried to compile in a (we hope) comprehensive manner the most important details of fpocket, dpocket and tpocket, to which you have access by command line. It is primordial to know, that fpockets performance was assessed and scoring function was established for standard parameters. The performance of pocket detection and scoring is highly dependent on these parameters, so keep in mind that you might have to adapt scoring to your specific problem.
Note that this section does not provide too much information about the theoretical background of the way fpocket works. In order to learn more about this read the Materials & Methods of the [freely available paper on the BMC Bioinformatics](https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-10-168) website. Nevertheless, we tried to keep it as clear as possible, using some application examples.
## fpocket advanced
### Input command line arguments
#### Mandatory:
The simplest way to run fpocket is either by providing a single pdb file, or by providing a list of pdb file, stored in a simple text file. You will need one of these two input to run fpocket:
-f string : one standard PDB filename that you want to analyze with fpocket
##### or:
-F string : filename of a simple list of pdb files.
#### Optional:
-m float: (default 3.4Å) This flag enables the user to modify the minimum radius an alpha sphere might have in a binding pocket. An alpha sphere is a contact sphere, that touches 4 atoms in 3D space without having any internal atoms. Here 3Å allow filtering of too small (protein internal) alpha spheres. I you want to analyze internal interstices, lower this parameter. In the contrary, if you want to analyze more solvent exposed cavities, you can raise this parameter in order to filter out too buried cavities.
-M float: (default 6.2Å) Here you can modify the maximum radius of alpha spheres in a pocket. An alpha sphere is a contact sphere, that touches 4 atoms in 3D space without having any internal atoms. Here 7Å allow to filter out too large contact spheres, that are lying on the protein surface. If you want to analyze very flat and solvent exposed surface depressions, raise this parameter. For analysis of buried parts of the protein you can lower this parameter. Higher radii might be more interesting for identification of protein protein binding sites or polysaccharide binding sites. Smaller radii enable detection of buried cavities for small organic molecules (drugs, for instance).
-l int: (None) If you have an input PDB file of an NMR structure or one with multiple models you can specify which model (conformation) you'd like to analyse
-C char: (default s) The clustering method to be used here. By default a pairwise single linkage clustering is used here.
's': pairwise single linkage clustering,
'm': pairwise maximum- (or complete-) linkage clustering,
'a': pairwise average-linkage clustering,
'c': pairwise centroid-linkage clustering
-e char: (default e) The distance measure used for the clustering algorithm.
'e': Euclidean distance
'b': City-block distance
'c': correlation
'a': absolute value of the correlation
'u': uncentered correlation
'x': absolute uncentered correlation
's': Spearman's rank correlation
'k': Kendall's tau
-i int: (default 15) This flag indicates how many alpha spheres a pocket must contain at least in order to figure in the results provided by fpocket. This parameter enables filtering of too small cavities. Thus, if you want to analyze smaller cavities also, lower this parameter, if you are only interested in huge cavities, like NADP binding sites, you can raise it in order to retain only very few pockets in the end. To give you an idea, a rather big cavity, like a NADP binding site, can have hundreds of alpha spheres. Thus, 30 as standard parameter enables also to keep smaller binding sites.
-A int: (default 3) Fpocket distinguishes between two types of alpha spheres. Polar alpha spheres and apolar alpha spheres. This flag ranges from 0 to 4 and modifies the definition of the alpha sphere type. By default, an alpha sphere contacting at least 3 apolar atoms (having an electronegativity below 2.8) is considered as apolar. If this is not the case it is considered as polar.
-D float: (default 2.4Å) this parameter changed compared to the previous versions of fpocket as we completely replaced the clustering algorithms entirely. This measure is now used to analyze a hierarchical distance and cut sub-trees at the desired distance. The bigger the distance, the larger the clusters you'll get.
-p float: (default 0.0) This is another parameter for filtering unwanted pockets. It defines the maximum ratio of apolar alpha spheres and the number of alpha spheres in a pocket in order to keep the pocket in the results list. That is to say, by default every pocket is kept (0.0). Now, if you would like to filter rather hydrophobic pockets, raise this parameter and very polar cavities will be filtered out. This parameter is a ratio, not a percentage, thus it ranges from 0 to 1.
-v int: (default 2500) By default, pockets volume are calculated using a monte-carlo algorithm. Basically, the algorithm picks a random point in the space and check if it is included in any alpha sphere, and stores this status. This is repeated N times, and we estimate the volume of the pocket using ratio between the number of hit and the number of iteration, scaled by the size of the box. This parameter defines the number of iteration to perform. Of course, the higher the value is, the greater the accuracy will be, but the performance will be slowed down.
-b (none): (NOT USED BY DEFAULT) This option allows the user to chose a discrete algorithm to calculate the volume of each pocket instead of the Monte Carlo method. This algorithm puts each pocket into a grid of dimention (1/N*X ; 1/N*Y ; 1/N*Z), N being the value given using this option, and X, Y and Z being the box dimensions, determined using coordinates of vertices. Then, a triple iteration on each dimensions is used to estimate the volume, checking if each points given by the iteration is in one of the pockets vertices. This parameter defines the grid discretization. If this parameter is used, this algorithm will be used instead of the Monte Carlo algorithm.
Warning: Although this algorithm could be more accurate, a high value might dramatically slow down the program, as this algorithm has a maximum complexity of N*N*N*nb_vertices, and a minimum of N*N*N !!!
-d (none): Option allowing you to output pockets and properties in a condensed format. This will put to the stdout pocket properties in a tab separated string and write pocket files in a subfolder
-r string: (None) This parameter allows you to run fpocket in a restricted mode. Let's suppose you have a very shallow or large pocket with a ligand inside and the automatic pocket prediction always splits up you pocket or you have only a part of the pocket found. Specifying your ligand residue with -r allows you to detect and characterize you ligand binding site explicitely. For instance for `1UYD.pdb` you can specify `-r 1224:PU8:A` (residue number of the ligand: residue name of the ligand: chain of the ligand)
-y string: (filename) EXPERIMENTAL: here you can specify a topology filename in the Amber prmtop format. This can then be used by fpocket & mdpocket to calculate energy grids for your pockets. NB: you have to specify the -x flag to run energy calculations
-x None: (None) EXPERIMENTAL: specify this flag if you want to run energy calculations on calculated pockets. That's not fully functional and only one or two probes are currently generated and output density grids written. Use with caution
### Output files description
fpocket yields output directly in the directory of the data file, creating a directory using the name of the PDB file followed bu the _out extension. Here, the command ll sample/3LKF_out of the current sample run would look something like this:
total 332
-rw-r--r-- 1 peter users 769 Nov 29 00:14 3LKF.pml
-rw-r--r-- 1 peter users 698 Nov 29 00:14 3LKF.tcl
-rwxr-xr-x 1 peter users 30 Nov 29 00:14 3LKF_PYMOL.sh
-rwxr-xr-x 1 peter users 41 Nov 29 00:14 3LKF_VMD.sh
-rw-r--r-- 1 peter users 245835 Nov 29 00:14 3LKF_out.pdb
-rw-r--r-- 1 peter users 6725 Nov 29 00:14 3LKF_pockets.info
-rw-r--r-- 1 peter users 49355 Nov 29 00:14 3LKF_pockets.pqr
-rw-r--r-- 1 peter users 4073 Nov 29 00:14 3LKF_info.txt
drwxr-xr-x 2 peter users 4096 Nov 29 00:14 pockets
As you can see, fpocket provides a lot of files and another subdirectory. However, majority of these files are necessary for easy visualization of binding pockets. Lets explain the content and utility of each file:
* `3LKF_info.txt`: this file contains human readable information (descriptors) about the pockets found on the protein. Notably this file contains a pocket score (likeliness this is a small molecule binding site) and a druggability score (how druggable the binding site is) Here an extract:
Pocket 1 :
Score : 0.490
Druggability Score : 0.019
Number of Alpha Spheres : 21
Total SASA : 19.687
Polar SASA : 7.611
Apolar SASA : 12.076
Volume : 270.934
Mean local hydrophobic density : 3.000
Mean alpha sphere radius : 3.816
Mean alp. sph. solvent access : 0.519
Apolar alpha sphere proportion : 0.190
Hydrophobicity score: 23.889
...
* `3LKF.pml`: this is a PyMOL script for visualization of binding pockets using PyMOL
* `3LKF.tcl`: this a tcl script for visualization of binding pockets using VMD
* `3LKF_PYMOL.sh`: this is the executable script to launch fast visualization using PYMOL
* `3LKF_VMD.sh`: this is the executable script to launch fast visualization using VMD
* `3LKF_out.pdb`: this is the most important file, it contains the initial PDB structure given as input. Non cofactor HETATM occurrences will be stripped off in this file compared to the original PDB input file. The PDB file contains centers of alpha spheres using the HETATM definition as dummy atoms. These alpha sphere centers are attached in the end of the PDB file, using the STP residue name (for site point). Apolar alpha spheres carry the atom name APOL, polar alpha spheres the atom name POL. Pockets are sets of alpha spheres. They can be distinguished by residue number. Thus residue STP 1 would be the first binding pocket according to fpocket. To show this more clearly here is an extract of the `3LKF_out.pdb`:
ATOM 2349 CD LYS A 299 9.679 16.827 105.636 0.00 0.00 C 0
ATOM 2350 CE LYS A 299 10.371 16.314 104.370 0.00 0.00 C 0
ATOM 2351 NZ LYS A 299 11.749 15.794 104.597 0.00 0.00 N 0
ATOM 2352 OXT LYS A 299 5.240 20.009 107.670 0.58 9.64 O 0
HETATM 1 APOL STP C 1 27.849 33.435 123.906 0.00 0.00 Ve
HETATM 2 APOL STP C 1 29.108 33.195 122.206 0.00 0.00 Ve
HETATM 3 APOL STP C 1 28.611 33.141 119.797 0.00 0.00 Ve
HETATM 4 APOL STP C 1 26.830 32.143 118.779 0.00 0.00 Ve
* `3LKF_pockets.pqr`: This file contains all alpha sphere centers, as the 3LKF_out.pdb file, but contains no information about the protein structure. Furthermore using the pqr format enables writing of the van der Waals radius of atoms explicitely in this file. Here this possibility was used to output the radii of alpha spheres of a pocket. Charging this pqr file, one can analyze more precisely the volume recognized by fpocket. Note that, currently only VMD supports reading this format correctly. PyMOL is able to read pqr file, but does not interpret van der Waals radii.
* `pockets/`: Well, again a subdirectory. But I promise, it's the last one. For development purposes or easy analysis, fpocket proposes this directory which contains according to the current example:
pocket0_atm.pdb pocket2_vert.pqr pocket5_atm.pdb pocket7_vert.pqr
pocket0_vert.pqr pocket3_atm.pdb pocket5_vert.pqr pocket8_atm.pdb
pocket1_atm.pdb pocket3_vert.pqr pocket6_atm.pdb pocket8_vert.pqr
pocket1_vert.pqr pocket4_atm.pdb pocket6_vert.pqr pocket9_atm.pdb
pocket2_atm.pdb pocket4_vert.pqr pocket7_atm.pdb pocket9_vert.pqr
* `*_atm.pdb`: These files contain only the atoms contacted by alpha spheres in the given pocket. Complementary to this information, `*_vert.pqr` files contain only the centers and radii of alpha spheres within the respective pocket. As extensions mention, atoms are output in the PDB file format and alpha sphere centers in the PQR file format.
### A word on druggability
With the [Understanding Druggability paper](https://pubs.acs.org/doi/abs/10.1021/jm100574m) we introduced an alternative scoring function in fpocket allowing you to assess the likelihood of a binding site to bind small druglike molecules. Since the publication the score has been retrained and performance improved (no paper for that work out). Roughly, if you get a druggability score of 0 or close to 0 it's predicted no-druggable with drug like molecules. If the score is above 0.5 there might be a chance to find drug-like molecules.
The druggability assessment is done using some of the pocket descriptors extracted by fpocket. IT DOES BY NO MEANS indicate that no molecule binds to a pocket. I.E. a peptide binding site will bind peptides, but peptides won't necessarily be considered as drug-like molecules.
## mdpocket advanced
A lot of the functionality of mdpocket has already been covered in the Getting started section. However, there is at least one little functionality that you can access via mdpocket that you don't know about yet.
### Detect transient druggable binding pockets
The current version of fpocket contains two scoring methods to score the pockets. The first one is the original fpocket score, published in the first release and the scientific paper. Later, a second pocket score was added. This score, called druggability score intends to assess at what point the identified pocket is likely to bind drug like molecules. This drug score is a value between 0 and 1, 0 signifying that the pocket is likely to not bind a drug like molecule and 1, that it is very likely to bind the latter. In combination with mdpocket the drug score can be of use when someone wants to assess if during a MD trajectory somewhere “druggable” pockets appear. You can do this during the first explorative mdpocket run (without studying a particular pocket), by specifying the `-S` flag in command line when calling mdpocket. This flag will yield mdpocket not to do the following thing: For each snapshot fpocket is run normally and a druggability score is associated to each pocket. Voronoi vertices near to grid points are used to map the drug score to each grid point (instead of counting them, we increment by the drug score of the pocket). We thus recommend to analyze the frequency grid when running mdpocket with `-S`. You will immediately notice that much less pockets are found in the grid at higher iso-values. This can also help to focus initially on your drug binding site (if you are coming from big pharma), especially for the tedious pocket selection by hand, this is very handy.
If you want to draw conclusions about the “mean druggability” of some pockets using the frequency grid you should beware of the fact, that the mean drug score that you see there (the iso-value) is very underestimated compared to values you obtain on crystal structures.
Last, but very important : if you plan to run a mdpocket calculation using `-S`, you should use the fpocket default pocket detection parameters. Using different parameters, like for channels etc makes strictly no sense as the druggability score was trained using the default fpocket parameters.
### Detect different types of pockets
Fpocket was initially created to detect small molecule binding sites on proteins. That is what most people are interested in (a big assumption, we know). But as we want to please a maximum number of you, distinguished fpocket users, we try to keep fpocket as flexible as possible via these various (probably a bit opaque) command line arguments. These arguments become very interesting when one is interested in a different type of pocket detection. For instance, detecting channels and gaz pathways in a protein is a completely different topic compared to finding drug binding sites.
If one wants to identify transient internal pockets and channels one could modify the pocket detection parameters for fpocket / mdpocket. Here we give examples of typical parameters and what type of pockets you are likely to get back from fpocket / mdpocket :
__Detect small molecule binding sites__ : Use the default parameters (don't specify anything)
__Detect putative channels and small cavities__ : -m 2.8 -M 5.5 -i 3
__Detect pockets where sterically water binding is possible__ : -m 3.5 -M 5.5 -i 3
__Detect rather big, external pockets__ : -m 3.5 -M 10.0 -i 3
### Additional scripts
In order to facilitate some simple tasks for conversion, extraction and creation of input files the fpocket distribution contains some additional python scripts that can be of use for some specific tasks but do not have anything to do in a concrete way with the pocket detection itself. This is why they are not included as standalone program here.
* `createMDPocketInputFile.py`: This is a standard python script (that should work out of the box on all machines having python installed on it) that takes the path of all the snapshot PDB files of a MD trajectory as input and creates a valid mdpocket input file (alpha numerically sorted list of paths). We recommend you to use this script if you need a valid mdpocket input file without worrying about how to order in a alphanumeric way your file names to form a valid list.
* `extractISO.py`: This is a python script that makes use of the numpy library. If you do not have numpy installed this will not work. However installing numpy is a rather good idea as this is a very nice library ;). The script takes as input a mdpocket dx grid file, a filename (the one you want for the output) and a wanted isvalue. The script will write all grid point coordinates from the dx file having a grid value higher or equal than the wanted isovalue to the output file.
## dpocket advanced
Input command line arguments
### Mandatory:
-f : a dpocket input file, this file has to contain the path to the PDB file, as well as the residuename (PDB HET residue tag, like “hem”, for heme) of the reference ligand, separated by a tabulation.
See the [Getting started section of dpocket](#dpocket-descriptor-extraction) for an example of such a file.
### Optional:
-o : (default dpout) the prefix you want to give to dpocket output files. The standard will produce three output files named dpout_fpocketnp.txt, dpout_fpocketp.txt, dpout_explicitp.txt.
-e : Use the first explicit interface definition (default): we define the explicit pocket as being all atoms contacted by alpha spheres situated at a distance of d A° from any ligand atom.
-E : Use the second explicit interface definition: we define the explicit pocket as being all atoms situated at a distance of d A° from any ligand atom.
-d : The distance criteria used for the explicit pocket definition.
Last, all optional parameters used by fpocket are also accessible on command line through dpocket. Refer to the preceding paragraph to see details about fpocket parameters.
### Output files description
As shown in the example, dpocket creates 3 output files. Lets describe them a bit more in detail here:
* `dpout_explicitp.txt`: This file contains all pocket descriptors implemented in fpocket of the explicitly defined binding pocket. What does this mean, explicitly? In the input you have associated a ligand identification to each PDB file. This ligand is used by fpocket in order to identify the actual binding pocket.
pdb ligand overlap lig_vol pocket_vol nb_alpha_spheres mean_asph_ray
data/3LKF.pdb PC 100.00 132.90 1678.64 29 3.94
data/1ATP.pdb ATP 100.00 322.62 2127.53 65 3.59
data/7TAA.pdb ABC 100.00 608.66 4977.48 97 4.20
Note that this is only an extract of this file. It contains a lot of columns (descriptors) that are not represented here. The first line describes the nature of the entry. The next line recapitulates the pdb structure analyzed (`data/sample/3LKF.pdb`), the ligand used as reference (PC). Next the overlap between the actual and found binding pocket is shown, here 100% as this is an explicitly defined binding pocket. The next entries can be used as descriptors, like the ligand volume, the pocket volume, the number of alpha spheres in the binding pocket, the mean alpha sphere radius ... For a complete list of all implemented descriptors in fpocket, refer to the Advanced features [Pocket descriptors section](#pocket-descriptors).
The volumes calculated here are not accurate at all. If you want to calculate accurate volumes you have to change parameters for volume calculation. As volume calculations are generally over-estimated using alpha sphere approaches, especially for open binding pockets, this calculation is made available, but uses the minimum sampling for the calculation. For more accurate calculation significantly more calculation time would be necessary. You can provide a higher sampling via the `-v` flag in the command line.
* `dpout_fpocketnp.txt`: This file contains the same kind of descriptors as the preceding one, but this time for pockets identified by fpocket, that are “non binding pockets”. Non binding pockets means here, that the pockets do not correspond to the pocket where the reference ligand binds. Be careful, this does not necessarily mean that other pockets do not bind anything.
* `dpout_fpocketp.txt`: The last file is also formated the same way as the preceding both. This file contains the binding pocket, this time identified by fpocket and not explicitly by the ligand.
## tpocket advanced
This program of the fpocket package is certainly very useful for testing new scoring methods rapidly on a large dataset of protein ligand complexes. However one might encounter difficulties to understand results, interest, advantages and drawbacks of this methodology. In order to facilitate your understanding of this package we provide some more fundamental information first, before treating more practical questions about tpocket.
### Input command line arguments
#### Mandatory:
-L : a tpocket input file. This file has to contain the paths to the PDB files (apo, holo or holo,holo if you want to test fpocket only on holo structures), as well as the residuename (PDB HET residue tag, like “hem” for heme) of the reference ligand, separated by tabulations.
#### Optional:
-o : (default ./stats_p.txt) The filename you want to give to tpocket detailed statistics.
-e : (default ./stats_g.txt) The filename you want to give to tpocket global statistics.
-d : Distance criteria used for one of the 3 definition of a pocket: All atoms situated at a distance lower of equal that d will be considered as part of the actual pocket.
-k : Keep fpocket output for each pdb test.
Last, all optional parameters used by fpocket are also accessible on command line through tpocket. Refer to [fpocket advanced](#fpocket-advanced) for fpocket parameters.
### Actual pocket definition for evaluation of fpocket
Delimiting, and more generally defining what is the exact binding pocket of a protein in an automated way is not that easy. Finding a criteria that evaluate correctly the ability of fpocket to detect the actual binding site of a protein is consequently even more difficult.
Tpocket makes use of 6 different ways to determine if a pocket found by fpocket could be considered as the actual binding pocket, with respect to a given ligand:
* 1 The actual binding site is reduced to a single point, the barycenter of the pocket (calculated using alpha spheres). The binding pocket is defined as the pocket which barycenter is situated at a distance of 4A of any ligand atom. It corresponds to the Ppc discussed in the paper.
* 2 The actual binding pocket is defined by the set of atoms that are in contact with alpha sphere that are nearby (< 3A) the actual ligand. This set of atoms is then compared to all atoms contacted by all voronoi vertices included in each pocket found by fpocket. WARNING: this is currently not safely usable for an holo/apo dataset.
* 3 The actual binding pocket is defined by the set of atoms that are nearby (4A) the actual ligand. The same procedure as for the first definition is then applied to say whether a pocket can be considered as the actual pocket or not. WARNING: this is currently not safely usable for an holo/apo dataset.
* 4 The actual binding pocket is defined by the set of alpha sphere nearby (< 3A) the actual ligand. Then, for a given pocket, we calculate the correspondence between alpha sphere in the pocket, and alpha sphere in the actual binding pocket. If this ratio exceed a certain value (25%), we consider this pocket as being the actual pocket.
* 5 For a given pocket, we calculate the proportion of ligand atom that are nearby (< 3A) at least one alpha sphere of pocket. If this proportion exceed a certain value (50%), we consider this pocket as being the actual pocket.
* 6 A combination of both 5th and 6th criteria described above. If both 4th and 5th criterion are satisfied, then this criteria is. This corresponds to the MOc (Mutual Overlap criterion) discussed in the paper.
The reason why we choose 3A for the criteria 2, 4 and 5 is quite simple: as in the current algorithm, the minimum radius of an alpha sphere is 3A, a ligand atom situated at a distance lower or equal than this value can be considered as included in this alpha sphere, and therefore detected. Of course, this applies to alpha sphere with higher radius too.
All of these criteria have their strengths and witnesses, that's why we choose to implement all of them.
## Pocket descriptors
In order to discriminate an interesting pocket from a lot of uninteresting ones, fpocket uses descriptors for each pocket. A scoring function, using these descriptors, was trained to well identify what we generally call “binding site”. Here are set together all descriptors implemented in fpocket. The ones that are currently used for scoring are marked with a *, and the one having the tag normalized associated with have a normalized (ie. scaled to a [0, 1] range, the highest (resp the lowest) value of a given descriptor being set to 1 (resp 0)) equivalent descriptor.
### Number of alpha spheres (normalized)
As the title says, this is surely the most simple descriptor. The number of alpha spheres reflects generally more or less proportionally the size of the cavity.
### Density of the cavity (normalized)
This descriptor tends to measure the density and “buriedness” of a pocket. It is nothing else than the mean value of all alpha sphere pair to pair distances in the binding pocket. Thus, a small value indicates a rather big compactness of the binding pocket and thus a rather burried pocket. Larger values give indication about more extended and exposed cavities.
### Polarity Score (normalized)
In the contrary to hydrophobicity this descriptor tries to measure the hydrophilicity character of a binding pocket. To each residue of the binding pocket a polarity score is assigned (as published on http://www.info.univ-angers.fr/~gh/Idas/proprietes.htm). The final polarity score is the mean of all polarity scores of all residues in the binding pocket. This is extremely approximative, so should not be overestimated. Each residue is evaluated only once.
### Mean local hydrophobic density (normalized)
This descriptor tries to identify if the binding pocket contains local parts that are rather hydrophobic. For each apolar alpha sphere the number of apolar alpha sphere neighbors is detected by seeking for overlapping apolar alpha spheres. The sum of all apolar alpha sphere neighbors is divided by the total number of apolar alpha spheres in the pocket. Last this score is normalized compared to other binding pockets.
### Proportion of apolar alpha spheres (normalized)
This descriptor, returned as percentage, reflects the proportion of apolar alpha spheres among all alpha spheres of one pocket identified by fpocket. This can reflect somehow the hydrophobic/-philic character of a binding pocket.
### Druggability Score
The druggability score is a numerical value between 0 and 1 associated to each pocket using a logistic function. This scores intends to assess the likeliness of the pocket to bind a small drug like molecule. A low score indicates that drug like molecules are likely to not bind to this pocket. A druggability score at 0.5 (the threshold) indicates that binding of prodrugs or druglike molecules can be possible. 1 indicates that binding of druglike molecules is very likely. The theoretical basis of the score is currently in the lengthy process of scientific publication.
### Maximum distance between two alpha sphere (normalized)
This descriptor store the maximum distance found between two alpha sphere in a given pocket.
### Hydrophobicity Score
This descriptor is based on a residue based hydrophobicity scale published by Monera & al. in the Journal of Protein Science 1, 319-329 (1995). For all residues implicated in the binding site the mean hydrophobicity score is calculated and is used as descriptor for the whole pocket. Each residue is evaluated only once.
### Charge Score
According to (http://www.info.univ-angers.fr/~gh/Idas/proprietes.htm) the charge of each amino acid in the binding site is tracked. The mean charge for all amino acids in contact with at least one alpha sphere of the pocket is calculated to form this charge score. Each residue is evaluated only once.
### Volume Score
Similarly to other descriptors, this one is based on data published on (http://www.info.univ-angers.fr/~gh/Idas/proprietes.htm). This data resumes relative volume of different amino acids. In order to calculate this descriptor the mean volume score of all amino acids in contact with at least one alpha sphere of the pocket is calculated. Each residue is evaluated only once.
### Composition of amino acids
As the name indicates, fpocket tracks the composition in amino acids of binding pockets. If at least one atom of a residue is in contact with at least one alpha sphere of a binding pocket it is accounted to be part of the binding site. This descriptor is returned as cumulative list, for instance you can find 2 valines, 3 glutamates etc... in the binding site.
Occurences of amino acids in different descriptor outputs are given in the following order : Ala, Cys, Asp, Glu, Phe, Gly, His, Ile, Lys, Leu, Met, Asn, Pro, Gln, Arg, Ser, Thr, Val, Trp, Tyr.
### Pocket volume
As indicated by the name, this descriptor tries to evaluate the volume of a binding pocket using a Monte-Carlo algorithm that calculates full volume occupied by all alpha sphere in a given pocket. The number of iteration of this algorithm can be controlled using fpocket input parameters.
### Polar Surface Area
This descriptor provides an estimation of the polar surface area of the pocket based on information of the receptor atoms. The method used to calculate the area only provides an approximation, but should be good enough to get some rather relevant estimates.
### Apolar Surface Area
See polar surface area in the previous point, only for apolar atoms.
### Total Surface Area
The sum of the polar and apolar surface area of the pocket, that is to say the receptor side surface area of the pocket.
### B-factor score (normalized)
Please handle with a lot of care this score with native crystal structures. This score is based on the mean B-factor of all atoms of the binding pocket (atoms that are contacted by at least one alpha sphere). As the B factor does not necessarily reflect flexibility in crystal structures, this score is somehow abusive. However, one could imagine performing molecular dynamics or other in order to determine relative flexibility of atoms and store this information in the B-factor column of the PDB file format.
This descriptor is normalized with other pockets of the same protein.
### List of abbreviations used in dpocket & mdpocket output
- pdb : pdb file name
- lig : ligand HET ID
- overlap : overlap of atoms in the actual pocket versus atoms in the pocket identified with fpocket
- PP-crit : binary PocketPicker criterion (1 if the ligand is < 4A from the center of mass of the alpha spheres, 0 else)
- PP-dst : the minimum distance between the center of mass of the pocket and the ligand
- crit4 : proportion of ligand atoms that have at least one vertice that lies within 3 A
- crit5 : proportion of alpha spheres that lie within 3A from any ligand atom
- crit6 : binary criterion that is 1 if crit4 >=0.5 and crit5>=0.2, 0 else
- crit6_continue : a continuous measure of crit6, but this is experimental and we currently don't use it...
- lig_vol : volume of the ligand
- pock_vol : volume of the pocket
- nb_AS : number of alpha spheres
- nb_AS_norm : number of alpha spheres normalized by all pockets on the protein
- mean_as_ray : mean alpha sphere radius
- mean_as_solv_acc : mean alpha sphere solvent accessibility
- apol_as_prop : proportion of apolar alpha spheres in the pocket
- apol_as_prop_norm : normalized proportion of apolar alpha spheres
- mean_loc_hyd_dens : mean local hydrophobic density
- mean_loc_hyd_dens_norm : normalized mean local hydrophobic density
- polarity_score_norm : normalized polarity score
- flex : measure of the flexibility of the pocket (B-factor based)
- prop_polar_atm : proportion of polar atoms
- as_density : alpha sphere density
- as_density_norm : normalized alpha sphere density
- as_max_dst : maximum distance between the center of mass and all alpha spheres
- as_max_dst_norm : normalized as_max_dst
- drug_score : druggability score
- pock_asa : solvent accessible surface area of the pocket
- pock_pol_asa : polar solvent accessible surface area of the pocket
- pock_apol_asa : apolar solvent accessible surface area of the pocket
- pock_asa22 : accessible surface area using a probe of 2.2 A instead of 1.4
- pock_pol_asa22 : see pock_pol_asa and pock_asa22
- pock_apol_asa22 : see pocket_apol_asa and pock_asa22
## Cofactor definition
fpocket, dpocket and tpocket contain in the current release a fixed set of cofactors. So far so good, but what for? Cofactors are often structurally necessary or must be present in the protein structure for ligand binding. The PDB nomenclature, however, treats them as usual hetero atoms, using the HETATM tag. This is the tag that fpocket uses to identify and eliminate crystallographic waters and possible ligands of holo protein structures. In order to force fpocket to keep the cofactor you are interested in, that is to say, to consider it as entire part of the protein structure for binding pocket detection, a list list of HETATM names is defined in the beginning of the `rpdb.c` file under (https://github.com/Discngine/fpocket/blob/master/src/rpdb.c#L39) the name `static const char *ST_keep_hetatm[]`. The next line of code defines the number of cofactors defined in this list : `static const int ST_nb_keep_hetatm = 111` ;
If you would like to add a new cofactor, you have to modifiy this code. First you add the whished HETATM tag to `ST_keep_hetatm` in the end of the list. Thus for example, `“MSE”` will become `“MSE”,”PTE”` if your cofactor has the HETATM tag PTE. Do not forget to increment the `ST_nb_keep_hetatm` variable to `112`, else this cofactor will not be taken into account.
Next you have to recompile the program, before being able to use this new definition.
In future releases this cofactor definition will be done dynamically with an external list.
The following list resumes the cofactors fpocket considers as recurrent in the PDB and useful to keep in protein structures in a systematic manner.
## Customizing fpocket
This section will introduce several ways of customizing fpocket by modifying the source code. We will first gives all instructions needed to recompile and rebuild the full package when any modification of the source code has to be taken into account. Then, we will describe how to write a new scoring function, and how to write your own descriptors and include it to dpocket output. We will not show the full content of the function to modify as we want to stay as concise as possible. The newly added code for these examples will be highlighted in blue.
### How to rebuild the package
After any modification to the fpocket source code, you will logically need to rebuild the package so the modification could be taken into account. Here is the current procedure to do so:
Go to your fpocket codebase:
```bash
make uninstall
make clean
```
Then, you will have to perform the installation process again to rebuild the package.
### Writing your own scoring function
Writing your own scoring function using currently implemented descriptors is a simple task, provided that you are not afraid to write one line of C code. Currently, the fpocket algorithm sort pockets using each pocket score. Each score is calculated by a single function. The source file src/pscoring.c contains the definition of this function that have the following prototype:
```C
float score_pocket(s_desc *pdesc) ;
```
The function takes as argument a pointer to a structure that contains all descriptors currently available in fpocket, and is called for each pocket to be scored. All descriptors available have been described previously, and you can check the exact name given to each of them in the source file headers/descriptors.h that defines the s_desc structure shown here.
Lets say that you just want to score pockets according to the number of alpha sphere of each pocket. To do so, you just have to change the content of score_pocket function and return the right value:
```C
float score_pocket(s_desc *pdesc)
{
float score = (float) pdesc->nb_asph ;
return score ;
}
```
Although this example is really simple, you may now understand that you can write any kind of scoring function, like a linear or non-linear combination of descriptors derived from a regression model or any other method. The only limitation is the use of available descriptors implemented in fpocket.
Of course, although the current scoring function has very satisfying performances using only 4 of the available descriptors, you may want to implement your own set. The next section will give you the basics to do so.
### Writing your own descriptor
So what if you want to write your own descriptors? Well this will be a little more difficult than writing your own scoring function, but nothing is impossible!
Suppose that we want to add a new (and very simple) descriptor: the maximum alpha sphere radius in a given pocket.
First of all, you have to add the variable that will store your descriptor to the structure containing all descriptors. This has to be done in the descriptor.h source file, in the definition of the structure `s_desc`. We will add the following line:
```C
typedef struct s_desc
{ ...
float as_max_r ;
} s_desc ;
```
After adding our variable, we need to give a default value when no calculation have been performed, lets say -1. This is done in the function reset_desc located in the same file:
```C
void reset_desc(s_desc *desc)
{ ...
desc->as_max_r = -1.0 ;
}
```
Let's now implement our descriptor. Go to the `src/descriptor.c` source file. In this file, you fill find the main function that calculate descriptors based on a list of atoms and a list of alpha sphere. Here is the prototype of this function:
```C
void set_descriptors( s_atm **tatoms, int natoms,
s_vvertice **tvert, int nvert,
s_desc *desc) ;
```
As you can see, the function takes in argument a list of atoms, a list of vertices, and an input/output descriptor structure that will actually store all descriptors calculated. When descriptors has to be calculated on a given pocket, we first get all atoms and vertices of the pocket, and we call this function using those atoms and vertices as arguments. The calculation then use information on atoms and vertices to calculate descriptors.
Based on those parameters, you will have to write your own code in this function, and update in consequent the desc variable given in argument so the descriptor value could be stored. Lets do this. You will probably notice that the current code is not fully modular. This is because of computational optimization: a fully modular code sometimes requires additional loop and treatment compared to an optimized code. Anyway, the task is still very simple. Lets go into the part of the code that will do the job.
```C
void set_descriptors( s_atm **tatoms, int natoms,
s_vvertice **tvert, int nvert,
s_desc *desc)
{ ...
float as_max_r = -1.0 ; /* Declare and initialize the descriptor */
...
for(i = 0 ; i < nvert ; i++) {
/* Loop through all vertices and update descriptors */
vcur = tvert[i] ;
if(vcur->ray > as_max_r) as_max_r = vcur->ray ;
...
}
...
desc->as_max_r = as_max_r ; /* Store the descriptor */
}
```
That's it, your descriptor is implemented, as each pocket descriptors is automatically calculated using this function at the end of the fpocket algorithm. Thus, it can now be used in the scoring function described previously, after rebuilding the package of course.
### Normalizing your descriptors
An advantage of normalization is that two descriptors generated from pockets of two different proteins can be compared to each other at a certain degree, depending on the normalization process. For example, if we normalize the number of alpha sphere between 0 and 1 (well here it's more a scaling than a normalization), the largest pocket of any protein will always have 1 as value for the normalized descriptor.
To do so, we can't use the exact same process as adding a given descriptor, because all descriptors of all pockets need to be calculated before the normalization step. Consequently, the calculation of all normalized descriptors is currently performed in the `src/pocket.c` source file. In this file, the function `set_normalized_descriptors` does the job, and have the following prototype:
```C
void set_normalized_descriptors(c_lst_pockets *pockets)
```
As you can see, it simply takes in argument a list of pockets, in fact a simple chained list, e.g. all pockets found in a given protein. Of course each pocket contained in this structure have a descriptor structure associated with.
Lets now enter more deeply into the code, and implement a normalized version of the new descriptors so it ranges between 0 and 1. The first step is similar to the first step needed to implement a new descriptors: you need to add a variable that will store this normalized descriptor in the structures pdesc:
```C
typedef struct s_desc
{ ...
float as_max_r ;
float as_max_r_norm ;
} s_desc ;
```
You can now add the default initialization of this descriptor:
```C
void reset_desc(s_desc *desc)
{ ...
desc->as_max_r = -1.0 ;
desc->as_max_r_norm = -1.0 ;
}
```
Lets implement the descriptor now. Go to the `src/pocket.c` source file, set_normalized_descriptor function. To calculate the normalized descriptor, we need the min and max value of the non-normalized descriptors. Next, we have to loop on the pocket list, update the min and max if necessary, and perform the normalization at the end of the loop. So easy:
```C
void set_normalized_descriptors(c_lst_pockets *pockets)
{ ...
/* Declare min and max */
float as_max_r_m = 1000, /* Initialize to a large value*/
as_max_r_M = -1.0 ; /* Initialize to a small value */
...
cur = pockets->first ;
/* Perform a first processing step, e.g. to set min and max */
while(cur) {
dcur = pcur->pdesc ;
if(cur == pockets->first) {
...
/* If it is the first pocket, min = max = pocket */
as_max_r_m = as_max_r_M = dcur->as_max_r ;
}
else {
...
/* If it is the Nth != 1 pocket, check and update
min and max if necessary*/
if(dcur->as_max_r > as_max_r_M)
as_max_r_M = dcur-> as_max_r ;
else if(dcur->as_max_r < as_max_m)
as_max_r_m = dcur->as_max_r ;
}
cur = cur->next ;
}
/* Perform a second loop to do the actual normalisation */
cur = pockets->first ;
while(cur) {
dcur = cur->pocket->pdesc ;
...
dcur->as_max_r_norm = (dcur->as_max_r - as_max_r_m)
/ (as_max_r_M - as_max_r_m) ;
}
}
```
And that's it. There is a little bit more effort to provide here to normalize the descriptor, but we believe it's not that much to do.
Unfortunately, we haven't taken the time to automatically add any new descriptor to the dpocket input. So basically here, your descriptors is implemented and can be used by a scoring function, but is not written to the dpocket output. The next paragraph will learn you how to so, it's very easy.
### Including your descriptor in dpocket
Although it would be possible, we haven't taken the time to construct a system that would detect and add automatically any new descriptor to the dpocket output.
So let's do this manually. The dpocket output format is defined by 3 macros in the dpocket.h header file:
```C
#define M_DP_OUTP_HEADER "pdb lig ...”
#define M_DP_OUTP_FORMAT "%s %s ...”
#define M_DP_OUTP_VAR (fc, l, ovlp, status, dst, lv, d) fc, l, ...
```
The first macro defines the header of the output file. The second macro corresponds to the format of each value to output given to the fprintf function. Finally, the last macro is the list of variables, with d being the pointer to the descriptor structure defined previously. Basically, writing the dpocket output for each pocket requires two main processes: write the header, and loop to write each pocket descriptor.
To include our descriptor into the dpocket output, we just need to add the header label of the descriptor, add the output format of the descriptor, and add the descriptor itself. Those three steps will modify the first, the second, and the third macro defined previously, respectively. The only difficulty is to keep the correspondence between of all 3 positions (header, format and variable) in the line: column number (position) of the header corresponding to the number of alpha sphere must correspond the that of the format and variable. For example, if we want to add our normalized variable at the first position of dpocket output, it would give:
```C
#define M_DP_OUTP_HEADER "as_max_r pdb lig ...”
#define M_DP_OUTP_FORMAT "%3.5f %s %s ...”
#define M_DP_OUTP_VAR (fc, l, ovlp, status, dst, lv, d) d->as_max_r, fc, l, ovlp, ...
```
That's all. Remember to be careful on this step: adding a new descriptor to dpocket is really easy in theory, but losing the correspondence between header, format and variable position columns is easy too, in which case interpretation, visualization and analysis of dpocket output can become somehow difficult or even meaningless.
### Including your descriptor in mdpocket
Adding a descriptor to mdpocket works pretty much the same way than in dpocket. So write your own descriptor like described previously for dpocket. The only difference is the last step, instead of modifying the dpocket.h macros you should modify the macros of mdpocket.h. They are constructed exactly the same way and are even somehow easier because smaller:
```C
#define M_MDP_OUTP_HEADER "snapshot pock_volume nb_AS...”
#define M_MDP_OUTP_FORMAT "%d %4.2f %d %4.2f %4.2f %4.2f..."
#define M_MDP_OUTP_VAR(i, d) i, d->volume ...
```
Simply add the header of your descriptor the output header macro, the output format to the format macro and the variable to the variable macro, exactly like in the previously described dpocket.h file.

153
doc/INSTALLATION.md Normal file
View File

@@ -0,0 +1,153 @@
# Installation
## Prerequisites
Currently fpocket proposes two different ways for visualization of binding pockets. Both are based on commonly used molecular visualization tools : VMD and PyMol. In order to use visualization you need to install at least one of both softwares, or any other valid tool able to read standard PDB files (Chimera, MOE, Maestro etc).
Currently, visualization using VMD has better rendering and performances and visualization using PyMol better handling of binding pockets. You can download VMD for free from http://www.ks.uiuc.edu/Research/vmd/. PyMol can be freely downloaded from https://pymol.org/2/.
## Dependencies
fpocket relies on Qhull. In the officially released version fpocket ships Qhull with it and Qhull compilation is automatically done when compiling and installing fpocket. Since the 3.0 release of fpocket
- libnetcdf and
- libstdc++
are required to compile fpocket.
## System Requirements
fpocket is available for Linux/Unix type OS's, and also MacOSX (so basically all OS's that don't completely suck).
In order to run fpocket, you should have at minimum a Pentium III 500 Mhz (does that still exist?) with 128Mb of RAM (lol). This program was co-developed and tested under the following Linux distributions : openSuse 10.3 (and newer), Centos 5.2, Fedora Core 7, Ubuntu 8.10 as well as Mac OS X (10.5, 10.6, 10.14.6). You need a valid C compiler like gcc or clang (for mac).
## Getting Started
These instructions will get you a copy of the project up and running on your local machine for development and testing purposes. See deployment for notes on how to deploy the project on a live system.
### Prerequisites
The most recent versions (starting with fpocket 3.0) make use of the molfile plugin from VMD. This plugin is shipped with fpocket. However, now you need to install the netcdf library on your system. This is typically called netcdf-devel or so, depending on you linux distribution.
fpocket needs to be compiled to run on your machine. For this you'll need the gnu c compiler (or another one, but didn't test with others than GCC).
install netcdf-devel on ubuntu type :
```
sudo apt-get install libnetcdf-dev
```
on a RHEL based distribution something like this should do:
```
sudo yum install netcdf-devel.x86_64
```
### Installing
Download the sources from github via the website or using git clone and then build and deploy fpocket using the following commands.
#### Compiling on Linux
```bash
git clone https://github.com/Discngine/fpocket.git
cd fpocket
make
sudo make install
```
#### Compiling on OSX
Install MacPorts https://www.macports.org/ for instance (needed for netcdf install)
```bash
sudo port install netcdf
export LIBRARY_PATH=/opt/local/lib
git clone https://github.com/Discngine/fpocket.git
cd fpocket
make ARCH=MACOSXX86_64
sudo make install
```
End with an example of getting some data out of the system or using it for a little demo
## Running the tests
The source code of fpocket is shipped with samples. They can be found in the data/sample folder. Try to run fpocket against the 1uyd sample to check if it's running OK.
```
cd data/sample
fpocket -f 1UYD.pdb
```
fpocket should state when it's beginning to search pocket and also when it's ending the search. Upon completion the folder should now contain a folder called 1UYD_out. Check whether the folder exists and the pdb files contain data and the pocket info file contains results.
## Frequent issues encountered
### netcdf issues
```
cannot find -lnetcdf
```
mdpocket supports reading and writing NETCDF formatted files. In order to use this you need to install the netcdf development libraries on your system.
#### Centos:
This can be achieved like this :
```
yum install -y epel-release #if the epel repo is not yet activated on your system
yum install -y netcdf-devel
```
#### Ubuntu:
```
sudo apt-get install libnetcdf-dev
```
#### OSX:
Install MacPorts https://www.macports.org/ for instance (needed for netcdf install)
```
sudo port install netcdf
export LIBRARY_PATH=/opt/local/lib
```
Run make again after installing this library. Mdpocket / fpocket should build just fine now.
### stdc++ issues
```
cannot find -lstdc++
```
You need to install the stc++ static libraries to build fpocket & mdpocket.
#### Centos:
On centos 7.4 this can be done like this :
```
yum install -y libstc++-static
```
#### Ubuntu:
```
sudo apt-get install libstdc++6
```
### Linking to molfile plugin issues
If you observe an error similar to this one
```
ld: warning: ignoring file plugins/MACOSXX86/molfile/libmolfile_plugin.a, file was built for archive which is not the architecture being linked (x86_64): plugins/MACOSXX86/molfile/libmolfile_plugin.a
Undefined symbols for architecture x86_64:
"_molfile_parm7plugin_init", referenced from:
_read_topology in topology.o
"_molfile_parm7plugin_register", referenced from:
_read_topology in topology.o
ld: symbol(s) not found for architecture x86_64
clang: error: linker command failed with exit code 1 (use -v to see invocation)
make[1]: *** [bin/fpocket] Error 1
make: *** [all] Error 2
```
then statically built libmolfile_plugin is not compatible with your machine. First check out that the ARCH variable set in the first line of the Makefile of fpocket actually reflects the architecture you want. For now I'm trying to support linux 64 bit systems and OSX 64 (LINUXAMD64) bit systems built with clang 32 and 64 bit (MACOSXX86 MACOSXX86_64). So all should work out of the box. If they do not, you might need to build the molfile plugin for your architecture. All available system architectures for the molfile plugin can be found in the plugins folder tree : [plugins directory](https://github.com/Discngine/fpocket/tree/master/plugins).
Here you can find more information on how to build the molfile plugin on CentOS 7.4:
[compile molfile plugin on centos 7.4 - Discngine blog post](https://www.discngine.com/blog/2019/5/25/building-the-vmd-molfile-plugin-from-source-code)
Once built, copy the architecture folder into the fpocket/plugins directory and make sure to declare this architecture in the ARCH variable in the Makefile. Finally run make again.
If you manage to build for other architectures and it works, I'd be happy to accept PR's with the relevant plugin architectures as I cannot build all of them on my own ;).
## Read next
* [Getting Started](GETTINGSTARTED.md)
* [Advanced Features](ADVANCED.md)

45
doc/INTRODUCTION.md Normal file
View File

@@ -0,0 +1,45 @@
# Introduction
Thanks for taking the time to read this official userguide of fpocket. In this guide are presented general functionalities of the fpocket program and its derivatives, dpocket, tpocket and mdpocket. Yes, indeed fpocket is a package of four distinct programs, mentioned here before. fpocket is an acronym for “finding” pocket; dpocket is an acronym for “describing” pockets as it is for extraction of physico-chemical descriptors of pockets; tpocket is an acronym for “testing” pockets, as it is used for testing on a large scale scoring function for ranking protein cavities developed with fpocket, among each other. mdpocket was named after pocket detection on molecular dynamics (MD) trajectories.
This is not a usual guide. You can find here elements you can find in usual user guides, but we included several examples in the getting started section, which should enhance fast understanding of how to work with fpocket. The getting started guide can be understood like a mini tutorial of basic functionality of this software.
Furthermore, we don't take ourselves too seriously, so the way this manual is written might not correspond to the industry standard ;)
## License & Copyright
This program is published under the MIT Licence. Basically do whatever you want with it.
Vincent Le Guilloux, Peter Schmidtke are authors of fpocket, dpocket, tpocket (which perform protein cavity detection, cavity descriptor extraction, large scale cavity prediction evaluations) Peter Schmidtke is the author of mdpocket which performs pocket detection and descriptor extraction on MD trajectories).
Contributions
The initial fpocket software was developed, validated, documented and distributed by Vincent Le Guilloux & Peter Schmidtke. Both, contributed equally to this project. The initial work on fpocket was initiated and supervised by Pierre Tufféry.
Latest extensions were developed, validated, documented and distributed by Peter Schmidtke (mdpocket, druggability score, energy calculations) supervised by Xavier Barril.
## Publication & Citation
The methods paper about this software was published in BMC Bioinformatics. In order to cite fpocket in the future, please cite this paper :
- Vincent Le Guilloux, Peter Schmidtke and Pierre Tuffery, “Fpocket: An open source platform for ligand pocket detection”, BMC Bioinformatics 2009, 10:168
If you use the druggability score of fpocket, please cite :
- Peter Schmidtke & Xavier Barril “Understanding and predicting druggability. A high-throughput method for detection of drug binding sites.”, J Med Chem, 2010, 53(15):5858-67
Last, the mdpocket paper has been published too and can be cited using:
- Peter Schmldtke, Axel Bidon-Chanal, Javier Luque, Xavier Barril, “MDpocket: open-source cavity detection and characterization on molecular dynamics trajectories.”, Bioinformatics. 2011 Dec 1;27(23):3276-85
Contact
If you want to contact the fpocket developers please create a github issue here: https://github.com/Discngine/fpocket/issues
We are happy about positive, negative, in any way constructive feedback.
## Read next
* [Installation](INSTALLATION.md)
* [Getting Started](GETTINGSTARTED.md)
* [Advanced Features](ADVANCED.md)

15
doc/MANUAL.md Normal file
View File

@@ -0,0 +1,15 @@
# fpocket User Manual
fpocket is a protein pocket prediction algorithm. Given a PDB protein structure it enables the user to identify potent binding sites. Based on Voronoi tessellation, this algorithm is very fast and particularly well suited for large scale protein binding pocket screenings and development of scoring functions for binding pocket characterization. Now, fpocket also allows pocket detection on MD trajectories and assessment of the volume & the druggability of a binding site. Last, also interaction energy calculations are now possible using fpocket & mdpocket.
## Notes
1. This program uses output coming from Qhull. Qhull is integrated within fpocket. More information about Qhull can be found in the paper : Barber, C.B., Dobkin, D.P., and Huhdanpaa, H.T., "The Quickhull algorithm for convex hulls," ACM Trans. on Mathematical Software, 22(4):469-483, Dec 1996, http://www.qhull.org
2. Part of this software includes code based on external code developed by the Theoretical and Computational Biophysics Group in the Beckman Institute for Advanced Science and Technology at the University of Illinois at Urbana-Champaign. The PDB parser of the Molfile Plugin of VMD were modified for the purposes of fpocket's PDB parsing. Furthermore, the molfile plugin allows now mdpocket to analyse various MD trajectory formats.
3. Within the whole documentation code and output from computer programs are represented and formatted in the following way : `ls -1 > out.txt`
4. This documentation, as well as the software itself, is under steady change. The fpocket developer team tries to provide a useful and easy to understand documentation, a thing that completely lacks in most of scientific open source softwares nowadays. In our opinion an open source software is useless without documentation of the source code on one side and documentation of the software on the other. Thus, we welcome every suggestion to improve this documentation in terms of accuracy, clarity and completeness.
## Contents
* [Introduction](INTRODUCTION.md)
* [Installation](INSTALLATION.md)
* [Getting Started & Advanced Features](GETTINGSTARTED.md)

View File

Before

Width:  |  Height:  |  Size: 57 B

After

Width:  |  Height:  |  Size: 57 B

View File

Before

Width:  |  Height:  |  Size: 1.3 KiB

After

Width:  |  Height:  |  Size: 1.3 KiB

Some files were not shown because too many files have changed in this diff Show More