Files
rdkit/Docs/Book/Overview.rst
Greg Landrum 1ac48b6242 docs update
2013-07-05 04:55:38 +02:00

161 lines
6.2 KiB
ReStructuredText
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
An overview of the RDKit
%%%%%%%%%%%%%%%%%%%%%%%%
What is it?
===========
- Open source toolkit for cheminformatics
- BSD licensed
- Core data structures and algorithms in C++
- Python (2.x) wrapper generated using Boost.Python
- Java and C# wrappers generated with SWIG
- 2D and 3D molecular operations
- Descriptor generation for machine learning
- Molecular database cartridge for PostgreSQL
- Cheminformatics nodes for KNIME (distributed from the KNIME community site: http://tech.knime.org/community/rdkit)
- Operational:
- http://www.rdkit.org
- Supports Mac/Windows/Linux
- Quarterly releases
- Web presence:
- Homepage: http://www.rdkit.org
Documentation, links
- Github (https://github.com/rdkit)
Bug tracker, git repository
- Sourceforge (http://sourceforge.net/projects/rdkit)
Mailing lists, Downloads, SVN repository (not always up-to-date)
- Google code (http://code.google.com/p/rdkit/)
Downloads, wiki
- Mailing lists at https://sourceforge.net/p/rdkit/mailman/, searchable archives available for
`rdkit-discuss <http://www.mail-archive.com/rdkit-discuss@lists.sourceforge.net/>`_ and
`rdkit-devel <http://www.mail-archive.com/rdkit-devel@lists.sourceforge.net/>`_
- History:
- 2000-2006: Developed and used at Rational Discovery for building predictive models for ADME, Tox, biological activity
- June 2006: Open-source (BSD license) release of software, Rational Discovery shuts down
- to present: Open-source development continues, use within Novartis, contributions from Novartis back to open-source version
Functionality overview
======================
- Input/Output: SMILES/SMARTS, SDF, TDT, SLN [1]_, Corina mol2 [1]_
- “Cheminformatics”:
- Substructure searching
- Canonical SMILES
- Chirality support (i.e. R/S or E/Z labeling)
- Chemical transformations (e.g. remove matching substructures)
- Chemical reactions
- Molecular serialization (e.g. mol <-> text)
- 2D depiction, including constrained depiction
- 2D->3D conversion/conformational analysis via distance geometry
- UFF implementation for cleaning up structures
- Fingerprinting: Daylight-like, atom pairs, topological torsions, Morgan algorithm, “MACCS keys”, etc.
- Similarity/diversity picking
- 2D pharmacophores [1]_
- Gasteiger-Marsili charges
- Hierarchical subgraph/fragment analysis
- Bemis and Murcko scaffold determination
- RECAP and BRICS implementations
- Multi-molecule maximum common substructure [2]_
- Feature maps
- Shape-based similarity
- Molecule-molecule alignment
- Shape-based alignment (subshape alignment [3]_) [1]_
- Integration with PyMOL for 3D visualization
- Functional group filtering
- Salt stripping
- Molecular descriptor library:
- Topological (κ3, Balaban J, etc.)
- Compositional (Number of Rings, Number of Aromatic Heterocycles, etc.)
- Electrotopological state (Estate)
- clogP, MR (Wildman and Crippen approach)
- “MOE like” VSA descriptors
- Feature-map vectors [4]_
- Machine Learning:
- Clustering (hierarchical)
- Information theory (Shannon entropy, information gain, etc.)
- Tight integration with the `IPython <http://ipython.org>`_ notebook and qtconsole.
.. [1] These implementations are functional but are not necessarily the best, fastest, or most complete.
.. [2] Contribution from Andrew Dalke
.. [3] Putta, S., Eksterowicz, J., Lemmen, C. & Stanton, R. "A Novel Subshape Molecular Descriptor" *Journal of Chemical Information and Computer Sciences* **43:162335** (2003).
.. [4] Landrum, G., Penzotti, J. & Putta, S. "Feature-map vectors: a new class of informative descriptors for computational drug discovery" *Journal of Computer-Aided Molecular Design* **20:75162** (2006).
The Contrib Directory
=====================
The Contrib directory, part of the standard RDKit distribution, includes code that has been contributed by members of the community.
- **LEF**: Local Environment Fingerprints
Contains python source code from the publications:
- A. Vulpetti, U. Hommel, G. Landrum, R. Lewis and C. Dalvit, "Design and NMR-based screening of LEF, a library of chemical fragments with different Local Environment of Fluorine" *J. Am. Chem. Soc.* **131** (2009) 12949-12959. http://dx.doi.org/10.1021/ja905207t
- A. Vulpetti, G. Landrum, S. Ruedisser, P. Erbel and C. Dalvit, "19F NMR Chemical Shift Prediction with Fluorine Fingerprint Descriptor" *J. of Fluorine Chemistry* **131** (2010) 570-577. http://dx.doi.org/10.1016/j.jfluchem.2009.12.024
Contribution from Anna Vulpetti
- **M_Kossner**:
Contains a set of pharmacophoric feature definitions as well as code for finding molecular frameworks.
Contribution from Markus Kossner
- **PBF**: Plane of best fit
Contains C++ source code and sample data from the publication:
N. C. Firth, N. Brown, and J. Blagg, "Plane of Best Fit: A Novel Method to Characterize the Three-Dimensionality of Molecules" *Journal of Chemical Information and Modeling* **52** 2516-2525 (2012). http://pubs.acs.org/doi/abs/10.1021/ci300293f
Contribution from Nicholas Firth
- **mmpa**: Matched molecular pairs
Python source and sample data for an implementation of the matched-molecular pair algorithm described in the publication:
Hussain, J., & Rea, C. "Computationally efficient algorithm to identify matched molecular pairs (MMPs) in large data sets." *Journal of chemical information and modeling* **50** 339-348 (2010). http://dx.doi.org/10.1021/ci900450m
Includes a fragment indexing algorithm from the publication:
Wagener, M., & Lommerse, J. P. "The quest for bioisosteric replacements." *Journal of chemical information and modeling* **46** 677-685 (2006).
Contribution from Jameed Hussain.
License
=======
This document is copyright (C) 2013 by Greg Landrum
This work is licensed under the Creative Commons Attribution-ShareAlike 3.0 License.
To view a copy of this license, visit http://creativecommons.org/licenses/by-sa/3.0/ or send a letter to Creative Commons, 543 Howard Street, 5th Floor, San Francisco, California, 94105, USA.
The intent of this license is similar to that of the RDKit itself.
In simple words: “Do whatever you want with it, but please give us some credit.”