documentaiton update

This commit is contained in:
Greg Landrum
2013-01-13 19:47:58 +00:00
parent 2cbcfd8529
commit 6ebc57b753
10 changed files with 5612 additions and 6 deletions

View File

@@ -17,6 +17,44 @@ send them to the mailing list: rdkit-discuss@lists.sourceforge.net
(you will need to subscribe first)
Miscellaneous Topics
********************
Using a different aromaticity model
-----------------------------------
By default, the RDKit applies its own model of aromaticity (explained
in the RDKit Theory Book) when it reads in molecules. It is, however,
fairly easy to override this and use your own aromaticity model.
The easiest way to do this is it provide the molecules as SMILES with
the aromaticity set as you would prefer to have it. For example,
consider indole:
.. image:: images/indole1.png
By default the RDKit considers both rings to be aromatic:
>>> from rdkit import Chem
>>> m = Chem.MolFromSmiles('N1C=Cc2ccccc12')
>>> m.GetSubstructMatches(Chem.MolFromSmarts('c'))
((1,), (2,), (3,), (4,), (5,), (6,), (7,), (8,))
If you'd prefer to treat the five-membered ring as aliphatic, which is
how the input SMILES is written, you just need to do a partial
sanitization that skips the kekulization and aromaticity perception
steps:
>>> m2 = Chem.MolFromSmiles('N1C=Cc2ccccc12',sanitize=False)
>>> Chem.SanitizeMol(m2,sanitizeOps=Chem.SanitizeFlags.SANITIZE_ALL^Chem.SanitizeFlags.SANITIZE_KEKULIZE^Chem.SanitizeFlags.SANITIZE_SETAROMATICITY)
rdkit.Chem.rdmolops.SanitizeFlags.SANITIZE_NONE
>>> m2.GetSubstructMatches(Chem.MolFromSmarts('c'))
((3,), (4,), (5,), (6,), (7,), (8,))
It is, of course, also possible to write your own aromaticity
perception function, but that is beyond the scope of this document.
Manipulating Molecules
**********************

View File

@@ -110,6 +110,15 @@ An alternate type of Supplier, the :api:`rdkit.Chem.rdmolfiles.ForwardSDMolSuppl
24
26
This means that they can be used to read from compressed files:
>>> import gzip
>>> inf = gzip.open('data/actives_5ht3.sdf.gz')
>>> gzsuppl = Chem.ForwardSDMolSupplier(inf)
>>> ms = [x for x in gzsuppl if x is not None]
>>> len(ms)
180
Note that ForwardSDMolSuppliers cannot be used as random-access objects:
>>> fsuppl[0]
@@ -380,7 +389,6 @@ True
>>> ri.IsBondInRingOfSize(1,3)
True
Modifying molecules
===================
@@ -424,6 +432,7 @@ rdkit.Chem.rdmolops.SanitizeFlags.SANITIZE_NONE
>>> m.GetBondWithIdx(0).GetBondType()
rdkit.Chem.rdchem.BondType.AROMATIC
The value returned by `SanitizeMol()` indicates that no problems were encountered.
Working with 2D molecules: Generating Depictions
================================================
@@ -548,6 +557,57 @@ The small overhead associated with python's pickling machinery normally doesn't
In a test I just ran on my laptop, loading a set of 699 drug-like molecules from an SD file took 10.8 seconds; loading the same molecules from a pickle file took 0.7 seconds.
The pickle file is also smaller 1/3 the size of the SD file but this difference is not always so dramatic (it's a particularly fat SD file).
Drawing Molecules
=================
The RDKit has some built-in functionality for creating images from
molecules found in the :api:`rdkit.Chem.Draw` package:
>>> suppl = Chem.SDMolSupplier('data/cdk2.sdf')
>>> ms = [x for x in suppl if x is not None]
>>> for m in ms: tmp=AllChem.Compute2DCoords(m)
>>> from rdkit.Chem import Draw
>>> Draw.MolToFile(ms[0],'images/cdk2_mol1.png')
>>> Draw.MolToFile(ms[1],'images/cdk2_mol2.png')
Producing these images:
+----------------------------------+----------------------------------+
| .. image:: images/cdk2_mol1.png | .. image:: images/cdk2_mol2.png |
+----------------------------------+----------------------------------+
It's also possible to produce an image grid out of a set of molecules:
>>> img=Draw.MolsToGridImage(ms[:8],molsPerRow=4,subImgSize=(200,200),legends=[x.GetProp("_Name") for x in ms[:8]])
This returns a PIL image, which can then be saved to a file:
>>> img.save('images/cdk2_molgrid.png')
The result looks like this:
.. image:: images/cdk2_molgrid.png
These would of course look better if the common core were
aligned. This is easy enough to do:
>>> p = Chem.MolFromSmiles('[nH]1cnc2cncnc21')
>>> subms = [x for x in ms if x.HasSubstructMatch(p)]
>>> len(subms)
14
>>> AllChem.Compute2DCoords(p)
0
>>> for m in subms: AllChem.GenerateDepictionMatching2DStructure(m,p)
>>> img=Draw.MolsToGridImage(subms,molsPerRow=4,subImgSize=(200,200),legends=[x.GetProp("_Name") for x in subms])
>>> img.save('images/cdk2_molgrid.aligned.png')
The result looks like this:
.. image:: images/cdk2_molgrid.aligned.png
Substructure Searching
**********************
@@ -601,7 +661,19 @@ False
>>> m.HasSubstructMatch(Chem.MolFromSmarts('COc')) #<- need an aromatic C
True
There's also functionality for using the substructure machinery for doing quick molecular transformations.
Chemical Transformations
************************
The RDKit contains a number of functions for modifying molecules. Note
that these transformation functions are intended to provide an easy
way to make simple modifications to molecules.
For more complex transformations, use the `Chemical Reactions`_ functionality.
Substructure-based transformations
==================================
There's a variety of functionality for using the RDKit's
substructure-matching machinery for doing quick molecular transformations.
These transformations include deleting substructures:
>>> m = Chem.MolFromSmiles('CC(=O)O')
@@ -659,8 +731,25 @@ This can be split into separate molecules using :api:`rdkit.Chem.rdmolops.GetMol
>>> Chem.MolToSmiles(rs[1],True)
'[5*]C(=O)O'
Note that these transformation functions are intended to provide an easy way to make simple modifications to molecules.
For more complex transformations, use the `Chemical Reactions`_ functionality.
Murcko Decomposition
====================
The RDKit provides standard Murcko-type decomposition [#bemis1]_ of molecules
into scaffolds:
>>> from rdkit.Chem.Scaffolds import MurckoScaffold
>>> cdk2mols = Chem.SDMolSupplier('data/cdk2.sdf')
>>> m1 = cdk2mols[0]
>>> core = MurckoScaffold.GetScaffoldForMol(m1)
>>> Chem.MolToSmiles(core)
'c1nc2cncnc2[nH]1'
or into a generic framework:
>>> fw = MurckoScaffold.MakeScaffoldGeneric(core)
>>> Chem.MolToSmiles(fw)
'C1CC2CCCCC2C1'
Maximum Common Substructure
@@ -1176,6 +1265,53 @@ rdkit.Chem.rdmolops.SanitizeFlags.SANITIZE_NONE
>>> Chem.MolToSmiles(p0)
'c1ccccc1'
Advanced Reaction Functionality
===============================
Protecting Atoms
----------------
Sometimes, particularly when working with rxn files, it is difficult
to express a reaction exactly enough to not end up with extraneous
products. The RDKit provides a method of "protecting" atoms to
disallow them from taking part in reactions.
This can be demonstrated re-using the amide-bond formation reaction used
above. The query for amines isn't specific enough, so it matches any
nitrogen that has at least one H attached. So if we apply the reaction
to a molecule that already has an amide bond, the amide N is also
treated as a reaction site:
>>> rxn = AllChem.ReactionFromRxnFile('data/AmideBond.rxn')
>>> acid = Chem.MolFromSmiles('CC(=O)O')
>>> base = Chem.MolFromSmiles('CC(=O)NCCN')
>>> ps = rxn.RunReactants((acid,base))
>>> len(ps)
2
>>> Chem.MolToSmiles(ps[0][0])
'CC(=O)N(CCN)C(C)=O'
>>> Chem.MolToSmiles(ps[1][0])
'CC(=O)NCCNC(C)=O'
The first product corresponds to the reaction at the amide N.
We can prevent this from happening by protecting all amide Ns. Here we
do it with a substructure query that matches amides and thioamides and
then set the "_protected" property on matching atoms:
>>> amidep = Chem.MolFromSmarts('[N;$(NC=[O,S])]')
>>> for match in base.GetSubstructMatches(amidep):
... base.GetAtomWithIdx(match[0]).SetProp('_protected','1')
Now the reaction only generates a single product:
>>> ps = rxn.RunReactants((acid,base))
>>> len(ps)
1
>>> Chem.MolToSmiles(ps[0][0])
'CC(=O)NCCNC(C)=O'
Recap Implementation
====================
@@ -1225,6 +1361,62 @@ The nodes themselves have associated molecules:
'[*]C(=O)CC'
BRICS Implementation
====================
The RDKit also provides an implementation of the BRICS
algorithm. [#degen]_ BRICS provides another
method for fragmenting molecules along synthetically accessible bonds:
>>> from rdkit.Chem import BRICS
>>> cdk2mols = Chem.SDMolSupplier('data/cdk2.sdf')
>>> m1 = cdk2mols[0]
>>> list(BRICS.BRICSDecompose(m1))
['[4*]CC(=O)C(C)C', '[14*]c1nc(N)nc2[nH]cnc21', '[3*]O[3*]']
>>> m2 = cdk2mols[20]
>>> list(BRICS.BRICSDecompose(m2))
['[3*]OC', '[1*]C(=O)NN(C)C', '[14*]c1[nH]nc2c1C(=O)c1c-2cccc1[16*]', '[5*]N[5*]', '[16*]c1ccc([16*])cc1']
Notice that RDKit BRICS implementation returns the unique fragments
generated from a molecule and that the dummy atoms are tagged to
indicate which type of reaction applies.
It's quite easy to generate the list of all fragments for a
group of molecules:
>>> allfrags=set()
>>> for m in cdk2mols:
... pieces = BRICS.BRICSDecompose(m)
... allfrags.update(pieces)
>>> len(allfrags)
90
>>> list(allfrags)[:5]
['[4*]CC[NH3+]', '[14*]c1cnc[nH]1', '[16*]c1cc([16*])c2c3c(ccc2F)NC(=O)c31', '[16*]c1ccc([16*])c(Cl)c1', '[15*]C1CCCC1']
The BRICS module also provides an option to apply the BRICS rules to a
set of fragments to create new molecules:
>>> import random
>>> random.seed(127)
>>> fragms = [Chem.MolFromSmiles(x) for x in allfrags]
>>> ms = BRICS.BRICSBuild(fragms)
The result is a generator object:
>>> ms
<generator object BRICSBuild at 0x...>
That returns molecules on request:
>>> prods = [ms.next() for x in range(10)]
>>> Chem.MolToSmiles(prods[0],True)
'O=[N+]([O-])c1ccc(C2CCCO2)cc1'
>>> Chem.MolToSmiles(prods[1],True)
'c1ccc(C2CCCO2)cc1'
>>> Chem.MolToSmiles(prods[2],True)
'NS(=O)(=O)c1ccc(C2CCCO2)cc1'
Chemical Features and Pharmacophores
************************************
@@ -1823,7 +2015,9 @@ These are adapted from the definitions in Gobbi, A. & Poppinger, D. “Genetic o
.. [#nilakantan] Nilakantan, R.; Bauman N.; Dixon J.S.; Venkataraghavan R. “Topological Torsion: A New Molecular Descriptor for SAR Applications. Comparison with Other Desciptors.” *J. Chem.Inf. Comp. Sci.* **27**:82-5 (1987).
.. [#rogers] Rogers, D.; Hahn, M. “Extended-Connectivity Fingerprints.” *J. Chem. Inf. and Model.* **50**:742-54 (2010).
.. [#ashton] Ashton, M. et al. “Identification of Diverse Database Subsets using Property-Based and Fragment-Based Molecular Descriptions.” *Quantitative Structure-Activity Relationships* **21**:598-604 (2002).
.. [#bemis1] Bemis, G. W.; Murcko, M. A. "The Properties of Known Drugs. 1. Molecular Frameworks." *J. Med. Chem.* **39**:2887-93 (1996).
.. [#lewell] Lewell, X.Q.; Judd, D.B.; Watson, S.P.; Hann, M.M. “RECAP-Retrosynthetic Combinatorial Analysis Procedure: A Powerful New Technique for Identifying Privileged Molecular Fragments with Useful Applications in Combinatorial Chemistry” *J. Chem. Inf. Comp. Sci.* **38**:511-22 (1998).
.. [#degen] Degen, J.; Wegscheid-Gerlach, C.; Zaliani, A; Rarey, M. "On the Art of Compiling and Using Drug-Like Chemical Fragment Spaces." *ChemMedChem* **3**:15037 (2008).
.. [#gobbi] Gobbi, A. & Poppinger, D. "Genetic optimization of combinatorial libraries." *Biotechnology and Bioengineering* **61**:47-54 (1998).
.. [#rxnsmarts] A more detailed description of reaction smarts, as defined by the rdkit, is in the :doc:`RDKit_Book`.

View File

@@ -48,9 +48,9 @@ copyright = u'2012, Greg Landrum'
# built documents.
#
# The short X.Y version.
version = '2012.09'
version = '2012.12'
# The full version, including alpha/beta/rc tags.
release = '2012.09.1'
release = '2012.12.1'
# The language for content autogenerated by Sphinx. Refer to documentation
# for a list of supported languages.

Binary file not shown.

5374
Docs/Book/data/cdk2.sdf Executable file

File diff suppressed because it is too large Load Diff

Binary file not shown.

After

Width:  |  Height:  |  Size: 8.2 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 11 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 80 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 43 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 6.4 KiB