mirror of
https://github.com/rdkit/rdkit.git
synced 2026-06-05 22:04:27 +08:00
* Adds C++ Enumeration Engine to the RDKit * Adds Sanitization helpers, wrappers and tests * Clang format * Remove unused enumerationStateOnly flag * Fixes docStrings to current API * Adds doc strings * Removes RGroupPosition, adds getPosition to EnumerationBase * Fixes readability. * Adds EnumerateLibraryBase::reset and getReaction * Added getReagents method to EnumerateLibrary * Make the tests have the same naming * Need to save the initial state for resetting. * Stupid case-insensitive file systems * Moves ResetState to EnumerateLibraryBase * Adds removeNonmatchingReagents helper * Renames currentPosition to getPosition * Adds Enumeration Toolkit tutorial * Fixes Python3 serialization and enumerators * Verified to run on python2 and 3 * Fixes integer issues on windows * The number of enumeration should be unsigned. * Adds deserialization constructor * Moves boost_serialization to the end * Deprecates Clone in favor of copy * Update tests to use copy.copy not Clone * Move RGROUPS and BBS into an EnumerationTypes namespace * Make sure old pickles work * Adds pickle for backwards compatibility * Moves to uint64_t from size_t for public api * Whups, accidentally used the binary archiver. * Commits boost 1.55 serialization * Makes serialization turnoffable Like Filter Catalog * Fixes tests when serialization not available. Adds more enumeration strategy tests * Fixes a syntax error on some versions of python * Fixes sanitizeRxn to actually make proper RGroup atoms * Updates SanitizeRXN python API * Updates Enumeration API to a parameter class - fixes reagent removal * Adds a mess of tests * Change stats to return a string. * Exposes EvenPairSamplingStrategy Stats to python * Fixes a crash bug in SanitizeRxn * Adds better testing of the even pair sampling * Fixes namespace * One more try to fix gcc * Enum classes are c++11 and a microsoft extension. * Fix typo * Fixes np.median for python3 * Fixes atom iterators * Adds virtual tags to derived virtual functions (for clarity) * Fixes size comparison issues * Adds doc string * Small cleanup (has no effect since flags aren’t used) * fixes crash bug on windows * get the tests working on windows * Updates tutorial * Adds Glare implementation to Contrib
1096 lines
42 KiB
Plaintext
1096 lines
42 KiB
Plaintext
{
|
|
"cells": [
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"RDKit Enumeration Toolkit \n",
|
|
"=========================\n",
|
|
"\n",
|
|
"RDKit Reaction Enumeration Toolkit tutorial.\n",
|
|
"\n",
|
|
"Here you will learn how to enumerate reactions with various building blocks."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 1,
|
|
"metadata": {
|
|
"collapsed": false
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"from __future__ import print_function\n",
|
|
"from rdkit.Chem import AllChem\n",
|
|
"from rdkit.Chem import rdChemReactions\n",
|
|
"from rdkit.Chem.AllChem import ReactionFromRxnBlock, ReactionToRxnBlock\n",
|
|
"from rdkit.Chem.Draw import IPythonConsole\n",
|
|
"IPythonConsole.ipython_useSVG=True"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "raw",
|
|
"metadata": {},
|
|
"source": [
|
|
"The first thing we need is a reaction. Lets do a simple enumeration here, a halogen and a primary amine."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 2,
|
|
"metadata": {
|
|
"collapsed": true
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"rxn_data = \"\"\"$RXN\n",
|
|
"\n",
|
|
" ISIS 090220091539\n",
|
|
"\n",
|
|
" 2 1\n",
|
|
"$MOL\n",
|
|
"\n",
|
|
" -ISIS- 09020915392D\n",
|
|
"\n",
|
|
" 2 1 1 0 0 0 0 0 0 0999 V2000\n",
|
|
" -2.0744 0.1939 0.0000 L 0 0 0 0 0 0 0 0 0 0 0 0\n",
|
|
" -2.5440 -0.1592 0.0000 R# 0 0 0 0 0 0 0 0 0 1 0 0\n",
|
|
" 1 2 1 0 0 0 0\n",
|
|
" 1 F 2 17 35\n",
|
|
"V 1 halogen\n",
|
|
"M RGP 1 2 1\n",
|
|
"M ALS 1 2 F Cl Br \n",
|
|
"M END\n",
|
|
"$MOL\n",
|
|
"\n",
|
|
" -ISIS- 09020915392D\n",
|
|
"\n",
|
|
" 2 1 0 0 0 0 0 0 0 0999 V2000\n",
|
|
" 2.8375 -0.2500 0.0000 R# 0 0 0 0 0 0 0 0 0 2 0 0\n",
|
|
" 3.3463 0.0438 0.0000 N 0 0 0 0 0 0 0 0 0 3 0 0\n",
|
|
" 1 2 1 0 0 0 0\n",
|
|
"V 2 amine.primary\n",
|
|
"M RGP 1 1 2\n",
|
|
"M END\n",
|
|
"$MOL\n",
|
|
"\n",
|
|
" -ISIS- 09020915392D\n",
|
|
"\n",
|
|
" 3 2 0 0 0 0 0 0 0 0999 V2000\n",
|
|
" 13.5792 0.0292 0.0000 N 0 0 0 0 0 0 0 0 0 3 0 0\n",
|
|
" 14.0880 0.3229 0.0000 R# 0 0 0 0 0 0 0 0 0 1 0 0\n",
|
|
" 13.0704 0.3229 0.0000 R# 0 0 0 0 0 0 0 0 0 2 0 0\n",
|
|
" 1 2 1 0 0 0 0\n",
|
|
" 1 3 1 0 0 0 0\n",
|
|
"M RGP 2 2 1 3 2\n",
|
|
"M END\"\"\""
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 3,
|
|
"metadata": {
|
|
"collapsed": false
|
|
},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAlgAAACWCAYAAAACG/YxAAAFd0lEQVR4nO3dbXLaOgCGUelOd9QF\nWllT15E1uT9y3RjjL+AlfPicmcwUI1KmBPtBVtza931fAACI+e/RTwAA4N0ILACAMIEFABAmsAAA\nwgQWAECYwAIACBNYAABhAgsAIExgAQCECSwAgDCBBQAQJrAAAMIEFgBAmMACAAgTWAAAYQILACBM\nYAEAhAksAIAwgQUAECawAADCBBYAQJjAAgAIE1gAAGECCwAgTGABAIQJLACAMIEFABAmsAAAwgQW\nAECYwAIACBNYAABhAgsAIExgAQCECSwAgDCBBQAQJrAAAMIEFgBAmMACAAgTWAAAYQILACBMYAEA\nhAksAIAwgQUAECawAADCBBYAQJjAAgAIE1gAAGECCwAgTGABAIQJLACAMIEFABAmsAAAwgQWAECY\nwAIACBNYAABhAgsAIExgAQCECSwAgDCBBQAQJrAAAMIEFgBAmMACAAgTWAAAYQILACBMYAHwUmqt\nj34KsElgAfBS+r4XWTw9gQXAyxFZPDuBdYX6WU++pvfBPbTWHv0U4KmIrNd0lJes9n3fP/pJvJL6\nWUv/uz/b1v3pSmtt9n64RK2ljN+VQ1h9fLSy9G4d77C8ozmaWmtxKFs39wFtum18e+0D3dq4tb9n\n2E/1/de2PR8a9457Rr8e/QReyVI8DXEFKbWW0nWtlDIOrOWx42PL9Da8u2EmS2Stm4uhYds0ZJbC\nZs+46e3zsLr8ub4igXWD8Q8mpK3NWI05poDIutXe49glx7shrLqunR0vt77POxxfrcG60lDtr/zi\n83zGn/T2jJtuM3vFkVmTlTE3S7Vn3PS+vr9sf/Rux1MzWFcQVqSNw2r657kd1No2kcWRmclatne9\n1LXjpuMvnYV6t+OqwLpQ/aylb9645KwF0VpkAfNE1rytWalbFp4vfe93i6ZLOEW409rlF1yagVvc\negxwRuRcrdXXwb+GnwPmrS1WX/tnW3rc3kXxR2IGa4fpbw9Og8plGRhb2sks3V7b+bTWSteVUuvy\nGM6ZuaBWM1jXmi5VKGV+Jr21tuvyMV13n+f57A4fWFsHwyGuxhW+FlTdn66U3+EnycuZC6rxeoSl\nT47Ljzn9VDne2Y13hsNtODJxdZ3xfqjrvm5P9zvD9u9xrZTyfXs8dngJhv3X1kvybrNdh7/Q6NK5\n5LnFeVszDXvG8f62fqb2jL9lHByZuLqP+v+1+aYzVnvC6ZJx78QarA0OftzT3GzWnnHAOXF1X19L\nFpb3RXNrt+qBLx9z+FOEt3DQY8namqvxtmvHAafE1c/4Pm3Yhi3/7nP5mFMCq+w7GO55HAy2ZqXM\njEKOuPpZX+s+h1OFrcytwcIpwlKKq7JzX3sXtW89Djgnrh7r46PNbneFDIF1xkEN4DWIq8eZ/vYy\n55wihAdYOi29tehd/MMXcfV408hy+ZhTAusGZruYs7Tmau3+S8fBkYmrn7f233ktjTn6S3T462AB\nAKRZgwUAECawAADCBBYAQJjAAgAIE1gAAGECCwAgTGABAIQJLACAMIEFABAmsAAAwgQWAECYwAIA\nCBNYAABhAgsAIExgAQCECSwAgDCBBQAQJrAAAMIEFgBAmMACAAgTWAAAYQILACBMYAEAhAksAIAw\ngQUAECawAADCBBYAQJjAAgAIE1gAAGECCwAgTGABAIQJLACAMIEFABAmsAAAwgQWAECYwAIACBNY\nAABhAgsAIExgAQCECSwAgDCBBQAQJrAAAMIEFgBAmMACAAgTWAAAYQILACBMYAEAhAksAIAwgQUA\nECawAADCBBYAQJjAAgAIE1gAAGECCwAgTGABAIQJLACAMIEFABAmsAAAwgQWAECYwAIACBNYAABh\nAgsAIExgAQCE/QWlzQdz1MzSTQAAAABJRU5ErkJggg==\n",
|
|
"text/plain": [
|
|
"<rdkit.Chem.rdChemReactions.ChemicalReaction at 0x10b867590>"
|
|
]
|
|
},
|
|
"execution_count": 3,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"rxn = ReactionFromRxnBlock(rxn_data)\n",
|
|
"rxn"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Sanitizing Reaction Blocks\n",
|
|
"==========================\n",
|
|
"\n",
|
|
"Reaction blocks come from many different sketchers, and some don't follow the MDL conventions very well. It is always a good idea to sanitize your reaction blocks first. This is also true for Smiles reactions if they are in kekule form."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 4,
|
|
"metadata": {
|
|
"collapsed": false
|
|
},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/plain": [
|
|
"rdkit.Chem.rdChemReactions.SanitizeFlags.SANITIZE_NONE"
|
|
]
|
|
},
|
|
"execution_count": 4,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"AllChem.SanitizeRxn(rxn)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Preprocessing Reaction Blocks\n",
|
|
"=============================\n",
|
|
"\n",
|
|
"You will note that there are some special annotations in the reaction block:\n",
|
|
" \n",
|
|
" V 1 halogen\n",
|
|
" V 2 amine.primary\n",
|
|
" \n",
|
|
"These allows us to specify functional groups with very specific smarts patterns. \n",
|
|
"These smarts patterns are preloaded into the RDKit, but require the use of PreprocessReactions\n",
|
|
"to embed the patterns."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 5,
|
|
"metadata": {
|
|
"collapsed": false
|
|
},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"Number of warnings: 0\n",
|
|
"Number of preprocessing errors: 0\n",
|
|
"Number of reactants in reaction: 2\n",
|
|
"Number of products in reaction: 1\n",
|
|
"Preprocess labels added: (((0, 'halogen'),), ((1, 'amine.primary'),))\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"rxn.Initialize()\n",
|
|
"nWarn, nError, nReactants, nProducts, labels = AllChem.PreprocessReaction(rxn)\n",
|
|
"print (\"Number of warnings:\", nWarn)\n",
|
|
"print (\"Number of preprocessing errors:\", nError)\n",
|
|
"print (\"Number of reactants in reaction:\", nReactants)\n",
|
|
"print (\"Number of products in reaction:\", nProducts)\n",
|
|
"print (\"Preprocess labels added:\", labels)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"So now, this scaffold will only match the specified halogens and a primary amine. Let's get some!"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 6,
|
|
"metadata": {
|
|
"collapsed": false
|
|
},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"--2016-11-05 09:31:09-- http://www.sigmaaldrich.com/content/dam/sigma-aldrich/docs/Aldrich/General_Information/1/sdf-benzylic-primary-amines.sdf\n",
|
|
"Resolving usca-proxy01.na.novartis.net... 160.62.237.221\n",
|
|
"Connecting to usca-proxy01.na.novartis.net|160.62.237.221|:2011... connected.\n",
|
|
"Proxy request sent, awaiting response... 200 OK\n",
|
|
"Length: 165130 (161K) [chemical/x-mdl-sdfile]\n",
|
|
"Saving to: 'amines.sdf'\n",
|
|
"\n",
|
|
"amines.sdf 100%[=====================>] 161.26K 759KB/s in 0.2s \n",
|
|
"\n",
|
|
"2016-11-05 09:31:09 (759 KB/s) - 'amines.sdf' saved [165130/165130]\n",
|
|
"\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"!wget http://www.sigmaaldrich.com/content/dam/sigma-aldrich/docs/Aldrich/General_Information/1/sdf-benzylic-primary-amines.sdf -O amines.sdf"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 7,
|
|
"metadata": {
|
|
"collapsed": false
|
|
},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"--2016-11-05 09:31:09-- http://www.sigmaaldrich.com/content/dam/sigma-aldrich/docs/Aldrich/General_Information/1/sdf-alkyl-halides.sdf\n",
|
|
"Resolving usca-proxy01.na.novartis.net... 160.62.237.221\n",
|
|
"Connecting to usca-proxy01.na.novartis.net|160.62.237.221|:2011... connected.\n",
|
|
"Proxy request sent, awaiting response... 200 OK\n",
|
|
"Length: 149722 (146K) [chemical/x-mdl-sdfile]\n",
|
|
"Saving to: 'halides.sdf'\n",
|
|
"\n",
|
|
"halides.sdf 100%[=====================>] 146.21K 634KB/s in 0.2s \n",
|
|
"\n",
|
|
"2016-11-05 09:31:10 (634 KB/s) - 'halides.sdf' saved [149722/149722]\n",
|
|
"\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"!wget http://www.sigmaaldrich.com/content/dam/sigma-aldrich/docs/Aldrich/General_Information/1/sdf-alkyl-halides.sdf -O halides.sdf"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 8,
|
|
"metadata": {
|
|
"collapsed": false
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"reagents = [\n",
|
|
" [x for x in AllChem.SDMolSupplier(\"halides.sdf\")],\n",
|
|
" [x for x in AllChem.SDMolSupplier(\"amines.sdf\")]\n",
|
|
" ]"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 9,
|
|
"metadata": {
|
|
"collapsed": false
|
|
},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"number of reagents per template: [149, 131]\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"print (\"number of reagents per template:\", [len(x) for x in reagents])"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Basic Usage\n",
|
|
"===========\n",
|
|
"\n",
|
|
"Creating a library for enumeration\n",
|
|
"----------------------------------\n",
|
|
"\n",
|
|
"Using the enumerator is simple, simply supply the desired reaction and reagents. The library filters away non-matching reagents by default. The RDKit will log any removed reagents to the info log."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 10,
|
|
"metadata": {
|
|
"collapsed": false
|
|
},
|
|
"outputs": [
|
|
{
|
|
"name": "stderr",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"RDKit INFO: [09:31:10] Removed 37 non matching reagents at template 0\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"library = rdChemReactions.EnumerateLibrary(rxn, reagents)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"If you only want each reactant to match once ( and hence only produce one product per reactant set ) you can adjust the parameters:"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 11,
|
|
"metadata": {
|
|
"collapsed": false
|
|
},
|
|
"outputs": [
|
|
{
|
|
"name": "stderr",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"RDKit INFO: [09:31:10] Removed 38 non matching reagents at template 0\n",
|
|
"RDKit INFO: [09:31:10] Removed 11 non matching reagents at template 1\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"params = rdChemReactions.EnumerationParams()\n",
|
|
"params.reagentMaxMatchCount = 1\n",
|
|
"library = rdChemReactions.EnumerateLibrary(rxn, reagents, params=params)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Enumerating the library\n",
|
|
"-----------------------\n",
|
|
"\n",
|
|
"A library has an enumerator that determines what reagents are selected for purposes of enumeration.\n",
|
|
"The default enumerator is a CartesianProduct enumerator, which is a fancy way of saying enumerate everything. You can get hold this enumerator by using the **GetEnumerator** method."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 12,
|
|
"metadata": {
|
|
"collapsed": false
|
|
},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"<rdkit.Chem.rdChemReactions.CartesianProductStrategy object at 0x10d9f7de0>\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"enumerator = library.GetEnumerator()\n",
|
|
"print (enumerator)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 13,
|
|
"metadata": {
|
|
"collapsed": false
|
|
},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"Possible number of permutations: 13320\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"print (\"Possible number of permutations:\", enumerator.GetNumPermutations())"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Understanding results of enumerations\n",
|
|
"-------------------------------------\n",
|
|
"\n",
|
|
"Each enumeration result may contain multiple resulting molecules. Consider a reaction setup as follows:\n",
|
|
"\n",
|
|
" A + B >> C + D\n",
|
|
" \n",
|
|
"There may be multiple result molecules for a number of reasons:\n",
|
|
"\n",
|
|
" 1. The reactant templates (A and B) match a reagent multiple times.\n",
|
|
" Each match has to analyzed to form a new product. Hence,\n",
|
|
" the result has to be a vector of products.\n",
|
|
" 2. There me be multiple product templates, i.e. C+D as shown above\n",
|
|
" where C and D are two different result templates. These are\n",
|
|
" output in a result as follows:\n",
|
|
" \n",
|
|
" result = enumerator.next()\n",
|
|
" \n",
|
|
" result == [ [results_from_product_template1], \n",
|
|
" [results_from_product_template2], ... ]\n",
|
|
" \n",
|
|
" result[0] == [results_from_product_template1]\n",
|
|
" result[1] == [results_from_Product_template2]\n",
|
|
" \n",
|
|
"\n",
|
|
" \n",
|
|
"Because there may be multiple product templates specified with\n",
|
|
"potentially multiple matches, iterating through the results to\n",
|
|
"get to the final molecules isa bit complicated and requires three loops. Here we use:\n",
|
|
"\n",
|
|
" * **result** for the result of reacting one set of reagents\n",
|
|
" * **productSet** for the products for a given product template\n",
|
|
" * **mol** the actual product\n",
|
|
" \n",
|
|
"In many reactions, this will result in a single molecule, but the\n",
|
|
"datastructures have to handle the full set of results:\n",
|
|
" \n",
|
|
"```\n",
|
|
" for result in enumerator:\n",
|
|
" for productSet in results:\n",
|
|
" for mol in productSet:\n",
|
|
"```"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 14,
|
|
"metadata": {
|
|
"collapsed": false
|
|
},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"Number of result sets 13320\n",
|
|
"Number of result molecules 13320\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"count = 0\n",
|
|
"totalMols = 0\n",
|
|
"for results in library:\n",
|
|
" for productSet in results:\n",
|
|
" for mol in productSet:\n",
|
|
" totalMols += 1\n",
|
|
" count += 1\n",
|
|
"print(\"Number of result sets\", count)\n",
|
|
"print(\"Number of result molecules\", totalMols)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"*Note: the productSet above may be empty if one of the current reagents did\n",
|
|
"not match the reaction!*"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"*Note: the number of permutations is not the same as the number of molecules. There may be more or less depending on how many times the reagent matched the template, or if the reagent matched\n",
|
|
"at all.*"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"How does the enumerator work?\n",
|
|
"=============================\n",
|
|
"\n",
|
|
"As mentioned, you can make a copy of the current enumeration scheme using the **GetEnumerator** method. Lets make a copy of this enumerator by **copying** it using copy.copy(..), this makes a copy so we don't change the state of the Library."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 15,
|
|
"metadata": {
|
|
"collapsed": false
|
|
},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"<rdkit.Chem.rdChemReactions.CartesianProductStrategy object at 0x10d9f7e50>\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"import copy\n",
|
|
"enumerator = copy.copy(library.GetEnumerator())\n",
|
|
"print(enumerator)\n",
|
|
"test_enumerator = copy.copy(enumerator)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Let's play with this enumerator.\n",
|
|
"\n",
|
|
"First: let's understand what the position means (this is the same as **library.GetPosition**)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 16,
|
|
"metadata": {
|
|
"collapsed": false
|
|
},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/plain": [
|
|
"[110L, 119L]"
|
|
]
|
|
},
|
|
"execution_count": 16,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"list(test_enumerator.GetPosition())"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"What this means is make the product from reagents[0][111] and reagents[1][130]"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 17,
|
|
"metadata": {
|
|
"collapsed": false
|
|
},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAcIAAACWCAIAAADCEh9HAAADPUlEQVR4nO3dQW4iMRRF0dDqHYX9\nr6BZEz2IFEVJQMBz2d/2OUOkSDUgV68ohE/X6/UNgFf9GX0BAHOTUYCIjAJEZBQgIqMAERkFiMgo\nQERGASIyChCRUYCIjAJEZBQgIqMAERkFiMgoQERGASJ/R1/Avs6Xy7dX/r2/D7kSICGjI+kmLEBG\ni/q5VT9dzuc7f+hUGOhMRou6N1TvhvJ0Oikp9CSjI32dnG7wYVIyOtIR6bxerwYp9OQLTwARGV3Q\nxyAdfRWwCxldk5JCNz5EW5ZPSKEPa3RZBin0IaMAERldmUEKHcjo4pQUjiajABEZXZ9BCoeSUYCI\njG7BIIXjyChAREZ3YZDCQWR0I0oKR5BRgIiM7sUgheZkFCDit9R21HyQehexMxmlAb9tys7c1ANE\nZJQGPLliZzIKEJFR2jBI2ZaMAkRklGYMUvYkowARGaUlg5QNyShAREZpzCBlNzJKe0rKVmQUICKj\nHMIgZR8yChCRUY5ikLIJGQWIyCgHMkjZgYwCRGSUYxmkLE9GASJOIqMHZ5GyMBllSs4ipQ439QAR\nGWVKnlxRh4wCRGSUWRmkFCGjABEZZWIGKRXIKHNTUoaTUYCIjDI9g5SxZBQgIqOswCBlIBkFiMgo\nizBIGUVGASIyyjoMUoaQUZaipPQnowARGWU1BimdyShAREZZkEFKTzIKEHG8IstqO0j9p3CLjMJD\nHOnMLW7qASIyCg/x2IpbZBQgIqPwKIOUX8koPEFJ+UlGASIyCs8xSPlGRgEiMgpPM0j5SkbhFUrK\nJxkFiMgovMgg5YOMAkRkFF5nkPImowAhGYWIQYqMQkpJNyejABEZhQYM0p3JKEDEKV3QjLNI9ySj\nUJSzSGfhph4gIqNQlMdWs5BRgIiMQl0G6RRkFEpT0vpkFCAio1CdQVqcjAJEZBQmYJBWJqMwByUt\nS0YBIjIK0zBIa5JRgIiMwkwM0oJkFCAiozAZg7QaGYX5KGkpMgoQkVGYkkFah4wCRGQUZmWQFiGj\nMDFHh1YgowARGQWIyChAREYBIjIKEJFRgIiMAkRkFCAiowARGQWIyChAREYBIjIKEJFRgIiMAkRk\nFCAiowARGQWI/AeU+fmfzKiTyQAAAABJRU5ErkJggg==\n",
|
|
"image/svg+xml": [
|
|
"<?xml version='1.0' encoding='iso-8859-1'?>\n",
|
|
"<svg version='1.1' baseProfile='full'\n",
|
|
" xmlns:svg='http://www.w3.org/2000/svg'\n",
|
|
" xmlns:rdkit='http://www.rdkit.org/xml'\n",
|
|
" xmlns:xlink='http://www.w3.org/1999/xlink'\n",
|
|
" xml:space='preserve'\n",
|
|
"width='450px' height='150px' >\n",
|
|
"<rect style='opacity:1.0;fill:#FFFFFF;stroke:none' width='450' height='150' x='0' y='0'> </rect>\n",
|
|
"<path d='M 118.452,6.81818 131.311,6.81818' style='fill:none;fill-rule:evenodd;stroke:#33CCCC;stroke-width:2px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1' />\n",
|
|
"<path d='M 131.311,6.81818 144.17,6.81818' style='fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1' />\n",
|
|
"<path d='M 144.17,6.81818 159.916,34.2169' style='fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1' />\n",
|
|
"<path d='M 159.916,34.2169 191.409,34.2169' style='fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1' />\n",
|
|
"<path d='M 191.409,34.2169 207.155,61.3007' style='fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1' />\n",
|
|
"<path d='M 207.155,61.3007 238.648,61.3007' style='fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1' />\n",
|
|
"<path d='M 238.648,61.3007 254.394,88.6993' style='fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1' />\n",
|
|
"<path d='M 254.394,88.6993 285.887,88.6993' style='fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1' />\n",
|
|
"<path d='M 285.887,88.6993 301.634,115.783' style='fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1' />\n",
|
|
"<path d='M 301.634,115.783 333.126,115.783' style='fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1' />\n",
|
|
"<path d='M 333.126,115.783 348.873,143.182' style='fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1' />\n",
|
|
"<text x='106.902' y='14.6914' style='font-size:15px;font-style:normal;font-weight:normal;fill-opacity:1;stroke:none;font-family:sans-serif;text-anchor:start;fill:#33CCCC' ><tspan>F</tspan></text>\n",
|
|
"</svg>\n"
|
|
],
|
|
"text/plain": [
|
|
"<rdkit.Chem.rdchem.Mol at 0x10d9f32f0>"
|
|
]
|
|
},
|
|
"execution_count": 17,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"reagents[0][111]"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 18,
|
|
"metadata": {
|
|
"collapsed": false
|
|
},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAcIAAACWCAIAAADCEh9HAAAD1klEQVR4nO3dwU7bUBRFUVzx4fnz\ndJC2orSgluN778vzWkIMEBIeRJvjxHGO+/3+AsBXfZs+AIDnJqMAERkFiMgoQERGASIyChCRUYCI\njAJEZBQgIqMAERkFiMgoQERGASIyChCRUYCIjAJEZJST3W63X9/hCg53v+d0t9tNRrkOa5ST3X6a\nPhBoYo0CRKxRgIiMAkRkFCAiowARGQWIyChAREYBIjIKEJFROnhTExuTUTp4eygbk1GAiIzSxCBl\nVzIKEJFR+hikbMmN8mh1HB5y7MYapdX9fj+OY/oo4EwyChCRUboZpGxGRhmgpOxERgEiXjZlzLmD\n1COZKa/TB8BFnX7lk0upmOKkHiAiowyoWI5etmKKjAJEZJRudU9iGqSMkFFW8V8FlEvWIaO0+miK\n/u9E/Wh4GqT0k1H69FyTpKQ0k1HmfS2vcskiZJQmnZfHKyydZJRhSV7lkhXIKB3636mpsLSRUSbl\neZVLxsko5aZuGqKw9JBRxpyVV7lkloxS66zr7T/nanwGySiFVrgHqJJSTUYZ4EZ57ERGqbLCFH1Q\nWErJKN3cKI/NyCgl1pmiDwpLHRmlVXVe5ZJ+Msr5VpuiDwpLERnlfB8Fa/Z+o2vGnQ3IKH06b5TX\n84fgRUYpstoZtClKHRkFiMgoVdYZpKYopWSUQiuUVEOpJqMAERml1uwgNUVpIKMAERml3NQgNUXp\nIaN0WOG1Jigio+zJFKWNjNKkc5BqKJ1epw+Aq8vbqpjM8k+bVg070RSlmZN6gIiM0qr6GVJTlH4y\nSre6kmooI2QUICKjDKgYpKYoU2QUIOIfOGPOHaQeyUyRUTbhpJ4pTuoBIjLKJtxEiikyChCRUfZh\nkDJCRtmKktJPRgEiMspuDFKayShAREbZkEFKJxllT0pKGxkFiMgo2zJI6SGjABEZZWcGKQ1kFCAi\no2zOIKWajLI/JaWUjAJEZJRLMEipI6MAERnlKgxSisgoV+GjQykiowARGeUSTFHqyChAREbZnylK\nKRnlCjSUQjLK5o7jxRKllIwCRGSUnZmiNJBRtqWh9JBRgIiMsidTlDYyyp40lDYyChCRUYCIjAJE\nZBQgIqMAERkFiMgoC/nzo5Le/uQ4fnz94+9DDxnlOTwup398fV5SaCajPIePLqd/V1XoJ6M8Ge/y\nZDWv0wcAv/l8Wv61oY9Bqq1MkVHW8q6G754G/fzUXkkZ4aSe56CSLEtG2YTXmpjipJ6n8baSf12m\nSsoIn98NEHFSDxCRUYCIjAJEZBQgIqMAERkFiMgoQERGASIyChCRUYCIjAJEZBQg8h0jx3spAcHh\ngQAAAABJRU5ErkJggg==\n",
|
|
"image/svg+xml": [
|
|
"<?xml version='1.0' encoding='iso-8859-1'?>\n",
|
|
"<svg version='1.1' baseProfile='full'\n",
|
|
" xmlns:svg='http://www.w3.org/2000/svg'\n",
|
|
" xmlns:rdkit='http://www.rdkit.org/xml'\n",
|
|
" xmlns:xlink='http://www.w3.org/1999/xlink'\n",
|
|
" xml:space='preserve'\n",
|
|
"width='450px' height='150px' >\n",
|
|
"<rect style='opacity:1.0;fill:#FFFFFF;stroke:none' width='450' height='150' x='0' y='0'> </rect>\n",
|
|
"<path d='M 198.915,13.3462 204.792,23.7815' style='fill:none;fill-rule:evenodd;stroke:#7F7F7F;stroke-width:2px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1' />\n",
|
|
"<path d='M 204.792,23.7815 210.67,34.2169' style='fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1' />\n",
|
|
"<path d='M 210.67,34.2169 194.923,61.6156' style='fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1' />\n",
|
|
"<path d='M 213.223,41.1513 202.2,60.3304' style='fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1' />\n",
|
|
"<path d='M 210.67,34.2169 242.163,34.2169' style='fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1' />\n",
|
|
"<path d='M 194.923,61.6156 210.67,88.6993' style='fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1' />\n",
|
|
"<path d='M 210.67,88.6993 242.163,88.6993' style='fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1' />\n",
|
|
"<path d='M 215.394,83.0307 237.439,83.0307' style='fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1' />\n",
|
|
"<path d='M 242.163,88.6993 257.909,61.6156' style='fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1' />\n",
|
|
"<path d='M 242.163,88.6993 257.909,116.098' style='fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1' />\n",
|
|
"<path d='M 257.909,61.6156 242.163,34.2169' style='fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1' />\n",
|
|
"<path d='M 250.632,60.3304 239.61,41.1513' style='fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1' />\n",
|
|
"<path d='M 257.909,116.098 252.553,125.31' style='fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1' />\n",
|
|
"<path d='M 252.553,125.31 247.198,134.521' style='fill:none;fill-rule:evenodd;stroke:#0000FF;stroke-width:2px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1' />\n",
|
|
"<text x='191.562' y='14.6914' style='font-size:15px;font-style:normal;font-weight:normal;fill-opacity:1;stroke:none;font-family:sans-serif;text-anchor:start;fill:#7F7F7F' ><tspan>*</tspan></text>\n",
|
|
"<text x='225.887' y='151.842' style='font-size:15px;font-style:normal;font-weight:normal;fill-opacity:1;stroke:none;font-family:sans-serif;text-anchor:start;fill:#0000FF' ><tspan>NH</tspan><tspan style='baseline-shift:sub;font-size:11.25px;'>2</tspan><tspan></tspan></text>\n",
|
|
"</svg>\n"
|
|
],
|
|
"text/plain": [
|
|
"<rdkit.Chem.rdchem.Mol at 0x10d9f7d70>"
|
|
]
|
|
},
|
|
"execution_count": 18,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"reagents[1][130]"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"This also appears to be the last product. So lets' start over."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 19,
|
|
"metadata": {
|
|
"collapsed": false
|
|
},
|
|
"outputs": [
|
|
{
|
|
"name": "stderr",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"RDKit INFO: [09:31:11] Removed 38 non matching reagents at template 0\n",
|
|
"RDKit INFO: [09:31:11] Removed 11 non matching reagents at template 1\n"
|
|
]
|
|
},
|
|
{
|
|
"data": {
|
|
"text/plain": [
|
|
"[0L, 0L]"
|
|
]
|
|
},
|
|
"execution_count": 19,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"library = rdChemReactions.EnumerateLibrary(rxn, reagents, params=params)\n",
|
|
"test_enumerator = copy.copy(library.GetEnumerator())\n",
|
|
"list(test_enumerator.GetPosition())"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"We can Skip to the 100th result"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 20,
|
|
"metadata": {
|
|
"collapsed": false
|
|
},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"[99L, 0L]\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"test_enumerator.Skip(100)\n",
|
|
"pos = list(test_enumerator.GetPosition())\n",
|
|
"print(pos)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 21,
|
|
"metadata": {
|
|
"collapsed": false
|
|
},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAcIAAACWCAIAAADCEh9HAAACjklEQVR4nO3cMWrDQBBA0WzIjZz7\n3yBn2hQBkziosL7Bu/i9SrgQKuzPSCM85pxvAJz1/uwLANibjAIkMgqQyChAIqMAiYwCJDIKkMgo\nQCKjAImMAiQyCpDIKEAiowCJjAIkMgqQyChAIqMAiYwCJDIKkMgoQCKjAImMAiQyCpDIKEAiowCJ\njAIkMgqQyChAIqMAiYwCJDIKkMgoQCKjAImMAiQyCpDIKEAiowCJjAIkMgqQyChAIqMAiYwCJDIK\nkMgoQCKjAImMAiQyCpDIKEAiowCJjAIkMgqQyChAIqMAiYwCJDIKkMgoQCKjAImMAiQyCpDIKEAi\nowDJx7MvgF2Nr3E9npd5/fB6DC9CRjnjJpdH9fyd2j8+D888pwqzGRnlbv+jeTSBHk6mx6kcYygp\ne/FsFCCRUdYy5xzj4FEALElGARIZZTkGUvYio9xtXubNCv5wIw8vwKaeM25K+vB3RX8GUit7tuCb\nyrqUlC24qQdIZJR12TWxBRkFSGSUpRlIWZ+MAiQyyuoMpCxORgESGWUDBlJWJqPsQUlZlowCJDLK\nNgykrElGARIZZScGUhYkowCJPyJjP48dSP0EiGQUIHFTD5DIKEAiowCJjAIkMgqQyChAIqMAiYwC\nJDIKkMgoQCKjAImMAiQyCpDIKEAiowCJjAIkMgqQyChAIqMAiYwCJDIKkMgoQCKjAImMAiQyCpDI\nKEAiowCJjAIkMgqQyChAIqMAiYwCJDIKkMgoQCKjAImMAiQyCpDIKEAiowCJjAIkMgqQyChAIqMA\niYwCJDIKkMgoQCKjAImMAiQyCpDIKEAiowCJjAIkMgqQyChAIqMAiYwCJDIKkMgoQCKjAImMAiQy\nCpDIKEDyDQfuWIMgl0pDAAAAAElFTkSuQmCC\n",
|
|
"image/svg+xml": [
|
|
"<?xml version='1.0' encoding='iso-8859-1'?>\n",
|
|
"<svg version='1.1' baseProfile='full'\n",
|
|
" xmlns:svg='http://www.w3.org/2000/svg'\n",
|
|
" xmlns:rdkit='http://www.rdkit.org/xml'\n",
|
|
" xmlns:xlink='http://www.w3.org/1999/xlink'\n",
|
|
" xml:space='preserve'\n",
|
|
"width='450px' height='150px' >\n",
|
|
"<rect style='opacity:1.0;fill:#FFFFFF;stroke:none' width='450' height='150' x='0' y='0'> </rect>\n",
|
|
"<path d='M 104.542,16.9706 166.993,16.9706' style='fill:none;fill-rule:evenodd;stroke:#00CC00;stroke-width:2px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1' />\n",
|
|
"<path d='M 166.993,16.9706 229.444,16.9706' style='fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1' />\n",
|
|
"<path d='M 229.444,16.9706 296.145,133.029' style='fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1' />\n",
|
|
"<path d='M 296.145,133.029 429.545,133.029' style='fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1' />\n",
|
|
"<text x='87.5437' y='24.4706' style='font-size:15px;font-style:normal;font-weight:normal;fill-opacity:1;stroke:none;font-family:sans-serif;text-anchor:start;fill:#00CC00' ><tspan>Cl</tspan></text>\n",
|
|
"</svg>\n"
|
|
],
|
|
"text/plain": [
|
|
"<rdkit.Chem.rdchem.Mol at 0x10d9f2d70>"
|
|
]
|
|
},
|
|
"execution_count": 21,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"reagents[0][pos[0]]"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 22,
|
|
"metadata": {
|
|
"collapsed": false
|
|
},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAcIAAACWCAIAAADCEh9HAAADEElEQVR4nO3dQU5UURRFUcs4CubE\nnHBMOiYdxrdBYkAtArUr+ffGtXp0Kq8Bu877VOByHMcnAG71+ewDAOwmowCJjAIkMgqQyChAIqMA\niYwCJDIKkMgoQCKjAImMAiQyCpDIKEAiowCJjAIkMgqQyChAIqMAiYwCJDIKkMgoQCKjAImMAiQy\nCpDIKEAiowCJjAIkMgqQyChAIqMAiYwCJDIKkMgoQCKjAImMAiQyCpDIKEAiowCJjAIkMgqQyCiD\nXC6Xs48AHyajDHIch5KyjowCJDLKLAYp68goQCKjjGOQsouMMpGSsoiMAiQyylAGKVvIKEAio8xl\nkLKCjAIkMspoBinzySjTKSnDyShAIqMsYJAymYwCJDLKDgYpY8koaygpM8koQCKjbGKQMpCMAiSX\n4zjOPgN8zH0HqR8Boi9nHwA+5nLx3s8sLvUAiYyyiSnKQDIKkMgoa5iizCSj7PBGQ32SlHPJKLuZ\nqJxORllAK5lMRllMXplARplOKxlORtlKXhlCRhlNK5lPRpnr7Q85yStDyChAIqMMZYqyhYyyjIYy\njYwykVayiIwyjus8u8goQCKjzGKKso7/xcSHPT0+vPry24+zTgITyCi3eJnOp8eHayX9I7i/ff3+\n89orm6KsI6PczXNPX1b1al6vv4hcso5no9zTG8u00FYms0a5xcvb+qsLfm7ocRx/R1NDmUxGucU7\nn43C/8ClnnGeB+nZp4D3klGAxKWeW1x7Nnov/3xCCjP5TmUuJWUFl3qAREaZy++aWEFGARIZZTSD\nlPlkFCCRUaYzSBlORgESGWUBg5TJZJQdlJSxZBQgkVHWMEiZSUYBEhllE4OUgWQUIJFRljFImUZG\n2ccfIWUUGQVIZBQgkVGAREYBEhkFSGQUIJFRgERGARIZBUhkFCCRUYBERgESGQVIZBQgkVGAREYB\nEhkFSGQUIJFRgERGARIZBUhkFCCRUYBERgESGQVIZBQgkVGAREYBEhkFSGQUIJFRgERGARIZBUhk\nFCCRUYBERgESGQVIZBQgkVGAREYBkl/kes0Ni0lSPQAAAABJRU5ErkJggg==\n",
|
|
"image/svg+xml": [
|
|
"<?xml version='1.0' encoding='iso-8859-1'?>\n",
|
|
"<svg version='1.1' baseProfile='full'\n",
|
|
" xmlns:svg='http://www.w3.org/2000/svg'\n",
|
|
" xmlns:rdkit='http://www.rdkit.org/xml'\n",
|
|
" xmlns:xlink='http://www.w3.org/1999/xlink'\n",
|
|
" xml:space='preserve'\n",
|
|
"width='450px' height='150px' >\n",
|
|
"<rect style='opacity:1.0;fill:#FFFFFF;stroke:none' width='450' height='150' x='0' y='0'> </rect>\n",
|
|
"<path d='M 184.185,97.5524 205.907,97.5524' style='fill:none;fill-rule:evenodd;stroke:#7F4C19;stroke-width:2px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1' />\n",
|
|
"<path d='M 205.907,97.5524 227.629,97.5524' style='fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1' />\n",
|
|
"<path d='M 227.629,97.5524 253.852,51.9231' style='fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1' />\n",
|
|
"<path d='M 223.377,86.004 241.734,54.0634' style='fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1' />\n",
|
|
"<path d='M 227.629,97.5524 253.852,143.182' style='fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1' />\n",
|
|
"<path d='M 253.852,51.9231 227.629,6.81818' style='fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1' />\n",
|
|
"<path d='M 253.852,51.9231 306.3,51.9231' style='fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1' />\n",
|
|
"<text x='166.178' y='105.052' style='font-size:15px;font-style:normal;font-weight:normal;fill-opacity:1;stroke:none;font-family:sans-serif;text-anchor:start;fill:#7F4C19' ><tspan>Br</tspan></text>\n",
|
|
"</svg>\n"
|
|
],
|
|
"text/plain": [
|
|
"<rdkit.Chem.rdchem.Mol at 0x10d9e51a0>"
|
|
]
|
|
},
|
|
"execution_count": 22,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"reagents[0][pos[1]]"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Let's advance by one here and see what happens. It's no surprise that for the CartesianProduct strategy the first index is increased by one."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 23,
|
|
"metadata": {
|
|
"collapsed": false
|
|
},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"[100L, 0L]\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"pos = test_enumerator.next()\n",
|
|
"print(list(pos))"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Enumeration States\n",
|
|
"==================\n",
|
|
"\n",
|
|
"Enumerations have states as well, so you can come back later using **GetState** and **SetState**\n",
|
|
"\n",
|
|
"**GetState** returns a text string so you can save this pretty much anywhere you like.\n",
|
|
"\n",
|
|
"Let's skip to the 100th sample and save both the state and the product at this step."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 24,
|
|
"metadata": {
|
|
"collapsed": false
|
|
},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"State is:\n",
|
|
" '22 serialization::archive 12 0 1 1 31 RDKit::CartesianProductStrategy 1 1\\n0 0 1 2 0 99 0 2 0 111 120 13320 100\\n'\n"
|
|
]
|
|
},
|
|
{
|
|
"name": "stderr",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"RDKit INFO: [09:31:11] Removed 38 non matching reagents at template 0\n",
|
|
"RDKit INFO: [09:31:11] Removed 11 non matching reagents at template 1\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"library = rdChemReactions.EnumerateLibrary(rxn, reagents, params=params)\n",
|
|
"# skip the first 100 molecules\n",
|
|
"library.GetEnumerator().Skip(100)\n",
|
|
"# get the state\n",
|
|
"\n",
|
|
"state = library.GetState()\n",
|
|
"print(\"State is:\\n\", repr(state))\n",
|
|
"\n",
|
|
"result = library.next()\n",
|
|
"for productSet in result:\n",
|
|
" for mol in productSet:\n",
|
|
" smiles = AllChem.MolToSmiles(mol)\n",
|
|
" break"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Now when we go back to this state, the next molecule should be the one we just saved."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 25,
|
|
"metadata": {
|
|
"collapsed": false
|
|
},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"CCNCc1cccc(c1)B1OC(C)(C)C(C)(C)O1 == CCNCc1cccc(c1)B1OC(C)(C)C(C)(C)O1 !\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"library.SetState(state)\n",
|
|
"result = library.next()\n",
|
|
"for productSet in result:\n",
|
|
" for mol in productSet:\n",
|
|
" assert AllChem.MolToSmiles(mol) == smiles\n",
|
|
" print(AllChem.MolToSmiles(mol), \"==\", smiles, \"!\")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Resetting the enumeration back to the beginning\n",
|
|
"===============================================\n",
|
|
"\n",
|
|
"To go back to the beginning, use **Reset**, for a CartesianProductStrategy this should revert back to [0,0] for indexing these reagents.\n",
|
|
"\n",
|
|
"This is useful because the state of the library is saved when the library\n",
|
|
"is serialized. See **Pickling Libraries** below."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 26,
|
|
"metadata": {
|
|
"collapsed": false
|
|
},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"[0L, 0L]\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"library.ResetState()\n",
|
|
"print(list(library.GetPosition()))"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Pickling Libraries\n",
|
|
"==================\n",
|
|
"\n",
|
|
"The whole library, including all reagents and the current enumeration state reagents is saved when the library is serialized."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 27,
|
|
"metadata": {
|
|
"collapsed": false
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"s = library.Serialize() # XXX bug need default arg"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 28,
|
|
"metadata": {
|
|
"collapsed": false
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"library2 = rdChemReactions.EnumerateLibrary()\n",
|
|
"library2.InitFromString(s)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"And the libraries are in lock step."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 29,
|
|
"metadata": {
|
|
"collapsed": false
|
|
},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"Result library1 CC(C)=C(C)NCc1cccc(c1)B1OC(C)(C)C(C)(C)O1\n",
|
|
"Result library2 CC(C)=C(C)NCc1cccc(c1)B1OC(C)(C)C(C)(C)O1\n",
|
|
"Result library1 C=C(C)NCc1cccc(c1)B1OC(C)(C)C(C)(C)O1\n",
|
|
"Result library2 C=C(C)NCc1cccc(c1)B1OC(C)(C)C(C)(C)O1\n",
|
|
"Result library1 CC=C(C)NCc1cccc(c1)B1OC(C)(C)C(C)(C)O1\n",
|
|
"Result library2 CC=C(C)NCc1cccc(c1)B1OC(C)(C)C(C)(C)O1\n",
|
|
"Result library1 CC=C(C)NCc1cccc(c1)B1OC(C)(C)C(C)(C)O1\n",
|
|
"Result library2 CC=C(C)NCc1cccc(c1)B1OC(C)(C)C(C)(C)O1\n",
|
|
"Result library1 CCC(C)(C)NCc1cccc(c1)B1OC(C)(C)C(C)(C)O1\n",
|
|
"Result library2 CCC(C)(C)NCc1cccc(c1)B1OC(C)(C)C(C)(C)O1\n",
|
|
"Result library1 CC(C)(C)NCc1cccc(c1)B1OC(C)(C)C(C)(C)O1\n",
|
|
"Result library2 CC(C)(C)NCc1cccc(c1)B1OC(C)(C)C(C)(C)O1\n",
|
|
"Result library1 CCCCCCCCCCC(C)NCc1cccc(c1)B1OC(C)(C)C(C)(C)O1\n",
|
|
"Result library2 CCCCCCCCCCC(C)NCc1cccc(c1)B1OC(C)(C)C(C)(C)O1\n",
|
|
"Result library1 CCCCCC(C)NCc1cccc(c1)B1OC(C)(C)C(C)(C)O1\n",
|
|
"Result library2 CCCCCC(C)NCc1cccc(c1)B1OC(C)(C)C(C)(C)O1\n",
|
|
"Result library1 CCCC(C)NCc1cccc(c1)B1OC(C)(C)C(C)(C)O1\n",
|
|
"Result library2 CCCC(C)NCc1cccc(c1)B1OC(C)(C)C(C)(C)O1\n",
|
|
"Result library1 CCC(CC)NCc1cccc(c1)B1OC(C)(C)C(C)(C)O1\n",
|
|
"Result library2 CCC(CC)NCc1cccc(c1)B1OC(C)(C)C(C)(C)O1\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"for i in range(10):\n",
|
|
" result = library.next()\n",
|
|
" for productSet in result:\n",
|
|
" for mol in productSet:\n",
|
|
" print(\"Result library1\", AllChem.MolToSmiles(mol))\n",
|
|
" result = library2.next()\n",
|
|
" for productSet in result:\n",
|
|
" for mol in productSet:\n",
|
|
" print(\"Result library2\", AllChem.MolToSmiles(mol)) "
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"*Note: Don't forget that the enumeration state can be saved independently and applied to a serialized library. Note that you will need to be careful to ensure that the enumeration state actually came from the library you are applying it against!*"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Additional Enumeration Strategies\n",
|
|
"=================================="
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"*rdChemReactions.RandomSampleStrategy* - randomly sample from the building blocks\n",
|
|
"\n",
|
|
"*rdChemReactions.RandomSampleAllBBsStrategy* - randomly sample, but force using all reagents\n",
|
|
"\n",
|
|
"*rdChemReactions.EvenSamplePairs* - evenly sample pairs of building blocks (use only for generation of small libraries)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"collapsed": true
|
|
},
|
|
"outputs": [],
|
|
"source": []
|
|
}
|
|
],
|
|
"metadata": {
|
|
"anaconda-cloud": {},
|
|
"kernelspec": {
|
|
"display_name": "Python [Root]",
|
|
"language": "python",
|
|
"name": "Python [Root]"
|
|
},
|
|
"language_info": {
|
|
"codemirror_mode": {
|
|
"name": "ipython",
|
|
"version": 2
|
|
},
|
|
"file_extension": ".py",
|
|
"mimetype": "text/x-python",
|
|
"name": "python",
|
|
"nbconvert_exporter": "python",
|
|
"pygments_lexer": "ipython2",
|
|
"version": "2.7.11"
|
|
}
|
|
},
|
|
"nbformat": 4,
|
|
"nbformat_minor": 0
|
|
}
|