mirror of
https://github.com/rdkit/rdkit.git
synced 2026-06-05 22:04:27 +08:00
* remove trailing spaces * 3256: Envelope aromaticity not detected in complex fused system Removes stopping point in aromaticity detection when all atoms are "done". This also markedly improves the performance of aromaticity detection for very large molecules - for example, aromitization of 3EOH from the PDB was dominated by done atom checking before this commit. Some aromatic bonds were missed before this commit in complex fused systems. This happened if all atoms in the fused system were also in some smaller aromatic ring and there was at least one fused edge that was single in the kekule form. Some example molecules for which envelope aromaticity failed before this commit: c1cc2n(c1)c1cccn1c1cccn21 -> became c1cc2n(c1)-c1cccn1-c1cccn1-2 before this commit c1cc2c3cc[nH]c3n3cccc3n2c1 -> became c1cc2n(c1)-c1cccn1-c1[nH]ccc1-2 before this commit c1cc2c3cc[nH]c3c3cc[nH]c3n2c1 -> became c1cc2n(c1)-c1[nH]ccc1-c1[nH]ccc1-2 before this commit Here's a similar example that didn't fail even before this commit. The central ring only shares double bonds with the exterior rings. * c1cc2c([nH]1)c1cc[nH]c1c1cc[nH]c21 Requires updates to some MQN descriptors tests because some bonds become aromatic (MQN includes counts of single and double bonds of kekule form). FWIW, for the molecule that had a change in counts, the counts were incorrect both before and after this commit, because MQN uses an approximation (dividing aromatic bonds evenly between single and double bonds) to avoid kekulization. This approximation is invalid when there are oodles of nitrogens lone pairs participating in the aromatic bonds. (the failing line was 2558 in aromat_regress.txt: Cc1cc2n(n1)c1cc(C)nn1c1c(C=O)c(C)nn21) * Detect envelope aromaticity in fused systems In #3253, we proposed removing doneAtoms for performance, and it was noted that it also fixed detection of envelope aromaticity in some fused systems. However, when I completely removed doneAtoms, I saw hangs in sanitization of things like nanotubes. Using doneBonds allows envelope aromaticity, while preserving a reasonable break on runaway work for crazy molecules. The performance issue was addressed by caching the ring bond count. Here are some sanitize timings on proteins from the RCSB PDB: Before this commit: * 3eoh 1.21s * 2j3n 0.77s * 1nks 0.053s Afterwards: * 3eoh 0.42s * 2j3n 0.15s * 1nks 0.046s * Use boost::dynamic_bitset instead of unordered_set To cound ring bonds.