Files
rdkit/Code/GraphMol/CIPLabeler/Digraph.h
Ric d54e77e375 Add new CIP labelling algorithm (#3234)
* add port of centres

* Several changes:
    - Added a test based on RDKit issue 2984
        (default RDKit fails it, this gets it right)
    - Use bond directions for bond stereo (label is no longer required)
    - Fix bugs in rules 4b and 5new
    - Fix some mem errors
    - clang-formatted
    - some other minor cleanups

* Several changes and some improvements:
    - Added LGPL license, as well as a mention in the doc.
    - Fix/update/add some comments
    - Fix typo/bug in Mancude calculation
    - Fix bug in rules 4b, 5New
    - Fix Sp2 Bond dir reference
    - Re clang-format
    - other minor changes suggested by Dan

* Another bunch of changes:
  - require integer-order bonds; kekulize when required
  - fix fraction comparison
  - rename sq Cis/Trans e/z
  - replace queues with vectors
  - update copyright notices
  - revert LGPL changes
  - fix Asymmetric typo

* move to separate lib/mod, add python validation test

* Moving away from the original implementation:
    - Rename to CIPLabeler
    - Remove the abstraction layer
    - Remove some stats stuff
    - Push some CIPMol functions down to Node
    - Use RDKit's isotope info

* Another bundle of changes. The most relevant ones:
    - fix parity translation
    - use cis trans as bond reference -- breaks #2984 test
    - kill a lot of unused code
    - use lists for queues
    - store nodes and edges in digraph
    - add prefixes to class data member names
    - update changeRoot() test
    - use fastFindRings() for mancude rings
    - update docs
    - add references to the scientific paper
    - Document the Mancude functions
    - Fix Mancude atom types and their comments
    - remove mol data member from SequenceRule
    - replace Fraction with boost::rational
    - update comments, docstrings and the doc

* fix building the test

* Changes here include:
    - adding bitset overload for the labeling function
    - python wrap of the overload
    - handling trigonal pyramids with implicit H
    - setting bond labels sets stereo atoms, cis/trans
    - nix LEFT/RIGHT/TOGETHER/OPPOSITE constants
    - don't use GLOB in cmake
    - a decent amount of refactoring

* Minor edits to new_CIP_labeling (#6)

* Some changes for clarity

Added some documentation and changed some variable names to match
my understanding. Also a ran clang-tidy to ensure that all blocks
were brace-enclosed.

* Return a reference instead of a copy for performance

This is called many times and showed up after some light
profiling. This change bumped throughput by about 20%

* move out of Graphmol

* move .hpp headers to .h

* update documentation; add label set of atoms test

* Address comments:
    - Added references to centres to CIPLabeler.h and Python Wrap.
    - Update validation test to skip sanitization.
    - Document mancude fractional atomic number calculation.
    - Use unittest assertions in python test.
    - Update mancude docstrings to 'resonance' instad of 'tautomers'.
    - Rename prioritise() to prioritize().
    - Add postcondition to check carriers size in Tetrahedral.cpp.
    - Use getNeighbors() in Tetrahedral.cpp.
    - Move findStereoAtoms to Chirality namespace.
    - Move code back into GraphMol.
    - Fix typos and reformat doc.

* More comments:
    - Mention why we use boost's unordered map rather than the std one.
    - Fix include in Python wrapper.

* Addressed second batch of comments:
    - fix the bug in rule 4b
    - fix docstring for rule 2
    - move atomic mass calculation from rule 2 to node
    - addressed some build warnings
    - simplify sp2bond::label(comp)
    - add start/end atoms to Sp2Bond constructor
    - update system/local includes

Co-authored-by: Dan N <dan.nealschneider@schrodinger.com>
2020-07-07 20:34:33 +02:00

118 lines
2.7 KiB
C++
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
//
// Digraph is the core data structure for determining
// CahnIngoldPrelog (CIP) chirality of a molecule.
//
// It's a "directed graph" - meaning that each bond
// has a start and an end. For CIP determination,
// the start points back towards the atom that is
// being labelled.
//
// Copyright (C) 2020 Schrödinger, LLC
//
// @@ All Rights Reserved @@
// This file is part of the RDKit.
// The contents are covered by the terms of the BSD license
// which is included in the file license.txt, found at the root
// of the RDKit source tree.
//
#pragma once
#include <list>
#include <vector>
#include <boost/rational.hpp>
#include "TooManyNodesException.h"
namespace RDKit {
class Atom;
class Bond;
namespace CIPLabeler {
class Node;
class Edge;
class CIPMol;
/**
* A class to hold directed acyclic graphs representing the molecule.
*
* The root of the DAG is one of the foci of the configuration for
* which the label is being calculated. The tmproot may be set to
* other nodes that may become relevant in the calculation.
*
*/
class Digraph {
public:
Digraph() = delete;
Digraph(const Digraph &) = delete;
Digraph &operator=(const Digraph &) = delete;
Digraph(const CIPMol &mol, Atom *atom);
const CIPMol &getMol() const;
Node *getOriginalRoot() const;
Node *getCurrentRoot() const;
int getNumNodes() const;
/**
* Get all nodes which refer to `atom` in order of
* distance from the root.
*/
std::vector<Node *> getNodes(Atom *atom) const;
/**
* Access the reference atom for Rule 6 (if one is set).
*/
Atom *getRule6Ref() const;
/**
* Used exclusively for Rule 6, we set one atom as the reference.
* @param ref reference atom
*/
void setRule6Ref(Atom *ref);
/**
* Sets the root node of this digraph by flipping the directions
* of edges as required.
*
* This is more efficient than building a new Digraph, but is
* only valid for neighboring Nodes.
*
* @param newroot the new root
*/
void changeRoot(Node *newroot);
void expand(Node *beg);
Node &addNode(std::vector<char> &&visit, Atom *atom,
boost::rational<int> &&frac, int dist, int flags);
private:
const CIPMol &d_mol;
// The node from which the Digraph is first initialized.
// It matches the atom that is being labeled.
Node *dp_origin = nullptr;
// The current root of the Digraph
Node *dp_root = nullptr;
Atom *dp_rule6Ref = nullptr;
// We can't store these in a vector, as adding new items will
// cause it to reallocate and invalidate the references
std::list<Node> d_nodes;
std::list<Edge> d_edges;
void addEdge(Node *beg, Bond *bond, Node *end);
};
} // namespace CIPLabeler
} // namespace RDKit