Files
rdkit/Code/GraphMol/CIPLabeler/Digraph.h
Dan Nealschneider 1663989053 CIP labeller performance: Don't calculate auxiliary descriptors unnecessarily (#9171)
* CIP labeller: Don't calculate auxiliary descriptors unnecessarily

The first 3 rules (the constitutional rules) are pretty easy
to understand. After rule 3, we need to calculate auxiliary
stereo descriptors to break ties.

However, we _were actually_ calculating auxiliary stereodescriptors
for all centers! We should only need to calculate auxiliary
stereocenters for sites that are needed to break ties.

This cost time - it also caused errors if the auxiliary descriptors
needed a graph expansion, because bonds in the digraph might be
pointed in the wrong direction.

Example case PDB ID 4AXM
Before this commit, errored with "Could not calculate parity! Carrier mismatch"
after 14s. After this commit, completes successfully in 0.036s.
Labelled centers all match (for the centers that had labels in
the failure case).

Includes a test that I can imagine breaking with this optimization.
The reference labels are from before this change

* Ensure all "arms" of stereo bonds and atropisomer bonds are expanded

For tetrahedral centers, ranking using the constitutional rules
always expands as far as is needed (but no further). For SP2bond
and atropisomers, if the first side is not resolvable, the
second side is never visited.

If the constitutional rules don't resolve a side, we need to
label the auxiliary centers. It's important to label all
auxiliary centers that _will_ be visited, so we need to know
what centers will be visited.

This commit updates the label() call in SP2 and Atropisomer
bonds to always attempt to label both sides if using the
constitutional rule set.

The constitutional rules are cheap, and if they fail, we
always go on to the full rule set. It is not a savings to skip
the search on the second side if we're going to keep going
anyway!

Includes a test that reproduces Ricardo's example.

This has no measurable effect on performance relative to the
original solution

* If any parts of the center have been seen, label it.

I couldn't make an example hit this, but Ric is totally
theoretically right

* Greg's ranges suggestion #2

Co-authored-by: Greg Landrum <greg.landrum@gmail.com>

* any_of for container search

Co-authored-by: Greg Landrum <greg.landrum@gmail.com>

---------

Co-authored-by: Greg Landrum <greg.landrum@gmail.com>
2026-05-06 06:12:50 +02:00

126 lines
2.9 KiB
C++
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
//
// Digraph is the core data structure for determining
// CahnIngoldPrelog (CIP) chirality of a molecule.
//
// It's a "directed graph" - meaning that each bond
// has a start and an end. For CIP determination,
// the start points back towards the atom that is
// being labelled.
//
// Copyright (C) 2020 Schrödinger, LLC
//
// @@ All Rights Reserved @@
// This file is part of the RDKit.
// The contents are covered by the terms of the BSD license
// which is included in the file license.txt, found at the root
// of the RDKit source tree.
//
#pragma once
#include <list>
#include <vector>
#include <RDGeneral/BoostStartInclude.h>
#include <boost/rational.hpp>
#include <RDGeneral/BoostEndInclude.h>
#include "TooManyNodesException.h"
namespace RDKit {
class Atom;
class Bond;
namespace CIPLabeler {
class Node;
class Edge;
class CIPMol;
/**
* A class to hold directed acyclic graphs representing the molecule.
*
* The root of the DAG is one of the foci of the configuration for
* which the label is being calculated. The tmproot may be set to
* other nodes that may become relevant in the calculation.
*
*/
class Digraph {
public:
Digraph() = delete;
Digraph(const Digraph &) = delete;
Digraph &operator=(const Digraph &) = delete;
Digraph(const CIPMol &mol, Atom *atom, bool atropsomerMode = false);
const CIPMol &getMol() const;
Node *getOriginalRoot() const;
Node *getCurrentRoot() const;
int getNumNodes() const;
/**
* Get all nodes which refer to `atom` in order of
* distance from the root.
*/
std::vector<Node *> getNodes(Atom *atom) const;
/**
* Access the reference atom for Rule 6 (if one is set).
*/
Atom *getRule6Ref() const;
/**
* Used exclusively for Rule 6, we set one atom as the reference.
* @param ref reference atom
*/
void setRule6Ref(Atom *ref);
/**
* Sets the root node of this digraph by flipping the directions
* of edges as required.
*
* This is more efficient than building a new Digraph, but is
* only valid for neighboring Nodes.
*
* @param newroot the new root
*/
void changeRoot(Node *newroot);
void expand(Node *beg);
Node &addNode(std::vector<char> &&visit, Atom *atom,
boost::rational<int> &&frac, int dist, int flags);
// Has `atom` been seen yet?
bool seenAtom(Atom *atom) const;
private:
const CIPMol &d_mol;
// The node from which the Digraph is first initialized.
// It matches the atom that is being labeled.
Node *dp_origin = nullptr;
// in atropisomer mode, we expand the two atoms of the atrop bond
bool d_atropisomerMode = false;
// The current root of the Digraph
Node *dp_root = nullptr;
Atom *dp_rule6Ref = nullptr;
// We can't store these in a vector, as adding new items will
// cause it to reallocate and invalidate the references
std::list<Node> d_nodes;
std::list<Edge> d_edges;
void addEdge(Node *beg, Bond *bond, Node *end);
};
} // namespace CIPLabeler
} // namespace RDKit