Files
tadhurst-cdd ca41fa5bfd Add SCSR parsing to RDKit (#8147)
* Parsing SCSR

* add scsrol to mol

* removed bad include file

* loosen distGeom test slightly

* add wrap test for SCSRMol

* Add test for scsr in python

* tests added for scsr and strict parsing removed

* remove extra stuff

* More fully specified use of SCSRMol for PR CI build

* Added flags for SCSR expansion to not include any leaving groups

* Added MolFromScsrParams to Wrap for python

* added SCSRMol destructor

* Added two tests for RNA macromols, and fixed a bug they revealed

* Added new tests abd expected files

* changes as per PR review

* SCSR Chnages for leaving groups

* fixed testScsr.py

* hydrogen bond treatment

* in SCSR expand, allow Hbond to be autoatically detected

* changes as per code review

* Adding new test file

* chages for SCSR contructors, destructors for CI build

* fixed pyton for SCSR hydrogen bond modes, and added tests

* Added new test files

* fixed edge case for SCSR

* fix checksum for inchi

* consistent capitalization of SCSR throughout

* switch to enum class

* make things shorter

* simplify

* get rid of the ATTCHORD class

* New section for SCSR in RDKit_book

* addeed section to RDKit_Book

* SCSRMol is no longer exposed in Python

* fix leak in MolFromSCSRFile()
light refactoring

* expose MolFromSCSRFile() to python
make the MolFromSCSR functions work with default args
a bit more testing

* removed C++ access to SCSRMol

* CXMsiles now ouputs hbonds, fix to template matching, and a few other things

* Addl fix for bad aromaticity in Hbond rings

* Test files needed

* Test files needed

* try to fix a CI build errors

* CI error fix

* Added missing test file

* CMake version - for CI build

* remove full file compoarison from macromol test file

* accidental change to debug restored to release

* Code review changes

* As per PR review

---------

Co-authored-by: Greg Landrum <greg.landrum@gmail.com>
2025-05-14 13:37:59 +02:00
..
2019-10-10 20:18:43 +09:00
2013-01-13 19:47:58 +00:00
2019-10-10 20:18:43 +09:00
2025-05-14 13:37:59 +02:00
2010-04-07 19:54:57 +00:00
2011-06-09 12:06:16 +00:00
2011-05-13 02:41:58 +00:00

Description of data files in this folder

Solubility dataset

  • solubility.test.sdf (257 records)
  • solubility.train.sdf (1025 records)

The two sdf files(hereby named "solubility dataset") are originated from the Huuskonen dataset. The Huuskonen dataset contains a training set of 884 compounds and a randomly chosen test set of 413 compounds.

  • Reference: Huuskonen, J. (2000). Estimation of Aqueous Solubility for a Diverse Set of Organic Compounds Based on Molecular Topology. Journal of Chemical Information and Computer Sciences, 40(3), 773777. https://doi.org/10.1021/ci9901338

This solubility dataset is originally downloaded from

Although cheminformatics.org no longer exists, supplementary file from https://doi.org/10.1021/ci9901338 contains a list of all the structures and the corresponding data in PDF format.