mirror of https://github.com/rdkit/rdkit.git synced 2026-06-04 21:54:27 +08:00

Files

Ric 880a8e5725 Reformat Python code for 2023.03 release (#6294 )

* run yapf

* run isort

---------

Co-authored-by: Greg Landrum <greg.landrum@gmail.com>

2023-04-28 06:53:56 +02:00

2023-04-28 06:53:56 +02:00

cdk2.smi

2018-01-21 07:18:38 +01:00

centroid.tsv

Update Readme.md (#2051 )

2018-09-21 05:14:25 +01:00

clustered_parse.tsv

Update Readme.md (#2051 )

2018-09-21 05:14:25 +01:00

clustered.tsv

Update Readme.md (#2051 )

2018-09-21 05:14:25 +01:00

fastcluster.py

2023-04-28 06:53:56 +02:00

fp.tsv

Update Readme.md (#2051 )

2018-09-21 05:14:25 +01:00

README.md

Update Readme.md (#2051 )

2018-09-21 05:14:25 +01:00

FastCluster

This is simple workflow for clustering molecules

Users need to install bayon at first, and can find tutorial at following URL. https://github.com/fujimizu/bayon/wiki/Tutorial_English
Also it is needed to install RDKit for parsing SMILES.
That's all!

input file format is tab delimited text format, "ID" \t "SMILES" \n .....
$ python fastcluster.py {input; inputfile} {N; number of clusters} { --output; filename of output} {--centroid; filename of centroid information}
Example usage
$ ptyhon fastcluster.py cdk2.smi 5 # clustering 47 compounds to 5 clusters.

Fastcluster iwatobipen$ python fastcluster.py cdk2.smi 5

real	0m0.015s
user	0m0.006s
sys	0m0.002s

clusterd.tsv is default output format of bayon. List of clusters with similarity points.

cluster_1 \t molid1 \t point \t molid2 \t point ... \n
cluster_2 \t molid4 \t point \t molid5 \t point ... \n
....

molid1 \t point \t clusterID1 \n
molid2 \t point \t clusterID2 \n
molid3 \t point \t clusterID3 \n
molid4 \t point \t clusterID4 \n
....

It will need more cpu time compared with directly using bayon. Because this script converts smiles to fingerprint dataset at first then performs clustering.