Circular Fingerprints: From Molecular Similarity to ADME

 

 

Robert C. Glen, Andreas Bender

Unilever Centre for Molecular Informatics,

Department of Chemistry,

University of Cambridge

 

Circular fingerprints – the representation of molecular structures by atom neighborhoods – are increasingly popular as a method to represent molecules for a wide range of applications such as similarity searching and the prediction of ADME/Tox properties.

Usually, this fingerprint is generated for each heavy atom of a molecule. While in this example, force field atom types are used which encode the hybridization state of atoms in addition to the elemental atom type, in principle any kind of additional information can be employed. Each layer is concatenated into a string which describes the environment around an atom centre. These circular fingerprints appear to have superior performance in comparative studies.

 


 

 

 

 

 

Relative performance of a list of fingerprint definitions on a standardized dataset. Circular fingerprint definitions are shown in black, while other definitions are shown in grey colour. The MOLPRINT 2D method is freely available from http://www.cheminformatics.org.

 

 

We have used these in e.g. calculating pKa's, predicting metabolites, virtual screening and toxicity prediction. Combining these fingerprints with entropy based selection methods and Bayesian analysis can be a powerful combination and be useful both as a classifier and to gain insight into physical phenomena such as pharmacophoric binding patterns.

 

Xing L, Glen RC. Novel methods for the prediction of logP, pK(a), and logD. J. Chem. Inf. Comput. Sci. 2002, 42:796-805.

Xing L, Glen RC, Clark RD. Predicting pK(a) by molecular tree structured fingerprints and PLS. J. Chem. Inf. Comput. Sci. 2003, 43:870-879.

Bender A, Glen RC. Molecular similarity: a key technique in molecular informatics. Org. Biomol. Chem. 2004, 2:3204-3218.

Bender A, Mussa HY, Glen RC, Reiling S. Molecular similarity searching using atom environments, information-based feature selection, and a naive Bayesian classifier. J. Chem. Inf. Comput. Sci. 2004, 44:170-178.

Bender A, Mussa HY, Glen RC, Reiling S. Similarity searching of chemical databases using atom environment descriptors (MOLPRINT 2D): evaluation of performance. J Chem Inf Comput Sci 2004, 44:1708-1718.

Bender, A.; Mussa, H. Y.; Gill, G. S. Glen, R. C. Molecular Surface Point Environments for Virtual Screening and the Elucidation of Binding Patterns (MOLPRINT 3D). J. Med. Chem. 2004; 47(26); 6569-6583.

Bender A, Mussa HY, Glen RC. Screening for Dihydrofolate Reductase Inhibitors Using MOLPRINT 2D, a Fast Fragment-Based Method Employing the Naïve Bayesian Classifier: Limitations of the Descriptor and the Importance of Balanced Chemistry in Training and Test Sets. J. Biomol. Screen. 2005 10: 658-666

Bender, A. Glen, R. C.  A Discussion of Measures of Enrichment in Virtual Screening: Comparing the Information Content of Descriptors with Increasing Levels of Sophistication. J. Chem. Inf. Model. 2005; 45(5); 1369-1375.

Hasselgren-Arnby C, Carlsson L, Smith J, Glen RC, Boyer S: SPORCalc:  A Method for Fingerprint-Based Probabilistic Scoring of Metabolically Labile Sites. J. Med. Chem 2005, (submitted).