Image_logo.bmp (1885 bytes) MOL ECULAR PR OPERTIES O PTIMIZATION PROJECT




 

MOLDIVS

MOLDIVS (MOLecular DIVersity and Similarity) is a new program for molecular similarity and diversity calculations for Microsoft Windows 2000/NT. MOLDIVS has friendly graphic user interface and it permits to perform the whole range of similarity and diversity calculation tasks on large sets of compounds. In this program it is possible to use the structural fragments of two types: plain structural fragment and combined structural-physicochemical fragments. Both fragments are defined as atom-centered concentric environments. Fragment consists of a central atom and neighboring atoms connected to it within the predefined sphere size (number of bonds between the central and edge atoms). For each fragment the complete connection table is stored. For each atom in a fragment the information on the atom and bond type, charge, valency, cycle type and size is coded into fixed-length variables, which are subsequently used to define a pseudo-random hash value for this fragment. The complete set of fragments with selected sphere size is created automatically and forms a fragments library. For each fragment in the library the frequency of occurrence is calculated. An unlimited number of fragments and sphere of any size can be used. In structural-physicochemical fragments each atom is characterized by three parameters: partial atomic charge, polarizability and H-bond donor/acceptor factor instead of atomic element type as in plain structural fragments. Adjustable ranges of these properties are used as atomic types. The program permits an estimation of similarity of each molecule in the database with all other molecules sorting them on the value of similarity with the initial molecule. It is possible to use different molecular similarity coefficients: Tanimoto, Euclidean and Cosine. Different measures of diversity of the whole database are available in this program. The program allows rapid estimation of diversity of the whole database according to equation (1) using the cosine similarity coefficient on the basis of the centroid algorithm. Different compound selection algorithms for diverse subset formation (stepwise elimination and cluster sampling, number of maximum dissimilarity selection algorithms) are used in this program.