Discrete-Continuos QSAR  Methodology on the Basis of Physico-Chemical descriptors.

The principal steps of QSAR modelling are usually the following. First, a set of descriptors that adequately characterizes the properties of a set of compounds with known activity (the so called "training set") is estimated. Second, a correlation between the selected descriptors and the property under consideration is developed using statistical methods. LSER (Least Squares Error Reduction ), a Multiple Regression Analysis ( MRA) technique is the common basis of QSAR. This approach describes the dependent vector (in our case, property or biological activity)

Y= {yi}, i=1,. M as a function,

 Yi = a0 + ∑ ajxij + ei                                         (1)

of a number of independent variables X = {xij}, i=1,M; j=1,.N of the training set ( here, M  is number of chemical compounds that are described by N structural descriptors; ei is residuals).

In such models, it is tacitly assumed that all compounds in the training set have the same mechanism of action and the same biological target. In actual situations, the diversity and complexity in a chemical structures may not allow a complete characterization of the compounds by physico-chemical descriptors.

Few approaches have been used to overcome this problem. For example, the use of indicator variables permits QSAR models to operate on sets of non-homogenous compounds. However, MRA technique is limited by the a priori requirement that all structural descriptors be independent from each other, error free, and relevant to the problem, and that all compounds belong to the same group (or cluster). The latter circumstance is important, because while MRA can lead to fairly good explanations of intraclass structure, it is unable to recognize the existance of clusters of compounds.

The other approach is SIMCA/PLS method [1,2] which is based on the philosophy of applying disjoint principal component (PC) models to each set of homogenous compounds.

In 1984 Raevsky at al. proposed QSAR Discriminant-Regression Model (DIREM) [3] which resembles the SIMCA/PLS method but differ from it in some important aspects:

During the 90 ths DIREM was used by authors for creation of stable predictable QSAR models of different properties and activities .

The original combination of Similarity and QSAR for creating stable, predictive models of properties (activity) was recently proposed by Raevsky [4]. In this work four approaches were considered for logP calculation of drugs containing few chemical functional groups:

              logPi = S [((logPj  + 0.267(ai - a j) - 1.00 (∑Cai -∑Caj)]/N                (5)

            where index i indicates the compound-of-interest, index j indicates a near neighbor;

            and N is the number of closely related structures used.

 Later this approach was applied to construct stable predictable models of  lipophilicity, solubility in water, intestinal absorption in human [5-7].



