QSAR and Drug Design: New Developments and Applications
Book file PDF easily for everyone and every device.
You can download and read online QSAR and Drug Design: New Developments and Applications file PDF Book only if you are registered here.
And also you can download or read online all Book PDF file that related with QSAR and Drug Design: New Developments and Applications book.
Happy reading QSAR and Drug Design: New Developments and Applications Bookeveryone.
Download file Free Book PDF QSAR and Drug Design: New Developments and Applications at Complete PDF Library.
This Book have some digital formats such us :paperbook, ebook, kindle, epub, fb2 and another formats.
Here is The CompletePDF Book Library.
It's free to register here to get Book file PDF QSAR and Drug Design: New Developments and Applications Pocket Guide.
In the latter, a favorable balance between potency, selectivity, and pharmacokinetic and toxicological parameters, which is required to develop a new, safe, and effective drug, could be achieved through several optimization cycles. As no compound need to be synthesized or tested before computational evaluation, QSAR represents a labor-, time-, and cost-effective method to obtain compounds with desired biological properties.
- The Group as an Object of Desire: Exploring Sexuality in Group Therapy?
- QSAR / in silico Tools | Learning QSAR.
- 1st Edition.
Consequently, QSAR is widely practiced in industries, universities, and research centers around the world Cherkasov et al. Initially, the data sets collected from external sources are curated and integrated to remove or correct inconsistent data. Then, QSAR models are used to identify chemical compounds predicted to be active against selected endpoints from large chemical libraries Cherkasov et al.
In principle, VS is often compared to a funnel, where a large chemical library i. However, it is important to mention that modern VS workflows incorporate additional filtering steps, including: i sets of empirical rules [e. Although the experimental validation of computational hits does not represent part of the QSAR methodology, this should be performed as the final important step.
After experimental validation, a multi-parameter optimization MPO with QSAR predictions of potency, selectivity, and pharmacokinetic parameters can be conducted. This information will be crucial during hit-to lead and lead optimization design of the compound series, to find the properties balance potency, selectivity, and PK related with the effect of different decoration patterns to establish a new series of target compounds for in vivo evaluation.
High-throughput screening can rapidly identify large subsets of molecules with desired activity from large screening collections of compounds 10 5 —10 6 compounds using automated plate-based experimental assays Mueller et al. However, the hit rate of HTS ranges between 0. Consequently, the drug discovery cost increases according to the number of tested compounds Butkiewicz et al. Thus, VS campaigns are found to have a higher rate of biologically active compounds and at a lower cost than HTS.
For example, Mueller et al. First, the HTS of approximately , compounds resulted in a total of 1, hits, with a hit rate of 0. Then, this dataset was used to build continuous QSAR models combining physicochemical descriptors and neural networks , which were subsequently applied to screen a database of approximately , compounds. Finally, compounds were acquired for biological testing and were confirmed as active hit rate of In another study, Rodriguez et al. Further, these data were used to develop QSAR models and, then, applied to screen near , compounds from ChemDiv database.
Among them, 88 of acquired compounds were active, corresponding to a hit rate of 3. Unfortunately, QSAR is still seen as a complementary analysis to studies of synthesis and biological evaluation, often introduced in the study without any justification or additional perspective. Despite the small number of VS applications available in the literature, most of them led to the discovery of promising hits and lead candidates. Below, we discuss some successful applications of QSAR-based VS for the discovery of new hits and hit-to-lead optimization. Malaria is an infectious disease caused by five different species of Plasmodium parasites and transmitted to humans through the bite of infected female mosquitoes of the genus Anopheles.
The most lethal species is P. Malaria is a widespread disease; 91 countries and areas have ongoing transmission. Furthermore, the resistance to antimalarial drugs is a common and growing issue and constitutes a substantial threat for populations in endemic regions Gorobets et al. In a study reported by Zhang et al. During QSAR modeling and validation, the data set was randomly divided into modeling and external evaluation set. Additionally, the modeling set was divided multiple times in training and test sets using the Sphere Exclusion algorithm.
After VS, potential antimalarial compounds were identified and submitted to experimental validation along with 42 putative inactive compounds, used as negative controls. Twenty-five compounds presented antimalarial activity in P. All 42 compounds predicted as inactives by the models were confirmed experimentally Zhang et al. The confirmed experimental hits presented new chemical scaffolds against P. Schistosomiasis is a disease caused by flatworms of the genus Schistosoma that affects million of people worldwide WHO, d.
The current reliance on only one drug, praziquantel, for treatment and control of this disease calls for the urgent discovery of novel anti-schistosomal drugs Colley et al. Aiming at discovering new drugs, our group developed binary QSAR models for Schistosoma mansoni thioredoxin glutathione reductase Sm TGR , a validated target for schistosomiasis Kuntz et al.
To achieve this goal, we designed a study with the following steps: i curation of the largest possible data set of Sm TGR inhibitors, ii development of rigorously validated and mechanistically interpretable models, and iii application of generated models for VS of ChemBridge library. Using the QSAR models, we prioritized 29 compounds for further experimental evaluation.
As a result, we found that the QSAR models were efficient for discovery of six novel hit compounds active against schistosomula and three hits active against adult worms hit rate of Among them, 2-[2- 3-methylnitroisoxazolyl vinyl]pyridine and 2- benzylsulfonyl -1,3-benzothiazole, two compounds representing new chemical scaffolds have activity against schistosomula and adult worms at low micromolar concentrations and therefore represent promising antischistosomal hits for further hit-to-lead optimization Neves et al. Then, the model was used for VS of ChemBridge database and the 10 top ranked compounds were further evaluated in vitro against schistosomula and adult worms.
Additionally, we applied five highly predictive in-house QSAR models for prediction of important pharmacokinetics and toxicity properties of the new hits. Mycobacterium tuberculosis , the causative agent of tuberculosis TB , kills about 1. The current treatment of this disease takes approximately 9 months, which normally leads to noncompliance and, hence, the emergence of multidrug-resistant bacteria AlMatar et al. Aiming the design of new anti-TB agents, our group used QSAR models to design new series of chalcone 1,3-diarylpropenones derivatives.
Initially, we retrieved from the literature all chalcone compounds with in vitro inhibition data against M. After rigorous data curation, these chalcones were subject to structure—activity relationships SAR analysis. Based on SAR rules, bioisosteric replacements were employed to design new chalcone derivatives with optimized anti-TB activity. In parallel, binary QSAR models were generated using several machine learning methods and molecular fingerprints. The fivefold external cross-validation procedure confirmed the high predictive power of the developed models.
Using these models, we prioritized series of chalcone derivatives for synthesis and biological evaluation Gomes et al. As a result, five 5-nitro-substituted heteroaryl chalcones were found to exhibit MICs at nanomolar concentrations against replicating mycobacteria, as well as low micromolar activity against nonreplicating bacteria.
In addition, four of these compounds were more potent than standard drug isoniazid. The series also showed low cytotoxicity against commensal bacteria and mammalian cells. These results suggest that designed heteroaryl chalcones, identified with the help of QSAR models, are promising anti-TB lead candidates Gomes et al. Yearly, influenza epidemics can seriously affect all populations in the world. These annual epidemics are estimated to result in about 5 million cases and , deaths WHO, b.
Influenza virus is mutating constantly, resulting in novel resistant strains, and hence, the development of new anti-influenza drugs active against these new strains is important to prevent pandemics Laborda et al. Aiming the discovery of new anti-influenza A drugs, Lian et al. Then, four different combinations of machine learning methods and molecular descriptors were applied to screen 15, compounds from an in-house database, among which 60 compounds were selected to experimental evaluation on neuraminidase activity. Nine inhibitors were identified, five of which were oseltamivir derivatives exhibiting potent neuraminidase inhibition at nanomolar concentrations.
Other four active compounds belonged to novel scaffolds, with potent inhibition at low micromolar concentrations Lian et al. The treatment for HIV infections requires a lifelong antiretroviral therapy, targeting different stages of HIV replication cycle. Consequently, because of the emergence of resistance and the lack of tolerability, development of novel anti-HIV drugs is of high demand Cihlar and Fordyce, ; Garbelli et al. With the purpose of discovering new anti-HIV-1 drugs, Kurczyk et al. The first step was based on binary QSAR models, and the second on privileged fragments.
Then, 1. Among them, two novel chemotypes with moderate anti-HIV-1 potencies were identified, and therefore, represent new starting points for prospective structural optimization studies. The 5-hydroxytryptamine 1A 5-HT 1A serotonin receptor has been an attractive target for treating mood and anxiety disorders such as schizophrenia Nichols and Nichols, ; Lacivita et al. However, the currently marketed drugs targeting 5-HT 1A receptor possess severe side effects. To address this, Luo et al. First, binary QSAR models were generated using Dragon descriptors and several machine learning methods.
Then, developed QSAR models were rigorously validated and applied in consensus for VS four commercial chemical databases. Fifteen compounds were selected for experimental testing, and nine of them have proven to be active at low nanomolar concentrations. To summarize, we would like to emphasize that QSAR modeling represents a time-, labor-, and cost-effective tool to discover hit compounds and lead candidates in the early stages of drug discovery process.
Analyzing the examples of QSAR-based VS available in the literature, one can see that many of them led to the identification of promising lead candidates. The philosophy of this process resides in reducing the complexity of any system without loss of any intrinsic characteristics or information about the chemical nature. This process is carried out by a back-propagation neural network with architecture AxR -c-y-c- AxR , where A x R represents CODES matrix, c is the number of neurons in codification layer and y is the number of hidden neurons.
The neural network is considered trained when the convergence plot shows a constant behaviour. Wang, T. Quantitative structure-activity relationship: promising advances in drug discovery platforms. Expert Opinion on Drug Discovery 11 , 1—18, doi: Kumar, R. An in silico platform for predicting, screening and designing of antihypertensive peptides. Scientific Reports 5 , , doi: Briard, J. Scientific Report 6 , , doi: Gasteiger, J.
Chemoinformatics: Achievements and Challenges, a Personal View. Molecules 21 2 , , Special Issue Chemoinformatics, doi: Patel, J. Science of the science, drug discovery and artificial neural networks. Current Drug Discovery Technologies 10 1 , 2—7, doi: Basant, N. Predicting human intestinal absorption of diverse chemicals using ensemble learning based QSAR modeling approaches. Computational Biology and Chemistry 61 , —, doi: Dobchev, D. Have artificial neural networks met expectations in drug discovery as implemented in QSAR framework?
Expert Opinion on Drug Discovery 11 7 , —, doi: Dragon, Version 5. Yap, C. PaDEL-descriptor: an open source software to calculate molecular descriptors and fingerprints. Journal of Computational Chemistry 32 7 , —, doi: Danishuddin Khan, A. Descriptors and their selection methods in QSAR analysis: paradigm for drug design.
Drug Discovery Today 21 8 , —, doi: Soto, A. Dorronsoro, I.
Palomba, D. QSAR models for predicting log Pliver on volatile organic compounds combining statistical methods and domain knowledge. Molecules 17 12 , —, doi: Prediction of Elongation at Break for Linear Polymers. Chemometrics and Intelligent Laboratory Systems , —, doi: Cravero, F. Advances in Intelligent Systems and Computing , 3—11, doi: Guerra, A.
Design, synthesis, and evaluation of potential inhibitors of nitric oxide synthase. European Journal of Medicinal Chemistry 45 3 , —, doi: Hall, M. Eklund, M. Choosing feature selection and learning algorithms in QSAR. Journal of Chemical Information and Modeling 54 3 , —, doi: Small-Molecule Drug Discovery Suite. Lipinski, C. Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings.
Advanced drug delivery reviews 46 1—3 , 3—26 11, doi: Martinez, M. Visual analytics in cheminformatics: user-supervised descriptor selection for QSAR methods. Journal Cheminformatics 7 , 39, doi: Deconinck, E. Classification of drugs in absorption classes using the classification and regression trees CART methodology.
Journal of Pharmaceutical and Biomedical Analysis 39 1—2 , 91—, doi: Palm, K. Journal of Medicinal Chemistry 41 27 , —, doi: Brown, J. Is Enantioselectivity Predictable in Asymmetric Catalysis? Angewandte Chemie International Edition 48 , —, doi: Trost, B. Asymmetric Allylic Alkylation, an Enabling Methodology. Journal of Organic Chemistry 69 18 , —, doi: Chemical Reviews 8 , —, doi: Martin, E. Thioether containing ligands for asymmetric allylic substitution reactions. Comptes Rendus Chimie 10 3 , —, doi: Lu, Z.
- Login using.
- We The Peoples Guide to Estate Planning: A Do-It-Yourself Plan for Creating a Will and Living Trust!
- The appreciation of ancient and medieval science during the Renaissance (1450-1600)?
- Psychology (8th Edition).
- Mini Review ARTICLE.
- Didactic Material for Quantitative Structure Activity Relationship (QSAR)!
Angewandte Chemie International Edition 47 , —, doi: Accounts of chemical research 43 , —, doi: Catalytic asymmetric allylic alkylation employing heteroatom nucleophiles: a powerful method for C—X bond formation. Chemical Science 1 , —, doi: Duan, J. Journal of Molecular Graphics and Modelling 29 , —, doi: Sastry, M. Journal of Chemical Information and Modeling 50 , —, doi: Canvas, version 2. Maestro, version 9. LigPrep, version 3. Epik, version 3. Jorgensen, W.
Journal of. Chemical Society 45 , —, doi: Statistics for Windows, Version Breiman, L. Random Forests. Machine Learning 45 1 , 5—32, doi: Download references. R performed the drug-like properties calculation, similarity assessments and the discussion of the importance of the parameters. All authors reviewed and approved the manuscript. Correspondence to Ignacio Ponzoni or Nuria E.
Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. Reprints and Permissions. Scientific Reports Nanoscale Applied Soft Computing Nucleic Acids Research By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate. Advanced search. Skip to main content. Subjects Computational chemistry Medicinal chemistry Virtual drug screening.
Abstract Quantitative structure—activity relationship modeling using machine learning techniques constitutes a complex computational problem, where the identification of the most informative molecular descriptors for predicting a specific target property plays a critical role. Results In this section, several QSAR models inferred by feature selection and feature learning for different physicochemical properties are described. Figure 1. Full size image.
Full size table. Figure 2. Figure 3. Redundancy analysis among the molecular descriptors that conforms the model M13 BBB. Figure 4. Figure 5. Figure 6. Correlation analysis among the molecular descriptors that conforms the model M9 HIA. Figure 7. Figure 8. Figure 9. Figure Discussion During the last decades, several feature selection and feature learning methods have been applied to the inference of molecular descriptor subsets for QSAR modeling. Software used for Processing Molecular Descriptors The first step before applying a feature selection method consists in calculating the molecular descriptors.
The methods used in this study are described next: Linear Regression: Class for using linear regression for prediction. Table 5 Discretization criteria for target properties. References 1. Google Scholar 2. Google Scholar Article Google Scholar Article PubMed Google Scholar La Carrindanga km. Ethics declarations Competing Interests The authors declare that they have no competing interests.
Additional information Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. Electronic supplementary material SupplementaryMaterial. Data Allylic Substitution Info. CTBBB classification. CTBBB regression. M13BBB classification. M13BBB regression. M2BBB classification. M2BBB regression. CTHIA classification. CTHIA regression. M5HIA classification. M5HIA regression. M9HIA classification. We envisage that computational methods can help to predict the enantiomeric excess for a defined set of catalysts and reactants, and hence this opens a new approach to estimate the best chiral ligand for a desired reaction without the need to perform the reaction The palladium 0 -catalyzed asymmetric allylic substitutions Tsuji-Trost reaction is one of the most powerful procedures for the enantiocontrolled formation of carbon-carbon or carbon-heteroatom bonds 28 , 29 , 30 , 31 , 32 , Since the first examples in the early seventies, a vast number of very efficient chiral ligand has been developed for this transformation.
In particular, the asymmetric allylic substitution of allylic acetates or benzoates with dialkyl malonate has been extensively used as a successful test bench for the design and development of new chiral ligands. Given the large amount of literature data for this reaction, in which a vast array of chiral ligands of different backbone, coordination atoms and coordination modes has been tested, we have selected this asymmetric transformation as a model reaction to test the viability of our initial hypothesis. Considering this data, ligands, substrates, and enantiomeric excess values were correlated.
To measure the structural diversity of this dataset, we have carried out studies in an analogous way as with the previous databases. Only ligands were considered to develop the QSAR studies, as substrates show a high similarity among them in terms of descriptors. As it can be observed, the compounds in the dataset present a wide range of diversity from total dissimilarity to similarity for a reference compound. This fact may allow us to foresee that the applicability domain of the models built in this study will be wide as in the previous models.
Due to the structural complexity of this dataset, that contains ligands with metal complexes on their backbone e. However, Canvas module 34 , 35 , 36 allows us to calculate molecular properties for this dataset. The most representative descriptors given by the software were analyzed in order to show the diversity of the dataset, as it is depicted in Fig.source link
Quantitative structure–activity relationship - Wikipedia
The data shows a wide distribution of the physicochemical properties: regarding MW values, compounds are in a range between and , HBA are in the interval from 0 to 19, while rotatable bonds are in the range between 1 and From this analysis it is possible to extract two central conclusions: the first one, this dataset presents a wide range of diversity, an important feature for QSAR models. The second one is related with the low level of drug-like properties of this dataset in comparison with both previous datasets. Physicochemical representation of the EE dataset.
A Dispersion of the compounds regarding hydrogen bond donors and rotatable bonds RB. Color is defined by molecular weight MW. B Dispersion of the dataset taking into account logP values and molecular weight. In particular, the best classification model obtained from the CT EE subset achieves a high level of accuracy From the confusion matrix, we can observe that this QSAR model has a high precision for low-enantiopurity samples The weak performance obtained for the second class can be related to the strong class imbalance in the testing set, where only the This fact can also explain the low value of the average ROC area 0.
In general terms, all the edges are light pink and pink, demonstrating low mutual information and consequently good complementary. Additionally, in Fig. Furthermore, another functionality offered by VIDEAN is the visualization of scatter plots and their associated histograms. The goal is to see the behavior of descriptor values versus the target property, in order to realize how the information zone is covered by different descriptors.
It can be seen a 10 scatter plot with its related histogram, one for each model descriptor. The analysis can be made in two groups, first for substrate descriptors and then for the ligand ones. Sa, Sb, Sc, and Sd mostly show values in the median and right zone see histograms. On the other hand, La, Lb, Lc, Ld, Le, and Lf show values in the left zone see histograms , where substrate descriptor does not present any value. Consequently, we can infer that the combination of the two groups is completing the information zone for the model, and this coverage is desirable for QSAR modeling.
The molecular descriptors chosen by the feature learning method have low redundancy levels. Nevertheless, these differences have not statistical significance. As it was mentioned before, a relevant goal of this work is to assess the potential benefits related to the hybridization of feature selection and feature learning approaches in QSAR modeling. Analyzing all experiments executed for each dataset under different experimental conditions combinations of different molecular descriptor subsets, machine learning methods, and sampling sizes, see Fig.
From this chart, it is clear that regression models inferred from the individual subsets have, in general, better accuracy that the combined ones. Only for the classification models inferred from the BBB dataset, it is observed that combined subsets outperform the individual subsets in most of the experimental scenarios. These results allow us to conclude that the hybridization of both strategies feature selection, and feature learning can be useful but the performance depends on the dataset characteristics.
Number of experimental scenarios where QSAR models obtained by combined subsets improve the performance of the QSAR models inferred by individual subsets.
Another factor of relevance for the practitioner is how to choice the methodology used for the inference of the QSAR models. In this paper, we explore the use of different training methods provided for WEKA tool for regression and classification problems. In both cases, regression and classification models, Random Forest and Random Committee methods achieved a better accuracy that the other methods Neural Networks, Decision Trees and Linear Regression with statistical significance. Nevertheless, the differences between Random Forest and Random Committee are negligibly in both scenarios regression and classification.
For this reason, our piece of advice for the practitioners is the use of training methods based on ensembles, like Random Forest and Random Committee are, because their accuracies outperform the most traditional machine learning methods. During the last decades, several feature selection and feature learning methods have been applied to the inference of molecular descriptor subsets for QSAR modeling.
These models play a central role in the virtual screening of drugs, allowing the study of relevant physicochemical properties even before the synthesis of newly designed compounds. The experiments were carried out with compound datasets for QSAR modeling of three different issues: blood-brain-barrier, human intestinal absorption, and enantiomeric excess. Each dataset used during the machine learning experiments was characterized in detail by drug-like properties calculation and similarity assessment of their molecular descriptors. In all cases, QSAR model performances were contrasted for several experimental conditions, varying sampling parameters and techniques used for inferring the classification and regression models.
From the results, we observed that none of the methods outperform the other one in all scenarios since the prediction accuracy depends on database features and experimental conditions. Nevertheless, regarding the training methods used for QSAR model inference, the techniques based on ensembles, Random Forest and Random Committee, outperform with statistical significance the most traditional algorithms in the two kind of QSAR models regression and classification. For this reason, we recommend to practitioners to apply ensemble based methods for the model training step.
Another piece of advice for QSAR modelers is associated with the intrinsic characteristics of each methodology. CODES considers that the property to study depends on chemical structure of the molecule, and not a contribution of different independent variables. In fact, CODES codifies a structure generation of a small set of descriptor from the chemical structure of the molecule based on the atom nature, the number of atom bonds and the connectivity with the rest of the molecule. Therefore, the interpretation of QSAR models in terms of the individual contribution of the molecular descriptors is possible, helping to obtain more understandable models.
For this reason, each modeler can chose a methodology taking into account in which aspect is focused: computational efforts or model interpretability. Beyond the use of these feature identification approaches separately, as alternative competing methodologies, in this study we also decided to assess the impact of hybridizing both techniques.
This decision was based on recent results, published in the area of QSAR modeling for material design, where the combination of both methods improved the prediction quality. These hybridization experiments for our datasets reveal that QSAR models accuracy can be enhanced by joining molecular descriptor subsets obtained by both methodologies if these subsets contain complementary information for the models, such as it occurred with the best HIA regression model. For this reason, as a general conclusion, we recommend to the virtual screening practitioners to consider this hybridizing philosophy as an additional strategy for their experiments.
Nevertheless, in all cases, different degrees of unbalance among the number of samples available for each class in the testing sets affected the average ROC area values. Therefore, even when this paper is focused on the comparison between two feature selection and feature extraction methods, together with their potential hybridization, we hope to enhance the classifiers by applying techniques for artificial balancing of sample classes in forthcoming experiments. Finally, it is possible to hybridize another alternatives methods for feature selection and feature learning as future work.
LigPrep 38 is a 2D-to-3D conversion tool that includes the addition of hydrogen atoms and options for generating multiple possible tautomers, stereoisomers, ionization at a selected pH range, and ring conformations using molecular mechanics force fields. To carry out our studies, possible ionizations were generated at pH 7. The ionization states were assigned with Epik3 module Also, all the compounds were desalted and no tautomers were generated.
In this process, we have restricted the search to obtain just one possible stereoisomer among all that can be found by the program, as well as one low energy ring conformation. Different conformers and ionization states of the same compounds were reduced in order to keep one 3D structure per initial compound. The selection was made considering the most probable ionization state at physiological pH conditions.
This preparation is a crucial step for the following studies and was performed with the aim of obtaining the most suitable 3D structures to further calculate the physicochemical properties of the existing compounds. For the EE dataset, and due to chemical structures of the dataset that contains coordination complex, Canvas 41 software was used. This tool is a cheminformatics package that provides a range of applications for structural and data analysis, including fingerprints, similarity searching, substructure searching, selection by diversity, clustering, building regression and classification models.
In this case, it allowed us to calculate physicochemical properties in an analogous way that Qikprop. Similarity calculations for the three datasets were performed using the SPSS software Distances were computed between cases measuring similarities by Pearson correlation. The values were transformed into a standardized range of 0 to 1 by variable, and the measures were transformed and rescaled to a 0—1 range.
With these parameters, the similarity was computed for all three databases thus obtaining different correlations between the compounds. For every dataset, one compound was chosen to be the reference and similarity is described for the rest of the datasets. The first step before applying a feature selection method consists in calculating the molecular descriptors. It provides almost 5, molecular descriptors 0D, 1D, 2D, and 3D , which can be used to evaluate molecular structure-activity or structure-property relationships of molecule databases.
To calculate these molecular descriptors, molecular structure files are required. CODES is a software based on artificial neural computing. It generates descriptors correlated with the atom nature, the atom bonds and the connectivity with the rest of the molecule. In fact, each point atom of the topological space corresponds to each unit neuron of the neural space, and each binary relation bond corresponds to each connection of the neural space.
This results in a neural network designed as an interactive activation and competition network, which is processed until an equilibrium state is reached The next step consists in the reduction of the dimension of matrices of each compound. TSAR is the software responsible for the dimension reduction process. Reduction of dimension philosophy resides in reducing the complexity of any system without loss information. This process is achieved by training a supervised multilayer neural network namely ReNDer Reversible Non-linear Dimension reduction.
TSAR program applies a Monte Carlo algorithm and the same number of descriptors for all molecules in databases was obtained Weka is a collection of machine learning algorithms for data mining tasks The methods used in this study are described next:. Linear Regression: Class for using linear regression for prediction. Uses the Akaike criterion for model selection, and is able to deal with weighted instances. Decision Tress: Classifier for building and using a decision stump. Usually used in conjunction with a boosting algorithm.
Hybridizing Feature Selection and Feature Learning Approaches in QSAR Modeling for Drug Discovery
Does regression based on mean-squared error or classification based on entropy. Missing is treated as a separate value. Neural Networks Multiperceptron : A Classifier that uses back-propagation to classify instances. This network can be built by hand, created by an algorithm or both. The network can also be monitored and modified during training time.
The nodes in this network are all sigmoid except for when the class is numeric in which case the output nodes become unthresholded linear units. Random Forest: Class for constructing a forest of random trees The random trees for constructing a tree that considers K randomly chosen attributes at each node.
Performs no pruning. Also, it has an option to allow estimation of class probabilities or target mean in the regression case based on a hold-out set back fitting. Random Committee: Class for building an ensemble of randomized base classifiers. Each base classifier is built using a different random number seed but based on the same data. The final prediction is a straight average of the predictions generated by the individual base classifiers. For HIA models the threshold under 0. CODES consists of two levels, a topological and a neural one, and its philosophy lies in a Gestalt isomorphism between both levels.
While the topological space is the chemical structure in itself, the neural one consists in an interactive and competitive network. Each point or atom of the topological space corresponds with each unit or neuron of the neural space, and each type of atom takes a different initial value. If atoms are not bonded in the topological space, it means an inhibitory connection in neural level, otherwise, the neural space considers an excitatory connection and the value depends on bond type.
The stereochemistry is also taken into account during the codification process and R or S chirality is expressed by a corrective non-linear function Fig. The neural network employs a sigmoideal function in the codification process and the network is characterised by a non-supervised learning. In the learning process, CODES records all the activities reached in every iteration of the network and it is processed until an equilibrium state is reached, so that we have a set of temporal values, cast into a matrix of AxR dimensions, where A is the number of atoms included in the SMILES code and R is the number of iterations that CODES function needs to achieve this equilibrium stage.
In fact, this is a dynamical matrix of descriptors because takes into account the whole codification progress. We have also the chance to choose only the last step of codification, so we would have a static set of descriptor of the molecules but, in order to perform a compression of the information without loss of any of the calculated descriptors, we have selected the matrix with the whole codification progress. Reduction of Dimensions RD. The philosophy of this process resides in reducing the complexity of any system without loss of any intrinsic characteristics or information about the chemical nature.
This process is carried out by a back-propagation neural network with architecture AxR -c-y-c- AxR , where A x R represents CODES matrix, c is the number of neurons in codification layer and y is the number of hidden neurons. The neural network is considered trained when the convergence plot shows a constant behaviour. Wang, T. Quantitative structure-activity relationship: promising advances in drug discovery platforms.
Expert Opinion on Drug Discovery 11 , 1—18, doi: Kumar, R. An in silico platform for predicting, screening and designing of antihypertensive peptides. Scientific Reports 5 , , doi: Briard, J. Scientific Report 6 , , doi: Gasteiger, J.
Foreword By Alexandru T. Balaban
Chemoinformatics: Achievements and Challenges, a Personal View. Molecules 21 2 , , Special Issue Chemoinformatics, doi: Patel, J. Science of the science, drug discovery and artificial neural networks. Current Drug Discovery Technologies 10 1 , 2—7, doi: Basant, N. Predicting human intestinal absorption of diverse chemicals using ensemble learning based QSAR modeling approaches. Computational Biology and Chemistry 61 , —, doi: