Exploring the Relationship Between Drug Side-Effects and Therapeutic Indications

Therapeutic indications and drug side-effects are both measureable human behavioral or physiological changes in response to the treatment. In modern drug development, both inferring potential therapeutic indications and identifying clinically important drug side-effects are challenging tasks. Previous studies have utilized either chemical structures or protein targets to predict indications and side-effects. In this study, we compared indication prediction using side-effect information and side-effect prediction using indication information against models using only chemical structures and protein targets. Experimental results based on 10-fold cross-validation, show that drug side-effects and therapeutic indications are the most predictive features for each other. In addition, we extracted 6,706 statistically highly correlated disease-side-effect pairs from all known drug-disease and drug-side-effect relationships. Many relationship pairs provide explicit repositioning hypotheses (e.g., drugs causing postural hypotension are potential candidates for hypertension) and clear adverse-reaction watch lists (e.g., drugs for heart failure possibly cause impotence). All data sets and highly correlated disease-side-effect relationships are available at http://astro.temple.edu/~tua87106/druganalysis.html.

Introduction

Drug discovery is a time-consuming and laborious process. By conservative estimates, it now takes at least 10 to 15 years and $500 million to $2 billion to bring a single drug to market 1 . Furthermore, there is a widening productivity gap: research and development spending continues to increase, yet the number of new therapeutic chemical and biological entities approved by the US FDA has been declining since the late 1990s. Lack of efficacy and adverse side-effects are two most important reasons for which a drug fails clinical trials, each accounting for around 30% of failures 2 . Thus the development of tools that can predict therapeutic indications and side-effects holds great promise for reducing the attrition rate and improving the drug discovery process.

Inferring potential therapeutic indications (i.e., drug repositioning), for either novel or approved drugs, has become a key approach in drug development. By starting from known compounds with well-characterized pharmacology and safety profiles, it could drastically reduce the risk of attrition in clinical phases. There have been several successful examples of drug repositioning (for example, thalidomide to treat leprosy or finasteride for the prevention of baldness); however they were all results of serendipitous discovery, not well-thought strategies. Recently, a number of computational methods have been developed to predict drug indications. There are five typical computational strategies in drug repositioning: (1) inferring novel drug uses based on shared treatment profile using a network-based, guilt-by-association method 3 ; (2) predicting drug indications on the basis of the chemical structure of the drug 4 ; (3) inferring drug indications from protein targets interaction networks 5 , 6 ; (4) identifying relationships between drugs based on the similarity of their phenotypic profiles (e.g., side-effects 7 , 8 and gene expression 9 , 10 ); (5) integrating multiple properties (e.g., chemical, biological, or phenotypic information) of drugs and diseases to predict drug indications 11 – 13 . With the exception of Yang et al 8 which used side-effects, these strategies focus primarily on using preclinical information. However, clinical therapeutic effects are not always consistent with preclinical outcomes.

At the same time, drug side-effects, or adverse drug reactions, have become a major healthcare concern. As an illustration to the extent of this problem, serious drug side-effects are estimated to be the fourth leading cause of death in US, resulting in 100,000 deaths per year 14 . The identification of potential severe adverse side-effects is a challenging issue at many stages of the drug development process. A useful experimental approach for predicting side-effects is preclinical in-vitro safety profiling which tests compounds with biomedical and cellular assays, but experimental detection of drug side-effects remains very challenging in terms of cost and efficiency. Therefore, several computational methods for analyzing or predicting drug side-effects have been proposed. The methods can be categorized into three types: (1) linking drug side-effects to their chemical structures 15 – 17 , following the spirit of QSAR (quantitative structure-activity relationship); (2) relating drug side-effects to its protein targets 18 , 19 because drugs with similar in vitro protein-binding profiles tend to exhibit similar side-effects; (3) predicting drug side-effects by integrating multiple data sources (e.g., chemical, biological, or phenotypic properties) 20 – 22 .

Therapeutic indications (i.e., drug’s indicated diseases) and side-effects are both measureable behavioral or physiological changes in response to the treatment. Intuitively, if drugs treating a disease share the same side-effects, this may be manifestation of some underlying mechanism-of-action (MOA) linking the indicated disease and the side-effect. In other words, the phenotypic expression of a side-effect can be correlated to that of a disease. Furthermore, both therapeutic indications and side-effects are observations on human in the clinical stage, so there is less of a translational issue. This provides the basis to relate diseases to side-effects (and vice versa), even in cases where the precise pharmacological mechanism is unknown.

In this study, we conducted a comprehensive investigation of multiple sources of information (and their combinations) for both therapeutic-indication prediction and drug side-effect prediction tasks. Our evaluation shows that indeed drug side-effects are important information for therapeutic-indication prediction; and vice versa: drug therapeutic-indications are important information for side-effect prediction. Building on this confirmation of strong correlation between drug indications and side-effects, we further compiled a list of relationships among all known drug-disease and drug-side-effect to build disease-side-effect profiles and identify statistically significant relationships between drug side-effects and therapeutic indications. These strong relationships can be used to provide repositioning hypotheses (e.g., drugs causing postural hypotension are potential candidates for hypertension), as well as adverse-effect watch lists (e.g., drugs for heart failure possibly cause impotence).

Our study differs from prior related studies in the following aspects: (1) we evaluate the use of therapeutic indications to predict side effects, and the use of side-effects to predict therapeutic indications, and by doing so demonstrate that they each is the most effective predictive factor of the other. To our knowledge ours is the first study to do so. While Yang et al 8 also used side-effects to predict drug indications, they did not evaluate it in a general machine learning framework. Furthermore, our study was conducted on a much larger dataset (719 diseases and 1385 side-effects vs. their 145 diseases and 584 side-effects). (2) we build disease-side-effect profiles to elucidate interesting relationships between drug side-effects and therapeutic indications with clinical meanings, which provides a systematic way to generate drug indication hypotheses and adverse-effect watch lists.

Data Sets

In the experiment, we analyzed the approved drugs from DrugBank 23 , which is a widely used public database of drug information. From DrugBank, we collected 1447 FDA-approved small-molecule drugs. Furthermore, we mapped these drugs to several other key drug resources including PubChem 24 and UMLS 25 in order to extract other drug related information. In the end, we extracted chemical structures of the 1103 drugs from PubChem. To encode the drug chemical structure, we used a fingerprint corresponding to the 881 chemical substructures defined in the PubChem. Each drug was represented by an 881-dimensional binary profile whose elements encode for the presence or absence of each PubChem substructure by 1 or 0, respectively. A description of the 881 chemical substructures can be found at the website of PubChem. There are 132,092 associations between drugs and chemical substructures in the dataset, and each drug has 119.8 substructures on average.

From DrugBank, we also got target information of each drug. To facilitate collecting target protein information, we mapped target proteins to UniProt Knowledgebase 26 , a central knowledgebase including most comprehensive and complete information on proteins. In the end, we extracted 3,152 relationships between 1007 drugs and 775 proteins, and each drug has 3.1 protein targets on average. Each drug was represented by a 775-dimensional binary profile whose elements encode for the presence or absence of each target protein by 1 or 0, respectively.

Side-effect keywords were obtained from the SIDER database 27 which contains information about marketed medicines and their recorded adverse drug reactions. This led to a dataset containing 888 small-molecule drugs and 1385 side-effect keywords. Each drug was represented by a 1385-dimensional binary profile whose elements encode for the presence or absence of each of the side-effect keywords by 1 or 0, respectively. We plotted the statistics of side-effect data in Figure 1 . 69% of drugs have between 10 and 100 different side effects; 22% of drugs have more than 100 side-effects; only 9% of drugs have less than 10 side-effects ( Figure 1(a) ). Also, 56% of all side-effects occur for 100 drugs ( Figure 1(b) ). Altogether, there are 61,102 associations between drugs and side-effect terms in the dataset, and each drug has 68.8 side-effects on average.

An external file that holds a picture, illustration, etc. Object name is amia_2013_symposium_1568f1.jpg

Statistics of the side-effect dataset. (a) The number of side-effects per drug. (b) The number of drugs per side-effect.

Drugs’ known uses were obtained through extracting treatment relationships between drugs and diseases from the National Drug File - Reference Terminology (NDF-RT), which is part of the UMLS 25 . The drug-disease treatment relationship list is also used by Li et al 12 as the gold standard set of a drug repositioning task. After normalizing various drug names in NDF-RT to their active ingredients, we were able to extract therapeutic indications for 799 drugs out of the 1103 drugs, which constructed 3250 treatment relationships between 799 drugs and 719 diseases. Thus each drug was represented by a 719-dimensional binary profile whose elements encode for the presence or absence of each of the therapeutic indications by 1 or 0, respectively. We plotted the statistics of therapeutic indications data in Figure 2 . Most of drugs (75%) treat 10 diseases ( Figure 2(a) ). Although the disease Hypertension has 78 related drugs, 80% of diseases has only 10 drugs ( Figure 2(b) ).

An external file that holds a picture, illustration, etc. Object name is amia_2013_symposium_1568f2.jpg

Statistics of the therapeutic-indication dataset. (a) The number of therapeutic indications per drug. (b) The number of drugs per therapeutic indication.

Methodology

In our study, we modeled both drug indication prediction task and drug side-effect prediction task as binary classification problems. For indication prediction, we constructed a classifier for predicting whether a given drug x treat a particular disease or not, and repeat this process for all 719 diseases. For side-effect prediction, we constructed a classifier for predicting whether a given drug x has a side-effect or not, and repeat this process for all 1385 side-effects. We tested four powerful classifiers, Support Vector Machine, Random Forest, Naïve Bayes and Logistic Regression. We will only report the results of logistic regression in the following because it achieved the best empirical results. Our implementation is by Python 2.7 and the codes of all those four classifiers are available in the Scikit-Learn package 28 (http://scikit-learn.org/stable/). The model parameters are tuned with 10-fold cross validation.

Experiment Settings

We tested all data sources and their possible combinations to predict drug side-effect profiles and drug therapeutic-indication profiles. Figure 3 provides a graphical illustration on what information has been used in our prediction tasks.

An external file that holds a picture, illustration, etc. Object name is amia_2013_symposium_1568f3.jpg

Illustration of the proposed method.

For therapeutic indication prediction task, we used the following sources: (1) chemical (881 substructure features); (2) biological (775 protein target features); (3) side-effect (1385 side-effect keywords); (4) chemical+biological (881+775 features); (5) chemical+side-effect (881+1385 features); (6) biological+side-effect (775+1385 features); (7) chemical+biological+side-effect (881+775+1385 features).

For side-effect prediction task, we used the following sources: (1) chemical (881 substructure features); (2) biological (775 protein target features); (3) indication (719 disease-indication features); (4) chemical+biological (881+775 features); (5) chemical+indication (881+719 features); (6) biological+indication (775+719 features); (7) chemical+ biological+indication (881+775+719 features).

To evaluate how difficult the problem considered in this study, we also applied a random assignment procedure, that is, we used the 0/1 ratio to generate a binary label to each test drug randomly. For example, if the ratio in given training data is 90%, we can assign zero for 90% of examples in test; otherwise 1. This method is used as a baseline method for both indication prediction and side-effect prediction tasks.

We used a 10-fold cross validation scheme to evaluate the accuracies of all methods. To avoid easy prediction cases, we held out all the associations involved with 10% of the drugs in each fold, rather than holding out 10% of the associations. For both indication prediction and side-effect prediction tasks, the sample sizes of output classes are highly imbalanced. Consequently, the accuracies of the prediction results could be overestimated. To avoid this problem, we also incorporated a sample balancing strategy in the 10-fold cross validation scheme, where all drugs were split into 10 equal-sized subsets, and each subset was used in turn as the testing set. For constructing the training set at each round of cross validation, we used all the positive drug-indication or drug-side-effect pairs from the remaining nine subsets, and randomly selected negative pairs from the same nine subsets, whose amount is twice as large as the positive pair number. This sample balancing strategy was also used in Gottlieb et al 11 . To obtain robust results, we performed 10 independent cross-validation runs, in each of which a different random partition of the data set to 10 parts was used; we then computed the mean and the standard deviation of the evaluation scores over the entire 10 repetitions. To conduct a fair and accurate comparison across different data sources, we only considered the drugs which have all available sources for each task. And the same experimental conditions were maintained by using the same training drugs and test drugs for each fold.

Performance Measure

We measure the final classification performance using three criteria: sensitivity, specificity, and area under the ROC curve (AUC). In order to provide definition of these four criteria, we construct the classification confusion table for binary classification problems as in Table 1 , where the two classes are indicated as positive or negative.

Table 1.

Actual Value
Predicted Value	True Positive (TP)	False Positive (FP)
Predicted Value	False Negative (FN)	True Negative (TN)

The evaluation metrics we used are sensitivity, specificity and Area Under the Curve (AUC) score. Sensitivity is the true positive rate computed as TP/(TP+FN). Specificity is calculated as TN/(TN+FP), which is equal to 1- False Positive Rate. AUC score is the area under the Receiver Operating Characteristic (ROC) curve, which is a graphical plot of true positive rate vs. false positive rate. The whole ROC curve can be plotted by varying the threshold value for prediction score, above which the output is predicted as positive and negative otherwise. The AUC score has widely been used as a classification performance measure in biostatistics and medical informatics.

To summarize the global performance across 719 diseases (for drug indication prediction) and 1385 side-effects (for drug side-effect prediction), we merged the prediction scores of all drugs over all diseases (for drug indication prediction) and of all drugs over all side-effects (for side-effect prediction) and drew global ROC curves. In literature, this strategy is widely used for both drug indication prediction tasks 11 , 12 and side-effect prediction tasks 16 , 22 . The reported sensitivity and specificity were obtained from the operating points of the global ROC curve, so that it gives the best tradeoff between false positives and negatives.

Results and Analysis

For therapeutic-indication prediction task, Figure 4 shows the averaged ROC curves of 10 runs of the cross validation for different information sources based on cross-validation experiments, and Table 2 summarizes the concrete values of those evaluation results. When the information sources were compared independently, side-effect is the most informative (AUC of 0.8408), chemical structure ranks as the second (AUC of 0.8148), followed by target protein information (AUC of 0.8011). While combing any two data sources will improve the AUC, combing all three data sources, we obtained the highest AUC score (AUC of 0.8640).

An external file that holds a picture, illustration, etc. Object name is amia_2013_symposium_1568f4.jpg

The averaged ROC comparison of therapeutic indication predictions for various information source combinations using in 10-fold cross validation. Information sources are sorted in legend of the figure according to their AUC score.

Table 2.

Performance comparison of drug therapeutic-indication prediction with different information sources

Information Source	AUC	Sensitivity	Specificity
Random	0.5000+/−0.0010	0.0072+/−0.0021	0.9929+/−0.0002
Chemical	0.8148+/−0.0019	0.5321+/−0.0046	0.9647+/−0.0004
Protein	0.8011+/−0.0021	0.5387+/−0.0038	0.9841+/−0.0002
Side-effect	0.8408+/−0.0036	0.5575+/−0.0046	0.9737+/−0.0004
Chemical+Protein	0.8295+/−0.0021	0.4014+/−0.0041	0.9921+/−0.0001
Chemical+Side-effect	0.8563+/−0.0022	0.6228+/−0.0071	0.9516+/−0.0006
Protein+Side-effect	0.8515+/−0.0053	0.5625+/−0.0070	0.9793+/−0.0003
Chemical+Protein+Side-effect	0.8640+/−0.0035	0.6195+/−0.0067	0.9650+/−0.0004

For drug side-effect prediction tasks, Figure 5 shows the averaged ROC curves of 10 runs of the cross validation for different information sources based on cross-validation experiments, and Table 3 summarizes the evaluation results. When the information sources were compared independently, therapeutic indication is the most informative (AUC of 0.7058), target protein information is also highly informative (AUC of 0.6993), but chemical structure performed much worse (AUC of 0.6379). This could be partially explained with the following reasons. Both therapeutic indications and side-effects are complex phenomenological observations that attributed to chemical structures (i.e., drugs) interact with primary or additional targets (off-targets hereafter). Expected activities derived from on-targets result in therapeutic effects. Unexpected (usually unwanted and harmful) activities derived from off-targets lead to side-effects. Minor differences in chemical structure of a drug may not affect the primary targets, therefore chemical structure could be very useful for predicting drug indications. However, even minor differences in chemical structure of a drug may cause a dramatic impact on how it interacts with off-targets, thus could result in significant differences in side-effect profiles of the drug. Therefore, drugs with similar chemical structures may not have similar side-effects, i.e., the performance could be bad if we use chemical structure to predict side-effects. While combing therapeutic indication and target protein results in the highest AUC score (AUC of 0.7103), combining chemical structure and any other information sources will make the prediction performance worse.

An external file that holds a picture, illustration, etc. Object name is amia_2013_symposium_1568f5.jpg

The averaged ROC comparison of drug side-effect predictions for various information source combinations using in 10-fold cross validation. Information sources are sorted in legend of the figure according to their AUC score.

Table 3.

Performance comparison of drug side-effects prediction with different information source

Information Source	AUC	Sensitivity	Specificity
Random	0.5001+/−0.0004	0.0599+/−0.0007	0.9403+/−0.0004
Chemical	0.6379+/−0.0008	0.2436+/−0.0012	0.9401+/−0.0003
Protein	0.6993+/−0.0014	0.4746+/−0.0010	0.9128+/−0.0006
Indication	0.7058+/−0.0014	0.5207+/−0.0017	0.8995+/−0.0005
Chemical+Protein	0.6644+/−0.0009	0.2843+/−0.0016	0.9468+/−0.0003
Chemical+Indication	0.6690+/−0.0012	0.2881+/−0.0016	0.9494+/−0.0004
Protein+Indication	0.7103+/−0.0011	0.4689+/−0.0018	0.9319+/−0.0002
Chemical+Protein+Indication	0.6837+/−0.0010	0.3035+/−0.0015	0.9542+/−0.0003

From those results we can observe that drug side-effects and therapeutic indications are the most predictive features for each other. This suggests some hidden correlations between them. To explore those correlations, we used Fisher’s exact test 29 , which is a typical approach for measuring the significance of the association between two nominal variables (e.g., each side-effect vs. each disease). Thus we build disease-side-effect profiles (the most likely side-effects by the drugs which treat a specific disease) based on known drug-disease and drug-side-effect relationships. Among all 995,815 (719 diseases by 1385 side-effects) disease-side-effect pairs, there are 17,386 (1.75%) pairs have p-value

At a p-value cutoff of 0.01, we found 6,706 highly correlated disease-side-effect pairs between 458 disease and 1077 side-effects. On average, each disease’s drugs very likely to cause 14.6 side-effects and each side-effect highly associates with 6.2 types of diseases. We plotted the statistics of highly correlated disease-side-effect pairs (p-value <0.01) in Figure 6 , from which we can observe that 63% of the diseases highly correlate with <10 side-effects; 36% of the diseases highly correlate with 10 to 100 drugs; only 4 diseases highly correlate with >100 side-effects ( Figure 6(a) ). For example, disease Obsessive-Compulsive Disorder is highly correlated with 260 side-effect keywords in our analysis, but only 7 drugs treat this disease in our drug-disease dataset. 60% of side-effects are highly associated with 10 diseases ( Figure 6(b) ).

An external file that holds a picture, illustration, etc. Object name is amia_2013_symposium_1568f6.jpg

Table 4(a) shows 10 most closely correlated side-effects for disease Hypertension. Some of the side-effects are physiologically linked to hypertension and the mechanism of action (MOA) can be explained. For example, some hypertension drugs may result in a sudden drop in blood pressure when a person stands up, thus the side-effect postural hypotension happens. Some hypertension drugs (e.g., β-blockers) hits α-adrenergic receptors protein target in penile tissue, which will cause side-effect impotence. The decreased blood pressure caused by some hypertension drugs (e.g., β-blockers) also cause side-effects syncope, dizziness, vertigo, and weakness. Side-effect pemphigus is related to ACE inhibitors, which is also one kind of hypertension drug. Some hypertension treatments (e.g., Diuretics) cause human body to lose salt and water, potentially causing side-effects gout and hyperuricemia. Similarly, Table 4(b) shows 10 most closely correlated side-effects for disease Pain. Nonsteroidal anti-inflammatory pain medicines (e.g., Advil and Motrin) increase risk of heart attack and stroke, thus cause tachycardia, heart block, and arrhythmia as side-effects. Low doses of tricyclic or tetracyclic antidepressant drugs increase the level of certain brain chemicals, which affect how the brain perceives pain. But they cause side-effects urinary retention, blurred vision and confusion. Other types of antidepressants (e.g., SSRI and SNRI) also cause somnolence.

Table 4.

10 most correlated side-effects for disease Hypertension and Pain

(a)
Disease	Side-effects	P-value
Hypertension	postural hypotension	3.66E-15
	impotence	7.21E-12
	claudication	2.19E-09
	syncope	1.11E-07
	hyperuricemia	2.92E-07
	vertigo	3.33E-07
	dizziness	8.54E-07
	gout	1.23E-06
	pemphigus	1.29E-06
	weakness	8.79E-06

(b)
Disease	Side-effects	P-value
Pain	tachycardia	1.65E-07
	heart block	2.14E-07
	apnea	3.10E-07
	urinary retention	1.18E-05
	hallucinations	2.32E-05
	tinnitus	2.53E-05
	somnolence	5.81E-05
	blurred vision	1.31E-04
	arrhythmia	2.42E-04
	confusion	2.55E-04

Table 5(a) shows 10 therapeutic indications (diseases) with strongest correlation to the side-effect weight loss. Many diseases in the list are mood disorders (e.g., bipolar disorder, depressive disorder, panic disorder). The most widely prescribed mood control drugs come from a class of medications known as selective serotonin reuptake inhibitors (SSRIs, such as Prozac, Zoloft). SSRIs act on serotonin, a chemical in the brain that helps regulate mood. However, serotonin also plays a role in digestion, sleep and other bodily functions. Thus mood control drugs result in dizziness, nausea, loss of appetite, and finally cause weight loss. Similarly, the drugs for Alzheimer disease (e.g., Aricept, Cognex, Exelon) cause vomiting, nausea, loss of appetite, thus result in weight loss. Some anti-diabetic medication (e.g., α-glucosidase inhibitors) lowers the amount of sugar metabolized, which cause weight loss. Table 5(b) shows 10 therapeutic indications (diseases) with strongest correlation to impotence as side-effect. Drugs for cardiovascular diseases (e.g., hypertension, heart failure) are used to lower the pressure inside blood vessels, so the heart does not have to work as hard as usual to pump blood throughout the body. But the decreased blood flow can reduce desire and interfere with erections and ejaculation, thus cause impotence. Some cardiovascular drugs limit the availability of cholesterol and likely interfere with the production of testosterone, estrogen and other sex hormones, also cause impotence. Drugs for mood disorders (e.g., depressive disorder, bipolar disorder) block the action of brain chemicals that relay signals between nerve cells, thus decrease sex drive, causing impotence as a side-effect.

Table 5.

10 most correlated indicated diseases for side-effect weight loss and impotence

(a)
Side-effect	Disease	P-value
weight loss	Bipolar Disorder	1.10E-06
	Depressive Disorder	1.63E-04
	Alzheimer Disease	3.81E-04
	Epilepsies, Partial	3.81E-04
	Panic Disorder	3.68E-03
	Diabetes Mellitus, Type 2	5.74E-03
	Asthma	6.99E-03
	Pancreatic Neoplasms	7.80E-03
	Autistic Disorder	9.06E-03
	Lymphoma	9.06E-03

(b)
Side-effect	Disease	P-value
impotence	Hypertension	7.21E-12
	Depressive Disorder	9.20E-12
	Bipolar Disorder	1.30E-05
	Schizophrenia	2.06E-05
	Heart Failure	4.11E-05
	Myocardial Infarction	5.69E-05
	Urinary Tract Infections	5.86E-05
	Diabetic Nephropathies	4.56E-04
	Asthma	1.12E-03
	Angina Pectoris	1.96E-03

Both therapeutic indications and clinical side-effects are human phenotypic data obviating translation issues. Therefore, the strongly correlated disease-side-effect pairs are beneficial for drug discovery: (1) we can use the side-effects information to repurpose existing treatments. For example, based on the information of Table 4(a) , we may consider drugs with side-effect postural hypotension as candidates for hypertension. Also based on the information in Table 5(a) , we may consider and evaluate some mood-disorder drugs for the usage of weight loss (i.e., as weight-loss pills). (2) If a new treatment is designed for a specific disease, all health care stakeholders (e.g., regulators, providers, patients and pharmaceutical companies) should pay more attention to adverse reactions in the over-represented side-effect list of the disease (e.g., Table 4(a) for hypertension and Table 4(b) for pain), and control the formulation and dosing of drugs in the clinical trials to prevent serious safety issues.

Conclusion

In this study, we performed a systematic exploration of multiple sources of information (and their combinations) for therapeutic-indication prediction and drug side-effect prediction tasks. Using a cross-validation scheme, we found that side-effect-based approach displays better performance for the prediction of therapeutic-indication and drug-indication-based approach shows better performance for the prediction of drug side-effect, compared to chemical-structure and protein-target based methods. Furthermore, we built disease-side-effect profiles to discover statistically highly correlated relationships between drug side-effects and therapeutic indications. These relationships can be leveraged for real-world drug discovery.

References

1. Adams CP, Brantner VV. Estimating the cost of new drug development: is it really 802 million dollars? Health Aff (Millwood) 2006; 25 (2):420–428. [PubMed] [Google Scholar]

2. Hopkins AL. Network pharmacology: the next paradigm in drug discovery. Nat Chem Biol. 2008; 4 :682–690. [PubMed] [Google Scholar]

3. Chiang AP, Butte AJ. Systematic evaluation of drug-disease relationships to identify leads for novel drug uses. Clin Pharmacol Ther. 2009; 86 (5):507–510. [PMC free article] [PubMed] [Google Scholar]

4. Keiser MJ, Setola V, Irwin JJ, Laggner C, Abbas AI, Hufeisen SJ, Jensen NH, Kuijer MB, Matos RC, Tran TB, Whaley R, Glennon RA, Hert J, Thomas KL, Edwards DD, Shoichet BK, Roth BL. Predicting new molecular targets for known drugs. Nature. 2009; 462 (7270):175–181. [PMC free article] [PubMed] [Google Scholar]

5. Li J, Zhu X, Chen JY. Building Disease-Specific Drug-Protein Connectivity Maps from Molecular Interaction Networks and PubMed Abstracts. PLoS Comput Biol. 2009; 5 (7):e1000450. [PMC free article] [PubMed] [Google Scholar]

6. Kotelnikova E, Yuryev A, Mazo I, Daraselia N. Computational approaches for drug repositioning and combination therapy design. J Bioinform Comput Biol. 2010; 8 (3):593–606. [PubMed] [Google Scholar]

7. Campillos M, Kuhn M, Gavin AC, Jensen LJ, Bork P. Drug target identification using side-effect similarity. Science. 2008; 321 (5886):263–266. [PubMed] [Google Scholar]

8. Yang L, Agarwal P. Systematic Drug Repositioning Based on Clinical Side-Effects. PLoS ONE. 2011; 6 (12):e28025. [PMC free article] [PubMed] [Google Scholar]

9. Hu G, Agarwal P. Human Disease-Drug Network Based on Genomic Expression Profiles. PLoS ONE. 2009; 4 (8):e6536. [PMC free article] [PubMed] [Google Scholar]

10. Sirota M, Dudley JT, Kim J, Chiang AP, Morgan AA, Sweet-Cordero A, Sage J, Butte AJ. Discovery and preclinical validation of drug indications using compendia of public gene expression data. Sci Transl Med. 2011; 3 (96):96ra77. [PMC free article] [PubMed] [Google Scholar]

11. Gottlieb A, Stein GY, Ruppin E, Sharan R. PREDICT: a method for inferring novel drug indications with application to personalized medicine. Mol Syst Biol. 2011; 7 :496. [PMC free article] [PubMed] [Google Scholar]

12. Li J, Lu Z. A New Method for Computational Drug Repositioning Using Drug Pairwise Similarity. Proceedings of the IEEE International Conference on Bioinformatics and Biomedicine; 2012. [PMC free article] [PubMed] [Google Scholar]

13. Zhang P, Agarwal P, Obradovic Z. Computational Drug Repositioning by Ranking and Integrating Multiple Data Sources. Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases; 2013. [Google Scholar]

14. Giacomini KM, Krauss RM, Roden DM, Eichelbaum M, Hayden MR, Nakamura Y. When good drugs go bad. Nature. 2007; 446 (7139):975–977. [PubMed] [Google Scholar]

15. Atias N, Sharan R. An algorithmic framework for predicting side effects of drugs. J Comput Biol. 2011; 18 (3):207–218. [PubMed] [Google Scholar]

16. Pauwels E, Stoven V, Yamanishi Y. Predicting drug side-effect profiles: a chemical fragment-based approach. BMC Bioinformatics. 2011; 18 (12):169. [PMC free article] [PubMed] [Google Scholar]

17. Scheiber J, Jenkins J, Sukuru S, Bender A, Mikhailov D, Milik M, Azzaoui K, Whitebread S, Hamon J, Urban L, Glick M, Davies J. Mapping adverse drug reaction in chemical space. J. Med. Chem. 2009; 52 (9):3103–3107. [PubMed] [Google Scholar]

18. Scheiber J, Chen B, Milik M, Sukuru SC, Bender A, Mikhailov D, Whitebread S, Hamon J, Azzaoui K, Urban L, Glick M, Davies JW, Jenkins JL. Gaining insight into off-target mediated effects of drug candidates with a comprehensive systems chemical biology analysis. J Chem Inf Model. 2009; 49 (2):308–317. [PubMed] [Google Scholar]

19. Xie L, Li J, Xie L, Bourne PE. Drug discovery using chemical systems biology: identification of the protein-ligand binding network to explain the side effects of CETP inhibitors. PLoS Comput Biol. 2009; 5 (5):e1000387. [PMC free article] [PubMed] [Google Scholar]

20. Yamanishi Y, Pauwels E, Kotera M. Drug side-effect prediction based on the integration of chemical and biological spaces. J Chem Inf Model. 2012; 52 (12):3284–3292. [PubMed] [Google Scholar]

21. Huang LC, Wu X, Chen JY. Predicting adverse drug reaction profiles by integrating protein interaction networks with drug structures. Proteomics. 2013; 13 (2):313–324. [PubMed] [Google Scholar]

22. Liu M, Wu Y, Chen Y, Sun J, Zhao Z, Chen XW, Matheny ME, Xu H. Large-scale prediction of adverse drug reactions using chemical, biological, and phenotypic properties of drugs. J Am Med Inform Assoc. 2012; 19 (1e):e28–e35. [PMC free article] [PubMed] [Google Scholar]

23. Wishart DS, Knox C, Guo AC, Cheng D, Shrivastava S, Tzur D, Gautam B, Hassanali M. DrugBank: a knowledgebase for drugs, drug actions and drug targets. Nucleic Acids Res. 2008; 36 (Database Issue):D901–D906. [PMC free article] [PubMed] [Google Scholar]

24. Wang Y, Xiao J, Suzek TO, Zhang J, Wang J, Bryant SH. PubChem: a public information system for analyzing bioactivities of small molecules. Nucleic Acids Res. 2009; 37 (Web Server Issue):W623–W633. [PMC free article] [PubMed] [Google Scholar]

25. Olivier B. The Unified Medical Language System (UMLS): integrating biomedical terminology. Nucleic Acids Res. 2004; 32 (Database Issue):D267–D270. [PMC free article] [PubMed] [Google Scholar]

26. Apweiler R, Bairoch A, Wu CH, Barker WC, Boeckmann B, Ferro S, Gasteiger E, Huang H, Lopez R, Magrane M, Martin MJ, Natale DA, O’Donovan C, Redaschi N, Yeh LS. UniProt: the Universal Protein knowledgebase. Nucleic Acids Res. 2004; 32 (Database issue):D115–D119. [PMC free article] [PubMed] [Google Scholar]

27. Kuhn M, Campillos M, Letunic I, Jensen LJ, Bork P. A side effect resource to capture phenotypic effects of drugs. Molecular Systems Biology. 2010; 6 :343. [PMC free article] [PubMed] [Google Scholar]

28. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E. Scikit-learn: machine learning in Python. Journal of Machine Learning Research. 2011; 12 :2825–2830. [Google Scholar]

29. Upton G. Fisher’s exact test. Journal of the Royal Statistical Society. Series A (Statistics in Society) 1992; 155 (3):395–402. [Google Scholar]

Articles from AMIA Annual Symposium Proceedings are provided here courtesy of American Medical Informatics Association