About

Comprehensive Resource of Biomedical Relations with Deep Learning and Network Representations (CROssBAR) is a joint project between the Middle East Technical University (METU) and the European Bioinformatics Institute (EMBL-EBI) . This project is funded by the Scientific and Technological Research Council of Turkey (TUBITAK) and British Council.

Within the scope of the CROssBAR project, we designed, developed and implemented;

Data sources:

CROssBAR noSQL database

Deep learning-based drug-target interaction prediction systems:

DEEPScreen and
MDeePred

Network based representation of the integrated biomedical information:

CROssBAR knowledge graphs
CROssBAR Web-service

Data visualization tools:

SmartBioGraph (Graph database and visualization in neo4j) and
iBioProVis

Wet-lab validation experiments:

In vitro cell-based assays on cancer cell-lines

DEEPScreen

DEEPScreen is a high performance drug–target interaction predictor that utilizes convolutional neural networks and 2-D structural compound representations to predict their activity against intended target proteins. DEEPScreen system is composed of 704 target protein specific prediction models, each independently trained using experimental bioactivity measurements against many drug candidate small molecules, and optimized according to the binding properties of the target proteins.

DEEPScreen can be exploited in the fields of drug discovery and repurposing for in silico screening of the chemogenomic space, to provide novel DTIs which can be experimentally pursued. The source code, trained "ready-to-use" prediction models, all datasets and the results of this study are available at https://github.com/cansyl/DEEPscreen. DEEPScreen article is available at https://doi.org/10.1039/C9SC03414E.

CROssBAR knowledge graphs

In CROssBAR knowledge graphs, different biological components, such as;

drugs/compounds,
genes/proteins,
pathways,
phenotypes and
diseases

are represented as nodes, and the known and predicted pairwise relationships are annotated and displayed as labeled edges. The knowledge graphs are constructed on the fly, each time the CROssBAR database is queried by the user. To convert the full output of user queries, which are initially extremely large biological networks, into biologically meaningful and interpretable representations without losing primary relationships, we applied intensive node enrichment operations. The knowledge graphs are displayed to the user as heterogeneous biological networks and their purpose is to aid biomedical research, especially in the fields of drug discovery and repositioning, by providing a concise piece of relevant biological information to the user in real time.

SmartBioGraph

In the SmartBioGraph project, CROssBAR noSQL database has been fully reconstructed using Neo4J graph database management system, to utilize advantages of modern graph database systems such as elevated query speed on large-scale data. Through the web application, users can fetch, visualize and filter data without using a query language. The search entries are converted into the Cypher query language and transferred to the Neo4J DBMS in real-time, and the query result is shown interactively as Cytoscape networks. SmartBioGraph application is available at http://smartbiograph.ceng.metu.edu.tr/ .

Wet-lab Experiments

Within the CROssBAR project, wet-lab molecular biology experiments, centered around the topics of mechanisms and the potential treatments of the hepatocellular carcinoma (HCC) disease, have been designed and carried out for selected sets of computational predictions, with the aim of validating the accuracy of the information included in the CROssBAR resource. As a result of DEEPScreen large-scale prediction run, it has been disclosed that JAK proteins may be the new target proteins of Cladribine. Cytotoxicity, cell death, cell cycle analyses and flow cytometry analysis of STAT-3 phosphorylation and protein immunoblotting techniques were used to demonstrate whether Cladribine drug had an effect on this pathway in hepatocellular carcinoma(HCC) cell lines (Huh7, Mahlavu and HepG2). For more information please refer to our article: https://doi.org/10.1039/C9SC03414E.

Drug-target Interaction Prediction Challenge

In order to measure the performance of DTI prediction approaches developed in this project and to compare it with other approaches, we participated in the IDG-DREAM Drug-Kinase Binding Prediction Challenge. As the CROssBAR project team, we developed two chemogenomic based receptor-ligand binding affinity prediction methods, using deep (pairwise input deep neural networks) and conventional (random forest) ML techniques and our models ranked among the top performers (4th best team with RMSE=1.066). These were earlier versions of our main DTI prediction tools. Considerably high performance of our models in this challenge demonstrates the usefulness of chemogenomic approach for the computational prediction of DTIs. More information about our models is available at https://www.synapse.org/#!Synapse:syn18636383/wiki/590959.

Project Publications

Doğan, T., Atas, H., Joshi, V., Atakan, A., Rifaioglu, A.S., Nalbat, E., Nightingale, A., Saidi, R., Volynkin, V., Zellner, H. Cetin-Atalay, R., Martin, M. J., Cetin-Atalay, R., & Atalay, V. (2020). CROssBAR: Comprehensive Resource of Biomedical Relations with Deep Learning Applications and Knowledge Graph Representations. bioRxiv, 2020.09.14.296889 (https://doi.org/10.1101/2020.09.14.296889).

Rifaioglu, A.S., Nalbat, E., Atalay, M.V., Martin, M.J., Cetin-Atalay, R. & Doğan, T. (2020). DEEPScreen: High Performance Drug-Target Interaction Prediction with Convolutional Neural Networks Using 2-D Structural Compound Representations Chemical Science 11(9), 2531-2557. (https://doi.org/10.1039/C9SC03414E).

Rifaioglu, A.S., Atas, H., R., Martin, M.J., Cetin‐Atalay, R., Atalay, M.V. & Doğan, T. (2019). Recent Applications of Machine Intelligence including Deep Learning on Virtual Screening: Methods, Tools and Databases. Briefings in Bioinformatics, 20(5), 1878-1912, (https://doi.org/10.1093/bioinformatics/btaa496).

Rifaioglu, A.S., Cetin-Atalay, R., Kahraman, D.C., Doğan T., Martin, M.J., & Atalay, M.V. MDeePred: Multi-Channel Deep Chemogenomic Prediction of Binding Affinity in Drug Discovery [in review at Bioinformatics journal]

(Talk) Donmez, A., Rifaioglu, A.S., Acar, A., Doğan T., Cetin-Atalay, R. & Atalay, M.V. iBioProVis: Interactive Visualization and Analysis of Compound Bioactivity Space Biological Data Visualization (BioVis:) COSI ISMB/ECCB 2019: 27th Annual International Conference on Intelligent Systems for Molecular Biology and 18th European Conference on Computational Biology, July 21-25, 2019, Basel, Switzerland.

(Talk) Joshi, V., Rifaioglu, A.S. Doğan T., Nightingale, A., Atalay, M.V., Cetin-Atalay, R., Atas, H., Sinoplu, E., Volynkin, V., Zellner, H., Saidi R. & Martin, M.J. CROssBAR: Comprehensive Resource of Biomedical Relations with Network Representations and Deep Learning Bio-Ontologies COSI ISMB/ECCB 2019: 27th Annual International Conference on Intelligent Systems for Molecular Biology and 18th European Conference on Computational Biology, July 21-25, 2019, Basel, Switzerland.

(Poster) Atas, H., Rifaioglu, A.S., Doğan T., Martin M.J., Cetin-Atalay, R. & Atalay, M.V. Deep and Shallow Chemogenomic Modelling for Compound-Target Binding Affinity Prediction Using Pairwise Input Neural Networks & Random Forests ISMB/ECCB 2019: 27th Annual International Conference on Intelligent Systems for Molecular Biology and 18th European Conference on Computational Biology, July 21-25, 2019, Basel, Switzerland.

(Poster) Atas, H., Rifaioglu, A.S., Cetin-Atalay, R., Atalay, M.V., Martin M.J., & Dogan, T. Large-Scale Benchmarking of Protein Descriptors for Protein Ligand Prediction in Target-Based Modelling and Proteochemometrics ISMB/ECCB 2019: 27th Annual International Conference on Intelligent Systems for Molecular Biology and 18th European Conference on Computational Biology, July 21-25, 2019, Basel, Switzerland.

(Talk) Rifaioglu, A.S., Atalay, M.V., Martin, M.J., Cetin-Atalay, R & Doğan, T., DEEPScreen: Drug-Target Interaction Prediction with Deep Convolutional Neural Networks Using Compound Images EMBL-EBI Industry Workshop, Machine Learning in Drug Discovery and Precision Medicine , 18-19 September, 2018, Cambridge, UK

(Talk) Rifaioglu, A.S., Atalay, M.V., Martin, M.J., Cetin-Atalay, R & Doğan, T., DEEPScreen: Drug-Target Interaction Prediction with Deep Convolutional Neural Networks Using Compound Images Summer School on Machine Learning in Drug Design , August 20-22, 2018, Leuven, Belgium

(Talk) Rifaioglu, A.S., Atalay, M.V., Martin, M.J., Cetin-Atalay, R & Doğan, T., DEEPScreen: Drug-Target Interaction Prediction with Deep Convolutional Neural Networks Using Compound Images ISMB 2018 - International Society for Computational Biology, Machine Learning for System Biology- COSI Oral Presentation , July 6 - July 10, 2018, Chicago, United States

Other Work

UniGOPred - http://cansyl.metu.edu.tr/unigopred.html

Abstract: Recent advances in computing power and machine learning empower functional annotation of protein sequences and their transcript variations. Here, we present an automated prediction system UniGOPred, for GO annotations and a database of GO term predictions for proteomes of several organisms in UniProt Knowledgebase (UniProtKB). UniGOPred provides function predictions for 514 molecular function (MF), 2909 biological process (BP), and 438 cellular component (CC) GO terms for each protein sequence. UniGOPred covers nearly the whole functionality spectrum in Gene Ontology system and it can predict both generic and specific GO terms. UniGOPred was run on CAFA2 challenge target protein sequences and it is categorized within the top 10 best performing methods for the molecular function category. In addition, the performance of UniGOPred is higher compared to the baseline BLAST classifier in all categories of GO. UniGOPred predictions are compared with UniProtKB/TrEMBL database annotations as well. Furthermore, the proposed tool's ability to predict negatively associated GO terms that defines the functions that a protein does not possess, is discussed. UniGOPred annotations were also validated by case studies on PTEN protein variants experimentally and on CHD8 protein variants with literature. UniGOPred protein functional annotation system is available as an open access tool at http://cansyl.metu.edu.tr/UniGOPred.html.

Publication : Rifaioglu, A.S., Doğan, T., Saraç, Ö.S., Ersahin, T., Saidi, R., Atalay, M.V., Martin, M.J., & Cetin-Atalay, R., Large-scale automated multi-functional annotation of protein sequences and an experimental case study validation on PTEN transcript variants Proteins. 2017;00:1–17. https://doi.org/10.1002/prot.25416

ECPred - http://cansyl.metu.edu.tr/ecpred.html

Abstract: The automated prediction of the enzymatic functions of uncharacterized proteins is a crucial topic in bioinformatics. Although several methods and tools have been proposed to classify enzymes, most of these studies are limited to specific functional classes and levels of the Enzyme Commission (EC) number hierarchy. Besides, most of the previous methods incorporated only a single input feature type, which limits the applicability to the wide functional space. Here, we proposed a novel enzymatic function prediction tool, ECPred, based on ensemble of machine learning classifiers. In ECPred, each EC number constituted an individual class and therefore, had an independent learning model. Enzyme vs. non-enzyme classification is incorporated into ECPred along with a hierarchical prediction approach exploiting the tree structure of the EC nomenclature. ECPred provides predictions for 858 EC numbers in total including 6 main classes, 55 subclass classes, 163 sub-subclass classes and 634 substrate classes. The proposed method is tested and compared with the state-of-the-art enzyme function prediction tools by using independent temporal hold-out and no-Pfam datasets constructed during this study.

Publication : Dalkiran, A., Rifaioglu, AS., Martin, MJ., Cetin-Atalay, R., Atalay, V. and Doğan, T., ECPred: a tool for the prediction of the enzymatic functions of protein sequences based on the EC nomenclature , BMC Bioinformatics , 2018, 19:334, https://doi.org/10.1186/s12859-018-2368-y

DEEPred - https://github.com/cansyl/DEEPred

Abstract: Automated protein function prediction is critical for the annotation of uncharacterized protein sequences, where accurate prediction methods are still required. Recently, deep learning based methods have outperformed conventional algorithms in computer vision and natural language processing due to the prevention of overfitting and efficient training. Here, we propose DEEPred, a hierarchical stack of multi-task feed-forward deep neural networks, as a solution to Gene Ontology (GO) based protein function prediction. DEEPred was optimized through rigorous hyper-parameter tests, and benchmarked using three types of protein descriptors, training datasets with varying sizes and GO terms form different levels. Furthermore, in order to explore how training with larger but potentially noisy data would change the performance, electronically made GO annotations were also included in the training process. The overall predictive performance of DEEPred was assessed using CAFA2 and CAFA3 challenge datasets, in comparison with the state-of-the-art protein function prediction methods. Finally, we evaluated selected novel annotations produced by DEEPred with a literature-based case study considering the ‘biofilm formation process’ in Pseudomonas aeruginosa. This study reports that deep learning algorithms have significant potential in protein function prediction; particularly when the source data is large. The neural network architecture of DEEPred can also be applied to the prediction of the other types of ontological associations. The source code and all datasets used in this study are available at: https://github.com/cansyl/DEEPred.

Publication : Rifaioglu, A. S., Doğan, T., Martin, M. J., Cetin-Atalay, R., & Atalay, V. (2019). DEEPred: automated protein function prediction with multi-task feed-forward deep neural networks , Scientific reports , 2019, 1 9(1), 1-16., https://doi.org/10.1186/s12859-018-2368-y

CROssBAR

Comprehensive Resource of Biomedical Relations

with Deep Learning and

Knowledge Graph Representations

In a Nutshell

- - -

- - -

About

CROssBAR Database & API

DEEPScreen

MDeePred

CROssBAR knowledge graphs

CROssBAR Web-service

SmartBioGraph

iBioProVis

Wet-lab Experiments

Protein Featurization Benchmark

Drug-target Interaction Prediction Challenge

Datasets

Project Publications

Links

Other Work

UniGOPred - http://cansyl.metu.edu.tr/unigopred.html

ECPred - http://cansyl.metu.edu.tr/ecpred.html

DEEPred - https://github.com/cansyl/DEEPred

People

METU Team

Mehmet Volkan Atalay

Rengul Cetin-Atalay

Tunca Dogan

Ahmet Sureyya Rifaioglu

Heval Atas

Esra Nalbat

Alperen Dalkıran

Ahmet Atakan

Gökhan Özsarı

Ataberk Dönmez

Mehmet Dinç

Aker Yılmaz

Dara Vefa

Ekin Tire

Yunus Emre Uzun

EMBL-EBI Team

Maria Jesus Martin

Andrew Nightingale

Vishal Joshi

Rabie Saidi

Hermann Zellner

Vladimir Volynkin

External Advisors

Nurcan Tuncbag

Erden Banoglu

Tugba Suzek

Aybar Acar