Skip to main content

Deep Biology Program

About Deep Biology Program

Cere Labs is happy to start the Deep Biology program under the umbrella of CoE with Patkar-Varde College, Goregaon. This unique program brings together multiple departments in Patkar-Varde College, Goregaon to collaborate with CereLabs. The objective is to use Deep Learning and Machine Learning for Drug Discovery and Personalised Oncology.

The Deep Biology program took place in four phases:

Phase I - April ‘17 to May ‘17 - Decide Areas  

In the first phase the following two areas were decided:
Drug Discovery and Personalised Oncology
Drug design is an expensive process. A new drug takes 10 to 15 years and costs more than $250 billion to introduce it to market. Applying Machine Learning to drug discovery will reduce both the time and cost of discovering a new drug.

Personalized oncology 
Personalized oncology is the method of offering customized medicine for a cancer patient based on the person’s genetic makeup. Machine Learning techniques accelerates the process of finding accurate treatment.

Phase II - May ‘17 to June ‘17 - Training & Assignments
Students from Bioinformatics and Computer Science went through a seven days workshop on Bioinformatics and Machine Learning. This workshop helped them to start their research in drug discovery and personalized oncology.

Phase III - June ‘17 to September ‘17 - Literature survey and decide project topic

Following two projects were finalized

Project 1:Design chemical entity suitable for inhibition for HIV-1 Protease by combination machine learning techniques & structure based drug designing.

Description: Understanding the pathway of HIV virus and identifying important drug target (i.e. HIV-1 Protease) & validating active site in protein. Approved drug parameters are retrieved from DrugBank or PubChem. Creating analogs or similar structure and checking its activity using insilico tools. Combining data of approved and similar structure suitable for applying supervised machine learning technique and generate model/equation. Retrieving the parent molecule from collected data and performing lead optimization derive a new molecule. New molecule can be tested through the equation generated by machine learning to check activity/inactivity of molecule on HIV-1 Protease.

Expected Outcome: Determine parameters for best suited for chemical entity on selected protein target & model the structure of chemical entity for further analysis.

Project 2: Identifying Drug Candidate for multidrug resistance tuberculosis using drug repositioning method & machine learning.

Description: Machine learning is used to find patterns from gene expressions retrieved from GEO database which helps in identifying differential gene expression in healthy and diseased sample. Drugs are linked with gene expression to find enrichment score for each drug. Score above 30% indicates optimal drug suitable for further optimization and testing.

Expected Outcome: Identifying drug candidate from previously drugs, optimize the drug to reduce timeline of treatment. 

Phase IV -  September ‘17 onwards - Actual Working on project
Students have started on the projects. The task is of collecting data and training it using Machine Learning algorithms.


Popular posts from this blog

GPU - The brain of Artificial Intelligence

Machine Learning algorithms require tens and thousands of CPU based servers to train a model, which turns out to be an expensive activity. Machine Learning researchers and engineers are often faced with the problem of running their algorithms fast. Although initially invented for processing graphics in computer games, GPUs today are used in machine learning to perform feature detection from vast amount of unlabeled data. Compared to CPUs, GPUs take far less time to train models that perform classification and prediction. Characteristics of GPUs that make them ideal for machine learning Handle large datasets Needs far less data centre infrastructure Can be specialized for specific machine learning needs Perform vector computations faster than any known processor Designed to perform data parallel computation NVIDIA CUDA GPUs today are used to build deep learning image processing tools for  Adobe Creative Cloud. According to NVIDIA blog future Adobe applicati

Building Commonsense in AI

It is often debated that what makes humans the ultimate intelligent species is the innate quality of doing commonsense reasoning. Humans use common sense knowledge about the world around to take appropriate decisions, and this turns out to be the necessary ingredient for their survival. AI researches have long thought about building commonsense knowledge in AI. They argue that if AI possess necessary commonsense knowledge then it will be a truly intelligent machine. We will discuss two major commonsense projects that exploit this idea: Cyc tries to build a comprehensive ontology and knowledge base of everyday commonsense knowledge. This knowledge can be used by AI applications to do human-like reasoning. Started in 1984, Cyc has come a long way. Today, OpenCyc 4.0 includes the entire Cyc ontology, containing 239,000 concepts and 2,093,000 facts and can be browsed on the OpenCyc website - . OpenCyc is available for download from Source

Understanding Projection Pursuit Regression

The following article gives an overview of the paper "Projection Pursuit Regression” published by Friedman J. H and Stuetzle W. You will need basic background of Machine Learning and Regression before understanding this article. The algorithms and images are taken from the paper. ( )  What is Regression? Regression is a machine learning technology used to predict a response variable given multiple predictor variables or features. The main distinction is that the response to be predicted is any real value and not just any class or cluster name. Hence though similar to Classification in terms of making a prediction, it is largely different given what it’s predicting.  A simple to understand real world problem of regression would be predicting the sale price of a particular house based on it’s square footage, given that we have data of similar houses sold in that area in the past. T