Navigating the Drug Discovery Labyrinth: Large Quantitative Models as Our Map in Target Identification

Business
December 3, 2024

In the vast and intricate world of drug discovery and development, target identification stands as one of the most challenging and pivotal steps. Much like complex mazes, scientists must traverse complex pathways, avoid dead ends and uncover hidden passages to find their way to promising drug targets. In this journey, our Large Quantitative Models (LQMs) emerge as our map, guiding us with precision and insight.

The Complexity of Labyrinths

Human biology is a matrix of intricate systems and processes. The sheer complexity of cellular mechanisms and the interplay of various biological factors make it extremely difficult to pinpoint precise targets for drug discovery. 

The stakes in this journey are high. If the wrong target is identified, the entire drug discovery program can fail, resulting in significant time and resource losses. One route for target identification is phenotypic screens, which expose candidate compounds to proteins in more biologically relevant contexts, directly identifying the ones eliciting the desired phenotype. In spite of the tremendous power of this approach, phenotypic assays require complex follow-up studies to determine the precise protein target or targets responsible for the observed phenotype. 

Traditional methods such as high throughput experimental screening, though valuable, often fall short in the face of such complexity. On the other hand, numerical methods face multiple technology challenges, including access to different sources of data, unknown binding sites on targets,  unknown poses in which a drug molecule can interact with the binding sites of targets, etc. There is a severe lack of high-quality simulation tools highlighting the need for urgency for using AI in this phase of drug discovery and development. 

Large Quantitative Models: The Modern Map

Artificial intelligence, with its ability to process and analyze vast amounts of data, acts as a modern map for researchers. AI technologies have the potential to sift through the complexities, identifying patterns and networks that would be difficult for humans to discern. Our core technology, Large Quantitative Models, trained on multiple orthogonal models including models based on computational chemistry methods, predictive modeling methods, and proprietary knowledge graphs provides unique views of target interactions and allows for efficient navigation of the chemical space.

Integration of Diverse Data

The labyrinth of drug discovery involves many types of data, from genomic sequences to clinical trial results. Our data extraction capability excels at integrating various inputs of  diverse and multimodal data to have a comprehensive view of the targets and relations. The final result includes a complex collection of data, for instance, data from the literature, affinity experimental data, protein sequences, binding site identification, protein-protein interactions, clinical data, and broad omics experimental results. This holistic approach ensures that all the relevant information is considered leading to more robust and reliable target identification. The data provided by this capability enables us to do analysis that provides valuable insights such as likelihood of target and drug molecule interactions. 

Mapping the Maze

Our Knowledge Graphs (KGs) help us organize and analyze a detailed map of biological pathways and interactions at an unprecedented scale.  These KGs, coupled with our proprietary query and recommendation algorithms, allow us to visualize and explore the labyrinthine structure of cellular processes, highlighting potential targets that might have been overlooked using conventional methods.  These KGs can be used to rank potential protein targets observed in vivo/vitro, since they contain data structures encompassing proteomics, transcriptomics and other relevant omics and algorithmic elements. Using our internal methods, our KG technology is able to form quantitative predictions beyond experimental and literature data. 

Predictive Power

Our proteochemometric machine learning (ML) models are designed to navigate the complex maze of experimental data sources. They are supported by an automated data curation system, which plays a crucial role in our research by ensuring the validity of our data sets. These data include experimental activities and assay detection limits. With such reliable data, we can train and evaluate ML models for specific targets, providing researchers with the predictive power needed to prioritize the most promising targets.

Enhancing Precision

In order to improve our precision in evaluating the binding of molecules with the target, we employ physics-based computational chemistry models such as AQFEP.  For each protein code, we utilize various conformations and pose ligands with cofolding or diffusion-based ML methods. This is followed by our custom physics optimization and AQFEP calculations to rank targets. Providing information not only relevant to the question of target identification, but also shedding crucial insights into the nature of the protein-ligand interactions and likely mode of action. With the identified binding sites and ligand poses, we can then apply methods such as our internal GenAI, ML-guided AQFEP and advanced sampling to kickstart drug discovery campaigns for additional chemical matter identification and/or ligand optimization.  

Real World Applications

Pharmaceutical companies and research institutions have routinely leveraged AI for virtual screening of molecules against known targets.  Here, we introduce how our LQMs solve the more complex problem of target identification. Our model has been instrumental in identifying novel targets for difficult to treat diseases.  These models can identify targets with high certainty, identify targets that are missed through traditional experimental screening methods, identify targets that can have toxic effects, and filter out false positives such as promiscuous targets.  These advancements not only help speed up the discovery process but also increase the likelihood of finding effective treatments. 

The Road Ahead

Our journey through the drug discovery labyrinth is far from over. We are continuously advancing our LQMs by training on expanded data sets so that we can make target identification more precise and efficient. 

As we navigate the labyrinth of drug discovery, our LQMs are becoming an indispensable map that solves many customer problems. By guiding us through the intricate pathways of human biology, these models increase our chances of finding effective drug targets and transform the landscape of drug discovery and development. The once-daunting labyrinth is becoming a pathway to new possibilities and breakthroughs in medicine. 

Learn more about how we’re transforming drug discovery with LQMs here.

Authors:

Atashi Basu is the Head of Products at SandboxAQ. She  has a background in Computational Chemistry with a PhD in Chemistry from the University of Cologne and postdoctoral research experience in Chemical Engineering from Stanford University. She has more than 15 years of experience in developing products in data science applications, image processing, computational materials and process modeling.

Romelia Salomon Ferrer, PhD, is a Senior Project Lead at Sandbox AQ interfacing between clients and R&D bringing innovative solutions to the most challenging problems in Drug Discovery. Dr Salomon has extensive experience in drug discovery and theoretical chemistry through her work in top institutions such as Pfizer, Caltech and Berkeley to name a few.

Mary Pitman , PhD, is a Staff Research Scientist at SandboxAQ specializing in combining AI with physics-based approaches. Dr. Pitman leads the Drug Discovery methods development team to develop scientific software for improved therapeutic outcomes and biological insights. Her research focuses on biophysics, graph theory, and free energy perturbation.