What is the International Conference on Learning Representations (ICLR)?

The International Conference on Learning Representations (ICLR) (https://iclr.cc), the International Conference on Machine Learning (ICML) (https://icml.cc), and the Conference on Neural Information Processing Systems (NeurIPS) (https://nips.cc) are the three most prominent conferences held annually in the field of artificial intelligence. ICLR, ICML, and NeurIPS are held annually in the order of April-May, June-July, and November-December, respectively, and in 2024, the 12th ICLR, also known as ICLR 2024, will be held in Vienna, Austria, from May 7-11.

If you're wondering what happened at NeurIPS 2023, which took place not too long ago?

https://hyperlab.hits.ai/en/blog/neurIPS2023

Unlike conferences in the natural sciences, where presenters usually submit abstracts for review, conferences in the machine learning field such as ICLR, ICML, and NeurIPS require full papers to be submitted and peer-reviewed. After Q&A and revision by the reviewers, if the paper is finally approved, it can be presented orally or as a poster at the conference, and the paper will be published in the proceedings.

For this reason, submissions start about half a year before the conference and are reviewed and judged. In the case of ICLR 2024, the final judgment of papers has been taking place since January, and the final list of accepted papers is almost complete. 7,304 papers have been submitted for ICLR 2024, and as of the end of February, 2,250 papers (31%) have been accepted for publication. The full list of submitted papers and review statistics can be found on the Paper Copilot site.

Research in drug discovery to be presented at ICLR 2024

In conferences such as ICLR, the general theory and methods of machine learning, especially deep learning, were originally covered, while the research applied to natural science problems such as physics, chemistry, and life sciences tended to be covered in Workshop rather than the main conference of ICLR. However, as deep learning is gradually spreading to various fields and more applied and applied research is being submitted, the number of research that uses deep learning as a method but centers on natural science problems is increasing in this conference.

To examine the research that can be applied to the drug development process among the accepted papers of ICLR 2024, we selected the relevant research using keywords such as chemistry, bio, molecule, drug, and protein. A total of 242 related papers were submitted, and 81 of them were finally accepted. The large number of submissions for a conference rather than a workshop shows that the field of AI is increasingly focusing on issues related to chemistry and life sciences. Let's take a look at the trends of the 81 accepted papers on drug discovery.

Trends in accepted papers related to drug discovery at ICLR 2024

First, we categorized the topics covered by the papers and examined the proportion of each topic.

Percentage of 81 chemical and life science-related papers accepted for publication in ICLR 2024 by topic.
Note that a paper can have multiple topics

The topics of molecule design and protein design, which design small molecules or proteins with desired properties, were the most common (19% + 10%), followed by property prediction, which predicts the properties of molecules or proteins (25%). In addition, with the rise of LLMs such as GPT and Gemini, and more generally, foundation models, their applications are also on the rise, and a significant number of papers (9%) were accepted in drug discovery.

Let's take a closer look at some of the main topics.

Topics in drug discovery at ICLR 2024 | 1. Molecular structure design (Molecule design & protein design, 29%)

Molecular structure design is practically the ultimate goal in drug discovery and materials. It has been attracting attention from researchers for a long time because it is a matter of wanting to know what kind of substance to make. Recently, as great achievements in generative AI have been shown in language, image, video, and voice, attempts to apply it to other fields have become more active.

The paper "Training-free Multi-objective Diffusion Model for 3D Molecule Generation", accepted for ICLR 2024, proposes a method to design molecules by simultaneously optimizing multiple properties. Previous molecule generation AIs require retraining the generating AI to handle the properties when the type or number of properties to be controlled changes. In this study, a generating AI that generates a three-dimensional molecular structure without any property conditions is trained once, and then a separate AI that predicts the desired properties from the molecular structure is used to design a molecular structure that controls those properties. To change or add properties, only the property prediction AI needs to be retrained and the generative AI can be recycled, which is much more flexible than preparing a new generative AI from scratch because it requires less training cost than the generative AI.

In this paper, we only showed an example of adjusting the electronic structure of a molecule, but it is possible to apply it to drug molecules.

(Left) The result of generating molecules by simultaneously controlling the polarizability (\alpha)and dipole moment (\mu), and
(right) the result of generating molecules by simultaneously controlling the HOMO and LUMO energies [Han, X. et al. ICLR 2024].

Topics in Drug Discovery at ICLR 2024 | 2. Molecular Property Prediction (25%)

Predicting the properties of molecules is the most basic and important problem for selecting compounds or proteins for actual experiments. Therefore, it has been addressed long before the problem of designing molecular structures in applied AI research.

In this ICLR 2024, 24 excellent papers were accepted, including "One For All: Towards Training One Graph Model For All Classification Tasks," which introduces a method for predicting various arbitrary molecular properties with a single AI that fuses a macrolinguistic model and a graph neural network. The idea is that you input the structure of the molecule you want to investigate, a description of the structure (e.g., the atoms and bonds that make up the structure), and a description of the property itself into the model, and it predicts the properties of the input molecule based on the description. Unlike conventional methods that require data collection and AI training for each desired property, we are proposing a literal "one for all" AI using the text interpretation capabilities and knowledge of a large language model, which has been accepted as a spotlight research at ICLR 2024 for this outstanding achievement.

Pipeline in the One For All (OFA) framework. For the molecular property prediction problem, given a molecular structure (graph), a description of the molecular structure (text), and a description of the property to be predicted (text), it predicts the corresponding property of the structure [Liu, H. et al. ICLR 2024].

Topics in Drug Discovery at ICLR 2024 | 3. Preparation of Dataset (Benchmark, 9%)

The most important thing in machine learning and deep learning research is data collection and processing. Preparing a good dataset takes a lot of time and expertise and has much less impact than publishing a good performing AI, but there are researchers who put in the hard work for the community.

In their paper "Towards Foundational Models for Molecular Learning on Large-Scale Multi-Task Datasets", accepted for ICLR 2024, they introduce three different sized datasets, as shown in the table below. The dataset contains over 3,000 data points for about 100 million molecules, including not only quantum chemical properties but also toxicity and omics data. Since they are making the organized data publicly available, I think it will be a great contribution to researchers working on related problems.

Three datasets published in [Beaini, D. et al. ICLR 2024]. G.: Graph-level; N.: node-level; C.: classification; R.: regression; RC.: Ranked classification.

Topics in Drug Discovery at ICLR 2024 | 4. Applied Research on Large Language Models (LLM, 9%)

When multifunctional language models such as ChatGPT and Gemini are in vogue, researchers in the pharmaceutical bio field all seem to be thinking the same thing: "Can such technology also be used for drug development?" Academics are also very interested in the problem of finding and enhancing such possibilities, and related research results seem to be coming out little by little.

"Conversational Drug Editing Using Retrieval and Domain Feedback", accepted at ICLR 2024, introduces ChatDrug, an AI that can improve drug molecular structures in a conversational manner like ChatGPT. ChatDrug consists of three modules: the PDDS (prompt design for domain-specific) module, which is responsible for prompt engineering to understand the specialized problem of drug design; the ReDF (retrieval and domain feedback) module, which extracts and feeds back the necessary information from a large scientific knowledge base; and the Conversation module, which interacts with the user to improve the generated results, suggesting a structure with optimized properties to meet the needs from a given molecular structure.

ChatDrug's pipeline. The different functions of PDDS, ReDF,
and Conversation modules form a whole system to optimize the structure of drug molecules to improve target properties [Liu, S. et al. ICLR 2024].

In the paper, experiments were conducted to improve solubility, druglikeness, permeability, hydrogen bonding tendency, etc. using ChatDrug, and it showed the best success rate and amount of property change among the compared techniques.

Example of editing the structure of six drug molecules. In the part where the structure is changed,
blue area: Input molecule, red area: Intermediate steps, green area: Final modification step [Liu, S. et al. ICLR 2024].

Drug Discovery Topics at ICLR 2024 | 5. Molecular Structure Prediction (Docking & conformer generation, 11%)

The problem of predicting molecular structure can be divided into two major categories: predicting the three-dimensional structure of small molecules in a free or specialized environment and predicting the three-dimensional structure of macromolecules such as proteins. Both problems are integral to the use of computational techniques in structure-based drug discovery: if the predicted protein structure or binding structure of a drug is incorrect, everything that follows can go wrong.

In this ICLR 2024 paper, "STR2STR: A Score-based Framework for Zero-shot Protein Conformation Sampling", we address the important issue of protein fluidity, which is extremely important when understanding the mechanism of a drug or the role of the protein itself, but tricky to deal with due to the large amount of computation and lack of accuracy. STR2STR uses diffusion models, which have made a big splash in deep learning since 2020, to generate different stable three-dimensional structures of the same protein. It shows better accuracy than comparable techniques. First, it can make predictions much faster than simulations such as molecular dynamics. It also shows the advantage of using only experimental crystal structures to train the model, rather than simulated structures.

In structure-based drug design, being able to quickly obtain multiple structures of a target protein reduces the likelihood of making an incorrect design compared to starting from a single structure determined by chance, which in turn increases the accuracy of the design. While models such as AlphaFold 2 predict a single protein structure, models such as STR2STR include AI predictions to further increase their usefulness.

ICLR 2024 preview wraps up

Conferences like ICLR, ICML, and NeurIPS are often said to be more valuable for the interactions that take place between attendees during the conference than for the papers that are presented. However, it is also a great opportunity to learn about new technologies and understand trends long before the event.

One of the ICLR 2024 workshops, Generative and Experimental Perspectives for Biomolecular Design (https://www.gembio.ai), will focus on the convergence of drug discovery and AI. The workshop is being organized by distinguished academics and outstanding students and will present research results in the field of in silico biology. Not all of the research that will be presented will be published on the web, so you won't be able to get to all of it unless you attend in person, but I'm looking forward to seeing what will be revealed that will contribute to drug discovery.

We look forward to the upcoming ICML 2024 and NeurIPS 2024 and will continue to bring you news from the specialized AI/deep learning community that may be hard to reach in the pharma-bio industry.

ICLR 2024 Preview