MolQuery

Repository containing code and walkthrough for methods in the paper MolQuery: Prediction of lipid synthesizability using active learning.

MolQuery is a pipeline that integrates Active Learning (AL) to predict chemical synthesizability of lipid molecules designed for mRNA delivery via lipid nanoparticles (LNP). By leveraging AL, MolQuery efficiently trains machine learning models using limited datasets.

This repository includes an example simulating four rounds of AL to predict the transfection efficacy of lipid nanoparticles (LNP) using a public dataset from our previous work (Representations of lipid nanoparticles using large language models for transfection efficiency prediction).

Environment Setup

pip install -r requirements.txt

Package Structure

data/: contains sample data that can used to demonstrate the methods.
alien/: contains source code for the selection framework with a CatBoost model. It contains wrappers for data, models, and classes to run candidate selection.
alien_selection.py: main script that runs entropy-based candidate selection with a CatBoost model.
scripts/: helper scripts that illustrate the MolQuery pipeline
protocol.sh: bash script that simulates an annotator and runs through 4 rounds of Active Learning.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MolQuery

Environment Setup

Package Structure

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
data		data
scripts		scripts
LICENSE.md		LICENSE.md
README.md		README.md
alien_selection.py		alien_selection.py
protocol.sh		protocol.sh
requirements.txt		requirements.txt

License

Sanofi-Public/MolQuery

Folders and files

Latest commit

History

Repository files navigation

MolQuery

Environment Setup

Package Structure

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages