|  | --- | 
					
						
						|  | library_name: transformers | 
					
						
						|  | tags: [] | 
					
						
						|  | --- | 
					
						
						|  |  | 
					
						
						|  | # ***Mol-MoE***: Training Preference-Guided Routers for Molecule Generation | 
					
						
						|  | *Diego Calanzone (1, 2), Pierluca D'Oro (2), Pierre-Luc Bacon (1, 2)* <br> | 
					
						
						|  | *(1) Universite de Montreal, (2) Mila Quebec AI Institute* <br> | 
					
						
						|  | **arXiv**: https://arxiv.org/abs/2502.05633 | 
					
						
						|  |  | 
					
						
						|  | **Abstract**: Recent advances in language models have enabled framing molecule generation as sequence modeling. However, existing approaches often rely on single-objective reinforcement learning, limiting their applicability to real-world drug design, where multiple competing properties must be optimized. Traditional multi-objective reinforcement learning (MORL) methods require costly retraining for each new objective combination, making rapid exploration of trade-offs impractical. To overcome these limitations, we introduce Mol-MoE, a mixture-of-experts (MoE) architecture that enables efficient test-time steering of molecule generation without retraining. Central to our approach is a preference-based router training objective that incentivizes the router to combine experts in a way that aligns with user-specified trade-offs. This provides improved flexibility in exploring the chemical property space at test time, facilitating rapid trade-off exploration. Benchmarking against state-of-the-art methods, we show that Mol-MoE achieves superior sample quality and steerability. | 
					
						
						|  |  | 
					
						
						|  |  | 
					
						
						|  | ## How to use this model | 
					
						
						|  | This LM is fine-tuned to generate molecules in the SMILES format wrt. desired properties. | 
					
						
						|  | For unconditioned SMILES generation, use the BOS token `<s>`. <br> | 
					
						
						|  | For conditioned generation, you can target the following properties: `JNK3, DRD2, GSK3B, CYP2D6, CYP2C19`. | 
					
						
						|  | ``` | 
					
						
						|  | prompt: <JNK3=0.3><DRD2=0.7><GSK3B=0.2><CYP2D6=0.8><CYP2C19=0.8><s> | 
					
						
						|  | ``` | 
					
						
						|  |  | 
					
						
						|  | An example of the generation pipeline: | 
					
						
						|  | ``` | 
					
						
						|  | from transformers import AutoTokenizer, AutoModelForCausalLM | 
					
						
						|  | import re | 
					
						
						|  |  | 
					
						
						|  | # Setup | 
					
						
						|  | device = "cuda" | 
					
						
						|  | tokenizer = AutoTokenizer.from_pretrained("ddidacus/RiC-mol-llama-1b") | 
					
						
						|  | model = AutoModelForCausalLM.from_pretrained("ddidacus/RiC-mol-llama-1b") | 
					
						
						|  | generation_kwargs = { | 
					
						
						|  | "max_new_tokens": 128, | 
					
						
						|  | "min_length": -1, | 
					
						
						|  | "top_k": 0.0, | 
					
						
						|  | "top_p": 0.9, | 
					
						
						|  | "do_sample": True, | 
					
						
						|  | "pad_token_id": tokenizer.eos_token_id, | 
					
						
						|  | "temperature": 1.0 | 
					
						
						|  | } | 
					
						
						|  |  | 
					
						
						|  | # Inference | 
					
						
						|  | query = "<JNK3=0.3><DRD2=0.7><GSK3B=0.2><CYP2D6=0.8><CYP2C19=0.8><s>" | 
					
						
						|  | toks = tokenizer([query], return_tensors="pt")["input_ids"].to(device) | 
					
						
						|  | output = model.generate(toks, **generation_kwargs) | 
					
						
						|  | output = tokenizer.batch_decode(output) | 
					
						
						|  |  | 
					
						
						|  | # Parsing | 
					
						
						|  | filter = r'<s>(.*?)</s>' | 
					
						
						|  | molecule = re.findall(filter, output[0], re.DOTALL) | 
					
						
						|  | ``` | 
					
						
						|  |  | 
					
						
						|  | ### Model Description | 
					
						
						|  | This model is a fine-tuned version of LLaMa 3.2 1B through two stages: | 
					
						
						|  | 1. Fine-tuning on ~3.5M molecules extracted from: ZINC 250K, MOSES, CHEMBL | 
					
						
						|  | 2. RLHF-tuning using instruction fine-tuning on 5 distinct reward signals. | 
					
						
						|  |  | 
					
						
						|  | The detailed pipeline we followed is reported in the original paper: <br> | 
					
						
						|  | *"Rewards-in-Context: Multi-objective Alignment of Foundation Models with Dynamic Preference Adjustment" Yang et al. 2024* [1] | 
					
						
						|  |  | 
					
						
						|  | - **Developed by:** Diego Calanzone ([email protected]) | 
					
						
						|  | - **Model type:** Decoder Only Transformer | 
					
						
						|  | - **Finetuned from model [optional]:** LLaMA 3.2 1B | 
					
						
						|  |  | 
					
						
						|  | Read the paper for further details. | 
					
						
						|  |  | 
					
						
						|  | ### Sources | 
					
						
						|  | [1] https://arxiv.org/abs/2402.10207 | 
					
						
						|  |  | 
					
						
						|  | <!-- | 
					
						
						|  | ### Model Sources [optional] | 
					
						
						|  |  | 
					
						
						|  | - **Repository:** [More Information Needed] | 
					
						
						|  | - **Paper [optional]:** [More Information Needed] | 
					
						
						|  | - **Demo [optional]:** [More Information Needed] | 
					
						
						|  |  | 
					
						
						|  | ## Uses | 
					
						
						|  |  | 
					
						
						|  | ### Direct Use | 
					
						
						|  |  | 
					
						
						|  | [More Information Needed] | 
					
						
						|  |  | 
					
						
						|  | ### Downstream Use [optional] | 
					
						
						|  |  | 
					
						
						|  | [More Information Needed] | 
					
						
						|  |  | 
					
						
						|  | ### Out-of-Scope Use | 
					
						
						|  |  | 
					
						
						|  | [More Information Needed] | 
					
						
						|  |  | 
					
						
						|  | ## Bias, Risks, and Limitations | 
					
						
						|  |  | 
					
						
						|  | [More Information Needed] | 
					
						
						|  |  | 
					
						
						|  | ### Recommendations | 
					
						
						|  | Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. | 
					
						
						|  |  | 
					
						
						|  | ## How to Get Started with the Model | 
					
						
						|  |  | 
					
						
						|  | Use the code below to get started with the model. | 
					
						
						|  |  | 
					
						
						|  | [More Information Needed] | 
					
						
						|  |  | 
					
						
						|  | ## Training Details | 
					
						
						|  |  | 
					
						
						|  | ### Training Data | 
					
						
						|  |  | 
					
						
						|  | [More Information Needed] | 
					
						
						|  |  | 
					
						
						|  | ### Training Procedure | 
					
						
						|  |  | 
					
						
						|  | #### Preprocessing [optional] | 
					
						
						|  |  | 
					
						
						|  | [More Information Needed] | 
					
						
						|  |  | 
					
						
						|  |  | 
					
						
						|  | #### Training Hyperparameters | 
					
						
						|  |  | 
					
						
						|  | - **Training regime:** [More Information Needed] | 
					
						
						|  |  | 
					
						
						|  | #### Speeds, Sizes, Times [optional] | 
					
						
						|  |  | 
					
						
						|  | [More Information Needed] | 
					
						
						|  |  | 
					
						
						|  | ## Evaluation | 
					
						
						|  |  | 
					
						
						|  | ### Testing Data, Factors & Metrics | 
					
						
						|  |  | 
					
						
						|  | #### Testing Data | 
					
						
						|  |  | 
					
						
						|  | [More Information Needed] | 
					
						
						|  |  | 
					
						
						|  | #### Factors | 
					
						
						|  |  | 
					
						
						|  | [More Information Needed] | 
					
						
						|  |  | 
					
						
						|  | #### Metrics | 
					
						
						|  |  | 
					
						
						|  | [More Information Needed] | 
					
						
						|  |  | 
					
						
						|  | ### Results | 
					
						
						|  |  | 
					
						
						|  | [More Information Needed] | 
					
						
						|  |  | 
					
						
						|  | #### Summary | 
					
						
						|  |  | 
					
						
						|  |  | 
					
						
						|  |  | 
					
						
						|  | ## Model Examination [optional] | 
					
						
						|  |  | 
					
						
						|  | [More Information Needed] | 
					
						
						|  |  | 
					
						
						|  | ## Environmental Impact | 
					
						
						|  |  | 
					
						
						|  | Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). | 
					
						
						|  |  | 
					
						
						|  | - **Hardware Type:** [More Information Needed] | 
					
						
						|  | - **Hours used:** [More Information Needed] | 
					
						
						|  | - **Cloud Provider:** [More Information Needed] | 
					
						
						|  | - **Compute Region:** [More Information Needed] | 
					
						
						|  | - **Carbon Emitted:** [More Information Needed] | 
					
						
						|  |  | 
					
						
						|  | ## Technical Specifications [optional] | 
					
						
						|  |  | 
					
						
						|  | ### Model Architecture and Objective | 
					
						
						|  |  | 
					
						
						|  | [More Information Needed] | 
					
						
						|  |  | 
					
						
						|  | ### Compute Infrastructure | 
					
						
						|  |  | 
					
						
						|  | [More Information Needed] | 
					
						
						|  |  | 
					
						
						|  | #### Hardware | 
					
						
						|  |  | 
					
						
						|  | [More Information Needed] | 
					
						
						|  |  | 
					
						
						|  | #### Software | 
					
						
						|  |  | 
					
						
						|  | [More Information Needed] | 
					
						
						|  |  | 
					
						
						|  | ## Citation [optional] | 
					
						
						|  |  | 
					
						
						|  | **BibTeX:** | 
					
						
						|  |  | 
					
						
						|  | [More Information Needed] | 
					
						
						|  |  | 
					
						
						|  | **APA:** | 
					
						
						|  |  | 
					
						
						|  | [More Information Needed] | 
					
						
						|  |  | 
					
						
						|  | ## Glossary [optional] | 
					
						
						|  |  | 
					
						
						|  | [More Information Needed] | 
					
						
						|  |  | 
					
						
						|  | ## More Information [optional] | 
					
						
						|  |  | 
					
						
						|  | [More Information Needed] | 
					
						
						|  |  | 
					
						
						|  | ## Model Card Authors [optional] | 
					
						
						|  |  | 
					
						
						|  | [More Information Needed] | 
					
						
						|  |  | 
					
						
						|  | ## Model Card Contact | 
					
						
						|  |  | 
					
						
						|  | [More Information Needed] --> |