instruction-pretrain
/

medicine-Llama3-8B

@@ -22,8 +22,9 @@ We explore supervised multitask pre-training by proposing ***Instruction Pre-Tra
 </p>
 **************************** **Updates** ****************************
 * 2024/7/31: Updated pre-training suggestions in the `Advanced Usage` section of [instruction-synthesizer](https://huggingface.co/instruction-pretrain/instruction-synthesizer)
-* 2024/7/15: We scaled up the pre-trained tokens from 100B to 250B, with the number of synthesized instruction-response pairs reaching 500M! Below, we show the performance trend on downstream tasks throughout the pre-training process:
 <p align='left'>
     <img src="https://cdn-uploads.huggingface.co/production/uploads/66711d2ee12fa6cc5f5dfc89/0okCfRkC6uALTfuNxt0Fa.png" width="500">
 </p>
@@ -73,30 +74,43 @@ pred = tokenizer.decode(outputs[answer_start:], skip_special_tokens=True)
 print(pred)
 ```
-### 2. To evaluate our models on the domain-specific tasks
-1. Setup dependencies
-```bash
-git clone https://github.com/microsoft/LMOps
-cd LMOps/adaptllm
-pip install -r requirements.txt
-```
-2. Evaluate
-```bash
-DOMAIN='biomedicine'
-# if the model can fit on a single GPU: set MODEL_PARALLEL=False
-# elif the model is too large to fit on a single GPU: set MODEL_PARALLEL=True
-MODEL_PARALLEL=False
-# number of GPUs, chosen from [1,2,4,8]
-N_GPU=1
-# Set as True
-add_bos_token=True
-bash scripts/inference.sh ${DOMAIN} 'instruction-pretrain/medicine-Llama3-8B' ${add_bos_token} ${MODEL_PARALLEL} ${N_GPU}
-```
 ## Citation

 </p>
 **************************** **Updates** ****************************
+* 2024/8/29: Updated [guidelines](https://huggingface.co/instruction-pretrain/medicine-Llama3-8B) on evaluating any 🤗Huggingface models on the domain-specific tasks
 * 2024/7/31: Updated pre-training suggestions in the `Advanced Usage` section of [instruction-synthesizer](https://huggingface.co/instruction-pretrain/instruction-synthesizer)
+* 2024/7/15: We scaled up the pre-trained tokens from 100B to 250B, with the number of synthesized instruction-response pairs reaching 500M. The performance trend on downstream tasks throughout the pre-training process:
 <p align='left'>
     <img src="https://cdn-uploads.huggingface.co/production/uploads/66711d2ee12fa6cc5f5dfc89/0okCfRkC6uALTfuNxt0Fa.png" width="500">
 </p>
 print(pred)
 ```
+### 2. evaluate any Huggingface LMs on domain-dpecific tasks (💡New!)
+You can use the following script to reproduce our results and evaluate any other Huggingface models on domain-specific tasks. Note that the script is NOT applicable to models that require specific prompt templates (e.g., Llama2-chat, Llama3-Instruct).
+1). Set Up Dependencies
+   ```bash
+   git clone https://github.com/microsoft/LMOps
+   cd LMOps/adaptllm
+   pip install -r requirements.txt
+   ```
+2). Evaluate the Model
+   ```bash
+   # Select the domain from ['biomedicine', 'finance', 'law']
+   DOMAIN='biomedicine'
+   # Specify any Huggingface LM name (Not applicable to models requiring specific prompt templates)
+   MODEL='instruction-pretrain/medicine-Llama3-8B'
+   # Model parallelization:
+   # - Set MODEL_PARALLEL=False if the model fits on a single GPU.
+   #   We observe that LMs smaller than 10B always meet this requirement.
+   # - Set MODEL_PARALLEL=True if the model is too large and encounters OOM on a single GPU.
+   MODEL_PARALLEL=False
+   # Choose the number of GPUs from [1, 2, 4, 8]
+   N_GPU=1
+   # Whether to add a BOS token at the beginning of the prompt input:
+   # - Set to False for AdaptLLM.
+   # - Set to True for instruction-pretrain models.
+   # If unsure, we recommend setting it to False, as this is suitable for most LMs.
+   add_bos_token=True
+   # Run the evaluation script
+   bash scripts/inference.sh ${DOMAIN} ${MODEL} ${add_bos_token} ${MODEL_PARALLEL} ${N_GPU}
+   ```
 ## Citation