nvidia
/

SteerLM-llama2-13B

@@ -78,16 +78,16 @@ SteerLM Llama-2 is a 13 billion parameter generative language model based on the
 Key capabilities enabled by SteerLM:
-- Dynamic steering of responses by specifying desired attributes like quality, helpfulness, and toxicity
-- Simplified training compared to RLHF techniques like fine-tuning and bootstrapping
 ## Model Architecture and Training
 The SteerLM method involves the following key steps:
-1. Train an attribute prediction model on human annotated data to evaluate response quality
-2. Use this model to annotate diverse datasets and enrich training data
-3. Perform conditioned fine-tuning to align responses with specified combinations of attributes
-4. (Optionally) Bootstrap training through model sampling and further fine-tuning
 SteerLM Llama-2 applies this technique on top of the Llama-2 architecture. It was pretrained on internet-scale data and then customized using [OASST](https://huggingface.co/datasets/OpenAssistant/oasst1) and [HH-RLHF](https://huggingface.co/datasets/Anthropic/hh-rlhf) data.
@@ -109,7 +109,7 @@ pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp
 pip install nemo_toolkit['nlp']==1.17.0
 ```
-Alternatively, you can use NeMo Megatron training docker container with all dependencies pre-installed.
 2. Launch eval server
@@ -199,7 +199,7 @@ The model was trained on the data originally crawled from the Internet. This dat
 We did not perform any bias/toxicity removal or model alignment on this checkpoint.
-## Licence
 - Llama 2 is licensed under the [LLAMA 2 Community License](https://ai.meta.com/llama/license/), Copyright © Meta Platforms, Inc. All Rights Reserved.
 - Your use of the Llama Materials must comply with applicable laws and regulations (including trade compliance laws and regulations) and adhere to the [Acceptable Use Policy](https://ai.meta.com/llama/use-policy) for the Llama Materials.

 Key capabilities enabled by SteerLM:
+- Dynamic steering of responses by specifying desired attributes like quality, helpfulness, and toxicity.
+- Simplified training compared to RLHF techniques like fine-tuning and bootstrapping.
 ## Model Architecture and Training
 The SteerLM method involves the following key steps:
+1. Train an attribute prediction model on human annotated data to evaluate response quality.
+2. Use this model to annotate diverse datasets and enrich training data.
+3. Perform conditioned fine-tuning to align responses with specified combinations of attributes.
+4. (Optionally) Bootstrap training through model sampling and further fine-tuning.
 SteerLM Llama-2 applies this technique on top of the Llama-2 architecture. It was pretrained on internet-scale data and then customized using [OASST](https://huggingface.co/datasets/OpenAssistant/oasst1) and [HH-RLHF](https://huggingface.co/datasets/Anthropic/hh-rlhf) data.
 pip install nemo_toolkit['nlp']==1.17.0
 ```
+Alternatively, you can use NeMo Framework.
 2. Launch eval server
 We did not perform any bias/toxicity removal or model alignment on this checkpoint.
+## License
 - Llama 2 is licensed under the [LLAMA 2 Community License](https://ai.meta.com/llama/license/), Copyright © Meta Platforms, Inc. All Rights Reserved.
 - Your use of the Llama Materials must comply with applicable laws and regulations (including trade compliance laws and regulations) and adhere to the [Acceptable Use Policy](https://ai.meta.com/llama/use-policy) for the Llama Materials.