|
--- |
|
license: other |
|
license_name: nv-ai-foundation-models-license |
|
license_link: https://developer.nvidia.com/downloads/nv-ai-foundation-models-license |
|
library_name: nemo |
|
|
|
extra_gated_heading: Access Nemotron 3 8B on Hugging Face |
|
extra_gated_description: >- |
|
To download this model, you must agree to the terms of the [NVIDIA AI Foundation Models Community License Agreement](https://developer.nvidia.com/downloads/nv-ai-foundation-models-license). |
|
extra_gated_fields: |
|
I agree to share my name, email address and username with NVIDIA: checkbox |
|
geo: ip_location |
|
language: |
|
- "en" |
|
- "ar" |
|
- "az" |
|
- "bg" |
|
- "bn" |
|
- "ca" |
|
- "cs" |
|
- "da" |
|
- "de" |
|
- "el" |
|
- "es" |
|
- "et" |
|
- "fa" |
|
- "fi" |
|
- "fr" |
|
- "gl" |
|
- "he" |
|
- "hi" |
|
- "hr" |
|
- "hu" |
|
- "hy" |
|
- "id" |
|
- "is" |
|
- "it" |
|
- "ka" |
|
- "kk" |
|
- "kn" |
|
- "ko" |
|
- "lt" |
|
- "lv" |
|
- "mk" |
|
- "ml" |
|
- "mr" |
|
- "ne" |
|
- "nl" |
|
- "no" |
|
- "pl" |
|
- "pt" |
|
- "ro" |
|
- "ru" |
|
- "sk" |
|
- "sl" |
|
- "sq" |
|
- "sr" |
|
- "sv" |
|
- "ta" |
|
- "te" |
|
- "tr" |
|
- "uk" |
|
- "ur" |
|
- "vi" |
|
- "ja" |
|
- "zh" |
|
pipeline_tag: text-generation |
|
inference: false |
|
fine-tuning: true |
|
tags: |
|
- nvidia |
|
- nemotron-3 |
|
- 8B |
|
--- |
|
|
|
# Nemotron-3-8B-QA-4k |
|
|
|
## Model Overview |
|
|
|
### License |
|
|
|
The use of this model is governed by the [NVIDIA AI Foundation Models Community License Agreement](https://developer.nvidia.com/downloads/nv-ai-foundation-models-license). |
|
|
|
### Description |
|
|
|
Nemotron-3-8B-QA-4k is a 8 billion parameter generative language model customized on an 8B base model. It takes input with context length up to 4,096 tokens. The model has been further fine-tuned for instruction following Supervised Fine-tuning (SFT) using a method by NVIDIA specifically for Question and Answer method, to give concise and reliable answers. |
|
|
|
Nemotron-3-8B-QA is part of Nemotron-3, which is a family of enterprise ready generative text models compatible with [NVIDIA NeMo Framework](https://www.nvidia.com/en-us/ai-data-science/generative-ai/nemo-framework/). For other models in this collection, see the [collections page](https://huggingface.co/collections/nvidia/nemotron-3-8b-6553adeb226f6ab4ffc356f9). |
|
|
|
NVIDIA NeMo is an end-to-end, cloud-native platform to build, customize, and deploy generative AI models anywhere. It includes training and inferencing frameworks, guardrailing toolkits, data curation tools, and pretrained models, offering enterprises an easy, cost-effective, and fast way to adopt generative AI. To get access to NeMo Framework, please sign up at [this link](https://developer.nvidia.com/nemo-framework/join). |
|
|
|
### References |
|
|
|
[Announcement Blog](https://developer.nvidia.com/blog/nvidia-ai-foundation-models-build-custom-enterprise-chatbots-and-co-pilots-with-production-ready-llms/) |
|
|
|
### Model Architecture |
|
|
|
**Architecture Type:** Transformer |
|
|
|
**Network Architecture:** Generative Pre-Trained Transformer (GPT-3) |
|
|
|
### Prompt Format |
|
|
|
#### Single Turn |
|
|
|
```text |
|
System: This is a chat between a user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions based on the context. The assistant should also indicate when the answer cannot be found in the context. |
|
|
|
{Context for Question-1} |
|
|
|
User: Please give a full and complete answer for the question. {Question-1} |
|
|
|
Assistant: |
|
``` |
|
|
|
#### Multi-Turn |
|
|
|
```text |
|
System: This is a chat between a user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions based on the context. The assistant should also indicate when the answer cannot be found in the context. |
|
|
|
{Context for Question-2} |
|
|
|
User: Please give a full and complete answer for the question. {Question-1} |
|
|
|
Assistant: {Answer-1} |
|
|
|
User: {Question-2} |
|
|
|
Assistant: |
|
``` |
|
|
|
#### Example prompt formation code |
|
|
|
```python |
|
PROMPT_TEMPLATE = """System: This is a chat between a user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions based on the context. The assistant should also indicate when the answer cannot be found in the context. |
|
|
|
{context_1} |
|
|
|
User: Please give a full and complete answer for the question. {question_1} |
|
|
|
Assistant:""" |
|
|
|
context_1 = "Climate change refers to long-term shifts in temperatures and weather patterns. Such shifts can be natural, due to changes in the sun’s activity or large volcanic eruptions. But since the 1800s, human activities have been the main driver of climate change, primarily due to the burning of fossil fuels like coal, oil and gas." |
|
question_1 = "What happened in the 1800s?" |
|
|
|
prompt = PROMPT_TEMPLATE.format(context_1=context_1, question_1=question_1) |
|
print(prompt) |
|
``` |
|
|
|
### Software Integration |
|
|
|
**Runtime Engine(s):** |
|
NVIDIA AI Enterprise |
|
|
|
**Toolkit:** |
|
NeMo Framework |
|
|
|
To get access to NeMo Framework, please sign up at [this link](https://developer.nvidia.com/nemo-framework/join). See [NeMo inference container](https://registry.ngc.nvidia.com/orgs/ea-bignlp/teams/ga-participants/containers/nemofw-inference) documentation for details on how to setup and deploy an inference server with NeMo. |
|
|
|
**Sample Inference Code:** |
|
|
|
```python |
|
from nemo.deploy import NemoQuery |
|
|
|
# In this case, we run inference on the same machine |
|
nq = NemoQuery(url="localhost:8000", model_name="Nemotron-3-8B-QA-4K") |
|
|
|
# See above for prompt format |
|
output = nq.query_llm(prompts=[prompt], max_output_token=200, top_k=1, top_p=0.0, temperature=0.1) |
|
print(output) |
|
``` |
|
|
|
**Supported Hardware:** |
|
|
|
- H100 |
|
- A100 80GB, A100 40GB |
|
|
|
### Model Version(s) |
|
|
|
`Nemotron-3-8B-QA-4k-SFT-BF16-1` |
|
|
|
## Dataset |
|
|
|
NVIDIA models are trained on a diverse set of public and proprietary datasets. This model was trained on a dataset containing 3.5 Trillion tokens of text. The dataset contains 53 different human languages and 37 programming languages. NVIDIA is committed to the responsible development of large language models and conducts reviews of all datasets included in training. |
|
|
|
## Evaluation |
|
|
|
| **Dataset and Metric** | **Nemotron-3-QA-8B, Zero-shot** | |
|
|------------------------------------|------------------------------------| |
|
| Natural Questions, F1-Score | 41.99% | |
|
| Doc2Dial, BLEU-4 | 30.20 | |
|
|
|
## Intended use |
|
|
|
The Nemotron-3-8B-QA model is best suited for Questions and Answering use cases and can be further customized on knowledge base data to achieve even better performance. |
|
|
|
### Ethical use |
|
|
|
Technology can have a profound impact on people and the world, and NVIDIA is committed to enabling trust and transparency in AI development. NVIDIA encourages users to adopt principles of AI ethics and trustworthiness to guide your business decisions by following the guidelines in the [NVIDIA AI Foundation Models Community License Agreement](https://developer.nvidia.com/downloads/nv-ai-foundation-models-license). |
|
|
|
## Limitations |
|
|
|
- The model was trained on data that contains toxic language and societal biases originally crawled from the internet. Therefore, the model may amplify those biases and return toxic responses especially when prompted with toxic prompts. |
|
- The model may generate answers that may be inaccurate, omit key information, or include irrelevant or redundant text producing socially unacceptable or undesirable text, even if the prompt itself does not include anything explicitly offensive. |
|
|