|
--- |
|
license: llama3.3 |
|
language: |
|
- en |
|
base_model: |
|
- deepseek-ai/DeepSeek-R1-Distill-Llama-70B |
|
pipeline_tag: text-generation |
|
library_name: transformers |
|
tags: |
|
- text-generation-inference |
|
- sft |
|
- reasoning |
|
- r1 |
|
--- |
|
 |
|
# **Magellanic-Llama-70B-r999** |
|
|
|
Magellanic-Llama-70B-r999 is a Llama-based model fine-tuned from the DeepSeek R1 Distill 70B FT Llama, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step. This model has demonstrated remarkable performance in reasoning. With RL, it has been trained on nearly 1 million entries of data, leading to increased improvements in safety and ensuring retention of factual accuracy. |
|
|
|
Additionally, it addresses issues such as endless repetition, poor readability, and language mixing. This approach allows the model to explore chain-of-thought (CoT) reasoning for solving complex problems, improving reasoning patterns, and aligning with human preferences. Furthermore, two SFT stages serve as the seed for the model's reasoning and non-reasoning capabilities. |
|
|
|
# **Use with Transformers** |
|
|
|
Starting with `transformers >= 4.45.0`, you can run conversational inference using the Transformers `pipeline` abstraction or by leveraging the Auto classes with the `generate()` function. |
|
|
|
Make sure to update your Transformers installation via: |
|
```sh |
|
pip install --upgrade transformers |
|
``` |
|
|
|
#### Example Usage: |
|
|
|
```python |
|
import transformers |
|
import torch |
|
|
|
model_id = "prithivMLmods/Magellanic-Llama-70B-r999" |
|
|
|
pipeline = transformers.pipeline( |
|
"text-generation", |
|
model=model_id, |
|
model_kwargs={"torch_dtype": torch.bfloat16}, |
|
device_map="auto", |
|
) |
|
|
|
messages = [ |
|
{"role": "system", "content": "You are a pirate chatbot who always responds in pirate speak!"}, |
|
{"role": "user", "content": "Who are you?"}, |
|
] |
|
|
|
outputs = pipeline( |
|
messages, |
|
max_new_tokens=256, |
|
) |
|
print(outputs[0]["generated_text"][-1]) |
|
``` |
|
|
|
### Tool Use with Transformers |
|
|
|
LLaMA-3.3 supports multiple tool use formats. You can see a full guide to prompt formatting [here](https://llama.meta.com/docs/model-cards-and-prompt-formats/llama3_1/). |
|
|
|
Tool use is also supported through [chat templates](https://huggingface.co/docs/transformers/main/chat_templating#advanced-tool-use--function-calling) in Transformers. |
|
|
|
#### Example Tool Integration: |
|
|
|
```python |
|
# Define a tool |
|
def get_current_temperature(location: str) -> float: |
|
""" |
|
Get the current temperature at a location. |
|
|
|
Args: |
|
location: The location to get the temperature for, in the format "City, Country" |
|
Returns: |
|
The current temperature at the specified location in the specified units, as a float. |
|
""" |
|
return 22.0 # A real function should retrieve actual temperature data! |
|
|
|
# Create a chat and apply the chat template |
|
messages = [ |
|
{"role": "system", "content": "You are a bot that responds to weather queries."}, |
|
{"role": "user", "content": "Hey, what's the temperature in Paris right now?"} |
|
] |
|
|
|
inputs = tokenizer.apply_chat_template(messages, tools=[get_current_temperature], add_generation_prompt=True) |
|
``` |
|
|
|
If the model generates a tool call, append it to the chat like so: |
|
|
|
```python |
|
tool_call = {"name": "get_current_temperature", "arguments": {"location": "Paris, France"}} |
|
messages.append({"role": "assistant", "tool_calls": [{"type": "function", "function": tool_call}]}) |
|
``` |
|
|
|
Then call the tool and append the result with the `tool` role: |
|
|
|
```python |
|
messages.append({"role": "tool", "name": "get_current_temperature", "content": "22.0"}) |
|
``` |
|
# **Intended Use** |
|
1. **Advanced Reasoning and Problem-Solving**: Designed for complex logical reasoning tasks, multi-step problem-solving, and structured responses. |
|
2. **Educational Assistance**: Useful for providing explanations, summaries, and structured responses to enhance learning experiences. |
|
3. **Conversational AI**: Ideal for chatbots and virtual assistants requiring deep contextual understanding. |
|
4. **Code Generation and Debugging**: Capable of assisting in writing, explaining, and improving code across multiple programming languages. |
|
5. **Research and Knowledge Discovery**: Supports academic and general knowledge research by generating informative responses. |
|
6. **Tool-Assisted Responses**: Equipped for function calling, data retrieval, and automation support. |
|
|
|
# **Limitations** |
|
1. **Hardware Requirements**: Due to its large size, it requires high-memory GPUs or TPUs for efficient deployment. |
|
2. **Potential Bias**: May reflect biases present in its training data, necessitating human oversight. |
|
3. **Lack of Real-Time Awareness**: Does not have access to real-world events beyond its training data cutoff. |
|
4. **Creative Task Variability**: Performance in highly subjective tasks such as storytelling may be inconsistent. |
|
5. **Error Propagation**: Minor inconsistencies in early outputs can affect coherence in longer responses. |
|
6. **Prompt Sensitivity**: The quality of generated responses depends on how well-structured the input prompts are. |