File size: 7,529 Bytes
3bf6b91 5142aaf 3bf6b91 5142aaf |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 |
---
language:
- en
license: llama3
base_model: meta-llama/Meta-Llama-3-8B
pipeline_tag: text-generation
tags:
- facebook
- meta
- pytorch
- llama
- llama-3
- groq
- tool-use
- function-calling
model-index:
- name: Llama-3-Groq-8B-Tool-Use
results:
- task:
type: text-generation
name: Text Generation
dataset:
name: IFEval (0-Shot)
type: wis-k/instruction-following-eval
split: train
args:
num_few_shot: 0
metrics:
- type: inst_level_strict_acc and prompt_level_strict_acc
value: 60.98
name: averaged accuracy
source:
url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=Groq%2FLlama-3-Groq-8B-Tool-Use
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: BBH (3-Shot)
type: SaylorTwift/bbh
split: test
args:
num_few_shot: 3
metrics:
- type: acc_norm
value: 27.25
name: normalized accuracy
source:
url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=Groq%2FLlama-3-Groq-8B-Tool-Use
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: MATH Lvl 5 (4-Shot)
type: lighteval/MATH-Hard
split: test
args:
num_few_shot: 4
metrics:
- type: exact_match
value: 5.82
name: exact match
source:
url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=Groq%2FLlama-3-Groq-8B-Tool-Use
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: GPQA (0-shot)
type: Idavidrein/gpqa
split: train
args:
num_few_shot: 0
metrics:
- type: acc_norm
value: 2.35
name: acc_norm
source:
url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=Groq%2FLlama-3-Groq-8B-Tool-Use
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: MuSR (0-shot)
type: TAUR-Lab/MuSR
args:
num_few_shot: 0
metrics:
- type: acc_norm
value: 5.39
name: acc_norm
source:
url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=Groq%2FLlama-3-Groq-8B-Tool-Use
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: MMLU-PRO (5-shot)
type: TIGER-Lab/MMLU-Pro
config: main
split: test
args:
num_few_shot: 5
metrics:
- type: acc
value: 26.66
name: accuracy
source:
url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=Groq%2FLlama-3-Groq-8B-Tool-Use
name: Open LLM Leaderboard
---
# Llama-3-Groq-8B-Tool-Use
This is the 8B parameter version of the Llama 3 Groq Tool Use model, specifically designed for advanced tool use and function calling tasks.
## Model Details
- **Model Type:** Causal language model fine-tuned for tool use
- **Language(s):** English
- **License:** Meta Llama 3 Community License
- **Model Architecture:** Optimized transformer
- **Training Approach:** Full fine-tuning and Direct Preference Optimization (DPO) on Llama 3 8B base model
- **Input:** Text
- **Output:** Text, with enhanced capabilities for tool use and function calling
## Performance
- **Berkeley Function Calling Leaderboard (BFCL) Score:** 89.06% overall accuracy
- This score represents the best performance among all open-source 8B LLMs on the BFCL
## Usage and Limitations
This model is designed for research and development in tool use and function calling scenarios. It excels at tasks involving API interactions, structured data manipulation, and complex tool use. However, users should note:
- For general knowledge or open-ended tasks, a general-purpose language model may be more suitable
- The model may still produce inaccurate or biased content in some cases
- Users are responsible for implementing appropriate safety measures for their specific use case
Note the model is quite sensitive to the `temperature` and `top_p` sampling configuration. Start at `temperature=0.5, top_p=0.65` and move up or down as needed.
Text prompt example:
We'd like to give a special shoutout to [@NousResearch](https://x.com/NousResearch) for pushing open source tool use forward with their public & open exploration of tool use in LLMs.
```
<|start_header_id|>system<|end_header_id|>
You are a function calling AI model. You are provided with function signatures within <tools></tools> XML tags. You may call one or more functions to assist with the user query. Don't make assumptions about what values to plug into functions. For each function call return a json object with function name and arguments within <tool_call></tool_call> XML tags as follows:
<tool_call>
{"name": <function-name>,"arguments": <args-dict>}
</tool_call>
Here are the available tools:
<tools> {
"name": "get_current_weather",
"description": "Get the current weather in a given location",
"parameters": {
"properties": {
"location": {
"description": "The city and state, e.g. San Francisco, CA",
"type": "string"
},
"unit": {
"enum": [
"celsius",
"fahrenheit"
],
"type": "string"
}
},
"required": [
"location"
],
"type": "object"
}
} </tools><|eot_id|><|start_header_id|>user<|end_header_id|>
What is the weather like in San Francisco?<|eot_id|><|start_header_id|>assistant<|end_header_id|>
<tool_call>
{"id":"call_deok","name":"get_current_weather","arguments":{"location":"San Francisco","unit":"celsius"}}
</tool_call><|eot_id|><|start_header_id|>tool<|end_header_id|>
<tool_response>
{"id":"call_deok","result":{"temperature":"72","unit":"celsius"}}
</tool_response><|eot_id|><|start_header_id|>assistant<|end_header_id|>
```
## Ethical Considerations
While fine-tuned for tool use, this model inherits the ethical considerations of the base Llama 3 model. Use responsibly and implement additional safeguards as needed for your application.
## Availability
The model is available through:
- [Groq API console](https://console.groq.com)
- [Hugging Face](https://huggingface.co/Groq/Llama-3-Groq-8B-Tool-Use)
For full details on responsible use, ethical considerations, and latest benchmarks, please refer to the [official Llama 3 documentation](https://llama.meta.com/) and the Groq model card.
# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/Groq__Llama-3-Groq-8B-Tool-Use-details)!
Summarized results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/contents/viewer/default/train?q=Groq%2FLlama-3-Groq-8B-Tool-Use&sort[column]=Average%20%E2%AC%86%EF%B8%8F&sort[direction]=desc)!
| Metric |Value (%)|
|-------------------|--------:|
|**Average** | 21.41|
|IFEval (0-Shot) | 60.98|
|BBH (3-Shot) | 27.25|
|MATH Lvl 5 (4-Shot)| 5.82|
|GPQA (0-shot) | 2.35|
|MuSR (0-shot) | 5.39|
|MMLU-PRO (5-shot) | 26.66|
|