File size: 4,411 Bytes
4ff512c
3167656
4ff512c
 
 
 
3167656
4ff512c
3167656
4ff512c
3167656
4ff512c
3167656
 
4ff512c
3167656
4ff512c
3167656
4ff512c
3167656
4ff512c
3167656
 
 
4ff512c
3167656
4ff512c
3167656
4ff512c
3167656
4ff512c
3167656
4ff512c
3167656
 
 
4ff512c
3167656
4ff512c
3167656
 
 
 
 
 
4ff512c
3167656
 
 
 
4ff512c
3167656
 
 
 
 
4ff512c
3167656
 
 
 
 
4ff512c
3167656
 
 
 
 
 
 
 
 
 
4ff512c
3167656
4ff512c
 
3167656
4ff512c
3167656
 
 
 
4ff512c
3167656
4ff512c
3167656
 
 
 
4ff512c
3167656
4ff512c
3167656
 
 
 
4ff512c
3167656
4ff512c
3167656
 
 
 
4ff512c
3167656
4ff512c
3167656
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
---
license: llama3
library_name: transformers
tags: []
---

# Dracarys2-Llama-3.1-70B-Instruct

### Built with Meta Llama 3

# Introduction

We introduce the latest in the Smaug series, the Dracarys family of finetunes targeting coding performance improvements
across a variety of base models.

This variant is a finetune of [meta-llama/Meta-Llama-3.1-70B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-70B-Instruct)

Compared to meta-llama/Meta-Llama-3.1-70B-Instruct, Dracarys has better LiveCodeBench scores (see evaluation results below).

### Model Description

- **Developed by:** [Abacus.AI](https://abacus.ai)
- **License:** https://llama.meta.com/llama3/license/
- **Finetuned from model:** [meta-llama/Meta-Llama-3.1-70B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-70B-Instruct).

## How to use

The prompt format is unchanged from Llama 3 70B Instruct (see evaluations for prompt details for LCB)

### Use with transformers

See the snippet below for usage with Transformers:

```python
import transformers
import torch

model_id = "abacusai/Dracarys-72B-Instruct"

pipeline = transformers.pipeline(
    "text-generation",
    model=model_id,
    model_kwargs={"torch_dtype": torch.bfloat16},
    device_map="auto",
)

messages = [
    {"role": "system", "content": "You are data science coding assistant that generates Python code using Pandas and Numpy."},
    {"role": "user", "content": "Write code to select rows from the dataframe `df` having the maximum `temp` for each `city`"},
]

prompt = pipeline.tokenizer.apply_chat_template(
		messages, 
		tokenize=False, 
		add_generation_prompt=True
)

terminators = [
    pipeline.tokenizer.eos_token_id,
    pipeline.tokenizer.convert_tokens_to_ids("<|eot_id|>"),
    pipeline.tokenizer.convert_tokens_to_ids("<|end_of_text|>"),
]

outputs = pipeline(
    prompt,
    max_new_tokens=256,
    eos_token_id=terminators,
    do_sample=True,
    temperature=0.6,
    top_p=0.9,
)
print(outputs[0]["generated_text"][len(prompt):])
```

# Evaluation Results


## LiveCodeBench

| Model                               | Code Generation | Code Execution |Test Output Prediction |
|-------------------------------------|-----------------|----------------|-----------------------|
| **Dracarys2-Llama-3.1-70B-Instruct**| **33.44**       | 48.26          | **52.10**             |
| Meta-Llama-3.1-70B-Instruct         | 32.23           | 48.768         | 41.40                 |

## Breakdown of LiveCodeBench CodeGeneration

| Model                               | Easy            | Medium         | Hard                  |
|-------------------------------------|-----------------|----------------|-----------------------|
| **Dracarys2-Llama-3.1-70B-Instruct**| **71.29**       | **18.48**      | **3.57**              |
| Meta-Llama-3.1-70B-Instruct         | 68.4            | 17.99          | 3.57                  |

## Breakdown of LiveCodeBench CodeExecution

| Model                               | COT             | Non-COT        |
|-------------------------------------|-----------------|----------------|
| **Dracarys2-Llama-3.1-70B-Instruct**| **75.55**       | 48.26          |
| Meta-Llama-3.1-70B-Instruct         | 70.14           | 48.768         |

## Breakdown of LiveCodeBench TestOutputPrediction

| Model                               | Easy            | Medium         | Hard                  |
|-------------------------------------|-----------------|----------------|-----------------------|
| **Dracarys2-Llama-3.1-70B-Instruct**| **63.53**       | **47.30**      | **43.61**             |
| Meta-Llama-3.1-70B-Instruct         | 51.22           | 35.91          | 34.30                 |

## LiveBench(Aug update)

| Model                               | Global Average | Coding Average | Reasoning Average| Mathematics Average | Data Analysis Average | Language Average | IF Average  |
|-------------------------------------|----------------|----------------|------------------|---------------------|-----------------------|------------------|-------------|
| **Dracarys2-Llama-3.1-70B-Instruct**| **47.8**       | **36.3**       | **47.3**         | **38.9**            | 46.1                  | 41.5             | 76.6        |
| Meta-Llama-3.1-70B-Instruct         | 45.1           | 30.7           | 35.3             | 37.0                | 48.4                  | 42.1             | 77.2        |