File size: 4,419 Bytes
b0fc450
d6bcf24
b0fc450
 
 
 
 
 
d6bcf24
 
 
 
b0fc450
 
 
d6bcf24
 
 
 
b0fc450
 
c3cac38
b0fc450
d6bcf24
 
b0fc450
d6bcf24
 
b0fc450
d6bcf24
 
 
 
 
 
 
 
 
 
 
691821c
d6bcf24
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
c3cac38
d6bcf24
 
 
 
 
 
 
 
 
 
c3cac38
d6bcf24
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
d87c808
d6bcf24
 
 
d87c808
 
 
 
d6bcf24
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
---
base_model: Qwen/Qwen2.5-7B-Instruct
tags:
- text-generation-inference
- transformers
- unsloth
- qwen2
- trl
- gammacorpus
- zurich
- chat
- conversational
license: apache-2.0
language:
- en
datasets:
- rubenroy/GammaCorpus-v2-500k
pipeline_tag: text-generation
library_name: transformers
---

![Zunich Banner](https://cdn.ruben-roy.com/AI/Zurich/img/banner-7B-500k.png)

# Zurich 7B GammaCorpus v2-500k 
*A Qwen 2.5 model fine-tuned on the GammaCorpus dataset*

## Overview
Zurich 7B GammaCorpus v2-500k is a fine-tune of Alibaba's **Qwen 2.5 7B Instruct** model. Zurich is designed to outperform other models that have a similar size while also showcasing [GammaCorpus v2-500k](https://huggingface.co/datasets/rubenroy/GammaCorpus-v2-500k).

## Model Details
- **Base Model:** [Qwen/Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct)
- **Type:** Causal Language Models
- **Architecture:** Transformers with RoPE, SwiGLU, RMSNorm, and Attention QKV bias
- **Number of Parameters:** 7.61B
- **Number of Paramaters (Non-Embedding):** 6.53B
- **Number of Layers:** 28
- **Number of Attention Heads (GQA):** 28 for Q and 4 for KV

## Training Details

Zurich-7B-GCv2-500k underwent fine-tuning with 1 T4 GPU for ~290 minutes and trained with the [Unsloth](https://unsloth.ai/) framework. Zurich-7B-GCv2-500k was trained for **60 Epochs**. 

## Usage

### Requirements

We **strongly** recommend you use the latest version of the `transformers` package. You may install it via `pip` as follows:

```
pip install transformers
```

### Quickstart

Here is a code snippet with `apply_chat_template` to show you how to load the tokenizer and model and how to generate contents;

```python
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "rubenroy/Zurich-7B-GCv2-500k"

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

prompt = "How tall is the Eiffel tower?"
messages = [
    {"role": "system", "content": "You are Zurich, an AI assistant built on the Qwen 2.5 7B model developed by Alibaba Cloud, and fine-tuned by Ruben Roy. You are a helpful assistant."},
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=512
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
```

## About GammaCorpus

This model, and all Zurich models, are trained with GammaCorpus. GammaCorpus is a dataset on HuggingFace that is filled with structured and filtered multi-turn conversations.
GammaCorpus has 4 version with different sizes in each. These are the following versions and sizes:

### GammaCorpus v1
- 10k UNFILTERED
- 50k UNFILTERED
- 70k UNFILTERED

Here is a link to the GCv1 dataset collection:<br>
https://huggingface.co/collections/rubenroy/gammacorpus-v1-67935e4e52a04215f15a7a60

### GammaCorpus v2
- 10k
- 50k
- 100k
- **500k  <-- This is the version of GammaCorpus v2 that the Zurich model you are using was trained on.**
- 1m
- 5m

Here is a link to the GCv2 dataset collection:<br>
https://huggingface.co/collections/rubenroy/gammacorpus-v2-67935e895e1259c404a579df

### GammaCorpus CoT
- Math 170k

Here is a link to the GC-CoT dataset collection:<br>
https://huggingface.co/collections/rubenroy/gammacorpus-cot-6795bbc950b62b1ced41d14f

### GammaCorpus QA
- Fact 450k

Here is a link to the GC-QA dataset collection:<br>
https://huggingface.co/collections/rubenroy/gammacorpus-qa-679857017bb3855234c1d8c7

### The link to the full GammaCorpus dataset collection can be found [here](https://huggingface.co/collections/rubenroy/gammacorpus-67765abf607615a0eb6d61ac).

## Known Limitations

- **Bias:** We have tried our best to mitigate as much bias we can, but please be aware of the possibility that the model might generate some biased answers.

## Additional Information

### Licensing Information

The model is released under the **[Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0)**. Please refer to the license for usage rights and restrictions.