File size: 3,504 Bytes
5a55bfe
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
30348c3
5a55bfe
 
 
 
 
e343c60
 
 
 
5a55bfe
 
 
 
 
 
 
e343c60
 
 
 
5a55bfe
 
 
 
 
 
 
 
 
 
 
233f016
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
---
language:
- ja
- en
license: other
license_link: LICENSE
---

# Sarashina2.1-1B

This repository provides large language models trained by [SB Intuitions](https://www.sbintuitions.co.jp/).

## How to use

```python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline, set_seed
 
model = AutoModelForCausalLM.from_pretrained("sbintuitions/sarashina2.1-1b", torch_dtype=torch.bfloat16, device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("sbintuitions/sarashina2.1-1b")
generator = pipeline("text-generation", model=model, tokenizer=tokenizer)
set_seed(123)
 
text = generator(
    "おはようございます、今日の天気は",
    max_length=30,
    do_sample=True,
    pad_token_id=tokenizer.pad_token_id,
    num_return_sequences=3,
)

for t in text:
    print(t)


```

## Model Description

We constructed this Sarashina2.1-1B model, which consists of 1 billion parameters, using a two-phase training process.
First, we trained the model on 10 trillion tokens, including Japanese and English data extracted from web corpora.
Then, we trained the model using 1 trillion tokens, predominantly consisting of Japanese data, to enhance its performance in Japanese.
The following tables show the model's performance on Japanese and English tasks.
We also show the performance of other public LLMs for reference.

#### Evaluation in Japanese tasks

| Model | Avg. | AIO | abc | JEMHopQA | NIILC | JComQA | JSQuAD |
| ----- | ---- | --- | --- | --------- | ---- | ------ | ------ |
| [Qwen2.5-0.5B](https://huggingface.co/Qwen/Qwen2.5-0.5B)    | 25.40 | 0.80  | 27.38 | 28.21 | 0.79 | 45.13 | 50.07 |
| [Qwen2.5-1.5B](https://huggingface.co/Qwen/Qwen2.5-1.5B)    | 39.61 | 7.00  | 38.14 | 27.35 | 11.81 | **79.18** | 74.18 |
| [llm-jp-3-1.8B](https://huggingface.co/llm-jp/llm-jp-3-1.8b)| 43.46 | 44.50 | 46.45 | 32.48 | 30.71 | 44.06 | 62.58 |
| [llm-jp-3-3.7B](https://huggingface.co/llm-jp/llm-jp-3-3.7b)| 54.24 | 54.10 | 49.63 | 36.75 | **49.61** | 58.36 | 77.01 |
| Sarashina2.1-1B (this model) | **58.31** | **54.70** | **58.44** | **41.88** | 48.82 | 64.70 | **81.34** |


### Evaluation in English tasks

| Model                        | Avg.  | PIQA  | OpenBookQA | HellaSwag | Winogrande | ARC-easy | ARC-challenge |
| ---------------------------- | ----- | ----- | ---------- | --------- | ---------- | -------- | ------------- |
| [Qwen2.5-0.5B](https://huggingface.co/Qwen/Qwen2.5-0.5B)    | 50.71 | 69.59 | 35.40  | 52.17 | 56.43  | 58.42  | 32.25       |
| [Qwen2.5-1.5B](https://huggingface.co/Qwen/Qwen2.5-1.5B)    | 60.84 | 76.17 | 40.40  | 67.83 | 63.85  | 72.01  | 44.80       |
| [llm-jp-3-1.8B](https://huggingface.co/llm-jp/llm-jp-3-1.8b)| 53.01 | 72.85 | 32.60  | 61.78 | 62.27  | 57.24  |  31.31      |
| [llm-jp-3-3.7B](https://huggingface.co/llm-jp/llm-jp-3-3.7b)| 56.70 | 74.92 |  36.60 | 67.75 | 62.90  | 61.91  | 36.09       |
| Sarashina2.1-1B (this model) | 56.01 | 74.10 |  37.20     |  63.16    |  61.01     | 63.64    | 36.95         |



## Ethical Considerations and Limitations
Sarashina2.1 has not been tuned to follow an instruction yet.
Therefore, sarashina2.1 might generate some meaningless sequences, some inaccurate instances or biased/objectionable outputs.
Before using sarashina2.1, we would like developers to tune models based on human preferences and safety considerations.

## License

[Sarashina Model NonCommercial License Agreement](https://huggingface.co/sbintuitions/sarashina2.1-1B/blob/main/LICENSE)