|
--- |
|
language: |
|
- ja |
|
- en |
|
license: other |
|
license_link: LICENSE |
|
--- |
|
|
|
# Sarashina2.1-1B |
|
|
|
This repository provides large language models trained by [SB Intuitions](https://www.sbintuitions.co.jp/). |
|
|
|
## How to use |
|
|
|
```python |
|
import torch |
|
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline, set_seed |
|
|
|
model = AutoModelForCausalLM.from_pretrained("sbintuitions/sarashina2.1-1b", torch_dtype=torch.bfloat16, device_map="auto") |
|
tokenizer = AutoTokenizer.from_pretrained("sbintuitions/sarashina2.1-1b") |
|
generator = pipeline("text-generation", model=model, tokenizer=tokenizer) |
|
set_seed(123) |
|
|
|
text = generator( |
|
"おはようございます、今日の天気は", |
|
max_length=30, |
|
do_sample=True, |
|
pad_token_id=tokenizer.pad_token_id, |
|
num_return_sequences=3, |
|
) |
|
|
|
for t in text: |
|
print(t) |
|
|
|
|
|
``` |
|
|
|
## Model Description |
|
|
|
We constructed this Sarashina2.1-1B model, which consists of 1 billion parameters, using a two-phase training process. |
|
First, we trained the model on 10 trillion tokens, including Japanese and English data extracted from web corpora. |
|
Then, we trained the model using 1 trillion tokens, predominantly consisting of Japanese data, to enhance its performance in Japanese. |
|
The following tables show the model's performance on Japanese and English tasks. |
|
We also show the performance of other public LLMs for reference. |
|
|
|
#### Evaluation in Japanese tasks |
|
|
|
| Model | Avg. | AIO | abc | JEMHopQA | NIILC | JComQA | JSQuAD | |
|
| ----- | ---- | --- | --- | --------- | ---- | ------ | ------ | |
|
| [Qwen2.5-0.5B](https://huggingface.co/Qwen/Qwen2.5-0.5B) | 25.40 | 0.80 | 27.38 | 28.21 | 0.79 | 45.13 | 50.07 | |
|
| [Qwen2.5-1.5B](https://huggingface.co/Qwen/Qwen2.5-1.5B) | 39.61 | 7.00 | 38.14 | 27.35 | 11.81 | **79.18** | 74.18 | |
|
| [llm-jp-3-1.8B](https://huggingface.co/llm-jp/llm-jp-3-1.8b)| 43.46 | 44.50 | 46.45 | 32.48 | 30.71 | 44.06 | 62.58 | |
|
| [llm-jp-3-3.7B](https://huggingface.co/llm-jp/llm-jp-3-3.7b)| 54.24 | 54.10 | 49.63 | 36.75 | **49.61** | 58.36 | 77.01 | |
|
| Sarashina2.1-1B (this model) | **58.31** | **54.70** | **58.44** | **41.88** | 48.82 | 64.70 | **81.34** | |
|
|
|
|
|
### Evaluation in English tasks |
|
|
|
| Model | Avg. | PIQA | OpenBookQA | HellaSwag | Winogrande | ARC-easy | ARC-challenge | |
|
| ---------------------------- | ----- | ----- | ---------- | --------- | ---------- | -------- | ------------- | |
|
| [Qwen2.5-0.5B](https://huggingface.co/Qwen/Qwen2.5-0.5B) | 50.71 | 69.59 | 35.40 | 52.17 | 56.43 | 58.42 | 32.25 | |
|
| [Qwen2.5-1.5B](https://huggingface.co/Qwen/Qwen2.5-1.5B) | 60.84 | 76.17 | 40.40 | 67.83 | 63.85 | 72.01 | 44.80 | |
|
| [llm-jp-3-1.8B](https://huggingface.co/llm-jp/llm-jp-3-1.8b)| 53.01 | 72.85 | 32.60 | 61.78 | 62.27 | 57.24 | 31.31 | |
|
| [llm-jp-3-3.7B](https://huggingface.co/llm-jp/llm-jp-3-3.7b)| 56.70 | 74.92 | 36.60 | 67.75 | 62.90 | 61.91 | 36.09 | |
|
| Sarashina2.1-1B (this model) | 56.01 | 74.10 | 37.20 | 63.16 | 61.01 | 63.64 | 36.95 | |
|
|
|
|
|
|
|
## Ethical Considerations and Limitations |
|
Sarashina2.1 has not been tuned to follow an instruction yet. |
|
Therefore, sarashina2.1 might generate some meaningless sequences, some inaccurate instances or biased/objectionable outputs. |
|
Before using sarashina2.1, we would like developers to tune models based on human preferences and safety considerations. |
|
|
|
## License |
|
|
|
[Sarashina Model NonCommercial License Agreement](https://huggingface.co/sbintuitions/sarashina2.1-1B/blob/main/LICENSE) |