dyogatama commited on
Commit
28e503b
·
verified ·
1 Parent(s): 90fdd69

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +83 -83
README.md CHANGED
@@ -1,83 +1,83 @@
1
- ---
2
- license: apache-2.0
3
- ---
4
- # Reka Flash 3
5
-
6
- Reka Flash 3 is Reka latest general-purpose reasoning model that excels in general chat, coding, instruction following, and function calling. At a size of 21B parameters, it performs competitively with proprietary models such as OpenAI o1-mini, making it a good foundation to build many applications that require low latency or on-device deployments. It is currently the best model in its size category.
7
-
8
- ![Performance](./evals.png)
9
-
10
- Try it out at [Reka Space](https://space.reka.ai).
11
-
12
- ## Quickstart
13
-
14
- For easing deployment, the model is released in a Llama-compatible format. You may use any library compatible with Llama to run the model.
15
-
16
- ### Via Hugging Face
17
-
18
- ```python
19
- import transformers
20
-
21
- tokenizer = transformers.AutoTokenizer.from_pretrained("RekaAI/reka-flash-3")
22
- model = transformers.AutoModelForCausalLM.from_pretrained("RekaAI/reka-flash-3", torch_dtype='auto', device_map='auto')
23
-
24
- prompt = {"role": "user", "content": "Write a poem about large language model."}
25
- text = tokenizer.apply_chat_template([prompt], tokenize=False, add_generation_prompt=True)
26
- model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
27
- outputs = model.generate(**model_inputs, max_new_tokens=512)
28
- print(tokenizer.decode(outputs[0], skip_special_tokens=True))
29
- ```
30
-
31
- ### Via vLLM
32
-
33
- ```bash
34
- docker run --rm -it --network=host --gpus '"device=0"' -v --shm-size=10.24gb vllm/vllm-openai:latest serve RekaAI/reka-flash-3 --dtype auto -tp 1
35
- ```
36
-
37
- ## Model Details
38
-
39
- ### Prompt Format
40
-
41
- Reka Flash 3 uses cl100k_base tokenizer and adds no additional special tokens. Its prompt format is as follows:
42
-
43
- ```
44
- human: this is round 1 prompt <sep> assistant: this is round 1 response <sep> ...
45
- ```
46
-
47
- Generation should stop on seeing the string `<sep>` or seeing the special token `<|endoftext|>`.
48
-
49
- System prompt can be added by prepending to the first user round.
50
-
51
- ```
52
- human: You are a friendly assistant blah ... this is round 1 user prompt <sep> assistant: this is round 1 response <sep> ...
53
- ```
54
-
55
- And for multi-round conversations, it is recommended to drop the Chain-Of-Thought reasoning traces in the previous assistant round to save tokens for the model to think.
56
-
57
- If you are using HF or vLLM, the built-in chat_template shall handle prompt formatting automatically.
58
-
59
- ### Budget Forcing
60
-
61
- Reka Flash thinks before it produces an output. We use <reasoning> </reasoning> tags to indicate the beginning and the end of its thinking process. For some problems, the model might think for a long time. You can make the model to stop its thinking process by forcing it to output </reasoning> after a certain number of steps. We observe such a budget forcing mechanism will still produce a reasonable output. We show performance on AIME-2024 (cons@16) for various budgets below.
62
-
63
- | Budget | Score |
64
- |--------|-------|
65
- | 4k | 40 |
66
- | 8k | 46 |
67
- | 12k | 50 |
68
- | 16k | 56 |
69
- | 24k | 60 |
70
- | 32k | 60 |
71
- | 48k | 63.3 |
72
-
73
-
74
- ### Language Support
75
-
76
- This model is primarily built for the English language, and you should consider this an English only model. However, the model is able to converse and understand other languages to some degree, as can be seen by its performance on WMT23 and BeleBele below.
77
-
78
-
79
- ### Known Limitations
80
-
81
- - It is not the best model for knowledge-intensive tasks. We recommend coupling Reka Flash-2.5 with web search for knowledge-related tasks.
82
- - The model often thinks in English when prompted questions in non-English languages. We observe that this sometimes affects the output quality in non-English languages.
83
- - The model has not undergone extensive persona training so in rare circumstances it might become confused about its identity due to AI-generated text that is polluting the internet.
 
1
+ ---
2
+ license: apache-2.0
3
+ ---
4
+ # Reka Flash 3
5
+
6
+ Reka Flash 3 is our latest general-purpose reasoning model that excels in general chat, coding, instruction following, and function calling. At a size of 21B parameters, it performs competitively with proprietary models such as OpenAI o1-mini, making it a good foundation to build many applications that require low latency or on-device deployments. It is currently the best model in its size category.
7
+
8
+ ![Performance](./evals.png)
9
+
10
+ Try it out at [Reka Space](https://space.reka.ai).
11
+
12
+ ## Quickstart
13
+
14
+ For ease of deployment, the model is released in a Llama-compatible format. You may use any library compatible with Llama to run the model.
15
+
16
+ ### Via Hugging Face
17
+
18
+ ```python
19
+ import transformers
20
+
21
+ tokenizer = transformers.AutoTokenizer.from_pretrained("RekaAI/reka-flash-3")
22
+ model = transformers.AutoModelForCausalLM.from_pretrained("RekaAI/reka-flash-3", torch_dtype='auto', device_map='auto')
23
+
24
+ prompt = {"role": "user", "content": "Write a poem about large language model."}
25
+ text = tokenizer.apply_chat_template([prompt], tokenize=False, add_generation_prompt=True)
26
+ model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
27
+ outputs = model.generate(**model_inputs, max_new_tokens=512)
28
+ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
29
+ ```
30
+
31
+ ### Via vLLM
32
+
33
+ ```bash
34
+ docker run --rm -it --network=host --gpus '"device=0"' -v --shm-size=10.24gb vllm/vllm-openai:latest serve RekaAI/reka-flash-3 --dtype auto -tp 1
35
+ ```
36
+
37
+ ## Model Details
38
+
39
+ ### Prompt Format
40
+
41
+ Reka Flash 3 uses cl100k_base tokenizer and adds no additional special tokens. Its prompt format is as follows:
42
+
43
+ ```
44
+ human: this is round 1 prompt <sep> assistant: this is round 1 response <sep> ...
45
+ ```
46
+
47
+ Generation should stop on seeing the string `<sep>` or seeing the special token `<|endoftext|>`.
48
+
49
+ System prompt can be added by prepending to the first user round.
50
+
51
+ ```
52
+ human: You are a friendly assistant blah ... this is round 1 user prompt <sep> assistant: this is round 1 response <sep> ...
53
+ ```
54
+
55
+ For multi-round conversations, it is recommended to drop the reasoning traces in the previous assistant round to save tokens for the model to think.
56
+
57
+ If you are using HF or vLLM, the built-in chat_template shall handle prompt formatting automatically.
58
+
59
+ ### Budget Forcing
60
+
61
+ Reka Flash thinks before it produces an output. We use <reasoning> </reasoning> tags to indicate the beginning and the end of its thinking process. For some problems, the model might think for a long time. You can make the model to stop its thinking process by forcing it to output </reasoning> after a certain number of steps. We observe such a budget forcing mechanism will still produce a reasonable output. We show performance on AIME-2024 (cons@16) for various budgets below.
62
+
63
+ | Budget | Score |
64
+ |--------|-------|
65
+ | 4k | 40 |
66
+ | 8k | 46 |
67
+ | 12k | 50 |
68
+ | 16k | 56 |
69
+ | 24k | 60 |
70
+ | 32k | 60 |
71
+ | 48k | 63.3 |
72
+
73
+
74
+ ### Language Support
75
+
76
+ This model is primarily built for the English language, and you should consider this an English only model. However, the model is able to converse and understand other languages to some degree, as can be seen by its performance on WMT23 and BeleBele below.
77
+
78
+
79
+ ### Relase notes
80
+
81
+ - As a smaller model, it is not the best model for knowledge-intensive tasks. We recommend coupling Reka Flash 3 with web search for knowledge-related tasks.
82
+ - The model often thinks in English when prompted questions in non-English languages. We observe that this sometimes affects the output quality in non-English languages.
83
+ - The model has not undergone extensive alignment or persona training.