aberrio commited on
Commit
a3aa361
1 Parent(s): d373ccb

Add revisions and fixes

Browse files

- Add details and links to overview
- Add vocab information
- Fix usage instructions
- Add MistralAI to citations for sentencepiece mistral model vocab
- Fix Citations and other references

Files changed (1) hide show
  1. README.md +38 -31
README.md CHANGED
@@ -15,29 +15,28 @@ tags:
15
 
16
  ## Valerie v0.1 Model Card
17
 
18
- ### Overview
19
 
20
- Valerie v0.1 is a custom language model created using `llama.cpp` (commit: 532c173) with a context length of 256 tokens, embedding length of 256, 8 heads, and
21
- 16 layers. This model was pretrained on a dataset consisting of female V's dialog from Cyberpunk 2077, extracted using the [Voice Over Subtitle
22
- Map](https://www.nexusmods.com/cyberpunk2077/mods/2045) mod.
23
 
24
- The `ggml-valerie-v0.1-256x32-f32-LATEST.gguf` release represents a single pass through all 51443 samples, completing one iteration over the entire dataset,
25
- and took approximately 3 hours for training.
26
 
27
- ### Model Information
28
 
29
- | Model name | Adam iteration | Model filename |
30
- | ----------------------- | -------------- | ---------------------------------------- |
31
- | Valerie v0.1 Checkpoint | 950 | chk-valerie-v0.1-256x32-950.gguf |
32
- | Valerie v0.1 Model | 1700 | ggml-valerie-v0.1-256x32-f32-LATEST.gguf |
33
 
34
  ### Files and versions
35
 
 
36
  - ggml-valerie-v0.1-256x32-f32-950.gguf: The pretrained model checkpoint version 950.
37
  - ggml-valerie-v0.1-256x32-f32-LATEST.gguf: The latest pretrained model checkpoint.
38
 
39
- ### Settings
40
 
 
41
  - Context length: 256 tokens
42
  - Embedding length: 256
43
  - Heads: 8
@@ -46,25 +45,34 @@ and took approximately 3 hours for training.
46
  - Seed: 1
47
  - Saved checkpoint every 50 iterations
48
 
49
- ### Usage
50
 
51
  To use Valerie v0.1, follow these steps:
52
 
53
- 1. Install the required dependencies (e.g., GGML library).
54
 
55
- ```sh
56
- git clone https://github.com/ggerganov/llama.cpp
57
- ```
 
 
 
 
58
 
59
- Reference the `llama.cpp` README.md for more information.
 
 
 
 
60
 
61
- 2. Download or clone this repository.
62
 
63
- ```sh
64
- wget https://huggingface.co/teleprint-me/cyberpunk-valerie-v0.1/resolve/main/ggml-valerie-v0.1-256x32-f32-LATEST.gguf\?download\=true -O ggml-valerie-v0.1-256x32-f32-LATEST.gguf
65
- ```
 
66
 
67
- This will download the latest available base model.
68
 
69
  3. Perform inference with the latest model checkpoint using the provided command:
70
 
@@ -72,13 +80,14 @@ To use Valerie v0.1, follow these steps:
72
  ./main -m models/valerie/v0.1/ggml-valerie-v0.1-256x32-f32-LATEST.gguf --color -e -s 1 -c 4096
73
  ```
74
 
75
- ### Citing Valerie v0.1
76
 
77
  When using Valerie v0.1 in your research, please remember to cite the following:
78
 
79
- - Aberrio. (2023). Valerie v0.1: A custom language model for female V's dialog from Cyberpunk 2077. https://huggingface.co/teleprint-me/cyberpunk.
80
- - julieisdead (2021). Voice Over Subtitle Map: Two files that contain the IDs for, Voice Over files the other Subtitles. https://www.nexusmods.com/cyberpunk2077/mods/2045
81
- - GGML team. (2023). `llama.cpp` version `532c173`. Georgi Gerganov Machine Learning Library. https://github.com/ggerganov/llama.cpp
 
82
 
83
  ### Contributors
84
 
@@ -86,10 +95,8 @@ Austin Berrio (teleprint-me) - Created and trained Valerie v0.1 using `llama.cpp
86
 
87
  ### Community
88
 
89
- Join the community of fellow language model enthusiasts and researchers by sharing your knowledge, asking questions, and collaborating on projects related to
90
- creating custom models using `llama.cpp`.
91
 
92
  ### License
93
 
94
- Valerie v0.1 is released under the CC-BY-NC-SA-3.0 license. You are free to use, modify, and redistribute this model for non-commercial purposes, but you must
95
- provide attribution to the original authors and release any derived works under the same license.
 
15
 
16
  ## Valerie v0.1 Model Card
17
 
18
+ ## Overview
19
 
20
+ Valerie v0.1 is a custom language model created using `llama.cpp` (commit: 532c173) with a context length of 256 tokens, embedding length of 256, 8 heads, and 16 layers. This model was pretrained on a dataset consisting of [female V's](https://cyberpunk.fandom.com/wiki/V_(character)) dialog from [Cyberpunk 2077](https://cyberpunk.fandom.com/wiki/Cyberpunk_Wiki), extracted using the [Voice Over Subtitle Map](https://www.nexusmods.com/cyberpunk2077/mods/2045) mod.
 
 
21
 
22
+ The `ggml-valerie-v0.1-256x32-f32-LATEST.gguf` release represents a single pass through all 51443 samples, completing one iteration over the entire dataset, and took approximately 3 hours for training.
 
23
 
24
+ ## Model Information
25
 
26
+ | Model name | Adam iteration | Model filename | Vocabulary size |
27
+ | ----------------------- | -------------- | ---------------------------------------- | --------------- |
28
+ | Valerie v0.1 Checkpoint | 950 | chk-valerie-v0.1-256x32-950.gguf | 32,000 |
29
+ | Valerie v0.1 Model | 1700 | ggml-valerie-v0.1-256x32-f32-LATEST.gguf | 32,000 |
30
 
31
  ### Files and versions
32
 
33
+ - ggml-vocab-mistral.gguf: Extracted Mistral 7B model vocabulary
34
  - ggml-valerie-v0.1-256x32-f32-950.gguf: The pretrained model checkpoint version 950.
35
  - ggml-valerie-v0.1-256x32-f32-LATEST.gguf: The latest pretrained model checkpoint.
36
 
37
+ ## Settings
38
 
39
+ - Vocabulary size: 32,000
40
  - Context length: 256 tokens
41
  - Embedding length: 256
42
  - Heads: 8
 
45
  - Seed: 1
46
  - Saved checkpoint every 50 iterations
47
 
48
+ ## Usage
49
 
50
  To use Valerie v0.1, follow these steps:
51
 
52
+ 1. Clone the `llama.cpp` library
53
 
54
+ ```sh
55
+ git clone https://github.com/ggerganov/llama.cpp
56
+ ```
57
+
58
+ Reference the `llama.cpp` [README.md](https://github.com/ggerganov/llama.cpp/blob/master/README.md) for more information about building. You can build using raw CPU or even OpenBLAS. CUDA, ROCm, Vulkan, and other backends are also available.
59
+
60
+ Arch Linux Example:
61
 
62
+ ```sh
63
+ # CPU build using BLAS backend on Arch Linux
64
+ sudo pacman -S openblas openblas64
65
+ make LLAMA_OPENBLAS=1
66
+ ```
67
 
68
+ 2. Download the latest model.
69
 
70
+ ```sh
71
+ wget https://huggingface.co/teleprint-me/cyberpunk-valerie-v0.1/resolve/main/ggml-valerie-v0.1-256x32-f32-LATEST.gguf?download=true -O
72
+ ggml-valerie-v0.1-256x32-f32-LATEST.gguf
73
+ ```
74
 
75
+ This will download the latest available base model.
76
 
77
  3. Perform inference with the latest model checkpoint using the provided command:
78
 
 
80
  ./main -m models/valerie/v0.1/ggml-valerie-v0.1-256x32-f32-LATEST.gguf --color -e -s 1 -c 4096
81
  ```
82
 
83
+ ## Citations
84
 
85
  When using Valerie v0.1 in your research, please remember to cite the following:
86
 
87
+ - Aberrio. (2024). Valerie v0.1: A custom language model for female V's dialog from Cyberpunk 2077. <https://huggingface.co/teleprint-me/cyberpunk-valerie-v0.1>
88
+ - GGML team. (2023). `llama.cpp` version `532c173`. Georgi Gerganov Machine Learning Library. <https://github.com/ggerganov/llama.cpp>
89
+ - MistralAI (2023). Extracted sentencepiece model vocabulary: <https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2>
90
+ - julieisdead (2021). Voice Over Subtitle Map: Two files that contain the IDs for, Voice Over files the other Subtitles. <https://www.nexusmods.com/cyberpunk2077/mods/2045>
91
 
92
  ### Contributors
93
 
 
95
 
96
  ### Community
97
 
98
+ Join the community of fellow language model enthusiasts and researchers by sharing your knowledge, asking questions, and collaborating on projects related to creating custom models using `llama.cpp`.
 
99
 
100
  ### License
101
 
102
+ Valerie v0.1 is released under the CC-BY-NC-SA-3.0 license. You are free to use, modify, and redistribute this model for non-commercial purposes, but you must provide attribution to the original authors and release any derived works under the same license.