File size: 4,232 Bytes
e5fb0c8
e4ea13f
 
 
 
 
 
 
 
 
 
 
 
e5fb0c8
e4ea13f
 
 
a3aa361
e4ea13f
a3aa361
e4ea13f
a3aa361
e4ea13f
a3aa361
e4ea13f
a3aa361
 
 
 
e4ea13f
 
 
a3aa361
e4ea13f
 
 
a3aa361
e4ea13f
a3aa361
e4ea13f
 
 
 
 
 
 
 
a3aa361
e4ea13f
 
 
a3aa361
e4ea13f
a3aa361
 
 
 
 
 
 
e4ea13f
a3aa361
 
 
 
 
e4ea13f
a3aa361
e4ea13f
a3aa361
 
 
 
e4ea13f
a3aa361
e4ea13f
 
 
 
 
 
 
a3aa361
e4ea13f
 
 
a3aa361
 
 
 
e4ea13f
 
 
 
 
 
 
a3aa361
e4ea13f
 
 
a3aa361
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
---
pipeline_tag: text-generation
inference: false
library_name: llama.cpp
license: cc-by-nc-sa-4.0
license_name: creative-commons
license_link: https://creativecommons.org/licenses/by-nc-sa/4.0/deed.en
language:
    - en
tags:
    - text-generation
    - artificial-intelligence
    - not-for-all-audiences
---

## Valerie v0.1 Model Card

## Overview

Valerie v0.1 is a custom language model created using `llama.cpp` (commit: 532c173) with a context length of 256 tokens, embedding length of 256, 8 heads, and 16 layers. This model was pretrained on a dataset consisting of [female V's](https://cyberpunk.fandom.com/wiki/V_(character)) dialog from [Cyberpunk 2077](https://cyberpunk.fandom.com/wiki/Cyberpunk_Wiki), extracted using the [Voice Over Subtitle Map](https://www.nexusmods.com/cyberpunk2077/mods/2045) mod.

The `ggml-valerie-v0.1-256x32-f32-LATEST.gguf` release represents a single pass through all 51443 samples, completing one iteration over the entire dataset, and took approximately 3 hours for training.

## Model Information

| Model name              | Adam iteration | Model filename                           | Vocabulary size |
| ----------------------- | -------------- | ---------------------------------------- | --------------- |
| Valerie v0.1 Checkpoint | 950            | chk-valerie-v0.1-256x32-950.gguf         | 32,000          |
| Valerie v0.1 Model      | 1700           | ggml-valerie-v0.1-256x32-f32-LATEST.gguf | 32,000          |

### Files and versions

-   ggml-vocab-mistral.gguf: Extracted Mistral 7B model vocabulary
-   ggml-valerie-v0.1-256x32-f32-950.gguf: The pretrained model checkpoint version 950.
-   ggml-valerie-v0.1-256x32-f32-LATEST.gguf: The latest pretrained model checkpoint.

## Settings

-   Vocabulary size: 32,000
-   Context length: 256 tokens
-   Embedding length: 256
-   Heads: 8
-   Layers: 16
-   Batch size: 32
-   Seed: 1
-   Saved checkpoint every 50 iterations

## Usage

To use Valerie v0.1, follow these steps:

1. Clone the `llama.cpp` library

```sh
git clone https://github.com/ggerganov/llama.cpp
```

Reference the `llama.cpp` [README.md](https://github.com/ggerganov/llama.cpp/blob/master/README.md) for more information about building. You can build using raw CPU or even OpenBLAS. CUDA, ROCm, Vulkan, and other backends are also available.

Arch Linux Example:

```sh
# CPU build using BLAS backend on Arch Linux
sudo pacman -S openblas openblas64
make LLAMA_OPENBLAS=1
```

2. Download the latest model.

```sh
wget https://huggingface.co/teleprint-me/cyberpunk-valerie-v0.1/resolve/main/ggml-valerie-v0.1-256x32-f32-LATEST.gguf?download=true -O 
ggml-valerie-v0.1-256x32-f32-LATEST.gguf
```

This will download the latest available base model.

3. Perform inference with the latest model checkpoint using the provided command:

```sh
./main -m models/valerie/v0.1/ggml-valerie-v0.1-256x32-f32-LATEST.gguf --color -e -s 1 -c 4096
```

## Citations

When using Valerie v0.1 in your research, please remember to cite the following:

-   Aberrio. (2024). Valerie v0.1: A custom language model for female V's dialog from Cyberpunk 2077. <https://huggingface.co/teleprint-me/cyberpunk-valerie-v0.1>
-   GGML team. (2023). `llama.cpp` version `532c173`. Georgi Gerganov Machine Learning Library. <https://github.com/ggerganov/llama.cpp>
-   MistralAI (2023). Extracted sentencepiece model vocabulary: <https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2>
-   julieisdead (2021). Voice Over Subtitle Map: Two files that contain the IDs for, Voice Over files the other Subtitles. <https://www.nexusmods.com/cyberpunk2077/mods/2045>

### Contributors

Austin Berrio (teleprint-me) - Created and trained Valerie v0.1 using `llama.cpp` and the referenced dataset.

### Community

Join the community of fellow language model enthusiasts and researchers by sharing your knowledge, asking questions, and collaborating on projects related to creating custom models using `llama.cpp`.

### License

Valerie v0.1 is released under the CC-BY-NC-SA-3.0 license. You are free to use, modify, and redistribute this model for non-commercial purposes, but you must provide attribution to the original authors and release any derived works under the same license.