turboderp commited on
Commit
b723172
·
verified ·
1 Parent(s): 3c2fd60

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +9 -0
README.md CHANGED
@@ -23,6 +23,15 @@ the same) and initializing it as follows:
23
  - every L3 token that decodes and re-encodes to multiple Qwen2 token is initialized with the mean of those embeddings
24
  - there are no L3 tokens that cannot be translated to one or more Qwen2 tokens (both vocabularies are complete).
25
 
 
 
 
 
 
 
 
 
 
26
  Swapping the vocabulary with the above method yields a mostly coherent but still very confused model. It especially
27
  struggles with numbers, and of course the embeddings for the Llama-3 control tokens do not have the significance they
28
  would in an instruct-tuned model.
 
23
  - every L3 token that decodes and re-encodes to multiple Qwen2 token is initialized with the mean of those embeddings
24
  - there are no L3 tokens that cannot be translated to one or more Qwen2 tokens (both vocabularies are complete).
25
 
26
+ ```python
27
+ for idx in range(target_vocab_size):
28
+ decode = tokenizer_target.decode(torch.tensor(idx, dtype = torch.long), decode_special_tokens = True)
29
+ encode = tokenizer_source.encode(decode, add_special_tokens = False, return_tensors = "pt")
30
+ new_emb[idx] = old_emb[encode.flatten()].mean(dim = 0)
31
+ new_head[idx] = old_head[encode.flatten()].mean(dim = 0)
32
+ ```
33
+ Full script is [here](https://huggingface.co/turboderp/Qwama-0.5B-Instruct/blob/main/vocab_transplant.py).
34
+
35
  Swapping the vocabulary with the above method yields a mostly coherent but still very confused model. It especially
36
  struggles with numbers, and of course the embeddings for the Llama-3 control tokens do not have the significance they
37
  would in an instruct-tuned model.