An Expirement of the recent Titans Architecture Model(Heavily Undertrained)

Note: this is just a fun/passion/hobby project nothing too serious, I don't think I will train anything further due to the lack of compute for serious training!

Training Details:

Parameters: 153.39 million parameters

Epochs: 5000

Dataset: HuggingFaceFW/fineweb-edu

subset: CC-MAIN-2016-26
split: train
total: 25%

Loss: 3.89053 (lowest_recorded_loss_model.safetensors)

Loss: 4.38535 (latest_model_most_trained.safetensors)

Code: titans-pytorch | version = "0.1.18"

Tokenizer: tigerbot-13b-chat-v2

Inference:

Note: To inference it you can follow these instructions and run the following code(tested and guaranteed to work in Colab):

pip install -q huggingface_hub titans-pytorch==0.1.18

git clone https://github.com/lucidrains/titans-pytorch.git
cd titans-pytorch

pip -q install .[examples]

code:

import os

from transformers import AutoTokenizer
from huggingface_hub import snapshot_download
from safetensors.torch import load_file

from titans_pytorch import MemoryAsContextTransformer

tokenizer = AutoTokenizer.from_pretrained("TigerResearch/tigerbot-13b-chat-v2")

# neural memory related

NEURAL_MEMORY_DEPTH = 2
NUM_PERSIST_MEM = 4
NUM_LONGTERM_MEM = 4
NEURAL_MEM_LAYERS = (2, 4)
NEURAL_MEM_GATE_ATTN_OUTPUT = True
NEURAL_MEM_MOMENTUM = True
WINDOW_SIZE = 32
NEURAL_MEM_SEGMENT_LEN = WINDOW_SIZE // 2 # set smaller for more granularity for learning rate / momentum etc
SLIDING_WINDOWS = True
STORE_ATTN_POOL_CHUNKS = True # whether to use attention pooling for chunk derived momentum, per-layer lr mod, decay
KV_RECON_LOSS_WEIGHT = 0.
LEARNED_MEM_MODEL_WEIGHTS = True

# perf related

USE_ACCELERATED_SCAN = True
USE_FLEX_ATTN = True

model = MemoryAsContextTransformer(
    num_tokens=tokenizer.vocab_size,
    dim=768, #dim = 384,
    depth = 8,
    segment_len = WINDOW_SIZE,
    num_persist_mem_tokens = NUM_PERSIST_MEM,
    num_longterm_mem_tokens = NUM_LONGTERM_MEM,
    neural_memory_layers = NEURAL_MEM_LAYERS,
    neural_memory_segment_len = NEURAL_MEM_SEGMENT_LEN,
    neural_mem_gate_attn_output = NEURAL_MEM_GATE_ATTN_OUTPUT,
    aux_kv_recon_loss_weight = KV_RECON_LOSS_WEIGHT,
    use_flex_attn = USE_FLEX_ATTN,
    sliding_window_attn = SLIDING_WINDOWS,
    neural_memory_kwargs = dict(
        dim_head = 64,
        heads = 4,
        attn_pool_chunks = STORE_ATTN_POOL_CHUNKS,
        momentum = NEURAL_MEM_MOMENTUM,
        use_accelerated_scan = USE_ACCELERATED_SCAN,
        learned_mem_model_weights = LEARNED_MEM_MODEL_WEIGHTS,
        default_model_kwargs = dict(
            depth = NEURAL_MEMORY_DEPTH,
        )
    )
).cuda()

snapshot_download(repo_id="Lyte/Titans-MAC-test", local_dir="repo")

def load_pretrained_safetensors(model, path, model_name):
    """
    Loads the model's state dictionary using safetensors.
    """
    filepath = os.path.join(path, f'{model_name}')
    if not os.path.exists(filepath):
        raise FileNotFoundError(f"Model file not found at {filepath}")
    
    state_dict = load_file(filepath)
    model.load_state_dict(state_dict)
    print(f"Model loaded from {filepath}")
    return model

model = load_pretrained_safetensors(model, 'repo', 'latest_model_most_trained.safetensors')

def inference(prompt_text, seq_len=512):
    model.eval()
    
    # Convert prompt text to input tensor
    inp = tokenizer.encode(prompt_text, truncation=True, max_length=seq_len, return_tensors="pt").cuda()

    # Generate text
    sample = model.sample(inp, seq_len)
    output_str = tokenizer.decode(sample[0])# decode_tokens(sample[0])
    return output_str

# Example usage:
prompt = "researchers have found"
generated_text = inference(prompt)
print(generated_text)

prompt:

researchers have found

model output:

that 38-97 children of age have a different way of thinking.
A few of the people who are in a field of life are being given the same family as children, and are also in a new family of school.
There are two reasons for this that I am the first-sever.
There is a good chance that you have to be in the classroom.
In my book, I think I've just used it as a child who can help me and I will find my first school for my children and children to know what they will.
There are several ways to do with their own, but I'm not going to know the story that they are not.
Today, they'll get up to you to learn the first, and they are doing the best to come with.
If you think it is a big problem, then you will want to see them and have you on the market, the more you are in.
You don't know the most about your home, or you need to learn.
When I read the question, you know how you do that.
The more you'd like to go to a number, it is the most important thing.
It's an issue of how to understand how we do it.
It's just the way to the question you will give us all the time we're doing, so it's not going to be a lot of it.
If you have a good idea about the first, and you can't get you back to it and have to be sure you are. You need to know where to read the "Lord of the Aesth."
When you can take a new person who has not had to go back to the way, the children will get into the game.
How the family will help you know the number of people in the area.
Ask to your child’s school.
Most children will not be able to find and follow them all about.
These children have a few questions:
There’s a lot of good things you need.
Farmers and teachers in the classroom will help them to take in order to take care of their own.
Teachers are also the most common, the students, their children and children, who have no other people in their children, but they are all their children and the students will be in the school.
"

Lyte
/

Titans-MAC-test

An Expirement of the recent Titans Architecture Model(Heavily Undertrained)

Note: this is just a fun/passion/hobby project nothing too serious, I don't think I will train anything further due to the lack of compute for serious training!

Training Details:

Inference:

Dataset used to train Lyte/Titans-MAC-test