I need some advices

#8
by thebryanalvarado - opened
MLX Community org
โ€ข
edited 5 days ago

Hello comunity I run the next sentences but the model too long time for trained:
This is for download the dataset and put in json format for the model:

from datasets import load_dataset
import json
folder="data/"
system_message = """You are an text to SQL query translator. Users will ask you questions in English and you will generate a SQL query based on the provided SCHEMA.
SCHEMA:
{schema}"""
def create_conversation(sample):
  return {
    "messages": [
      {"role": "system", "content": system_message.format(schema=sample["context"])},
      {"role": "user", "content": sample["question"]},
      {"role": "assistant", "content": sample["answer"]}
    ]
  }
dataset = load_dataset("b-mc2/sql-create-context", split="train")
dataset = dataset.shuffle().select(range(150))
dataset = dataset.map(create_conversation, remove_columns=dataset.features,batched=False)
dataset = dataset.train_test_split(test_size=50/150)
dataset_test_valid = dataset['test'].train_test_split(0.5)
print(dataset["train"][45]["messages"])
dataset["train"].to_json(folder + "train.jsonl", orient="records")
dataset_test_valid["train"].to_json(folder + "test.jsonl", orient="records")
dataset_test_valid["test"].to_json(folder + "valid.jsonl", orient="records")
folder_input='./data/'
folder_output='./data-text/'
names=['test', 'train', 'valid']
for name in names:
    with open(folder_input + name + '.jsonl', 'r') as json_file:
        json_list = list(json_file)
    with open(folder_output + name + '.jsonl', 'w') as outfile:
        for json_str in json_list:
            result = json.loads(json_str)
            message = result['messages']
            entry = {
                "text": message[0]['content'] + '\nUser:' + message[1]['content'] + '\nAssistant:' + message[2]['content']
            }
            json.dump(entry, outfile)
            outfile.write('\n')

And this sintax is for train the LLM:

python3 -m mlx_lm.lora --model mistralai/Mistral-7B-Instruct-v0.2 --train --data ./data-text --iters 1000

Please i need advices for run more quickly the part of train a llm and I used a macbook pro with chip m4.

Sign up or log in to comment