Spaces:
Running
Error running InstructLab Training Python Library on Kaggle
Hello instructlab,
I am very interested in using the instructlab for my project, and have tried to use the library https://pypi.org/project/instructlab-training/ on Kaggle and following directions from the documentation.
I have tried to use the library in Kaggle on both the CPU and the GPU, with no success. Can someone answer the question: IS IT POSSIBLE TO RUN https://pypi.org/project/instructlab-training/ on KAGGLE?
-----------------------DETAILS----------------------------------------------
- I have successfully installed the library in the Kaggle workbook:
!pip install instructlab-training
- I have successfully imported the necessary tools without error:
from instructlab.training import run_training, TrainingArgs, TorchrunArgs
- I am getting the ERRORs that neither the 'torchrun_args' or 'training_args' are valid 'run_training' inputs:
Run the training
run_training(
torchrun_args=TorchrunArgs(
nnodes=1,
nproc_per_node=1,
node_rank=0, # Node rank
rdzv_id=0, # Changed rdzv_id to an integer
rdzv_endpoint="localhost:29500", # Endpoint
),
training_args=training_args
)
TypeError Traceback (most recent call last)
Cell In[7], line 46
43 os.makedirs(training_args.data_output_dir, exist_ok=True)
45 # Run the training
---> 46 run_training(
47 torchrun_args=TorchrunArgs(
48 nnodes=1,
49 nproc_per_node=1,
50 node_rank=0, # Node rank
51 rdzv_id=0, # Changed rdzv_id to an integer
52 rdzv_endpoint="localhost:29500", # Endpoint
53 ),
54 training_args=training_args
55 )
57 print("Training completed successfully.")
TypeError: run_training() got an unexpected keyword argument 'torchrun_args'
Here is my basic test code to try to utilize the Python Library and Train an IBM model using instructLab:
-------------------------------------------------CODE--------------------------------------------------------------------------------------------------------------
!pip install instructlab-training
import json
import os
from instructlab.training import run_training, TrainingArgs, TorchrunArgs
Step 1: Create a small hardcoded synthetic dataset in JSONL format
def create_synthetic_data(output_file="dataset.jsonl"):
examples = [
{"instruction": "Translate 'Hello' to Spanish.", "response": "Hola"},
{"instruction": "What is the capital of France?", "response": "Paris"},
{"instruction": "Solve 5 + 3.", "response": "8"},
{"instruction": "Provide a synonym for 'happy'.", "response": "Joyful"},
{"instruction": "List three primary colors.", "response": "Red, Blue, Yellow"}
]
with open(output_file, 'w') as f:
for example in examples:
f.write(json.dumps(example) + '\n')
print(f"Synthetic dataset created at {output_file}")
Generate dataset
create_synthetic_data()
Step 2: Define training arguments with all required fields
training_args = TrainingArgs(
model_path="ibm-granite/granite-3.0-1b-a400m-instruct",
data_path="dataset.jsonl",
ckpt_output_dir="data/saved_checkpoints",
data_output_dir="data/outputs",
max_seq_len=512,
max_batch_len=64, # Added max_batch_len
num_epochs=1,
effective_batch_size=8,
save_samples=1000, # Added save_samples
learning_rate=2e-6,
warmup_steps=100, # Added warmup_steps
is_padding_free=True, # Added is_padding_free
random_seed=42,
)
Ensure output directories exist
os.makedirs(training_args.ckpt_output_dir, exist_ok=True)
os.makedirs(training_args.data_output_dir, exist_ok=True)
Run the training
run_training(
torchrun_args=TorchrunArgs(
nnodes=1,
nproc_per_node=1,
node_rank=0, # Node rank
rdzv_id=0, # Changed rdzv_id to an integer
rdzv_endpoint="localhost:29500", # Endpoint
),
training_args=training_args
)
print("Training test complet.")
THANK YOU! JEFF