Chess GPT-4.5M

Overview

Chess GPT-4.5M is a generative language model trained specifically to generate chess moves and analyze chess games. The model is based on the GPT architecture and was trained with a custom 32-token vocabulary reflecting key chess symbols and notations.

Model Details

  • Architecture: GPT-based language model (GPT2LMHeadModel)
  • Parameters: Approximately 4.5M parameters
  • Layers: 8 transformer layers
  • Heads: 4 attention heads per layer
  • Embedding Dimension: 256
  • Training Sequence Length: 1024 tokens per chess game
  • Vocabulary: 32 tokens (custom vocabulary)

Training Data

The model was trained on tokenized chess game data prepared from the Lichess dataset. The preparation process involved:

  • Tokenizing chess games using a custom 32-token vocabulary.
  • Creating binary training files (train.bin and val.bin).
  • Saving vocabulary information to meta.pkl.

Training Configuration

The training configuration, found in config/mac_chess_gpt.py, includes:

  • Dataset: lichess_hf_dataset
  • Batch Size: 2 (optimized for Mac's memory constraints)
  • Block Size: 1023 (1024 including the positional embedding)
  • Learning Rate: 3e-4
  • Max Iterations: 140,000
  • Device: 'mps' (Mac-specific settings)
  • Other Settings: No dropout and compile set to False for Mac compatibility

How to Use

Generating Chess Moves

After fine-tuning, use the generation script to sample chess moves. Example commands: bash Sample from the model without a provided prompt: python sample.py --out_dir=out-chess-mac Generate a chess game sequence starting with a custom prompt: python sample.py --out_dir=out-chess-mac --start=";1.e4"

Loading the Model in Transformers

Once the model card and converted model files are pushed to the Hugging Face Hub, you can load the model using:

python from transformers import GPT2LMHeadModel, GPT2Tokenizer model = GPT2LMHeadModel.from_pretrained("your-hf-username/chess-gpt-4.5M") tokenizer = GPT2Tokenizer.from_pretrained("your-hf-username/chess-gpt-4.5M")

Note: The tokenizer uses a custom vocabulary provided in vocab.json.

Intended Use

The model is intended for:

  • Generating chess move sequences.
  • Assisting in automated chess analysis.
  • Educational purposes in understanding language model training on specialized domains.

Limitations

  • The model is trained on a relatively small (4.5M parameter) architecture and may not capture extremely complex chess strategies.
  • It is specialized on chess move generation and may not generalize to standard language tasks.

Training Process Summary

  1. Data Preparation: Tokenized the Lichess chess game dataset using a 32-token vocabulary.
  2. Model Training: Used custom training configurations specified in config/mac_chess_gpt.py.
  3. Model Conversion: Converted added checkpoint from out-chess-mac/ckpt.pt into a Hugging Face compatible format with convert_to_hf.py.
  4. Repository Setup: Pushed the converted model files (including custom tokenizer vocab) to the Hugging Face Hub with Git LFS handling large files.

Acknowledgements

This model was developed following inspiration from GPT-2 and adapted for the chess domain.


Downloads last month
12
Safetensors
Model size
6.59M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for derickio/chess-gpt-4.5M

Finetuned
(1508)
this model