JarvisX50M

JarvisX50M is a 50M parameter language model built from scratch with the JarvisXCore architecture, designed to be lean, fast, and factual. Trained on WikiText-2, it aims to rival GPT-2 in accuracy (~85-95% on factual Q&A) while being ~5x faster and ~4x lighter. India's first custom AI, crafted for budget devices! 🇮🇳

Model Details

Parameters: ~50M
Architecture: JarvisXCore (custom multi-head attention, GELU, optimized FFNs)
Training Data: WikiText-2 (~2M tokens)
Vocabulary Size: 50,257 (GPT-2 tokenizer)
Context Length: 256 tokens
Training: 3 epochs, ~2,800 steps/epoch, CPU/GPU
Final Loss: ~0.0010

Try It Out!

Chat with JarvisX50M below (powered by Gradio):

Usage

import torch
from model import JarvisX50M, Config
from transformers import AutoTokenizer

config = Config()
model = JarvisX50M(config)
model.load_state_dict(torch.load("pytorch_model.bin"))
tokenizer = AutoTokenizer.from_pretrained("vihaan134354/JarvisX50M")
model.eval()

Chat

Run the chat script:

python chat_jarvisx50m.py

Train

Retrain with:

python train_jarvisx50m.py

Example

Prompt: "Tell me about Rome"
Output: "Rome's empire shaped law, architecture, and culture for centuries."

Note

Casual prompts (e.g., "What's up?") may need fine-tuning for better coherence due to WikiText-2 focus. Try factual questions for best results!

Author

Created by vihaan134354. Aiming to put India on the AI map! 🚀

vihaan134354
/

JarvisX50M