JarvisX50M
JarvisX50M is a 50M parameter language model built from scratch with the JarvisXCore architecture, designed to be lean, fast, and factual. Trained on WikiText-2, it aims to rival GPT-2 in accuracy (~85-95% on factual Q&A) while being ~5x faster and ~4x lighter. India's first custom AI, crafted for budget devices! ๐ฎ๐ณ
Model Details
- Parameters: ~50M
- Architecture: JarvisXCore (custom multi-head attention, GELU, optimized FFNs)
- Training Data: WikiText-2 (~2M tokens)
- Vocabulary Size: 50,257 (GPT-2 tokenizer)
- Context Length: 256 tokens
- Training: 3 epochs, ~2,800 steps/epoch, CPU/GPU
- Final Loss: ~0.0010
Try It Out!
Chat with JarvisX50M below (powered by Gradio):
Usage
import torch
from model import JarvisX50M, Config
from transformers import AutoTokenizer
config = Config()
model = JarvisX50M(config)
model.load_state_dict(torch.load("pytorch_model.bin"))
tokenizer = AutoTokenizer.from_pretrained("vihaan134354/JarvisX50M")
model.eval()
Chat
Run the chat script:
python chat_jarvisx50m.py
Train
Retrain with:
python train_jarvisx50m.py
Example
Prompt: "Tell me about Rome"
Output: "Rome's empire shaped law, architecture, and culture for centuries."
Note
Casual prompts (e.g., "What's up?") may need fine-tuning for better coherence due to WikiText-2 focus. Try factual questions for best results!
Author
Created by vihaan134354. Aiming to put India on the AI map! ๐
- Downloads last month
- 34
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support
HF Inference deployability: The model has no library tag.