JarvisX50M

JarvisX50M is a 50M parameter language model built from scratch with the JarvisXCore architecture, designed to be lean, fast, and factual. Trained on WikiText-2, it aims to rival GPT-2 in accuracy (~85-95% on factual Q&A) while being ~5x faster and ~4x lighter. India's first custom AI, crafted for budget devices! ๐Ÿ‡ฎ๐Ÿ‡ณ

Model Details

  • Parameters: ~50M
  • Architecture: JarvisXCore (custom multi-head attention, GELU, optimized FFNs)
  • Training Data: WikiText-2 (~2M tokens)
  • Vocabulary Size: 50,257 (GPT-2 tokenizer)
  • Context Length: 256 tokens
  • Training: 3 epochs, ~2,800 steps/epoch, CPU/GPU
  • Final Loss: ~0.0010

Try It Out!

Chat with JarvisX50M below (powered by Gradio):

Usage

import torch
from model import JarvisX50M, Config
from transformers import AutoTokenizer

config = Config()
model = JarvisX50M(config)
model.load_state_dict(torch.load("pytorch_model.bin"))
tokenizer = AutoTokenizer.from_pretrained("vihaan134354/JarvisX50M")
model.eval()

Chat

Run the chat script:

python chat_jarvisx50m.py

Train

Retrain with:

python train_jarvisx50m.py

Example

Prompt: "Tell me about Rome"
Output: "Rome's empire shaped law, architecture, and culture for centuries."

Note

Casual prompts (e.g., "What's up?") may need fine-tuning for better coherence due to WikiText-2 focus. Try factual questions for best results!

Author

Created by vihaan134354. Aiming to put India on the AI map! ๐Ÿš€


Downloads last month
34
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Space using vihaan134354/JarvisX50M 1