exllama / README.md
pabl-o-ce
update gradio 5.5.0
ac9889d

A newer version of the Gradio SDK is available: 5.28.0

Upgrade
metadata
title: Exllama
emoji: 😽
colorFrom: purple
colorTo: indigo
sdk: gradio
sdk_version: 5.5.0
app_file: app.py
pinned: false
header: mini
fullWidth: true
license: apache-2.0
short_description: 'Chat: exllama v2'

Exllama Chat 😽

Open In Spaces Apache 2.0

A Gradio-based chat interface for ExLlamaV2, featuring Mistral-7B-Instruct-v0.3 and Llama-3-70B-Instruct models. Experience high-performance inference on consumer GPUs with Flash Attention support.

🌟 Features

  • 🚀 Powered by ExLlamaV2 inference library
  • 💨 Flash Attention support for optimized performance
  • 🎯 Supports multiple instruction-tuned models:
    • Mistral-7B-Instruct v0.3
    • Meta's Llama-3-70B-Instruct
  • ⚡ Dynamic text generation with adjustable parameters
  • 🎨 Clean, modern UI with dark mode support

🎮 Parameters

Customize your chat experience with these adjustable parameters:

  • System Message: Set the AI assistant's behavior and context
  • Max Tokens: Control response length (1-4096)
  • Temperature: Adjust response creativity (0.1-4.0)
  • Top-p: Fine-tune response diversity (0.1-1.0)
  • Top-k: Control vocabulary sampling (0-100)
  • Repetition Penalty: Prevent repetitive text (0.0-2.0)

🛠️ Technical Details

  • Framework: Gradio 5.5.0
  • Models: ExLlamaV2-compatible models
  • UI: Custom-themed interface with Gradio's Soft theme
  • Optimization: Flash Attention for improved performance

🔗 Links

📝 License

This project is licensed under the Apache 2.0 License - see the LICENSE file for details.

🙏 Acknowledgments


Made with ❤️ using ExLlamaV2 and Gradio