Spaces:
Running
on
Zero
Running
on
Zero
A newer version of the Gradio SDK is available:
5.28.0
metadata
title: Exllama
emoji: 😽
colorFrom: purple
colorTo: indigo
sdk: gradio
sdk_version: 5.5.0
app_file: app.py
pinned: false
header: mini
fullWidth: true
license: apache-2.0
short_description: 'Chat: exllama v2'
Exllama Chat 😽
A Gradio-based chat interface for ExLlamaV2, featuring Mistral-7B-Instruct-v0.3 and Llama-3-70B-Instruct models. Experience high-performance inference on consumer GPUs with Flash Attention support.
🌟 Features
- 🚀 Powered by ExLlamaV2 inference library
- 💨 Flash Attention support for optimized performance
- 🎯 Supports multiple instruction-tuned models:
- Mistral-7B-Instruct v0.3
- Meta's Llama-3-70B-Instruct
- ⚡ Dynamic text generation with adjustable parameters
- 🎨 Clean, modern UI with dark mode support
🎮 Parameters
Customize your chat experience with these adjustable parameters:
- System Message: Set the AI assistant's behavior and context
- Max Tokens: Control response length (1-4096)
- Temperature: Adjust response creativity (0.1-4.0)
- Top-p: Fine-tune response diversity (0.1-1.0)
- Top-k: Control vocabulary sampling (0-100)
- Repetition Penalty: Prevent repetitive text (0.0-2.0)
🛠️ Technical Details
- Framework: Gradio 5.5.0
- Models: ExLlamaV2-compatible models
- UI: Custom-themed interface with Gradio's Soft theme
- Optimization: Flash Attention for improved performance
🔗 Links
📝 License
This project is licensed under the Apache 2.0 License - see the LICENSE file for details.
🙏 Acknowledgments
- ExLlamaV2 for the core inference library
- Hugging Face for hosting and model distribution
- Gradio for the web interface framework
Made with ❤️ using ExLlamaV2 and Gradio