Spaces:
Running
on
Zero
Running
on
Zero
pabl-o-ce
commited on
Commit
·
e539845
1
Parent(s):
fc9eab8
docs: better readme
Browse files
README.md
CHANGED
@@ -13,4 +13,57 @@ license: apache-2.0
|
|
13 |
short_description: 'Chat: exllama v2'
|
14 |
---
|
15 |
|
16 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
13 |
short_description: 'Chat: exllama v2'
|
14 |
---
|
15 |
|
16 |
+
# Exllama Chat 😽
|
17 |
+
|
18 |
+
[](https://huggingface.co/spaces/pabloce/exllama)
|
19 |
+
[](LICENSE)
|
20 |
+
|
21 |
+
A Gradio-based chat interface for ExLlamaV2, featuring Mistral-7B-Instruct-v0.3 and Llama-3-70B-Instruct models. Experience high-performance inference on consumer GPUs with Flash Attention support.
|
22 |
+
|
23 |
+
## 🌟 Features
|
24 |
+
|
25 |
+
- 🚀 Powered by ExLlamaV2 inference library
|
26 |
+
- 💨 Flash Attention support for optimized performance
|
27 |
+
- 🎯 Supports multiple instruction-tuned models:
|
28 |
+
- Mistral-7B-Instruct v0.3
|
29 |
+
- Meta's Llama-3-70B-Instruct
|
30 |
+
- ⚡ Dynamic text generation with adjustable parameters
|
31 |
+
- 🎨 Clean, modern UI with dark mode support
|
32 |
+
|
33 |
+
## 🎮 Parameters
|
34 |
+
|
35 |
+
Customize your chat experience with these adjustable parameters:
|
36 |
+
|
37 |
+
- **System Message**: Set the AI assistant's behavior and context
|
38 |
+
- **Max Tokens**: Control response length (1-4096)
|
39 |
+
- **Temperature**: Adjust response creativity (0.1-4.0)
|
40 |
+
- **Top-p**: Fine-tune response diversity (0.1-1.0)
|
41 |
+
- **Top-k**: Control vocabulary sampling (0-100)
|
42 |
+
- **Repetition Penalty**: Prevent repetitive text (0.0-2.0)
|
43 |
+
|
44 |
+
## 🛠️ Technical Details
|
45 |
+
|
46 |
+
- **Framework**: Gradio 5.4.0
|
47 |
+
- **Models**: ExLlamaV2-compatible models
|
48 |
+
- **UI**: Custom-themed interface with Gradio's Soft theme
|
49 |
+
- **Optimization**: Flash Attention for improved performance
|
50 |
+
|
51 |
+
## 🔗 Links
|
52 |
+
|
53 |
+
- [Try it on Hugging Face Spaces](https://huggingface.co/spaces/pabloce/exllama)
|
54 |
+
- [ExLlamaV2 GitHub Repository](https://github.com/turboderp/exllamav2)
|
55 |
+
- [Join our Discord](https://discord.gg/gmVgCk6X2x)
|
56 |
+
|
57 |
+
## 📝 License
|
58 |
+
|
59 |
+
This project is licensed under the Apache 2.0 License - see the [LICENSE](LICENSE) file for details.
|
60 |
+
|
61 |
+
## 🙏 Acknowledgments
|
62 |
+
|
63 |
+
- [ExLlamaV2](https://github.com/turboderp/exllamav2) for the core inference library
|
64 |
+
- [Hugging Face](https://huggingface.co/) for hosting and model distribution
|
65 |
+
- [Gradio](https://gradio.app/) for the web interface framework
|
66 |
+
|
67 |
+
---
|
68 |
+
|
69 |
+
Made with ❤️ using ExLlamaV2 and Gradio
|