pabl-o-ce commited on
Commit
e539845
·
1 Parent(s): fc9eab8

docs: better readme

Browse files
Files changed (1) hide show
  1. README.md +54 -1
README.md CHANGED
@@ -13,4 +13,57 @@ license: apache-2.0
13
  short_description: 'Chat: exllama v2'
14
  ---
15
 
16
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
13
  short_description: 'Chat: exllama v2'
14
  ---
15
 
16
+ # Exllama Chat 😽
17
+
18
+ [![Open In Spaces](https://img.shields.io/badge/🤗-Open%20in%20Spaces-blue.svg)](https://huggingface.co/spaces/pabloce/exllama)
19
+ [![Apache 2.0](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](LICENSE)
20
+
21
+ A Gradio-based chat interface for ExLlamaV2, featuring Mistral-7B-Instruct-v0.3 and Llama-3-70B-Instruct models. Experience high-performance inference on consumer GPUs with Flash Attention support.
22
+
23
+ ## 🌟 Features
24
+
25
+ - 🚀 Powered by ExLlamaV2 inference library
26
+ - 💨 Flash Attention support for optimized performance
27
+ - 🎯 Supports multiple instruction-tuned models:
28
+ - Mistral-7B-Instruct v0.3
29
+ - Meta's Llama-3-70B-Instruct
30
+ - ⚡ Dynamic text generation with adjustable parameters
31
+ - 🎨 Clean, modern UI with dark mode support
32
+
33
+ ## 🎮 Parameters
34
+
35
+ Customize your chat experience with these adjustable parameters:
36
+
37
+ - **System Message**: Set the AI assistant's behavior and context
38
+ - **Max Tokens**: Control response length (1-4096)
39
+ - **Temperature**: Adjust response creativity (0.1-4.0)
40
+ - **Top-p**: Fine-tune response diversity (0.1-1.0)
41
+ - **Top-k**: Control vocabulary sampling (0-100)
42
+ - **Repetition Penalty**: Prevent repetitive text (0.0-2.0)
43
+
44
+ ## 🛠️ Technical Details
45
+
46
+ - **Framework**: Gradio 5.4.0
47
+ - **Models**: ExLlamaV2-compatible models
48
+ - **UI**: Custom-themed interface with Gradio's Soft theme
49
+ - **Optimization**: Flash Attention for improved performance
50
+
51
+ ## 🔗 Links
52
+
53
+ - [Try it on Hugging Face Spaces](https://huggingface.co/spaces/pabloce/exllama)
54
+ - [ExLlamaV2 GitHub Repository](https://github.com/turboderp/exllamav2)
55
+ - [Join our Discord](https://discord.gg/gmVgCk6X2x)
56
+
57
+ ## 📝 License
58
+
59
+ This project is licensed under the Apache 2.0 License - see the [LICENSE](LICENSE) file for details.
60
+
61
+ ## 🙏 Acknowledgments
62
+
63
+ - [ExLlamaV2](https://github.com/turboderp/exllamav2) for the core inference library
64
+ - [Hugging Face](https://huggingface.co/) for hosting and model distribution
65
+ - [Gradio](https://gradio.app/) for the web interface framework
66
+
67
+ ---
68
+
69
+ Made with ❤️ using ExLlamaV2 and Gradio