I have this small utility: no_more_typo It is running in the background and able to call the LLM model to update the text on the clipboard. I think it would be ideal to fix typos and syntax. I have just added the option to use custom prompt templates to perform different tasks.
Repurposed my older AI workstation to a homelab server, it has received 2xV100 + 1xP40 I can reach huge 210k token context size with MegaBeam-Mistral-7B-512k-GGUF ~70+tok/s, or run Llama-3.1-Nemotron-70B-Instruct-HF-GGUF with 50k Context ~10tok/s (V100 only 40k ctx and 15tok/s). Also able to Lora finetune with similar performace as an RTX3090. It moved to the garage to no complaints for the noise from the family. Will move to a Rack soon :D
Some time ago, I built a predictive LLM router that routes chat requests between small and large LLM models based on prompt classification. It dynamically selects the most suitable model depending on the complexity of the user input, ensuring optimal performance while maintaining conversation context. I also fine-tuned a RoBERTa model to use with the package, but you can plug and play any classifier of your choice.