That Time I got Reincarnated as a Hugging Face Organization
community
AI & ML interests
LowRes animated waifus (✿◡‿◡)
Recent Activity
lowres's activity
not-lain
updated
a
Space
6 days ago
Post
2233
Happy New Year 2025 🤗
For the Huggingface community.
For the Huggingface community.
Post
1766
Welcome back,
Small Language Models Enthusiasts and GPU Poor oss enjoyers lets connect.
Just created an organization which main target is to have fun with smaller models tuneable on consumer range GPUs, feel free to join and lets have some fun, much love ;3
https://huggingface.co/SmolTuners
Small Language Models Enthusiasts and GPU Poor oss enjoyers lets connect.
Just created an organization which main target is to have fun with smaller models tuneable on consumer range GPUs, feel free to join and lets have some fun, much love ;3
https://huggingface.co/SmolTuners
Post
1607
spedrox-sac
updated
a
Space
about 1 month ago
Post
1591
p104-100s are beasts. 8 gigs of VRAM, 12 tok/s on qwen 14b at q4, and 18 tok/s on 7b at q6. best thing - 20 euros each.
https://furry.engineer/@cappuch/113500349547803802
https://furry.engineer/@cappuch/113500349547803802
louisbrulenaudet
posted
an
update
about 2 months ago
Post
1814
I’ve published a new dataset to simplify model merging 🤗
This dataset facilitates the search for compatible architectures for model merging with @arcee_ai’s mergekit, streamlining the automation of high-performance merge searches 📖
Dataset : louisbrulenaudet/mergekit-configs
This dataset facilitates the search for compatible architectures for model merging with @arcee_ai’s mergekit, streamlining the automation of high-performance merge searches 📖
Dataset : louisbrulenaudet/mergekit-configs
Post
3522
🙋🏻♂️hey there folks,
periodic reminder : if you are experiencing ⚠️500 errors ⚠️ or ⚠️ abnormal
we have a thread 👉🏻 https://discord.com/channels/879548962464493619/1295847667515129877
if you can record the problem and share it there , or on the forums in your own post , please dont be shy because i'm not sure but i do think it helps 🤗🤗🤗
periodic reminder : if you are experiencing ⚠️500 errors ⚠️ or ⚠️ abnormal
spaces
behavior on load or launch ⚠️we have a thread 👉🏻 https://discord.com/channels/879548962464493619/1295847667515129877
if you can record the problem and share it there , or on the forums in your own post , please dont be shy because i'm not sure but i do think it helps 🤗🤗🤗
louisbrulenaudet
posted
an
update
3 months ago
Post
1203
Introducing Lemone-router, a series of classification models designed to produce an optimal multi-agent system for different branches of tax law.
Trained on a base of 49k lines comprising a set of synthetic questions generated by GPT-4 Turbo and Llama 3.1 70B, which have been further refined through evol-instruction tuning and manual curation and authority documents, these models are based on an 8-category decomposition of the classification scheme derived from the Bulletin officiel des finances publiques - impôts :
It achieves the following results on the evaluation set:
- Loss: 0.4734
- Accuracy: 0.9191
Link to the collection: louisbrulenaudet/lemone-router-671cce21d6410f3570514762
Trained on a base of 49k lines comprising a set of synthetic questions generated by GPT-4 Turbo and Llama 3.1 70B, which have been further refined through evol-instruction tuning and manual curation and authority documents, these models are based on an 8-category decomposition of the classification scheme derived from the Bulletin officiel des finances publiques - impôts :
label2id = {
"Bénéfices professionnels": 0,
"Contrôle et contentieux": 1,
"Dispositifs transversaux": 2,
"Fiscalité des entreprises": 3,
"Patrimoine et enregistrement": 4,
"Revenus particuliers": 5,
"Revenus patrimoniaux": 6,
"Taxes sur la consommation": 7
}
id2label = {
0: "Bénéfices professionnels",
1: "Contrôle et contentieux",
2: "Dispositifs transversaux",
3: "Fiscalité des entreprises",
4: "Patrimoine et enregistrement",
5: "Revenus particuliers",
6: "Revenus patrimoniaux",
7: "Taxes sur la consommation"
}
It achieves the following results on the evaluation set:
- Loss: 0.4734
- Accuracy: 0.9191
Link to the collection: louisbrulenaudet/lemone-router-671cce21d6410f3570514762
Post
831
🙋🏻♂️ hey there folks ,
really enjoying sharing cool genomics and protein datasets on the hub these days , check out our cool new org : https://huggingface.co/seq-to-pheno
scroll down for the datasets, still figuring out how to optimize for discoverability , i do think on that part it will be better than zenodo[dot}org , it would be nice to write a tutorial about that and compare : we already have more downloads than most zenodo datasets from famous researchers !
really enjoying sharing cool genomics and protein datasets on the hub these days , check out our cool new org : https://huggingface.co/seq-to-pheno
scroll down for the datasets, still figuring out how to optimize for discoverability , i do think on that part it will be better than zenodo[dot}org , it would be nice to write a tutorial about that and compare : we already have more downloads than most zenodo datasets from famous researchers !
Post
1455
hey there folks,
twitter is aweful isnt it ? just getting into the habbit of using hf/posts for shares 🦙🦙
Tonic/on-device-granite-3.0-1b-a400m-instruct
new granite on device instruct model demo , hope you like it 🚀🚀
twitter is aweful isnt it ? just getting into the habbit of using hf/posts for shares 🦙🦙
Tonic/on-device-granite-3.0-1b-a400m-instruct
new granite on device instruct model demo , hope you like it 🚀🚀
Post
991
if you're encountering 500 errors on spaces that seem to work otherwise , kindly consider screenshotting and sharing the link here : https://discord.com/channels/879548962464493619/1295847667515129877
louisbrulenaudet
posted
an
update
3 months ago
Post
3117
🚨 I have $3,500 in Azure credits, including access to an H100 (96 Go), expiring on November 12, 2024.
I won’t be able to use it all myself, so I’m reaching out to the @huggingface community: Are there any open-source projets with data ready for some compute power?
Let’s collaborate and make the most of it together 🔗
I won’t be able to use it all myself, so I’m reaching out to the @huggingface community: Are there any open-source projets with data ready for some compute power?
Let’s collaborate and make the most of it together 🔗
Post
2740
🙋🏻♂️hey there folks ,
did you know that https://huggingface.co/lmms-lab released a new version of 🌋🌋Llava on thursday ? Now it has 🎥video understanding !
check it out 👇🏻
collection : lmms-lab/llava-video-661e86f5e8dabc3ff793c944
demo : Tonic/Llava-Video
did you know that https://huggingface.co/lmms-lab released a new version of 🌋🌋Llava on thursday ? Now it has 🎥video understanding !
check it out 👇🏻
collection : lmms-lab/llava-video-661e86f5e8dabc3ff793c944
demo : Tonic/Llava-Video
Post
1857
🙋🏻♂️ Hey there folks ,
🦎Salamandra release by @mvillegas and team
@BSC_CNS https://huggingface.co/BSC-LT is absolutely impressive so far !
perhaps the largest single training dataset of high quality text to date of 7.8 trillion tokens in 35 European languages and code.
the best part : the data was correctly licenced so it's actually future-proof!
the completions model is really creative and instruct fine tuned version is very good also.
now you can use such models for multi-lingual enterprise applications with further finetunes , long response generation, structured outputs (coding) also works.
check out 👇🏻
the collection : BSC-LT/salamandra-66fc171485944df79469043a
the repo : https://github.com/langtech-bsc/salamandra
7B-Instruct demo : Tonic/Salamandra-7B
🦎Salamandra release by @mvillegas and team
@BSC_CNS https://huggingface.co/BSC-LT is absolutely impressive so far !
perhaps the largest single training dataset of high quality text to date of 7.8 trillion tokens in 35 European languages and code.
the best part : the data was correctly licenced so it's actually future-proof!
the completions model is really creative and instruct fine tuned version is very good also.
now you can use such models for multi-lingual enterprise applications with further finetunes , long response generation, structured outputs (coding) also works.
check out 👇🏻
the collection : BSC-LT/salamandra-66fc171485944df79469043a
the repo : https://github.com/langtech-bsc/salamandra
7B-Instruct demo : Tonic/Salamandra-7B
louisbrulenaudet
posted
an
update
3 months ago
Post
2111
My biggest release of the year: a series of 7 specialized embedding models for information retrieval within tax documents, is now available for free on Hugging Face 🤗
These new models aim to offer an open source alternative for in-domain semantic search from large text corpora and will improve RAG systems and context addition for large language models.
Trained on more than 43 million tax tokens derived from semi-synthetic and raw-synthetic data, enriched by various methods (in particular MSFT's evol-instruct by @intfloat ), and corrected by humans, this project is the fruit of hundreds of hours of work and is the culmination of a global effort to open up legal technologies that has only just begun.
A big thank you to Microsoft for Startups for giving me access to state-of-the-art infrastructure to train these models, and to @julien-c , @clem 🤗, @thomwolf and the whole HF team for the inference endpoint API and the generous provision of Meta LLama-3.1-70B. Special thanks also to @tomaarsen for his invaluable advice on training embedding models and Loss functions ❤️
Models are available on my personal HF page, into the Lemone-embed collection: louisbrulenaudet/lemone-embed-66fdc24000df732b395df29b
These new models aim to offer an open source alternative for in-domain semantic search from large text corpora and will improve RAG systems and context addition for large language models.
Trained on more than 43 million tax tokens derived from semi-synthetic and raw-synthetic data, enriched by various methods (in particular MSFT's evol-instruct by @intfloat ), and corrected by humans, this project is the fruit of hundreds of hours of work and is the culmination of a global effort to open up legal technologies that has only just begun.
A big thank you to Microsoft for Startups for giving me access to state-of-the-art infrastructure to train these models, and to @julien-c , @clem 🤗, @thomwolf and the whole HF team for the inference endpoint API and the generous provision of Meta LLama-3.1-70B. Special thanks also to @tomaarsen for his invaluable advice on training embedding models and Loss functions ❤️
Models are available on my personal HF page, into the Lemone-embed collection: louisbrulenaudet/lemone-embed-66fdc24000df732b395df29b
Post
1733
@mlabonne
hey there 🙋🏻♂️ I kinda got obsessed with your great model , and i found the endpoint for it in lambda labs, but basically i got rate limited / banned for trying to make my DPO dataset project, i was wondering if you all had an open ai compatible solution for me to make a great "thinking" sft + dpo dataset with all the splits 🙏🏻🙏🏻 kinda desparate , it's true , but was looking forward to a nice write ups 🚀🚀🚀
Post
2325
Big Congrats on the BIG RELEASE by
@mlabonne
and team at https://huggingface.co/liquidai ...
testing it out now to make a dataset , i cant hardly wait... but one question 👇🏻 why / wen ? 😅🚀🚀
check out the blog post : https://www.liquid.ai/liquid-foundation-models
testing it out now to make a dataset , i cant hardly wait... but one question 👇🏻 why / wen ? 😅🚀🚀
check out the blog post : https://www.liquid.ai/liquid-foundation-models