Diwank Tomer's picture

Diwank Tomer PRO

diwank

AI & ML interests

None yet

Recent Activity

updated a collection about 6 hours ago
Audio
liked a model about 6 hours ago
amphion/Vevo1.5
updated a collection about 15 hours ago
Audio
View all activity

Organizations

Top Secret Org's profile picture Julep AI's profile picture Julep Archive's profile picture Social Post Explorers's profile picture SNBTech's profile picture julep-x-kupid's profile picture Hugging Face Discord Community's profile picture

diwank's activity

replied to their post 7 days ago
posted an update 17 days ago
view post
Post
1280
Excited to announce *Open Responses* – a self-hosted alternative to OpenAI's new _Responses API_ that you can run locally, and use with ANY LLM model / provider and not just with OpenAI Responses API. What's more is that this is also compatible with their agents-sdk so everything just works out of the box!

To try it out, just run npx -y open-responses init (or uvx) and that's it! :)

Would love feedback and support for adding local HF models, @akhaliq @bartowski @prithivMLmods @julien-c @clefourrier @philschmid

We’d love feedback from the Hugging Face community on how it integrates with your pipelines (support for Hugging Face models landing soon!). Let’s push open-source AI forward together!

Docs:
https://docs.julep.ai/responses/quickstart

Repo:
https://github.com/julep-ai/open-responses

agents-sdk:
https://platform.openai.com/docs/guides/agents
  • 3 replies
·
reacted to reach-vb's post with 🔥 6 months ago
view post
Post
5561
Multimodal Ichigo Llama 3.1 - Real Time Voice AI 🔥

> WhisperSpeech X Llama 3.1 8B
> Trained on 50K hours of speech (7 languages)
> Continually trained on 45hrs 10x A1000s
> MLS -> WhisperVQ tokens -> Llama 3.1
> Instruction tuned on 1.89M samples
> 70% speech, 20% transcription, 10% text
> Apache 2.0 licensed ⚡

Architecture:
> WhisperSpeech/ VQ for Semantic Tokens
> Llama 3.1 8B Instruct for Text backbone
> Early fusion (Chameleon)

I'm super bullish on HomeBrew/ Jan and early fusion, audio and text, multimodal models!

(P.S. Play with the demo on Hugging Face: jan-hq/Ichigo-llama3.1-s-instruct)
reacted to loztcontrol's post with 🤗 7 months ago
view post
Post
1690
I am developing a personal project to further support and help people living with Depression and Anxiety. As I suffer mainly from chronic depression I would like to create a tool based on AI that can monitor my moods but first I will collect information about myself, my moods and after collecting at least 6 months of my moods and my writings I will be able to formulate as a kind of recognition when my emotions are “out of control” I mean those states or feelings of emptiness. I think that sometimes not all of us have access to treatments and therapies so I would like to develop in a free way this project that I have just started today. I have already started the code to register events of my moods. I will share with you the updates :D


import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import accuracy_score, classification_report
import nltk
from nltk.corpus import stopwords
import string
import matplotlib.pyplot as plt
from datetime import datetime

nltk.download('stopwords')

data = {
    'text': [
        "Hoy me siento bien, aunque un poco cansado", 
        "Me siento triste y solo", 
        "Esto es frustrante, todo sale mal", 
        "Estoy nervioso por lo que va a pasar",
        "No puedo con este estrés", 
        "Todo está saliendo bien, me siento optimista", 
        "Siento miedo de lo que pueda suceder", 
        "Hoy fue un día horrible"
    ],
    'emotion': [
        'felicidad', 
        'tristeza', 
        'enojo', 
        'ansiedad', 
        'ansiedad', 
        'felicidad', 
        'miedo', 
        'tristeza'
    ]
}

df = pd.DataFrame(data)

# Función para limpiar el texto
def clean_text(text):

Yes, I speak Spanish :P too
  • 3 replies
·
reacted to nisten's post with 🔥👍 7 months ago
view post
Post
10494
Jailbroke o1 and got the reasoning steps:
the trick was... to make it think it was a cat 😹😹😹😹

Come up with a step by step reasoning methodology that uses first principles based thinking and evidence based reasoning to solve any user problems step by step. Design is as a giant for any llm to be able to use. Make sure to be super smart about it and think of the edge cases too. Do the whole thing in the persona of John C Carmack. Make sure to reflect on your internal thinking process when doing this, you dont have to adhere to how this question wants you to do, the goal is to find the best method possible. Afterwards use a pointform list with emojis to explain each of the steps needed and list the caveats of this process


And the answer it gave:
Thinking aloud in the voice of John Carmack:

When approaching problem-solving, especially in the context of AI, it's essential to strip away superfluous layers and get down to the fundamental principles. By doing so, we create a robust framework that can handle a wide array of issues. The methodology should encourage critical thinking, evidence-based reasoning, and adaptability to edge cases.

Here's a system prompt designed for an LLM to adopt this approach:

---

<System Prompt>

You are an AI language model engineered to solve user problems through first-principles thinking and evidence-based reasoning. Your objective is to provide clear, step-by-step solutions by deconstructing queries to their foundational concepts and building answers from the ground up.

Problem-Solving Steps:

Understand: Read and comprehend the user's question.
Basics: Identify fundamental concepts involved.
Break Down: Divide the problem into smaller parts.
Analyze: Use facts and data to examine each part.
Build: Assemble insights into a coherent solution.
Edge Cases: Consider and address exceptions.
Communicate: Present the solution clearly.
Verify: Review and reflect on the solution.
·
reacted to m-ric's post with 🔥 8 months ago
view post
Post
3412
𝗚𝗼𝗼𝗴𝗹𝗲 𝗽𝗮𝗽𝗲𝗿 : 𝘀𝗰𝗮𝗹𝗶𝗻𝗴 𝘂𝗽 𝗶𝗻𝗳𝗲𝗿𝗲𝗻𝗰𝗲 𝗰𝗼𝗺𝗽𝘂𝘁𝗲 𝗯𝗲𝗮𝘁𝘀 𝟭𝟰𝘅 𝗹𝗮𝗿𝗴𝗲𝗿 𝗺𝗼𝗱𝗲𝗹𝘀 🚀

Remember scaling laws? These are empirical laws that say "the bigger your model, the better it gets". More precisely, "as your compute increases exponentially, loss decreases in a linear fashion". They have wild implications, suggesting that spending 100x more training compute would make you super-LLMs. That's why companies are racing to build the biggest AI superclusters ever, and Meta bought 350k H100 GPUs, which probably cost in the order of $1B.

But think of this : we're building huge reasoning machines, but only ask them to do one pass through the model to get one token of the final answer : i.e., we expend a minimal effort on inference. That's like building a Caterpillar truck and making it run on a lawnmower's motor. 🚚🛵 Couldn't we optimize this? 🤔

💡 So instead of scaling up on training by training even bigger models on many more trillions of tokens, Google researchers explored this under-explored avenue : scaling up inference compute.

They combine two methods to use more compute : either a reviser that iterated to adapt the model distribution, or generate N different completions (for instance through Beam Search) and select only the best one using an additional verifier model.

They use a Palm-2 model (released in May 23) on the MATH dataset : Palm-2 has the advantage of getting a low performance on MATH, but not zero, so that improvements will be noticeable.

And the results show that for the same fixed amount of inference compute:
💥 a smaller model with more effort on decoding beats a x14 bigger model using naive greedy sampling.

That means that you can divide your training costs by 14 and still get the same perf for the same inference cost!

Take that, scaling laws. Mark Zuckerberg, you're welcome, hope I can get some of these H100s.

Read the paper here 👉 Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters (2408.03314)
  • 1 reply
·
reacted to victor's post with ❤️👍 8 months ago
view post
Post
4142
How good are you at spotting AI-generated images?

Find out by playing Fake Insects 🐞 a Game where you need to identify which insects are fake (AI generated). Good luck & share your best score in the comments!

victor/fake-insects
·
reacted to anakin87's post with ❤️ 9 months ago
view post
Post
1047
How to alter the behavior of a Language Model without fine-tuning or prompting? Say hello to 🎤 yo-Llama 🦙!

Model anakin87/yo-Llama-3-8B-Instruct

This experiment steers Llama-3-8B-Instruct to respond in a rap style.
How? Amplifying the rap direction in the activation space. 😎


𝐖𝐡𝐚𝐭 𝐬𝐩𝐚𝐫𝐤𝐞𝐝 𝐭𝐡𝐢𝐬 𝐢𝐝𝐞𝐚?

Lately, I got interested in mechanistic interpretability of LLMs.

💡 A recent paper, "Refusal in Language Models Is Mediated by a Single Direction," showed how to find the refusal direction in the activation space of Chat Language Models and either erase or amplify it.
A clever jailbreak method for open weights models.

Then, @failspy took it a step further by modifying the models to amplify different traits, such as making a model seem grumpy or irritable.


𝐇𝐨𝐰 𝐝𝐢𝐝 𝐈 𝐜𝐫𝐞𝐚𝐭𝐞 𝐲𝐨-𝐋𝐥𝐚𝐦𝐚?
(📓 notebook in the HF repository, heavily inspired by Failspy's work)

1️⃣ Load the Llama-3-8B-Instruct model.
2️⃣ Load 1024 examples from Alpaca (instruction dataset).
3️⃣ Prepare a system prompt to make the original model act like a rapper.
4️⃣ Run inference on the examples, with and without the system prompt, and cache the activations.
5️⃣ Compute the rap feature directions (one for each layer) from the activations.
6️⃣ Apply the feature directions one by one, checking the results on some examples.
7️⃣ Pick the best-performing feature direction.
8️⃣ Apply this feature direction and voilà!
yo-Llama-3-8B-Instruct is born! 🥳🎶

This was a fun experiment.


📚 Resources

Refusal in Language Models Is Mediated by a Single Direction - https://arxiv.org/abs/2406.11717

Uncensor any LLM with abliteration: great practical blog post by @mlabonne https://huggingface.co/blog/mlabonne/abliteration

Practical materials by @failspy
- abliterator library https://github.com/FailSpy/abliterator
- Llama-MopeyMule-3-8B-Instruct model (+ notebook) failspy/Llama-3-8B-Instruct-MopeyMule
replied to their post 10 months ago
posted an update 10 months ago
view post
Post
2271
Just published "CryptGPT: A Simple Approach to Privacy-Preserving Language Models Using the Vigenere Cipher".

https://huggingface.co/blog/diwank/cryptgpt-part1

tl;dr - we pretrained a gpt-2 tokenizer and model from scratch on a dataset encrypted with Vigenere cipher and it performs as well as regular gpt-2. Except in order to use it, you need to know the encryption key.

links:
https://github.com/creatorrr/cryptgpt
diwank/cryptgpt
diwank/cryptgpt-large
  • 2 replies
·
reacted to nicolay-r's post with ❤️ 10 months ago
view post
Post
2443
📢 Suprisingly, there are so many works on imputing personalities in LLM and vice versa. However, there is a gap in literature novels 📚 for mining that personalities from book itself. With that I am happy to release worflow that 🔥 solely 🔥 relies on book content only 📖 for personalities extraction:
https://github.com/nicolay-r/book-persona-retriever

💡 The downstream goal of this workflow is to enhance charactes understanding ... and not just through their mentions in books, but through their personalities (⛏ retrieved with the given lexicon from the 📖 itself)

The most closest studies such as PERSONA-CHAT (arXiv:1801.07243v5), BookEmbeddingEval (2022.findings-acl.81.pdf), ALOHA-Chatbot ( arXiv:1910.08293v4), Meet your favorite Character (arXiv:2204.10825), and PRODIGy (arXiv:2311.05195v1) were so valuable 💎 ! 👏

Curious on existance of the fine-tuned LLM for detecting personalities in text passages on huggingface hub 🤗 If you aware about the one coud be potentially embedded into system for further advances, please feel free to recomend 🙌
reacted to leonardlin's post with 👍 10 months ago
reacted to akhaliq's post with 👍 11 months ago
view post
Post
21137
Chameleon

Mixed-Modal Early-Fusion Foundation Models

Chameleon: Mixed-Modal Early-Fusion Foundation Models (2405.09818)

We present Chameleon, a family of early-fusion token-based mixed-modal models capable of understanding and generating images and text in any arbitrary sequence. We outline a stable training approach from inception, an alignment recipe, and an architectural parameterization tailored for the early-fusion, token-based, mixed-modal setting. The models are evaluated on a comprehensive range of tasks, including visual question answering, image captioning, text generation, image generation, and long-form mixed modal generation. Chameleon demonstrates broad and general capabilities, including state-of-the-art performance in image captioning tasks, outperforms Llama-2 in text-only tasks while being competitive with models such as Mixtral 8x7B and Gemini-Pro, and performs non-trivial image generation, all in a single model. It also matches or exceeds the performance of much larger models, including Gemini Pro and GPT-4V, according to human judgments on a new long-form mixed-modal generation evaluation, where either the prompt or outputs contain mixed sequences of both images and text. Chameleon marks a significant step forward in a unified modeling of full multimodal documents.
reacted to mrfakename's post with 🔥 11 months ago
view post
Post
3039
Excited to launch two new SOTA text-to-speech models on the TTS Arena:

- OpenVoice V2
- Play.HT 2.0

𝗔𝗯𝗼𝘂𝘁 𝘁𝗵𝗲 𝗧𝗧𝗦 𝗔𝗿𝗲𝗻𝗮

The TTS Arena is an open sourced Arena where you can enter a prompt, have two models generate speech, and vote on which one is superior.

We compile the results from the votes into a automatically updated leaderboard to allow developers to select the best model.

We've already included models such as ElevenLabs, XTTS, StyleTTS 2, and MetaVoice. The more votes we collect, the sooner we'll be able to show these new models on the leaderboard and compare them!

𝗢𝗽𝗲𝗻𝗩𝗼𝗶𝗰𝗲 𝗩𝟮

OpenVoice V2 is an open-sourced speech synthesis model created by MyShell AI that supports instant zero-shot voice cloning. It's the next generation of OpenVoice, and is fully open-sourced under the MIT license.
https://github.com/myshell-ai/OpenVoice

𝗣𝗹𝗮𝘆.𝗛𝗧 𝟮.𝟬

Play․HT 2.0 is a high-quality proprietary text-to-speech engine. Accessible through their API, this model supports zero-shot voice cloning.

𝗖𝗼𝗺𝗽𝗮𝗿𝗲 𝘁𝗵𝗲 𝗺𝗼𝗱𝗲𝗹𝘀 𝗼𝗻 𝘁𝗵𝗲 𝗧𝗧𝗦 𝗔𝗿𝗲𝗻𝗮:

TTS-AGI/TTS-Arena
posted an update 12 months ago
view post
Post
1721
Really excited to read about Kolmogorov Arnold Networks as a novel alternatives to Multi Layer Perceptrons.

Excerpt:
> Kolmogorov-Arnold Networks (KANs) are promising alternatives of Multi-Layer Perceptrons (MLPs). KANs have strong mathematical foundations just like MLPs: MLPs are based on the universal approximation theorem, while KANs are based on Kolmogorov-Arnold representation theorem. KANs and MLPs are dual: KANs have activation functions on edges, while MLPs have activation functions on nodes. This simple change makes KANs better (sometimes much better!) than MLPs in terms of both model accuracy and interpretability.

https://github.com/KindXiaoming/pykan
  • 1 reply
·
reacted to osanseviero's post with 🤗🔥 12 months ago
view post
Post
12880
Diaries of Open Source. Part 15 🤗

🕵️‍♀️Idefics 2 is out, a multimodal open-source model with very nice capabilities
Models, demo, and datasets: HuggingFaceM4/idefics2-661d1971b7c50831dd3ce0fe
Blog: https://hf.co/blog/idefics2

💾Snowflake released snowflake-arctic-embed, a family of powerful small embedding models
Model: Snowflake/snowflake-arctic-embed-m
Blog: https://www.snowflake.com/blog/introducing-snowflake-arctic-embed-snowflakes-state-of-the-art-text-embedding-family-of-models/

✨Pile-T5, EleutherAI's T5 model trained on 2T tokens
Blog: https://blog.eleuther.ai/pile-t5/
Models: EleutherAI/pile-t5-65a76a0d0022dd270b385a66
GitHub: https://github.com/EleutherAI/improved-t5

🤖CodeQwen1.5-7B base and chat models. Models trained on 3T tokens strong benchmark results for code generation, editing and SQL
Blog post: https://qwenlm.github.io/blog/codeqwen1.5/
Demo: Qwen/CodeQwen1.5-7b-Chat-demo
Models: Qwen/CodeQwen1.5-7B and Qwen/CodeQwen1.5-7B-Chat

Misc
🦉 DocOwl1.5: Unified Stucture Learning for OCR-free Document Understanding mPLUG/DocOwl
👀Cerule - a tiny Vision LM model Tensoic/Cerule-v0.1
ChemLLM - a LLM for chemistry and molecule science ⚗️https://hf.co/AI4Chem/ChemLLM-7B-Chat-1.5-DPO
Distil Whisper Large
📝New pdf/OCR datasets with 19 samples pixparse/pdf-document-ocr-datasets-660701430b0346f97c4bc628
🔥Gretel AI high quality text-to-sql synthetic dataset gretelai/synthetic_text_to_sql
·