Expansion-LLM (ExpanLLM)

chansung

posted an update 20 days ago

Post

3420

simple guide on the recipe for GRPO on Open-R1 which is built on top of TRL

I think FastAPI wrapper of vLLM with WeightSyncWorker is pretty cool feature. Also, we have many predefined reward functions out of the box!

5 replies

·

chansung

posted an update 26 days ago

Post

2561

Mistral AI Small 3.1 24B is not only commercial free but also the best model in a single GPU deployment.

I packed up all the information you need to know in a single picture. Hope this helps! :)

1 reply

·

chansung

posted an update about 1 month ago

Post

1569

Gemma 3 Release in a nutshell
(seems like function calling is not supported whereas the announcement said so)

chansung

posted an update 2 months ago

Post

3017

Simple Paper Review #5

I briefly reviewed the paper "SFT Memorizes, RL Generalizes," which compares SFT and RL in post-training of LLM/VLM from HKU, UC Berkeley, Google DeepMind, and New York University

The conclusion suggests SFT excels in memorization, while RL is better for generalization. However, since LLM/VLM should benefit humans beyond just generalization, a mix of SFT and RL is advisable. Typically, some SFT is followed by RL to understand prompt formats and enhance generalization through trial and error.

The study focused on one model, Llama-3.2-Vision-11B, using environments like General Points for arithmetic reasoning and V-IRL for spatial reasoning. Training data was used for both SFT and RL, with evaluations on in-distribution and out-of-distribution data to assess memorization and generalization.

I want to apply RL extensively, but it requires building a similar simulation environment. For domain-specific models, significant investment in creating a "playground" for the model is crucial, as the effort will directly influence the outcomes.

https://arxiv.org/abs/2501.17161

chansung

posted an update 2 months ago

Post

4387

A brief summary of the o3-mini

The OpenAI o3-mini model is a significant improvement over the o1-mini, reaching o1 performance levels. While generally good, its performance isn't universally better than previous models (o1, o1-prev.) or GPT-4o across all benchmarks. This means workflows should be re-evaluated with each model upgrade.

The o3-mini has "low," "medium," and "high" versions, with "low" being the base model used for benchmarking. It's speculated that the higher versions simply involve more processing. A fair comparison with other models like Gemini 2.0 Thinking or DeepSeek-R1 would likely need to use the "low" version and a similar "think more" mechanism.

The system card is recommended reading due to its comprehensive benchmark data.

https://openai.com/index/openai-o3-mini/

chansung

posted an update 3 months ago

Post

2031

Simple summary on DeepSeek AI's Janus-Pro: A fresh take on multimodal AI!

It builds on its predecessor, Janus, by tweaking the training methodology rather than the model architecture. The result? Improved performance in understanding and generating multimodal data.

Janus-Pro uses a three-stage training strategy, similar to Janus, but with key modifications:
✦ Stage 1 & 2: Focus on separate training for specific objectives, rather than mixing data.
✦ Stage 3: Fine-tuning with a careful balance of multimodal data.

Benchmarks show Janus-Pro holds its own against specialized models like TokenFlow XL and MetaMorph, and other multimodal models like SD3 Medium and DALL-E 3.

The main limitation? Low image resolution (384x384). However, this seems like a strategic choice to focus on establishing a solid "recipe" for multimodal models. Future work will likely leverage this recipe and increased computing power to achieve higher resolutions.

chansung

posted an update 3 months ago

Post

1739

New look for AI powered paper reviews from the list by Hugging Face Daily Papers ( managed by the @akhaliq )

Bookmark the webpage along, check comprehensive reviews by Google DeepMind Gemini 1.5, and listen to audio podcast made by the same tech used in NotebookLM.

Link: https://deep-diver.github.io/ai-paper-reviewer/

This is not an official service by Hugging Face. It is just a service developed by an individual developer using his own money :)

chansung

posted an update 3 months ago

Post

2033

Simple summarization of Evolving Deeper LLM Thinking (Google DeepMind)

The process starts by posing a question.
1) The LLM generates initial responses.
2) These generated responses are evaluated according to specific criteria (program-based checker).
3) The LLM critiques the evaluated results.
4) The LLM refines the responses based on the evaluation, critique, and original responses.

The refined response is then fed back into step 2). If it meets the criteria, the process ends. Otherwise, the algorithm generates more responses based on the refined ones (with some being discarded, some remaining, and some responses potentially being merged).

Through this process, it demonstrated excellent performance in complex scheduling problems (travel planning, meeting scheduling, etc.). It's a viable method for finding highly effective solutions in specific scenarios.

However, there are two major drawbacks:
🤔 An excessive number of API calls are required. (While the cost might not be very high, it leads to significant latency.)
🤔 The evaluator is program-based. (This limits its use as a general method. It could potentially be modified/implemented using LLM as Judge, but that would introduce additional API costs for evaluation.)

https://arxiv.org/abs/2501.09891

chansung

posted an update 3 months ago

Post

2070

Simple Summarization on DeepSeek-R1 from DeepSeek AI

The RL stage is very important.
↳ However, it is difficult to create a truly helpful AI for people solely through RL.
↳ So, we applied a learning pipeline consisting of four stages: providing a good starting point, reasoning RL, SFT, and safety RL, and achieved performance comparable to o1.
↳ Simply fine-tuning other open models with the data generated by R1-Zero (distillation) resulted in performance comparable to o1-mini.

Of course, this is just a brief overview and may not be of much help. All models are accessible on Hugging Face, and the paper can be read through the GitHub repository.

Model:

deepseek-ai
Paper: https://github.com/deepseek-ai/DeepSeek-R1

1 reply

·

chansung

authored a paper 4 months ago

KaSA: Knowledge-Aware Singular-Value Adaptation of Large Language Models

Paper • 2412.06071 • Published Dec 8, 2024 • 9

chansung

posted an update 5 months ago

Post

1973

🎙️ Listen to the audio "Podcast" of every single Hugging Face Daily Papers.

Now, "AI Paper Reviewer" project can automatically generates audio podcasts on any papers published on arXiv, and this is integrated into the GitHub Action pipeline. I sounds pretty similar to hashtag#NotebookLM in my opinion.

🎙️ Try out yourself at https://deep-diver.github.io/ai-paper-reviewer/

This audio podcast is powered by Google technologies: 1) Google DeepMind Gemini 1.5 Flash model to generate scripts of a podcast, then 2) Google Cloud Vertex AI's Text to Speech model to synthesize the voice turning the scripts into the natural sounding voices (with latest addition of "Journey" voice style)

"AI Paper Reviewer" is also an open source project. Anyone can use it to build and own a personal blog on any papers of your interests. Hence, checkout the project repository below if you are interested in!
: https://github.com/deep-diver/paper-reviewer

This project is going to support other models including open weights soon for both text-based content generation and voice synthesis for the podcast. The only reason I chose Gemini model is that it offers a "free-tier" which is enough to shape up this projects with non-realtime batch generations. I'm excited to see how others will use this tool to explore the world of AI research, hence feel free to share your feedback and suggestions!

3 replies

·

chansung

posted an update 5 months ago

Post

4761

Effortlessly stay up-to-date with AI research trends using a new AI tool, "AI Paper Reviewer" !!

It analyzes a list of Hugging Face Daily Papers(w/ @akhaliq ) and turn them into insightful blog posts. This project leverages Gemini models (1.5 Pro, 1.5 Flash, and 1.5 Flash-8B) for content generation and Upstage Document Parse for parsing the layout and contents.
blog link: https://deep-diver.github.io/ai-paper-reviewer/

Also, here is the link of GitHub repository for parsing and generating pipeline. By using this, you can easily build your own GitHub static pages based on any arXiv papers with your own interest!
: https://github.com/deep-diver/paper-reviewer

chansung

authored a paper 8 months ago

LlamaDuo: LLMOps Pipeline for Seamless Migration from Service LLMs to Small-Scale Local LLMs

Paper • 2408.13467 • Published Aug 24, 2024 • 26

chansung

posted an update 12 months ago

Post

4027

🦙🦙 LLaMA Duo project update

Last time, I gave a brief introduction about LLaMA Duo project with @sayakpaul . It is a simple toolset to aligning sLLM with service LLM with coverage dataset 👉🏻 (https://huggingface.co/posts/chansung/708646454991943).
- coverage dataset is what we believe to be the most important/desired (instruction, response) pairs. In system thinking, each instruction could be an analogy of a function from traditional programming. We make unit tests and measure the coverage % for all the features/functions. Similarly, we need to ensure if our fine-tuned model could handle what % of given instructions from coverage dataset satisfactory (hence coverage dataset).

We have tested it with "Coding" category of data from HuggingFaceH4/no_robots dataset. It has about 300 SFT training data points under Coding category. After fine-tuning Gemma 7B model on that, the result was very poor. LLaMA Duo's evaluation tool gave < 20% of metrics in similarity and preciseness on the test split.

So, we used LLaMA Duo's synthetic data generation tool to generate 60k data points that looks similar to the original dataset. We first created ~10k synthetic data points, then created 50k more based on the synthetic dataset itself.

After fine-tuning Gemma 7B on the 60k synthetic dataset, the evaluation result went up to 80~90% high. Also, when testing out the model in UI, it tends to give good responses.

It is a good showcase to transition from service LLM to sLLM or having a backup sLLM for service LLM failure scenarios. I am going to expand this experiments on all categories of no_robots dataset. It will roughly generate > 100k data points.

Here are some links:
- LLaMA Duo project repo: https://github.com/deep-diver/llamaduo
- 60k Coding synthetic dataset: chansung/merged_ds_coding
- Fine-tuned Gemma 7B model: chansung/coding_llamaduo_60k_v0.2

chansung

posted an update 12 months ago

Post

4413

💻 Smoothing the Transition from Service LLM to Local LLM

Imagine your go-to LLM service is down, or you need to use it offline – yikes! This project is all about having that "Plan B" ready to go. Here's LLaMA Duo I've been building with @sayakpaul :

✨ Fine-tune a smaller LLM: We used Hugging Face's alignment-handbook to teach a smaller-sized LLM to mimic my favorite large language model. Think of it as that super-smart AI assistant getting a capable understudy.

🤖 Batch Inference: Let's get that fine-tuned LLM working! My scripts generate lots of text like a champ, and we've made sure things run smoothly even with bigger workloads.

🧐 Evaluation: How well is my small LLM doing? We integrated with the Gemini API to use it as an expert judge – it compares my model's work to the original. Talk about a tough critic!

🪄 Synthetic Data Generation: Need to boost that model's performance? Using Gemini's feedback, we can create even more training data, custom-made to make the LLM better.

🧱 Building Blocks: This isn't just a one-time thing – it's a toolkit for all kinds of LLMOps work. Want to change your evaluation metrics? Bring in models trained differently? Absolutely, let's make it happen.

Why this project is awesome:

💪 Reliability: Keep things running no matter what happens to your main LLM source.
🔒 Privacy: Process sensitive information on your own terms.
🗺️ Offline capable: No internet connection? No problem!
🕰️ Version Control: Lock in your favorite LLM's behavior, even if the service model changes.

We'm excited to share the code on GitHub. Curious to see what you all think! 👉🏻 https://github.com/deep-diver/llamaduo

chansung

posted an update about 1 year ago

Post

2539

Realize LLM powered idea on Hugging Face Space.

I made Space for you to duplicate, then it comes with Gradio and LLM served by Hugging Face's efficient Text Generation Inference(TGI) framework packed into a single machine.

It provides a sample app code snippet with gr.ChatInterface. However, it is not limited to chat usage, but you can leverage the efficiency of TGI for any sort of apps built in Gradio.

Have you ever enjoyed playing with Hugging Chat? Then, you will enjoy writing your own idea with this. Because both are built on top of TGI!

Focus on your app code, and go beyond chat!

chansung/gradio_together_tgi

2 replies

·

chansung

posted an update about 1 year ago

Post

🎥 🤾 Vid2Persona: talk to person from video clip

A fun project over the last week with @sayakpaul . It has a simple pipeline from extracting traits of video characters to chatting with them.

Under the hood, this project leverages the power of both commercial and open source models. We used Google's Gemini 1.0 Pro Vision model to understand the video content directly, then we used HuggingFaceH4/zephyr-7b-beta model to make conversation!

Try it Hugging Face Space and let us know what you think.
: chansung/vid2persona

The space application is a dedicated implementation for ZeroGPU environment + Hugging Face Inference API with PRO account. If you wish to host it on your own environment, consider duplicate the space or run locally with the project repository
: https://github.com/deep-diver/Vid2Persona

chansung

posted an update about 1 year ago

Post

Updating PaperQA Gradio app and Hugging Face Space.
: Link ➡️ chansung/paper_qa
: Standalone repo ➡️ https://github.com/deep-diver/paperqa-ui

The final goal is to let ppl have their own paper archive. At the end, You will be able to easily *clone* on local or Hugging Face Space with Google's Gemini API Key (which is free), Hugging Face Access Token. You can just drop arXiv IDs at the bottom, then all the auto analyze papers are automatically archived on Hugging Face Dataset repo.

Here are few updates included, and dig in the source code if you want similar features for your use cases!
🖥️ making complex UI + fully responsive
+ making UI as quickly as possible (avoid server-client when possible)
💬 Permanent Chat history management with in-browser local storage
+ Chat history management *per* paper
+ Chat history management in lazy mode (too many paper, impossible to create chat history for every single paper beforehand, hence)

Current plan is to support Gemini and any open source models on Hugging Face PRO account, but will expand it to GPT4 soon.

Any suggestion on this project is welcome! possibly,
- hooking up RAG system (open models' context length is small)
- hooking up Internet search system
- image/figure analysis
....

chansung

posted an update about 1 year ago

Post

Understand research papers easier with automatically generated Q&As by LLM (Gemini 1.0 Pro). For this purpose, I have built two projects.

- [Auto Paper Analysis](https://github.com/deep-diver/auto-paper-analysis) let you generate QAs on a list of papers. The paper list could be specified either from Hugging Face's Daily Papers or in a set of raw arXiv IDs. Then the generated QA dataset could be pushed to the Hugging Face Dataset. Refer to the attached image.

- [PaperQA Space application]( chansung/paper_qa) shows how to interact with the generated QA dataset. Search the paper by keyword or date, then understand it with the QAs (in ELI5 and technical versions). Check out the attached video, or visit the space directly.

This is a baby step for the automated paper analysis (summarization) to easily consume the exploding information in the field of AI. In the next phase, I am gonna need spend my time to enhance prompt engineering, UI/UX (such as Like/Dislike system), ...

However, in the meantime, I hope this project could be helpful for someone who struggles on understanding papers (new papers comes out even when I did finish reading a paper from yesterday yet,,)!

Also, any suggestion to improve this, please let me know :)

1 reply

·

chansung

posted an update about 1 year ago

Post

Update on the Newsletter of 🤗 Daily Paper

Automatic Korean translation is integrated. In the newspaper, "KO" links appear, and it will bring you to the translated version of full paper. This is done with the following workflow.

1. Grasp the list of arXiv IDs from 🤗 Daily Paper API
2. Distribute a number of sub-list of arXiv IDs to VMs (possibly spot instances since the job ends shortly)
3. Commit & push the translated paper in HTML to the designated GitHub repository
4. Newsletter will include the links to the HTML of each paper

Job distribution to a number of VMs are super easily done with [dstack]( https://dstack.ai/ ), and the translation sub-workflow is done through 1) download PDF of each paper with arxiv-dl package, 2) PDF => text with nougat-ocr package, 3) a custom trained model( nlp-with-deeplearning/enko-t5-small-v0 ) in 🤗 transformers to translate the English text into Korean line by line, and 4) reformat the translation into HTML.

Many people in Korea are not fluent in English but want to learn about new stuff in AI, so they usually use Google Translate or other services. This is why I made this feature for easier and direct access to the SOTA knowledge.

Are there other countries with the similar needs? If so, it would be wonderful to cooperate to support more languages. Please reach out anyone is interested in this.

PS; I always wanted to show the usefulness of open ML models by building a well working end to end product, and this newsletter shows it by featuring T5ForConditionalGeneration (translation), SOLAR LLM (summarization).

if you want to sub to the newsletter
: https://groups.google.com/g/hf-daily-paper-newsletter

if you want to look into the source codes
: https://github.com/deep-diver/hf-daily-paper-newsletter

3 replies

·

ExpanLLM

AI & ML interests

Expansion-LLM's activity

KaSA: Knowledge-Aware Singular-Value Adaptation of Large Language Models

LlamaDuo: LLMOps Pipeline for Seamless Migration from Service LLMs to Small-Scale Local LLMs

AI & ML interests

Team members 2

Expansion-LLM's activity