optimum-internal-testing (Optimum Internal Testing)

optimum-internal-testing-user

updated a model 1 day ago

optimum-internal-testing/tiny_random_bert_neuronx

Feature Extraction • Updated 1 day ago • 617

dacorvo

updated a model 1 day ago

optimum-internal-testing/neuron-testing-cache

Updated 1 day ago

optimum-internal-testing-user

updated a model 1 day ago

optimum-internal-testing/neuron-testing-cache

Updated 1 day ago

sayakpaul

posted an update 1 day ago

Post

2272

Commits speak louder than words 🤪

* 4 new video models
* Multiple image models, including SANA & Flux Control
* New quantizers -> GGUF & TorchAO
* New training scripts

Enjoy this holiday-special Diffusers release 🤗
Notes: https://github.com/huggingface/diffusers/releases/tag/v0.32.0

regisss

posted an update 7 days ago

Post

867

Nice to see day 1 support of Falcon 3 on Gaudi with Optimum Habana!

👉 https://www.intel.com/content/www/us/en/developer/articles/technical/intel-ai-solutions-support-falcon-3-fdn-models.html

sayakpaul

posted an update 7 days ago

Post

1549

In the past seven days, the Diffusers team has shipped:

1. Two new video models
2. One new image model
3. Two new quantization backends
4. Three new fine-tuning scripts
5. Multiple fixes and library QoL improvements

Coffee on me if someone can guess 1 - 4 correctly.

1 reply

·

sayakpaul

posted an update 16 days ago

Post

2040

Introducing a high-quality open-preference dataset to further this line of research for image generation.

Despite being such an inseparable component for modern image generation, open preference datasets are a rarity!

So, we decided to work on one with the community!

Check it out here:
https://huggingface.co/blog/image-preferences

7 replies

·

sayakpaul

posted an update 16 days ago

Post

2096

The Control family of Flux from @black-forest-labs should be discussed more!

It enables structural controls like ControlNets while being significantly less expensive to run!

So, we're working on a Control LoRA training script 🤗

It's still WIP, so go easy:
https://github.com/huggingface/diffusers/pull/10130

sayakpaul

authored a paper 19 days ago

A Noise is Worth Diffusion Guidance

Paper • 2412.03895 • Published 20 days ago • 27

sayakpaul

posted an update 26 days ago

Post

1466

Let 2024 be the year of video model fine-tunes!

Check it out here:
https://github.com/a-r-r-o-w/cogvideox-factory/tree/main/training/mochi-1

sayakpaul

posted an update about 1 month ago

Post

2600

It's been a while we shipped native quantization support in diffusers 🧨

We currently support bistandbytes as the official backend but using others like torchao is already very simple.

This post is just a reminder of what's possible:

1. Loading a model with a quantization config
2. Saving a model with quantization config
3. Loading a pre-quantized model
4. enable_model_cpu_offload()
5. Training and loading LoRAs into quantized checkpoints

Docs:
https://huggingface.co/docs/diffusers/main/en/quantization/bitsandbytes

1 reply

·

regisss

posted an update 2 months ago

Post

1378

Interested in performing inference with an ONNX model?⚡️

The Optimum docs about model inference with ONNX Runtime is now much clearer and simpler!

You want to deploy your favorite model on the hub but you don't know how to export it to the ONNX format? You can do it in one line of code as follows:

from optimum.onnxruntime import ORTModelForSequenceClassification

# Load the model from the hub and export it to the ONNX format
model_id = "distilbert-base-uncased-finetuned-sst-2-english"
model = ORTModelForSequenceClassification.from_pretrained(model_id, export=True)

Check out the whole guide 👉 https://huggingface.co/docs/optimum/onnxruntime/usage_guides/models

sayakpaul

posted an update 3 months ago

Post

2752

Did some little experimentation to resize pre-trained LoRAs on Flux. I explored two themes:

* Decrease the rank of a LoRA
* Increase the rank of a LoRA

The first one is helpful in reducing memory requirements if the LoRA is of a high rank, while the second one is merely an experiment. Another implication of this study is in the unification of LoRA ranks when you would like to torch.compile() them.

Check it out here:
sayakpaul/flux-lora-resizing

1 reply

·

sayakpaul

authored a paper 4 months ago

LlamaDuo: LLMOps Pipeline for Seamless Migration from Service LLMs to Small-Scale Local LLMs

Paper • 2408.13467 • Published Aug 24 • 24

sayakpaul

posted an update 4 months ago

Post

2945

Here is a hackable and minimal implementation showing how to perform distributed text-to-image generation with Diffusers and Accelerate.

Full snippet is here: https://gist.github.com/sayakpaul/cfaebd221820d7b43fae638b4dfa01ba

With @JW17

sayakpaul

posted an update 5 months ago

Post

4478

Flux.1-Dev like images but in fewer steps.

Merging code (very simple), inference code, merged params: sayakpaul/FLUX.1-merged

Enjoy the Monday 🤗

4 replies

·

sayakpaul

posted an update 5 months ago

Post

3793

With larger and larger diffusion transformers coming up, it's becoming increasingly important to have some good quantization tools for them.

We present our findings from a series of experiments on quantizing different diffusion pipelines based on diffusion transformers.

We demonstrate excellent memory savings with a bit of sacrifice on inference latency which is expected to improve in the coming days.

Diffusers 🤝 Quanto ❤️

This was a juicy collaboration between @dacorvo and myself.

Check out the post to learn all about it
https://huggingface.co/blog/quanto-diffusers

3 replies

·

sayakpaul

posted an update 6 months ago

Post

2207

Were you aware that we have a dedicated guide on different prompting mechanisms to improve the image generation quality? 🧨

Takes you through simple prompt engineering, prompt weighting, prompt enhancement using GPT-2, and more.

Check out the guide here 🦯
https://huggingface.co/docs/diffusers/main/en/using-diffusers/weighted_prompts

1 reply

·

IlyasMoutawwakil

posted an update 6 months ago

Post

3995

Last week, Intel's new Xeon CPUs, Sapphire Rapids (SPR), landed on Inference Endpoints and I think they got the potential to reduce the cost of your RAG pipelines 💸

Why ? Because they come with Intel® AMX support, which is a set of instructions that support and accelerate BF16 and INT8 matrix multiplications on CPU ⚡

I went ahead and built a Space to showcase how to efficiently deploy embedding models on SPR for both Retrieving and Ranking documents, with Haystack compatible components: https://huggingface.co/spaces/optimum-intel/haystack-e2e

Here's how it works:

- Document Store: A FAISS document store containing the seven-wonders dataset, embedded, indexed and stored on the Space's persistent storage to avoid unnecessary re-computation of embeddings.

- Retriever: It embeds the query at runtime and retrieves from the dataset N documents that are most semantically similar to the query's embedding.
We use the small variant of the BGE family here because we want a model that's fast to run on the entire dataset and has a small embedding space for fast similarity search. Specifically we use an INT8 quantized bge-small-en-v1.5, deployed on an Intel Sapphire Rapids CPU instance.

- Ranker: It re-embeds the retrieved documents at runtime and re-ranks them based on semantic similarity to the query's embedding. We use the large variant of the BGE family here because it's optimized for accuracy allowing us to filter the most relevant k documents that we'll use in the LLM prompt. Specifically we use an INT8 quantized bge-large-en-v1.5, deployed on an Intel Sapphire Rapids CPU instance.

Space: https://huggingface.co/spaces/optimum-intel/haystack-e2e
Retriever IE: optimum-intel/fastrag-retriever
Ranker IE: optimum-intel/fastrag-ranker

sayakpaul

posted an update 6 months ago

Post

3131

What is your favorite part of our Diffusers integration of Stable Diffusion 3?

My personal favorite is the ability to run it on a variety of different GPUs with minimal code changes.

Learn more about them here:
https://huggingface.co/blog/sd3

Optimum Internal Testing

AI & ML interests

Recent Activity

optimum-internal-testing's activity

optimum-internal-testing/tiny_random_bert_neuronx

optimum-internal-testing/neuron-testing-cache

optimum-internal-testing/neuron-testing-cache

A Noise is Worth Diffusion Guidance

LlamaDuo: LLMOps Pipeline for Seamless Migration from Service LLMs to Small-Scale Local LLMs

AI & ML interests

Recent Activity

Team members 11

optimum-internal-testing's activity