Ross Wightman

rwightman

AI & ML interests

Computer vision, transfer learning, semi/self supervised learning, robotics.

Recent Activity

Articles

Organizations

Hugging Face's profile picture PyTorch Image Models's profile picture Spaces-explorers's profile picture Flax Community's profile picture LAION eV's profile picture Pixel Parsing's profile picture

rwightman's activity

posted an update 7 days ago
view post
Post
1224
I re-worked the JuptyerLab Space template recently. It's optimized for timm use, but will work great with transformers and other libs. Updated the base image, Python 3.12, Pillow-SIMD before better CPU use with image preprocessing, and made a number of other tweaks. From the Jupyter launcher you can run the terminal and setup a timm environment in moments with setup_timm_dev or setup_timm_scripts helpers. Give it a try, timm/jupyterlab-timm
reacted to ariG23498's post with ๐Ÿš€ 8 days ago
reacted to merve's post with ๐Ÿ”ฅ 14 days ago
view post
Post
1778
ByteDance just dropped SA2VA: a new family of vision LMs combining Qwen2VL/InternVL and SAM2 with MIT license ๐Ÿ’— ByteDance/sa2va-model-zoo-677e3084d71b5f108d00e093

> The models are capable of tasks involving vision-language understanding and visual referrals (referring segmentation) both for images and videos โฏ๏ธ

> The models come in 1B, 4B and 8B and are based on InternVL2.5 for base architecture and Qwen2, Qwen2.5 and InternLM2 for language model part (depending on the checkpoint)

> The model is very interesting, it has different encoders for different modalities each (visual prompt, text prompt, image and video) then it concatenates these to feed into LLM ๐Ÿ’ฌ

the output segmentation tokens are passed to SAM2, to sort of match text (captions or semantic classes) to masks โคต๏ธ

> Their annotation pipeline is also interesting, they seems to use two open large vision LMs to refine the annotations, and have different levels of descriptions to provide consistency.
  • 1 reply
ยท
posted an update 15 days ago
view post
Post
1114
New timm 1.0.13 and OpenCLIP 2.30.0 releases to start the year. Both modest but worthwhile updates.

timm added a number of new model weights, supporting loading of:
* PaliGemma2 encoders (ported from google/paligemma-2-release-67500e1e1dbfdd4dee27ba48)
* AIMv-2 encoders (ported from apple/aimv2-6720fe1558d94c7805f7688c)

A few higher resolution 384x384 ConvNeXt-Nano ImageNet-12k pretrain & finetunes. See other changes here: https://github.com/huggingface/pytorch-image-models/releases/tag/v1.0.13

And support added in both OpenCLIP and timm for two CLIP models that were missed. The DFN L/14 is ๐Ÿ”ฅ
* DFN CLIP L/14 w/ 39B samples seen - apple/DFN2B-CLIP-ViT-L-14-39B, timm/vit_large_patch14_clip_224.dfn2b_s39b
* MetaCLIP H/14 (altogether) - timm/vit_huge_patch14_clip_224.metaclip_altogether

And last, ~70-80 models that were relying on timm remapping from OpenCLIP got their own timm hub instances to allow use with the upcoming Transformers TimmWrapperModel
reacted to aaditya's post with ๐Ÿ”ฅ about 1 month ago
view post
Post
3310
Last Week in Medical AI: Top Research Papers/Models ๐Ÿ”ฅ
๐Ÿ… (December 7 โ€“ December 14, 2024)

Medical LLM & Other Models
- PediaBench: Chinese Pediatric LLM
- Comprehensive pediatric dataset
- Advanced benchmarking platform
- Chinese healthcare innovation
- BiMediX: Bilingual Medical LLM
- Multilingual medical expertise
- Diverse medical knowledge integration
- Cross-cultural healthcare insights
- MMedPO: Vision-Language Medical LLM
- Clinical multimodal optimization
- Advanced medical image understanding
- Precision healthcare modeling

Frameworks and Methodologies
- TOP-Training: Medical Q&A Framework
- Hybrid RAG: Secure Medical Data Management
- Zero-Shot ATC Clinical Coding
- Chest X-Ray Diagnosis Architecture
- Medical Imaging AI Democratization

Benchmarks & Evaluations
- KorMedMCQA: Korean Healthcare Licensing Benchmark
- Large Language Model Medical Tasks
- Clinical T5 Model Performance Study
- Radiology Report Quality Assessment
- Genomic Analysis Benchmarking

Medical LLM Applications
- BRAD: Digital Biology Language Model
- TCM-FTP: Herbal Prescription Prediction
- LLaSA: Activity Analysis via Sensors
- Emergency Department Visit Predictions
- Neurodegenerative Disease AI Diagnosis
- Kidney Disease Explainable AI Model

Ethical AI & Privacy
- Privacy-Preserving LLM Mechanisms
- AI-Driven Digital Organism Modeling
- Biomedical Research Automation
- Multimodality in Medical Practice

Full thread in detail: https://x.com/OpenlifesciAI/status/1867999825721242101
ยท
reacted to julien-c's post with ๐Ÿ”ฅ about 1 month ago
view post
Post
8636
After some heated discussion ๐Ÿ”ฅ, we clarify our intent re. storage limits on the Hub

TL;DR:
- public storage is free, and (unless blatant abuse) unlimited. We do ask that you consider upgrading to PRO and/or Enterprise Hub if possible
- private storage is paid above a significant free tier (1TB if you have a paid account, 100GB otherwise)

docs: https://huggingface.co/docs/hub/storage-limits

We optimize our infrastructure continuously to scale our storage for the coming years of growth in Machine learning, to the benefit of the community ๐Ÿ”ฅ

cc: @reach-vb @pierric @victor and the HF team
ยท
replied to their post about 2 months ago
view reply

Yeah, it's been working out well in runs so far, but as is often the case with new optimizers or optimizer enhancements milage can vary depending on many variables, curious to know how it works for your case. Case in point I had some great fine-tune results with adopt, but in this mini-imagenet case it rather flopped. But MARS, is actually doing really well here, and MARS w/ caution even better so it's very hard to cover all ground with new optimizers. MARS results to be added soon though

posted an update about 2 months ago
view post
Post
1407
There's a new timm release, v 1.0.12, with a focus on optimizers. The optimizer factory has been refactored, there's now a timm.optim.list_optimizers() and new way to register optimizers and their attributes. As always you can use an timm optimizer like a torch one, just replace torch.optim with timm.optim

New optimizers include:
* AdafactorBigVision - adfactorbv
* ADOPT - adopt / adoptw (decoupled decay)
* MARS - mars
* LaProp - laprop
* Cautious Optimizers - a modification to all of the above, prefix with c as well as cadamw, cnadamw, csgdw, clamb, crmsproptf

I shared some caution comparisons in this model repo: rwightman/timm-optim-caution

For details, references, see the code: https://github.com/huggingface/pytorch-image-models/tree/main/timm/optim

  • 3 replies
ยท
reacted to jeffboudier's post with ๐Ÿค— 2 months ago
reacted to merve's post with ๐Ÿ”ฅ 2 months ago
view post
Post
2619
What a week! A recap for everything you missed โ„๏ธ
merve/nov-22-releases-673fbbcfc1c97c4f411def07
Multimodal โœจ
> Mistral AI
released Pixtral 124B, a gigantic open vision language model
> Llava-CoT (formerly known as Llava-o1) was released, a multimodal reproduction of o1 model by PKU
> OpenGVLab released MMPR: a new multimodal reasoning dataset
> Jina has released Jina-CLIP-v2 0.98B multilingual multimodal embeddings
> Apple released new SotA vision encoders AIMv2

LLMs ๐Ÿฆ™
> AllenAI dropped a huge release of models, datasets and scripts for Tรผlu, a family of models based on Llama 3.1 aligned with SFT, DPO and a new technique they have developed called RLVR
> Jina has released embeddings-v3: new multilingual embeddings with longer context
> Hugging Face released SmolTalk: synthetic dataset used to align SmolLM2 using supervised fine-tuning
> Microsoft released orca-agentinstruct-1M-v1: a gigantic instruction dataset of 1M synthetic instruction pairs

Image Generation ๐Ÿ–ผ๏ธ
> Black Forest Labs released Flux 1. tools: four new models for different image modifications and two LoRAs to do image conditioning and better steer generations

Lastly Hugging Face released a new library Observers: a lightweight SDK for monitoring interactions with AI APIs and easily store and browse them ๐Ÿ“š
$ pip install observers
  • 3 replies
ยท
reacted to BrigitteTousi's post with ๐Ÿš€ 2 months ago
posted an update 2 months ago
view post
Post
1344
I'm currently on a push to expand the scope of image based datasets on the Hub. There's certainly a lot already, but for anyone who's looked closely, there's not a whole lot of standardization. I am to fix that, datasets under the https://huggingface.co/timm and https://huggingface.co/pixparse orgs will serve as canonical examples for various task / modality combinations and be useable without fuss in libraries like timm, OpenCLIP, and hopefully more.

I just uploaded the first multi-label dataset that I'll support with timm scripts soon: timm/plant-pathology-2021

Next up object detection & segmentation! I've got an annotation spec sorted out, a lot of datasets ready to rip, and yeah that means timm support for object detection, eventually segmentation, is finally under development :O
posted an update 2 months ago
view post
Post
1066
Want to validate some hparams or figure out what timm model to use before commiting to download or training with a large dataset? Try mini-imagenet: timm/mini-imagenet

I had this sitting on my drive and forgot where I pulled it together from. It's 100 classes of imagenet, 50k train and 10k val images (from ImageNet-1k train set), and 5k test images (from ImageNet-1k val set). 7.4GB instead of > 100GB for the full ImageNet-1k. This ver is not reduced resolution like some other 'mini' versions. Super easy to use with timm train/val scripts, checkout the dataset card.

I often check fine-tuning with even smaller datasets like:
* timm/resisc45
* timm/oxford-iiit-pet
But those are a bit small to train any modest size model w/o starting from pretrained weights.
reacted to dvilasuero's post with ๐Ÿš€ 2 months ago
reacted to sayakpaul's post with ๐Ÿš€ 2 months ago
view post
Post
2649
It's been a while we shipped native quantization support in diffusers ๐Ÿงจ

We currently support bistandbytes as the official backend but using others like torchao is already very simple.

This post is just a reminder of what's possible:

1. Loading a model with a quantization config
2. Saving a model with quantization config
3. Loading a pre-quantized model
4. enable_model_cpu_offload()
5. Training and loading LoRAs into quantized checkpoints

Docs:
https://huggingface.co/docs/diffusers/main/en/quantization/bitsandbytes
  • 1 reply
ยท
posted an update 2 months ago
view post
Post
1593
New MobileNetV4 weights were uploaded a few days ago -- more ImageNet-12k training at 384x384 for the speedy 'Conv Medium' models.

There are 3 weight variants here for those who like to tinker. On my hold-out eval they are ordered as below, not that different, but the Adopt 180 epochs closer to AdamW 250 than to AdamW 180.
* AdamW for 250 epochs - timm/mobilenetv4_conv_medium.e250_r384_in12k
* Adopt for 180 epochs - timm/mobilenetv4_conv_medium.e180_ad_r384_in12k
* AdamW for 180 epochs - timm/mobilenetv4_conv_medium.e180_r384_in12k

This was by request as a user reported impressive results using the 'Conv Large' ImagNet-12k pretrains as object detection backbones. ImageNet-1k fine-tunes are pending, the weights do behave differently with the 180 vs 250 epochs and the Adopt vs AdamW optimizer.

reacted to merve's post with ๐Ÿš€ 3 months ago
posted an update 3 months ago
view post
Post
655
A new timm release (1.0.11) is out now. A also wrote an article on one of the included models: https://huggingface.co/blog/rwightman/mambaout

Featured in the release are:
* The MambaOut model, a cheeky arch inspired by SSM but without the SSM part, a ConvNeXt with gating.
* Several timm trained MambaOut variations with arch tweaks and ImageNet-12k pretrain to verify scaling, supplement ported weights.
* The smallest MobileNetV4, a 0.5x width scaled Conv-Small.
* Two impressive MobileNetV3 Large models outperforming all previous, using MNV4 Small recipe.
* 'Zepto,' a new compact ConvNeXt variant even smaller than the previous Atto, 2.2M params, RMSNorm, and solid results for its size.
* Newly ported SigLIP SO400M/16 ViT multi-lingual weights, the largest i18n weights, prevous was B/16.
* Two ImageNet-1k fine-tuned SigLIP SO400M models at 378x378
* InternViT 300M weight port. A really solid ViT encoder distilled from OpenGVLab 6B VL model encoder.
* An assortment of very small, sub 1M param pretrained test models to improve library unit tests and serve low-resource applications.
reacted to merve's post with ๐Ÿ”ฅ 4 months ago
view post
Post
3783
Meta AI vision has been cooking @facebook
They shipped multiple models and demos for their papers at @ECCV ๐Ÿค—

Here's a compilation of my top picks:
- Sapiens is family of foundation models for human-centric depth estimation, segmentation and more, all models have open weights and demos ๐Ÿ‘

All models have their demos and even torchscript checkpoints!
A collection of models and demos: facebook/sapiens-66d22047daa6402d565cb2fc
- VFusion3D is state-of-the-art consistent 3D generation model from images

Model: facebook/vfusion3d
Demo: facebook/VFusion3D

- CoTracker is the state-of-the-art point (pixel) tracking model

Demo: facebook/cotracker
Model: facebook/cotracker
reacted to Wauplin's post with ๐Ÿ”ฅ 4 months ago
view post
Post
2952
What a great milestone to celebrate! The huggingface_hub library is slowly becoming a cornerstone of the Python ML ecosystem when it comes to interacting with the @huggingface Hub. It wouldn't be there without the hundreds of community contributions and feedback! No matter if you are loading a model, sharing a dataset, running remote inference or starting jobs on our infra, you are for sure using it! And this is only the beginning so give a star if you wanna follow the project ๐Ÿ‘‰ https://github.com/huggingface/huggingface_hub
  • 1 reply
ยท