# RobustVLM (Foundation Models) via Object-centric Learning ## Table of Contents - [Installation](#installation) - [Stage1: Get Object-centric Models](#models) - [Dataset](#loading-pretrained-models) - [Training](#summary-of-results) ## Installation Create and activate anaconda environment: ```shell conda create -n robustclip python==3.11 ``` ```shell conda activate robustclip ``` The code is tested with Python 3.11. To install the required packages, run: ```shell pip install -r requirements.txt ``` To install the open_clip_torch locally run: ```shell cd ./open_clip_torch ``` ```shell python setup.py develop ``` ## Stage1: Get Object-centric Models ### Dataset Prepare the ImageNet dataset in a torch.ImageFolder style format: ``` dataset_path └─imagenet └─train └─n01440764 xxxxxx.JPEG ..... └─...... └─val └─n04254680 xxxxxx.JPEG ..... └─...... ``` ### Training - Slot-Attention on 4GPUs ```shell CUDA_VISIBLE_DEVICES=0,1,2,3 python -m train.training_clip_slots --clip_model_name ViT-L-14 --pretrained openai --dataset imagenet --imagenet_root /.../.../dataset_path/imagenet --template std --output_normalize False --steps 1000000 --warmup 10000 --batch_size 128 --loss l2 --opt adamw --lr 5e-5 --wd 1e-4 --attack pgd --inner_loss l2 --norm linf --eps 4 --iterations_adv 10 --stepsize_adv 1 --wandb False --output_dir ./output_slots --experiment_name SLOTS --log_freq 1000 --eval_freq 1000``` ``` The results of reconstruction after slot-attention and ckps are stored in './output_slots/ViT-L-14_openai_imagenet_l2_imagenet_SLOTS_xxxxx'