YannQi
/

COMBO-AVS-checkpoints

English

Model card Files Files and versions Community

COMBO-AVS-checkpoints / README.md

YannQi

Update README.md

4089f08 verified 7 months ago

preview code

raw

history blame contribute delete

No virus

4.26 kB

	---
	license: apache-2.0
	language:
	- en
	---

	# [Cooperation Does Matter: Exploring Multi-Order Bilateral Relations for Audio-Visual Segmentation](https://yannqi.github.io/AVS-COMBO/)

	[Qi Yang](https://yannqi.github.io/), Xing Nie, Tong Li, Pengfei Gao, Ying Guo, Cheng Zhen, Pengfei Yan and [Shiming Xiang](https://people.ucas.ac.cn/~xiangshiming)

	This repository provides the pretrained checkpoints for the paper "Cooperation Does Matter: Exploring Multi-Order Bilateral Relations for Audio-Visual Segmentation" accepted by CVPR 2024.

	## 🔥What's New

	- (2024. 3.14) Our checkpoints are available to the public!
	- (2024. 3.12) Our code is available to the public🌲!
	- (2024. 2.27) Our paper(COMBO) is accepted by CVPR 2024!
	- (2023.11.17) We completed the implemention of COMBO and push the code.

	<!-- ## 🪵 TODO List -->


	## 🛠️ Getting Started

	### 1. Environments

	- Linux or macOS with Python ≥ 3.6

	```shell
	# recommended
	pip install -r requirements.txt
	pip install soundfile
	# build MSDeformAttention
	cd model/modeling/pixel_decoder/ops
	sh make.sh
	```

	- Preprocessing for detectron2

	For using Siam-Encoder Module (SEM), we refine 1-line code of the detectron2.

	The refined file that requires attention is located at:

	`conda_envs/xxx/lib/python3.xx/site-packages/detectron2/checkpoint/c2_model_loading.py`
	(refine the `xxx` to your own environment)

	Commenting out the following code in [L287](https://github.com/facebookresearch/detectron2/blob/cc9266c2396d5545315e3601027ba4bc28e8c95b/detectron2/checkpoint/c2_model_loading.py#L287) will allow the code to run without errors:

	```python
	# raise ValueError("Cannot match one checkpoint key to multiple keys in the model.")
	```

	- Install Semantic-SAM (Optional)

	```shell
	# Semantic-SAM
	pip install git+https://github.com/cocodataset/panopticapi.git
	git clone https://github.com/UX-Decoder/Semantic-SAM
	cd Semantic-SAM
	python -m pip install -r requirements.txt
	```

	Find out more at [Semantic-SAM](https://github.com/UX-Decoder/Semantic-SAM)

	### 2. Datasets

	Please refer to the link [AVSBenchmark](https://github.com/OpenNLPLab/AVSBench) to download the datasets. You can put the data under `data` folder or rename your own folder. Remember to modify the path in config files. The `data` directory is as bellow:

	```
	\|--AVS_dataset
	\|--AVSBench_semantic/
	\|--AVSBench_object/Multi-sources/
	\|--AVSBench_object/Single-source/
	```

	### 3. Download Pre-Trained Models

	- The pretrained backbone is available from benchmark AVSBench pretrained backbones[TODO].

	```
	\|--pretrained
	\|--detectron2/R-50.pkl
	\|--detectron2/d2_pvt_v2_b5.pkl
	\|--vggish-10086976.pth
	\|--vggish_pca_params-970ea276.pth
	```

	### 4. Maskiges pregeneration

	- Generate class-agnostic masks (Optional)

	```shell
	sh avs_tools/pre_mask/pre_mask_semantic_sam_s4.sh train # or ms3, avss
	sh avs_tools/pre_mask/pre_mask_semantic_sam_s4.sh val
	sh avs_tools/pre_mask/pre_mask_semantic_sam_s4.sh test
	```

	- Generate Maskiges (Optional)

	```shell
	python3 avs_tools/pre_mask2rgb/mask_precess_s4.py --split train # or ms3, avss
	python3 avs_tools/pre_mask2rgb/mask_precess_s4.py --split val
	python3 avs_tools/pre_mask2rgb/mask_precess_s4.py --split test
	```

	- Move Maskiges to the following folder
	Note: For convenience, we provide pre-generated Maskiges for S4\MS3\AVSS subset on the TODO hugging face link.

	```
	\|--AVS_dataset
	\|--AVSBench_semantic/pre_SAM_mask/
	\|--AVSBench_object/Multi-sources/ms3_data/pre_SAM_mask/
	\|--AVSBench_object/Single-source/s4_data/pre_SAM_mask/
	```

	### 5. Train

	```shell
	# ResNet-50
	sh scripts/res_train_avs4.sh # or ms3, avss
	```

	```shell
	# PVTv2
	sh scripts/pvt_train_avs4.sh # or ms3, avss
	```

	### 6. Test

	```shell
	# ResNet-50
	sh scripts/res_test_avs4.sh # or ms3, avss
	```

	```shell
	# PVTv2
	sh scripts/pvt_test_avs4.sh # or ms3, avss
	```

	## 🤝 Citing COMBO

	```
	@misc{yang2023cooperation,
	title={Cooperation Does Matter: Exploring Multi-Order Bilateral Relations for Audio-Visual Segmentation},
	author={Qi Yang and Xing Nie and Tong Li and Pengfei Gao and Ying Guo and Cheng Zhen and Pengfei Yan and Shiming Xiang},
	year={2023},
	eprint={2312.06462},
	archivePrefix={arXiv},
	primaryClass={cs.CV}
	}
	```