File size: 4,256 Bytes
4089f08 dd86488 4089f08 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 |
---
license: apache-2.0
language:
- en
---
# [Cooperation Does Matter: Exploring Multi-Order Bilateral Relations for Audio-Visual Segmentation](https://yannqi.github.io/AVS-COMBO/)
[Qi Yang](https://yannqi.github.io/), Xing Nie, Tong Li, Pengfei Gao, Ying Guo, Cheng Zhen, Pengfei Yan and [Shiming Xiang](https://people.ucas.ac.cn/~xiangshiming)
This repository provides the pretrained checkpoints for the paper "Cooperation Does Matter: Exploring Multi-Order Bilateral Relations for Audio-Visual Segmentation" accepted by CVPR 2024.
## 🔥What's New
- (2024. 3.14) Our checkpoints are available to the public!
- (2024. 3.12) Our code is available to the public🌲!
- (2024. 2.27) Our paper(COMBO) is accepted by CVPR 2024!
- (2023.11.17) We completed the implemention of COMBO and push the code.
<!-- ## 🪵 TODO List -->
## 🛠️ Getting Started
### 1. Environments
- Linux or macOS with Python ≥ 3.6
```shell
# recommended
pip install -r requirements.txt
pip install soundfile
# build MSDeformAttention
cd model/modeling/pixel_decoder/ops
sh make.sh
```
- Preprocessing for detectron2
For using Siam-Encoder Module (SEM), we refine 1-line code of the detectron2.
The refined file that requires attention is located at:
`conda_envs/xxx/lib/python3.xx/site-packages/detectron2/checkpoint/c2_model_loading.py`
(refine the `xxx` to your own environment)
Commenting out the following code in [L287](https://github.com/facebookresearch/detectron2/blob/cc9266c2396d5545315e3601027ba4bc28e8c95b/detectron2/checkpoint/c2_model_loading.py#L287) will allow the code to run without errors:
```python
# raise ValueError("Cannot match one checkpoint key to multiple keys in the model.")
```
- Install Semantic-SAM (Optional)
```shell
# Semantic-SAM
pip install git+https://github.com/cocodataset/panopticapi.git
git clone https://github.com/UX-Decoder/Semantic-SAM
cd Semantic-SAM
python -m pip install -r requirements.txt
```
Find out more at [Semantic-SAM](https://github.com/UX-Decoder/Semantic-SAM)
### 2. Datasets
Please refer to the link [AVSBenchmark](https://github.com/OpenNLPLab/AVSBench) to download the datasets. You can put the data under `data` folder or rename your own folder. Remember to modify the path in config files. The `data` directory is as bellow:
```
|--AVS_dataset
|--AVSBench_semantic/
|--AVSBench_object/Multi-sources/
|--AVSBench_object/Single-source/
```
### 3. Download Pre-Trained Models
- The pretrained backbone is available from benchmark AVSBench pretrained backbones[TODO].
```
|--pretrained
|--detectron2/R-50.pkl
|--detectron2/d2_pvt_v2_b5.pkl
|--vggish-10086976.pth
|--vggish_pca_params-970ea276.pth
```
### 4. Maskiges pregeneration
- Generate class-agnostic masks (Optional)
```shell
sh avs_tools/pre_mask/pre_mask_semantic_sam_s4.sh train # or ms3, avss
sh avs_tools/pre_mask/pre_mask_semantic_sam_s4.sh val
sh avs_tools/pre_mask/pre_mask_semantic_sam_s4.sh test
```
- Generate Maskiges (Optional)
```shell
python3 avs_tools/pre_mask2rgb/mask_precess_s4.py --split train # or ms3, avss
python3 avs_tools/pre_mask2rgb/mask_precess_s4.py --split val
python3 avs_tools/pre_mask2rgb/mask_precess_s4.py --split test
```
- Move Maskiges to the following folder
Note: For convenience, we provide pre-generated Maskiges for S4\MS3\AVSS subset on the TODO hugging face link.
```
|--AVS_dataset
|--AVSBench_semantic/pre_SAM_mask/
|--AVSBench_object/Multi-sources/ms3_data/pre_SAM_mask/
|--AVSBench_object/Single-source/s4_data/pre_SAM_mask/
```
### 5. Train
```shell
# ResNet-50
sh scripts/res_train_avs4.sh # or ms3, avss
```
```shell
# PVTv2
sh scripts/pvt_train_avs4.sh # or ms3, avss
```
### 6. Test
```shell
# ResNet-50
sh scripts/res_test_avs4.sh # or ms3, avss
```
```shell
# PVTv2
sh scripts/pvt_test_avs4.sh # or ms3, avss
```
## 🤝 Citing COMBO
```
@misc{yang2023cooperation,
title={Cooperation Does Matter: Exploring Multi-Order Bilateral Relations for Audio-Visual Segmentation},
author={Qi Yang and Xing Nie and Tong Li and Pengfei Gao and Ying Guo and Cheng Zhen and Pengfei Yan and Shiming Xiang},
year={2023},
eprint={2312.06462},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
``` |