happyme531
commited on
Commit
•
50704de
1
Parent(s):
fe8d810
Upload 12 files
Browse files- .gitattributes +3 -0
- README.md +158 -3
- convert_rknn.py +98 -0
- dog.jpg +0 -0
- export_onnx.py +278 -0
- sam2.1_hiera_large_decoder.onnx +3 -0
- sam2.1_hiera_large_encoder.rknn +3 -0
- sam2.1_hiera_small_decoder.onnx +3 -0
- sam2.1_hiera_small_encoder.rknn +3 -0
- sam2.1_hiera_tiny_decoder.onnx +3 -0
- sam2.1_hiera_tiny_encoder.rknn +3 -0
- test_onnx.py +195 -0
- test_rknn.py +178 -0
.gitattributes
CHANGED
@@ -33,3 +33,6 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
|
33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
|
|
|
|
|
|
|
33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
36 |
+
sam2.1_hiera_large_encoder.rknn filter=lfs diff=lfs merge=lfs -text
|
37 |
+
sam2.1_hiera_small_encoder.rknn filter=lfs diff=lfs merge=lfs -text
|
38 |
+
sam2.1_hiera_tiny_encoder.rknn filter=lfs diff=lfs merge=lfs -text
|
README.md
CHANGED
@@ -1,3 +1,158 @@
|
|
1 |
-
|
2 |
-
|
3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Segment Anything 2.1 RKNN2
|
2 |
+
|
3 |
+
## (English README see below)
|
4 |
+
|
5 |
+
在RK3588上运行强大的Segment Anything 2.1图像分割模型!
|
6 |
+
|
7 |
+
- 推理速度(RK3588):
|
8 |
+
- Encoder(Tiny)(单NPU核): 3s
|
9 |
+
- Encoder(Small)(单NPU核): 3.5s
|
10 |
+
- Encoder(Large)(单NPU核): 12s
|
11 |
+
- Decoder(CPU): 0.1s
|
12 |
+
|
13 |
+
- 内存占用(RK3588):
|
14 |
+
- Encoder(Tiny): 0.95GB
|
15 |
+
- Encoder(Small): 1.1GB
|
16 |
+
- Encoder(Large): 4.1GB
|
17 |
+
- Decoder: 非常小, 可以忽略不计
|
18 |
+
|
19 |
+
## 使用方法
|
20 |
+
|
21 |
+
1. 克隆或者下载此仓库到本地. 模型较大, 请确保有足够的磁盘空间.
|
22 |
+
|
23 |
+
2. 安装依赖
|
24 |
+
|
25 |
+
```bash
|
26 |
+
pip install numpy<2 pillow matplotlib opencv-python onnxruntime rknn-toolkit-lite2
|
27 |
+
```
|
28 |
+
|
29 |
+
3. 运行
|
30 |
+
|
31 |
+
```bash
|
32 |
+
python test_rknn.py
|
33 |
+
```
|
34 |
+
|
35 |
+
你可以修改`test_rknn.py`中这一部分
|
36 |
+
```python
|
37 |
+
def main():
|
38 |
+
# 1. 加载原始图片
|
39 |
+
path = "dog.jpg"
|
40 |
+
orig_image, input_image, (scale, offset_x, offset_y) = load_image(path)
|
41 |
+
decoder_path = "sam2.1_hiera_small_decoder.onnx"
|
42 |
+
encoder_path = "sam2.1_hiera_small_encoder.rknn"
|
43 |
+
...
|
44 |
+
```
|
45 |
+
|
46 |
+
来测试不同的模型和图片. 注意, 和SAM1不同, 这里的encoder和decoder必须使用同一个版本的模型.
|
47 |
+
|
48 |
+
|
49 |
+
## 模型转换
|
50 |
+
|
51 |
+
1. 安装依赖
|
52 |
+
|
53 |
+
```bash
|
54 |
+
pip install numpy<2 onnxslim onnxruntime rknn-toolkit2 sam2
|
55 |
+
```
|
56 |
+
|
57 |
+
2. 下载SAM2.1的pt模型文件. 可以从[这里](https://github.com/facebookresearch/sam2?tab=readme-ov-file#model-description)下载.
|
58 |
+
|
59 |
+
3. 转换pt模型到onnx模型. 以Tiny模型为例:
|
60 |
+
|
61 |
+
```bash
|
62 |
+
python ./export_onnx.py --model_type sam2.1_hiera_tiny --checkpoint ./sam2.1_hiera_tiny.pt --output_encoder ./sam2.1_hiera_tiny_encoder.onnx --output_decoder sam2.1_hiera_tiny_decoder.onnx
|
63 |
+
```
|
64 |
+
|
65 |
+
4. 将onnx模型转换为rknn模型. 以Tiny模型为例:
|
66 |
+
|
67 |
+
```bash
|
68 |
+
python ./convert_rknn.py sam2.1_hiera_tiny
|
69 |
+
```
|
70 |
+
如果在常量折叠时报错, 请尝试更新onnxruntime到最新版本.
|
71 |
+
|
72 |
+
## 已知问题
|
73 |
+
|
74 |
+
- 只实现了图片分割, 没有实现视频分割.
|
75 |
+
- 由于RKNN-Toolkit2的问题, decoder模型在转换时会报错, 暂时需要使用CPU onnxruntime运行, 会略微增加CPU占用.
|
76 |
+
|
77 |
+
## 参考
|
78 |
+
|
79 |
+
- [samexporter/export_sam21_cvat.py](https://github.com/hashJoe/samexporter/blob/cvat/samexporter/export_sam21_cvat.py)
|
80 |
+
- [SAM 2](https://github.com/facebookresearch/sam2)
|
81 |
+
|
82 |
+
## English README
|
83 |
+
|
84 |
+
Run the powerful Segment Anything 2.1 image segmentation model on RK3588!
|
85 |
+
|
86 |
+
- Inference Speed (RK3588):
|
87 |
+
- Encoder(Tiny)(Single NPU Core): 3s
|
88 |
+
- Encoder(Small)(Single NPU Core): 3.5s
|
89 |
+
- Encoder(Large)(Single NPU Core): 12s
|
90 |
+
- Decoder(CPU): 0.1s
|
91 |
+
|
92 |
+
- Memory Usage (RK3588):
|
93 |
+
- Encoder(Tiny): 0.95GB
|
94 |
+
- Encoder(Small): 1.1GB
|
95 |
+
- Encoder(Large): 4.1GB
|
96 |
+
- Decoder: Negligible
|
97 |
+
|
98 |
+
## Usage
|
99 |
+
|
100 |
+
1. Clone or download this repository. Models are large, please ensure sufficient disk space.
|
101 |
+
|
102 |
+
2. Install dependencies
|
103 |
+
|
104 |
+
```bash
|
105 |
+
pip install numpy<2 pillow matplotlib opencv-python onnxruntime rknn-toolkit-lite2
|
106 |
+
```
|
107 |
+
|
108 |
+
3. Run
|
109 |
+
|
110 |
+
```bash
|
111 |
+
python test_rknn.py
|
112 |
+
```
|
113 |
+
|
114 |
+
You can modify this part in `test_rknn.py`
|
115 |
+
```python
|
116 |
+
def main():
|
117 |
+
# 1. Load original image
|
118 |
+
path = "dog.jpg"
|
119 |
+
orig_image, input_image, (scale, offset_x, offset_y) = load_image(path)
|
120 |
+
decoder_path = "sam2.1_hiera_small_decoder.onnx"
|
121 |
+
encoder_path = "sam2.1_hiera_small_encoder.rknn"
|
122 |
+
...
|
123 |
+
```
|
124 |
+
|
125 |
+
to test different models and images. Note that unlike SAM1, the encoder and decoder must use the same version of the model.
|
126 |
+
|
127 |
+
## Model Conversion
|
128 |
+
|
129 |
+
1. Install dependencies
|
130 |
+
|
131 |
+
```bash
|
132 |
+
pip install numpy<2 onnxslim onnxruntime rknn-toolkit2 sam2
|
133 |
+
```
|
134 |
+
|
135 |
+
2. Download SAM2.1 pt model files. You can download them from [here](https://github.com/facebookresearch/sam2?tab=readme-ov-file#model-description).
|
136 |
+
|
137 |
+
3. Convert pt models to onnx models. Taking Tiny model as an example:
|
138 |
+
|
139 |
+
```bash
|
140 |
+
python ./export_onnx.py --model_type sam2.1_hiera_tiny --checkpoint ./sam2.1_hiera_tiny.pt --output_encoder ./sam2.1_hiera_tiny_encoder.onnx --output_decoder sam2.1_hiera_tiny_decoder.onnx
|
141 |
+
```
|
142 |
+
|
143 |
+
4. Convert onnx models to rknn models. Taking Tiny model as an example:
|
144 |
+
|
145 |
+
```bash
|
146 |
+
python ./convert_rknn.py sam2.1_hiera_tiny
|
147 |
+
```
|
148 |
+
If you encounter errors during constant folding, try updating onnxruntime to the latest version.
|
149 |
+
|
150 |
+
## Known Issues
|
151 |
+
|
152 |
+
- Only image segmentation is implemented, video segmentation is not supported.
|
153 |
+
- Due to issues with RKNN-Toolkit2, the decoder model conversion will fail. Currently, it needs to run on CPU using onnxruntime, which will slightly increase CPU usage.
|
154 |
+
|
155 |
+
## References
|
156 |
+
|
157 |
+
- [samexporter/export_sam21_cvat.py](https://github.com/hashJoe/samexporter/blob/cvat/samexporter/export_sam21_cvat.py)
|
158 |
+
- [SAM 2](https://github.com/facebookresearch/sam2)
|
convert_rknn.py
ADDED
@@ -0,0 +1,98 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
#!/usr/bin/env python
|
2 |
+
# coding: utf-8
|
3 |
+
|
4 |
+
import datetime
|
5 |
+
import argparse
|
6 |
+
from rknn.api import RKNN
|
7 |
+
from sys import exit
|
8 |
+
import os
|
9 |
+
import onnxslim
|
10 |
+
|
11 |
+
num_pointss = [1]
|
12 |
+
num_labelss = [1]
|
13 |
+
|
14 |
+
def convert_to_rknn(onnx_model, model_part, dataset="/home/zt/rk3588-nn/rknn_model_zoo/datasets/COCO/coco_subset_20.txt", quantize=False):
|
15 |
+
"""转换单个ONNX模型到RKNN格式"""
|
16 |
+
rknn_model = onnx_model.replace(".onnx",".rknn")
|
17 |
+
timedate_iso = datetime.datetime.now().isoformat()
|
18 |
+
|
19 |
+
print(f"\n开始转换 {onnx_model} 到 {rknn_model}")
|
20 |
+
|
21 |
+
input_shapes = None
|
22 |
+
|
23 |
+
if model_part == "encoder":
|
24 |
+
input_shapes = None
|
25 |
+
elif model_part == "decoder":
|
26 |
+
input_shapes = [
|
27 |
+
[
|
28 |
+
[1, 256, 64, 64], # image_embedding
|
29 |
+
[1, 32, 256, 256], # high_res_feats_0
|
30 |
+
[1, 64, 128, 128], # high_res_feats_1
|
31 |
+
[num_labels, num_points, 2], # point_coords
|
32 |
+
[num_labels, num_points], # point_labels
|
33 |
+
[num_labels, 1, 256, 256], # mask_input
|
34 |
+
[num_labels], # has_mask_input
|
35 |
+
]
|
36 |
+
for num_labels in num_labelss
|
37 |
+
for num_points in num_pointss
|
38 |
+
]
|
39 |
+
|
40 |
+
rknn = RKNN(verbose=True)
|
41 |
+
rknn.config(
|
42 |
+
dynamic_input=input_shapes,
|
43 |
+
std_values=[[255,255,255]] if model_part == "encoder" else None,
|
44 |
+
quantized_dtype='w8a8',
|
45 |
+
quantized_algorithm='normal',
|
46 |
+
quantized_method='channel',
|
47 |
+
quantized_hybrid_level=0,
|
48 |
+
target_platform='rk3588',
|
49 |
+
quant_img_RGB2BGR = False,
|
50 |
+
float_dtype='float16',
|
51 |
+
optimization_level=3,
|
52 |
+
custom_string=f"converted at {timedate_iso}",
|
53 |
+
remove_weight=False,
|
54 |
+
compress_weight=False,
|
55 |
+
inputs_yuv_fmt=None,
|
56 |
+
single_core_mode=False,
|
57 |
+
model_pruning=False,
|
58 |
+
op_target=None,
|
59 |
+
quantize_weight=False,
|
60 |
+
remove_reshape=False,
|
61 |
+
sparse_infer=False,
|
62 |
+
enable_flash_attention=False,
|
63 |
+
)
|
64 |
+
|
65 |
+
ret = rknn.load_onnx(model=onnx_model)
|
66 |
+
ret = rknn.build(do_quantization=quantize, dataset=dataset, rknn_batch_size=None)
|
67 |
+
ret = rknn.export_rknn(rknn_model)
|
68 |
+
print(f"完成转换 {rknn_model}\n")
|
69 |
+
|
70 |
+
def main():
|
71 |
+
parser = argparse.ArgumentParser(description='转换SAM模型从ONNX到RKNN格式')
|
72 |
+
parser.add_argument('model_name', type=str, help='模型名称,例如: sam2.1_hiera_tiny')
|
73 |
+
args = parser.parse_args()
|
74 |
+
|
75 |
+
# 构建encoder和decoder的文件名
|
76 |
+
encoder_onnx = f"{args.model_name}_encoder.onnx"
|
77 |
+
decoder_onnx = f"{args.model_name}_decoder.onnx"
|
78 |
+
|
79 |
+
# 检查文件是否存在
|
80 |
+
for model in [encoder_onnx, decoder_onnx]:
|
81 |
+
if not os.path.exists(model):
|
82 |
+
print(f"错误: 找不到文件 {model}")
|
83 |
+
exit(1)
|
84 |
+
|
85 |
+
# 转换encoder和decoder
|
86 |
+
#encoder需要先跑一个onnxslim
|
87 |
+
print("开始转换encoder...")
|
88 |
+
onnxslim.slim(encoder_onnx, output_model="encoder_slim.onnx", skip_fusion_patterns=["EliminationSlice"])
|
89 |
+
convert_to_rknn("encoder_slim.onnx", model_part="encoder")
|
90 |
+
os.rename("encoder_slim.rknn", encoder_onnx.replace(".onnx", ".rknn"))
|
91 |
+
os.remove("encoder_slim.onnx")
|
92 |
+
|
93 |
+
# convert_to_rknn(decoder_onnx, model_part="decoder") # 坏的
|
94 |
+
|
95 |
+
print("所有模型转换完成!")
|
96 |
+
|
97 |
+
if __name__ == "__main__":
|
98 |
+
main()
|
dog.jpg
ADDED
export_onnx.py
ADDED
@@ -0,0 +1,278 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
from typing import Any
|
2 |
+
import argparse
|
3 |
+
import pathlib
|
4 |
+
|
5 |
+
import torch
|
6 |
+
from torch import nn
|
7 |
+
from sam2.build_sam import build_sam2
|
8 |
+
from sam2.modeling.sam2_base import SAM2Base
|
9 |
+
|
10 |
+
|
11 |
+
class SAM2ImageEncoder(nn.Module):
|
12 |
+
def __init__(self, sam_model: SAM2Base) -> None:
|
13 |
+
super().__init__()
|
14 |
+
self.model = sam_model
|
15 |
+
self.image_encoder = sam_model.image_encoder
|
16 |
+
self.no_mem_embed = sam_model.no_mem_embed
|
17 |
+
|
18 |
+
def forward(self, x: torch.Tensor) -> tuple[Any, Any, Any]:
|
19 |
+
backbone_out = self.image_encoder(x)
|
20 |
+
backbone_out["backbone_fpn"][0] = self.model.sam_mask_decoder.conv_s0(
|
21 |
+
backbone_out["backbone_fpn"][0]
|
22 |
+
)
|
23 |
+
backbone_out["backbone_fpn"][1] = self.model.sam_mask_decoder.conv_s1(
|
24 |
+
backbone_out["backbone_fpn"][1]
|
25 |
+
)
|
26 |
+
|
27 |
+
feature_maps = backbone_out["backbone_fpn"][
|
28 |
+
-self.model.num_feature_levels :
|
29 |
+
]
|
30 |
+
vision_pos_embeds = backbone_out["vision_pos_enc"][
|
31 |
+
-self.model.num_feature_levels :
|
32 |
+
]
|
33 |
+
|
34 |
+
feat_sizes = [(x.shape[-2], x.shape[-1]) for x in vision_pos_embeds]
|
35 |
+
|
36 |
+
# flatten NxCxHxW to HWxNxC
|
37 |
+
vision_feats = [x.flatten(2).permute(2, 0, 1) for x in feature_maps]
|
38 |
+
vision_feats[-1] = vision_feats[-1] + self.no_mem_embed
|
39 |
+
|
40 |
+
feats = [
|
41 |
+
feat.permute(1, 2, 0).reshape(1, -1, *feat_size)
|
42 |
+
for feat, feat_size in zip(vision_feats[::-1], feat_sizes[::-1])
|
43 |
+
][::-1]
|
44 |
+
|
45 |
+
return feats[0], feats[1], feats[2]
|
46 |
+
|
47 |
+
|
48 |
+
class SAM2ImageDecoder(nn.Module):
|
49 |
+
def __init__(self, sam_model: SAM2Base, multimask_output: bool) -> None:
|
50 |
+
super().__init__()
|
51 |
+
self.mask_decoder = sam_model.sam_mask_decoder
|
52 |
+
self.prompt_encoder = sam_model.sam_prompt_encoder
|
53 |
+
self.model = sam_model
|
54 |
+
self.img_size = sam_model.image_size
|
55 |
+
self.multimask_output = multimask_output
|
56 |
+
|
57 |
+
@torch.no_grad()
|
58 |
+
def forward(
|
59 |
+
self,
|
60 |
+
image_embed: torch.Tensor,
|
61 |
+
high_res_feats_0: torch.Tensor,
|
62 |
+
high_res_feats_1: torch.Tensor,
|
63 |
+
point_coords: torch.Tensor,
|
64 |
+
point_labels: torch.Tensor,
|
65 |
+
orig_im_size: torch.Tensor,
|
66 |
+
mask_input: torch.Tensor,
|
67 |
+
has_mask_input: torch.Tensor,
|
68 |
+
):
|
69 |
+
sparse_embedding = self._embed_points(point_coords, point_labels)
|
70 |
+
self.sparse_embedding = sparse_embedding
|
71 |
+
dense_embedding = self._embed_masks(mask_input, has_mask_input)
|
72 |
+
|
73 |
+
high_res_feats = [high_res_feats_0, high_res_feats_1]
|
74 |
+
image_embed = image_embed
|
75 |
+
|
76 |
+
masks, iou_predictions, _, _ = self.mask_decoder.predict_masks(
|
77 |
+
image_embeddings=image_embed,
|
78 |
+
image_pe=self.prompt_encoder.get_dense_pe(),
|
79 |
+
sparse_prompt_embeddings=sparse_embedding,
|
80 |
+
dense_prompt_embeddings=dense_embedding,
|
81 |
+
repeat_image=False,
|
82 |
+
high_res_features=high_res_feats,
|
83 |
+
)
|
84 |
+
|
85 |
+
if self.multimask_output:
|
86 |
+
masks = masks[:, 1:, :, :]
|
87 |
+
iou_predictions = iou_predictions[:, 1:]
|
88 |
+
else:
|
89 |
+
masks, iou_predictions = (
|
90 |
+
self.mask_decoder._dynamic_multimask_via_stability(
|
91 |
+
masks, iou_predictions
|
92 |
+
)
|
93 |
+
)
|
94 |
+
|
95 |
+
masks = torch.clamp(masks, -32.0, 32.0)
|
96 |
+
|
97 |
+
return masks, iou_predictions
|
98 |
+
|
99 |
+
def _embed_points(
|
100 |
+
self, point_coords: torch.Tensor, point_labels: torch.Tensor
|
101 |
+
) -> torch.Tensor:
|
102 |
+
|
103 |
+
point_coords = point_coords + 0.5
|
104 |
+
|
105 |
+
padding_point = torch.zeros(
|
106 |
+
(point_coords.shape[0], 1, 2), device=point_coords.device
|
107 |
+
)
|
108 |
+
padding_label = -torch.ones(
|
109 |
+
(point_labels.shape[0], 1), device=point_labels.device
|
110 |
+
)
|
111 |
+
point_coords = torch.cat([point_coords, padding_point], dim=1)
|
112 |
+
point_labels = torch.cat([point_labels, padding_label], dim=1)
|
113 |
+
|
114 |
+
point_coords[:, :, 0] = point_coords[:, :, 0] / self.model.image_size
|
115 |
+
point_coords[:, :, 1] = point_coords[:, :, 1] / self.model.image_size
|
116 |
+
|
117 |
+
point_embedding = self.prompt_encoder.pe_layer._pe_encoding(
|
118 |
+
point_coords
|
119 |
+
)
|
120 |
+
point_labels = point_labels.unsqueeze(-1).expand_as(point_embedding)
|
121 |
+
|
122 |
+
point_embedding = point_embedding * (point_labels != -1)
|
123 |
+
point_embedding = (
|
124 |
+
point_embedding
|
125 |
+
+ self.prompt_encoder.not_a_point_embed.weight
|
126 |
+
* (point_labels == -1)
|
127 |
+
)
|
128 |
+
|
129 |
+
for i in range(self.prompt_encoder.num_point_embeddings):
|
130 |
+
point_embedding = (
|
131 |
+
point_embedding
|
132 |
+
+ self.prompt_encoder.point_embeddings[i].weight
|
133 |
+
* (point_labels == i)
|
134 |
+
)
|
135 |
+
|
136 |
+
return point_embedding
|
137 |
+
|
138 |
+
def _embed_masks(
|
139 |
+
self, input_mask: torch.Tensor, has_mask_input: torch.Tensor
|
140 |
+
) -> torch.Tensor:
|
141 |
+
mask_embedding = has_mask_input * self.prompt_encoder.mask_downscaling(
|
142 |
+
input_mask
|
143 |
+
)
|
144 |
+
mask_embedding = mask_embedding + (
|
145 |
+
1 - has_mask_input
|
146 |
+
) * self.prompt_encoder.no_mask_embed.weight.reshape(1, -1, 1, 1)
|
147 |
+
return mask_embedding
|
148 |
+
|
149 |
+
|
150 |
+
if __name__ == "__main__":
|
151 |
+
parser = argparse.ArgumentParser(
|
152 |
+
description="Export the SAM2 prompt encoder and mask decoder to an ONNX model."
|
153 |
+
)
|
154 |
+
parser.add_argument(
|
155 |
+
"--checkpoint",
|
156 |
+
type=str,
|
157 |
+
required=True,
|
158 |
+
help="The path to the SAM model checkpoint.",
|
159 |
+
)
|
160 |
+
|
161 |
+
parser.add_argument(
|
162 |
+
"--output_encoder",
|
163 |
+
type=str,
|
164 |
+
required=True,
|
165 |
+
help="The filename to save the encoder ONNX model to.",
|
166 |
+
)
|
167 |
+
|
168 |
+
parser.add_argument(
|
169 |
+
"--output_decoder",
|
170 |
+
type=str,
|
171 |
+
required=True,
|
172 |
+
help="The filename to save the decoder ONNX model to.",
|
173 |
+
)
|
174 |
+
|
175 |
+
parser.add_argument(
|
176 |
+
"--model_type",
|
177 |
+
type=str,
|
178 |
+
required=True,
|
179 |
+
help="In the form of sam2_hiera_{tiny, small, base_plus, large}.",
|
180 |
+
)
|
181 |
+
|
182 |
+
parser.add_argument(
|
183 |
+
"--opset",
|
184 |
+
type=int,
|
185 |
+
default=17,
|
186 |
+
help="The ONNX opset version to use. Must be >=11",
|
187 |
+
)
|
188 |
+
|
189 |
+
args = parser.parse_args()
|
190 |
+
|
191 |
+
input_size = (1024, 1024)
|
192 |
+
multimask_output = False
|
193 |
+
model_type = args.model_type
|
194 |
+
if model_type == "sam2.1_hiera_tiny":
|
195 |
+
model_cfg = "configs/sam2.1/sam2.1_hiera_t.yaml"
|
196 |
+
elif model_type == "sam2.1_hiera_small":
|
197 |
+
model_cfg = "configs/sam2.1/sam2.1_hiera_s.yaml"
|
198 |
+
elif model_type == "sam2.1_hiera_base_plus":
|
199 |
+
model_cfg = "configs/sam2.1/sam2.1_hiera_b+.yaml"
|
200 |
+
elif model_type == "sam2.1_hiera_large":
|
201 |
+
model_cfg = "configs/sam2.1/sam2.1_hiera_l.yaml"
|
202 |
+
else:
|
203 |
+
model_cfg = "configs/sam2.1/sam2.1_hiera_l.yaml"
|
204 |
+
|
205 |
+
sam2_model = build_sam2(model_cfg, args.checkpoint, device="cpu")
|
206 |
+
img = torch.randn(1, 3, input_size[0], input_size[1]).cpu()
|
207 |
+
sam2_encoder = SAM2ImageEncoder(sam2_model).cpu()
|
208 |
+
high_res_feats_0, high_res_feats_1, image_embed = sam2_encoder(img)
|
209 |
+
|
210 |
+
pathlib.Path(args.output_encoder).parent.mkdir(parents=True, exist_ok=True)
|
211 |
+
torch.onnx.export(
|
212 |
+
sam2_encoder,
|
213 |
+
img,
|
214 |
+
args.output_encoder,
|
215 |
+
export_params=True,
|
216 |
+
opset_version=args.opset,
|
217 |
+
do_constant_folding=True,
|
218 |
+
input_names=["image"],
|
219 |
+
output_names=["high_res_feats_0", "high_res_feats_1", "image_embed"],
|
220 |
+
)
|
221 |
+
print("Saved encoder to", args.output_encoder)
|
222 |
+
|
223 |
+
sam2_decoder = SAM2ImageDecoder(
|
224 |
+
sam2_model, multimask_output=multimask_output
|
225 |
+
).cpu()
|
226 |
+
|
227 |
+
embed_dim = sam2_model.sam_prompt_encoder.embed_dim
|
228 |
+
embed_size = (
|
229 |
+
sam2_model.image_size // sam2_model.backbone_stride,
|
230 |
+
sam2_model.image_size // sam2_model.backbone_stride,
|
231 |
+
)
|
232 |
+
mask_input_size = [4 * x for x in embed_size]
|
233 |
+
print(embed_dim, embed_size, mask_input_size)
|
234 |
+
|
235 |
+
point_coords = torch.randint(
|
236 |
+
low=0, high=input_size[1], size=(1, 5, 2), dtype=torch.float
|
237 |
+
)
|
238 |
+
point_labels = torch.randint(low=0, high=1, size=(1, 5), dtype=torch.float)
|
239 |
+
mask_input = torch.randn(1, 1, *mask_input_size, dtype=torch.float)
|
240 |
+
has_mask_input = torch.tensor([1], dtype=torch.float)
|
241 |
+
orig_im_size = torch.tensor([input_size[0], input_size[1]], dtype=torch.int)
|
242 |
+
|
243 |
+
pathlib.Path(args.output_decoder).parent.mkdir(parents=True, exist_ok=True)
|
244 |
+
torch.onnx.export(
|
245 |
+
sam2_decoder,
|
246 |
+
(
|
247 |
+
image_embed,
|
248 |
+
high_res_feats_0,
|
249 |
+
high_res_feats_1,
|
250 |
+
point_coords,
|
251 |
+
point_labels,
|
252 |
+
orig_im_size,
|
253 |
+
mask_input,
|
254 |
+
has_mask_input,
|
255 |
+
),
|
256 |
+
args.output_decoder,
|
257 |
+
export_params=True,
|
258 |
+
opset_version=args.opset,
|
259 |
+
do_constant_folding=True,
|
260 |
+
input_names=[
|
261 |
+
"image_embed",
|
262 |
+
"high_res_feats_0",
|
263 |
+
"high_res_feats_1",
|
264 |
+
"point_coords",
|
265 |
+
"point_labels",
|
266 |
+
"orig_im_size",
|
267 |
+
"mask_input",
|
268 |
+
"has_mask_input",
|
269 |
+
],
|
270 |
+
output_names=["masks", "iou_predictions"],
|
271 |
+
dynamic_axes={
|
272 |
+
"point_coords": {0: "num_labels", 1: "num_points"},
|
273 |
+
"point_labels": {0: "num_labels", 1: "num_points"},
|
274 |
+
"mask_input": {0: "num_labels"},
|
275 |
+
"has_mask_input": {0: "num_labels"},
|
276 |
+
},
|
277 |
+
)
|
278 |
+
print("Saved decoder to", args.output_decoder)
|
sam2.1_hiera_large_decoder.onnx
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:c039b2455b4e92dfeb8cb8e4d10a98a92a79ec1550a7119c997bad4352811554
|
3 |
+
size 16526061
|
sam2.1_hiera_large_encoder.rknn
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:0ce5ae036eb273f4e017481c8cb744e50c84a93e81e2f6a84ff4b89a118e756a
|
3 |
+
size 1419024037
|
sam2.1_hiera_small_decoder.onnx
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:4e7ba7a80bfae89c1a660d3b64291fa4f5a2de15022a4e8eab933218d4f34582
|
3 |
+
size 16526003
|
sam2.1_hiera_small_encoder.rknn
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:d8b9efce9e5d12900a508dc1b79dfbd389057136a6d2ab4cb66654961f3106ef
|
3 |
+
size 374531749
|
sam2.1_hiera_tiny_decoder.onnx
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:f594db10b3c7b4d9de7f8854693ea6f7a880e5e228ad08d7823393233e65f4fa
|
3 |
+
size 16525993
|
sam2.1_hiera_tiny_encoder.rknn
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:c3750eef90b87ab63cfefbf4f89858072a4891818c315d96dddeea172119cba1
|
3 |
+
size 339018597
|
test_onnx.py
ADDED
@@ -0,0 +1,195 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
import os
|
2 |
+
os.chdir(os.path.dirname(os.path.abspath(__file__)))
|
3 |
+
|
4 |
+
import numpy as np
|
5 |
+
import torch
|
6 |
+
import onnxruntime
|
7 |
+
from PIL import Image
|
8 |
+
import requests
|
9 |
+
from io import BytesIO
|
10 |
+
import matplotlib.pyplot as plt
|
11 |
+
from sam2.build_sam import build_sam2
|
12 |
+
from sam2.sam2_image_predictor import SAM2ImagePredictor
|
13 |
+
|
14 |
+
|
15 |
+
def load_image(url):
|
16 |
+
"""加载并预处理图片"""
|
17 |
+
response = requests.get(url)
|
18 |
+
image = Image.open(BytesIO(response.content)).convert("RGB")
|
19 |
+
print(f"Original image size: {image.size}")
|
20 |
+
|
21 |
+
# 计算resize后的尺寸,保持长宽比
|
22 |
+
target_size = (1024, 1024)
|
23 |
+
w, h = image.size
|
24 |
+
scale = min(target_size[0] / w, target_size[1] / h)
|
25 |
+
new_w = int(w * scale)
|
26 |
+
new_h = int(h * scale)
|
27 |
+
print(f"Scale factor: {scale}")
|
28 |
+
print(f"Resized dimensions: {new_w}x{new_h}")
|
29 |
+
|
30 |
+
# resize图片
|
31 |
+
resized_image = image.resize((new_w, new_h), Image.Resampling.LANCZOS)
|
32 |
+
|
33 |
+
# 创建1024x1024的黑色背景
|
34 |
+
processed_image = Image.new("RGB", target_size, (0, 0, 0))
|
35 |
+
# 将resized图片粘贴到中心位置
|
36 |
+
paste_x = (target_size[0] - new_w) // 2
|
37 |
+
paste_y = (target_size[1] - new_h) // 2
|
38 |
+
print(f"Paste position: ({paste_x}, {paste_y})")
|
39 |
+
processed_image.paste(resized_image, (paste_x, paste_y))
|
40 |
+
|
41 |
+
# 保存处理后的图片用于检查
|
42 |
+
processed_image.save("debug_processed_image.png")
|
43 |
+
|
44 |
+
# 转换为numpy数组并归一化到[0,1]
|
45 |
+
img_np = np.array(processed_image).astype(np.float32) / 255.0
|
46 |
+
# 调整维度顺序从HWC到CHW
|
47 |
+
img_np = img_np.transpose(2, 0, 1)
|
48 |
+
# 添加batch维度
|
49 |
+
img_np = np.expand_dims(img_np, axis=0)
|
50 |
+
|
51 |
+
print(f"Final input tensor shape: {img_np.shape}")
|
52 |
+
|
53 |
+
return image, img_np, (scale, paste_x, paste_y)
|
54 |
+
|
55 |
+
def prepare_point_input(point_coords, point_labels, image_size=(1024, 1024)):
|
56 |
+
"""准备点击输入数据"""
|
57 |
+
point_coords = np.array(point_coords, dtype=np.float32)
|
58 |
+
point_labels = np.array(point_labels, dtype=np.float32)
|
59 |
+
|
60 |
+
# 添加batch维度
|
61 |
+
point_coords = np.expand_dims(point_coords, axis=0)
|
62 |
+
point_labels = np.expand_dims(point_labels, axis=0)
|
63 |
+
|
64 |
+
# 准备mask输入
|
65 |
+
mask_input = np.zeros((1, 1, 256, 256), dtype=np.float32)
|
66 |
+
has_mask_input = np.zeros(1, dtype=np.float32)
|
67 |
+
orig_im_size = np.array(image_size, dtype=np.int32)
|
68 |
+
|
69 |
+
return point_coords, point_labels, mask_input, has_mask_input, orig_im_size
|
70 |
+
|
71 |
+
def main():
|
72 |
+
# 1. 加载原始图片
|
73 |
+
url = "https://raw.githubusercontent.com/facebookresearch/segment-anything/main/notebooks/images/dog.jpg"
|
74 |
+
orig_image, input_image, (scale, offset_x, offset_y) = load_image(url)
|
75 |
+
|
76 |
+
# 2. 准备输入点 - 需要根据scale和offset调整点击坐标
|
77 |
+
input_point_orig = [[750, 400]]
|
78 |
+
input_point = [[
|
79 |
+
int(x * scale + offset_x),
|
80 |
+
int(y * scale + offset_y)
|
81 |
+
] for x, y in input_point_orig]
|
82 |
+
print(f"Original point: {input_point_orig}")
|
83 |
+
print(f"Transformed point: {input_point}")
|
84 |
+
input_label = [1]
|
85 |
+
|
86 |
+
# 3. 运行PyTorch模型
|
87 |
+
print("Running PyTorch model...")
|
88 |
+
checkpoint = "sam2.1_hiera_large.pt"
|
89 |
+
model_cfg = "configs/sam2.1/sam2.1_hiera_l.yaml"
|
90 |
+
predictor = SAM2ImagePredictor(build_sam2(model_cfg, checkpoint))
|
91 |
+
|
92 |
+
with torch.inference_mode():
|
93 |
+
predictor.set_image(orig_image)
|
94 |
+
masks_pt, iou_scores_pt, low_res_masks_pt = predictor.predict(
|
95 |
+
point_coords=np.array(input_point),
|
96 |
+
point_labels=np.array(input_label),
|
97 |
+
multimask_output=True
|
98 |
+
)
|
99 |
+
|
100 |
+
# 4. 运行ONNX模型
|
101 |
+
print("Running ONNX model...")
|
102 |
+
encoder_path = "sam2.1_hiera_tiny_encoder.s.onnx"
|
103 |
+
decoder_path = "sam2.1_hiera_tiny_decoder.onnx"
|
104 |
+
|
105 |
+
# 创建ONNX Runtime会话
|
106 |
+
encoder_session = onnxruntime.InferenceSession(encoder_path)
|
107 |
+
decoder_session = onnxruntime.InferenceSession(decoder_path)
|
108 |
+
|
109 |
+
# 运行encoder
|
110 |
+
encoder_inputs = {'image': input_image}
|
111 |
+
high_res_feats_0, high_res_feats_1, image_embed = encoder_session.run(None, encoder_inputs)
|
112 |
+
|
113 |
+
# 准备decoder输入
|
114 |
+
point_coords, point_labels, mask_input, has_mask_input, orig_im_size = prepare_point_input(
|
115 |
+
input_point, input_label, orig_image.size[::-1]
|
116 |
+
)
|
117 |
+
|
118 |
+
# 运行decoder
|
119 |
+
decoder_inputs = {
|
120 |
+
'image_embed': image_embed,
|
121 |
+
'high_res_feats_0': high_res_feats_0,
|
122 |
+
'high_res_feats_1': high_res_feats_1,
|
123 |
+
'point_coords': point_coords,
|
124 |
+
'point_labels': point_labels,
|
125 |
+
# 'orig_im_size': orig_im_size,
|
126 |
+
'mask_input': mask_input,
|
127 |
+
'has_mask_input': has_mask_input,
|
128 |
+
}
|
129 |
+
|
130 |
+
low_res_masks, iou_predictions = decoder_session.run(None, decoder_inputs)
|
131 |
+
|
132 |
+
# 后处理: 将low_res_masks缩放到原始图片尺寸
|
133 |
+
w, h = orig_image.size
|
134 |
+
|
135 |
+
# 1. 首先将mask缩放到1024x1024
|
136 |
+
masks_1024 = torch.nn.functional.interpolate(
|
137 |
+
torch.from_numpy(low_res_masks),
|
138 |
+
size=(1024, 1024),
|
139 |
+
mode="bilinear",
|
140 |
+
align_corners=False
|
141 |
+
)
|
142 |
+
|
143 |
+
# 2. 去除padding
|
144 |
+
new_h = int(h * scale)
|
145 |
+
new_w = int(w * scale)
|
146 |
+
start_h = (1024 - new_h) // 2
|
147 |
+
start_w = (1024 - new_w) // 2
|
148 |
+
masks_no_pad = masks_1024[..., start_h:start_h+new_h, start_w:start_w+new_w]
|
149 |
+
|
150 |
+
# 3. 缩放到原始图片尺寸
|
151 |
+
masks_onnx = torch.nn.functional.interpolate(
|
152 |
+
masks_no_pad,
|
153 |
+
size=(h, w),
|
154 |
+
mode="bilinear",
|
155 |
+
align_corners=False
|
156 |
+
)
|
157 |
+
|
158 |
+
# 4. 二值化
|
159 |
+
masks_onnx = masks_onnx > 0.0
|
160 |
+
masks_onnx = masks_onnx.numpy()
|
161 |
+
|
162 |
+
# 在运行ONNX模型后,打印输出的shape
|
163 |
+
print(f"\nOutput shapes:")
|
164 |
+
print(f"PyTorch masks shape: {masks_pt.shape}")
|
165 |
+
print(f"ONNX masks shape: {masks_onnx.shape}")
|
166 |
+
|
167 |
+
# 修改可视化部分,暂时注释掉差异图
|
168 |
+
plt.figure(figsize=(10, 5))
|
169 |
+
|
170 |
+
# PyTorch结果
|
171 |
+
plt.subplot(121)
|
172 |
+
plt.imshow(orig_image)
|
173 |
+
plt.imshow(masks_pt[0], alpha=0.5)
|
174 |
+
plt.plot(input_point_orig[0][0], input_point_orig[0][1], 'rx')
|
175 |
+
plt.title('PyTorch Output')
|
176 |
+
plt.axis('off')
|
177 |
+
|
178 |
+
# ONNX结果
|
179 |
+
plt.subplot(122)
|
180 |
+
plt.imshow(orig_image)
|
181 |
+
plt.imshow(masks_onnx[0,0], alpha=0.5)
|
182 |
+
plt.plot(input_point_orig[0][0], input_point_orig[0][1], 'rx')
|
183 |
+
plt.title('ONNX Output')
|
184 |
+
plt.axis('off')
|
185 |
+
|
186 |
+
plt.tight_layout()
|
187 |
+
plt.show()
|
188 |
+
|
189 |
+
# 6. 打印一些统计信息
|
190 |
+
print("\nStatistics:")
|
191 |
+
print(f"PyTorch IoU scores: {iou_scores_pt}")
|
192 |
+
print(f"ONNX IoU predictions: {iou_predictions}")
|
193 |
+
|
194 |
+
if __name__ == "__main__":
|
195 |
+
main()
|
test_rknn.py
ADDED
@@ -0,0 +1,178 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
import os
|
2 |
+
import time
|
3 |
+
os.chdir(os.path.dirname(os.path.abspath(__file__)))
|
4 |
+
|
5 |
+
import numpy as np
|
6 |
+
import onnxruntime
|
7 |
+
from rknnlite.api import RKNNLite
|
8 |
+
from PIL import Image
|
9 |
+
import matplotlib.pyplot as plt
|
10 |
+
import cv2
|
11 |
+
|
12 |
+
|
13 |
+
def load_image(path):
|
14 |
+
"""加载并预处理图片"""
|
15 |
+
image = Image.open(path).convert("RGB")
|
16 |
+
print(f"Original image size: {image.size}")
|
17 |
+
|
18 |
+
# 计算resize后的尺寸,保持长宽比
|
19 |
+
target_size = (1024, 1024)
|
20 |
+
w, h = image.size
|
21 |
+
scale = min(target_size[0] / w, target_size[1] / h)
|
22 |
+
new_w = int(w * scale)
|
23 |
+
new_h = int(h * scale)
|
24 |
+
print(f"Scale factor: {scale}")
|
25 |
+
print(f"Resized dimensions: {new_w}x{new_h}")
|
26 |
+
|
27 |
+
# resize图片
|
28 |
+
resized_image = image.resize((new_w, new_h), Image.Resampling.LANCZOS)
|
29 |
+
|
30 |
+
# 创建1024x1024的黑色背景
|
31 |
+
processed_image = Image.new("RGB", target_size, (0, 0, 0))
|
32 |
+
# 将resized图片粘贴到中心位置
|
33 |
+
paste_x = (target_size[0] - new_w) // 2
|
34 |
+
paste_y = (target_size[1] - new_h) // 2
|
35 |
+
print(f"Paste position: ({paste_x}, {paste_y})")
|
36 |
+
processed_image.paste(resized_image, (paste_x, paste_y))
|
37 |
+
|
38 |
+
# 保存处理后的图片用于检查
|
39 |
+
processed_image.save("debug_processed_image.png")
|
40 |
+
|
41 |
+
# 转换为numpy数组并归一化到[0,1] # 归一化整合到模型了
|
42 |
+
img_np = np.array(processed_image).astype(np.float32) # / 255.0
|
43 |
+
# 调整维度顺序从HWC到CHW
|
44 |
+
img_np = img_np.transpose(2, 0, 1)
|
45 |
+
# 添加batch维度
|
46 |
+
img_np = np.expand_dims(img_np, axis=0)
|
47 |
+
|
48 |
+
print(f"Final input tensor shape: {img_np.shape}")
|
49 |
+
|
50 |
+
return image, img_np, (scale, paste_x, paste_y)
|
51 |
+
|
52 |
+
def prepare_point_input(point_coords, point_labels, image_size=(1024, 1024)):
|
53 |
+
"""准备点击输入数据"""
|
54 |
+
point_coords = np.array(point_coords, dtype=np.float32)
|
55 |
+
point_labels = np.array(point_labels, dtype=np.float32)
|
56 |
+
|
57 |
+
# 添加batch维度
|
58 |
+
point_coords = np.expand_dims(point_coords, axis=0)
|
59 |
+
point_labels = np.expand_dims(point_labels, axis=0)
|
60 |
+
|
61 |
+
# 准备mask输入
|
62 |
+
mask_input = np.zeros((1, 1, 256, 256), dtype=np.float32)
|
63 |
+
has_mask_input = np.zeros(1, dtype=np.float32)
|
64 |
+
orig_im_size = np.array(image_size, dtype=np.int32)
|
65 |
+
|
66 |
+
return point_coords, point_labels, mask_input, has_mask_input, orig_im_size
|
67 |
+
|
68 |
+
def main():
|
69 |
+
# 1. 加载原始图片
|
70 |
+
path = "dog.jpg"
|
71 |
+
orig_image, input_image, (scale, offset_x, offset_y) = load_image(path)
|
72 |
+
decoder_path = "sam2.1_hiera_small_decoder.onnx"
|
73 |
+
encoder_path = "sam2.1_hiera_small_encoder.rknn"
|
74 |
+
|
75 |
+
# 2. 准备输入点
|
76 |
+
# input_point_orig = [[750, 400]]
|
77 |
+
input_point_orig = [[189, 394]]
|
78 |
+
input_point = [[
|
79 |
+
int(x * scale + offset_x),
|
80 |
+
int(y * scale + offset_y)
|
81 |
+
] for x, y in input_point_orig]
|
82 |
+
input_label = [1]
|
83 |
+
|
84 |
+
# 3. 运行RKNN encoder
|
85 |
+
print("Running RKNN encoder...")
|
86 |
+
rknn_lite = RKNNLite(verbose=False)
|
87 |
+
|
88 |
+
ret = rknn_lite.load_rknn(encoder_path)
|
89 |
+
if ret != 0:
|
90 |
+
print('Load RKNN model failed')
|
91 |
+
exit(ret)
|
92 |
+
|
93 |
+
ret = rknn_lite.init_runtime()
|
94 |
+
if ret != 0:
|
95 |
+
print('Init runtime environment failed')
|
96 |
+
exit(ret)
|
97 |
+
start_time = time.time()
|
98 |
+
encoder_outputs = rknn_lite.inference(inputs=[input_image], data_format="nchw")
|
99 |
+
end_time = time.time()
|
100 |
+
print(f"RKNN encoder time: {end_time - start_time} seconds")
|
101 |
+
high_res_feats_0, high_res_feats_1, image_embed = encoder_outputs
|
102 |
+
rknn_lite.release()
|
103 |
+
|
104 |
+
# 4. 运行ONNX decoder
|
105 |
+
print("Running ONNX decoder...")
|
106 |
+
decoder_session = onnxruntime.InferenceSession(decoder_path)
|
107 |
+
|
108 |
+
point_coords, point_labels, mask_input, has_mask_input, orig_im_size = prepare_point_input(
|
109 |
+
input_point, input_label, orig_image.size[::-1]
|
110 |
+
)
|
111 |
+
|
112 |
+
decoder_inputs = {
|
113 |
+
'image_embed': image_embed,
|
114 |
+
'high_res_feats_0': high_res_feats_0,
|
115 |
+
'high_res_feats_1': high_res_feats_1,
|
116 |
+
'point_coords': point_coords,
|
117 |
+
'point_labels': point_labels,
|
118 |
+
'mask_input': mask_input,
|
119 |
+
'has_mask_input': has_mask_input,
|
120 |
+
}
|
121 |
+
start_time = time.time()
|
122 |
+
low_res_masks, iou_predictions = decoder_session.run(None, decoder_inputs)
|
123 |
+
end_time = time.time()
|
124 |
+
print(f"ONNX decoder time: {end_time - start_time} seconds")
|
125 |
+
print(low_res_masks.shape)
|
126 |
+
# 5. 后处理
|
127 |
+
w, h = orig_image.size
|
128 |
+
masks_rknn = []
|
129 |
+
|
130 |
+
# 处理所有3个mask
|
131 |
+
for i in range(low_res_masks.shape[1]):
|
132 |
+
# 将mask缩放到1024x1024
|
133 |
+
masks_1024 = cv2.resize(
|
134 |
+
low_res_masks[0,i],
|
135 |
+
(1024, 1024),
|
136 |
+
interpolation=cv2.INTER_LINEAR
|
137 |
+
)
|
138 |
+
|
139 |
+
# 去除padding
|
140 |
+
new_h = int(h * scale)
|
141 |
+
new_w = int(w * scale)
|
142 |
+
start_h = (1024 - new_h) // 2
|
143 |
+
start_w = (1024 - new_w) // 2
|
144 |
+
masks_no_pad = masks_1024[start_h:start_h+new_h, start_w:start_w+new_w]
|
145 |
+
|
146 |
+
# 缩放到原始图片尺寸
|
147 |
+
mask = cv2.resize(
|
148 |
+
masks_no_pad,
|
149 |
+
(w, h),
|
150 |
+
interpolation=cv2.INTER_LINEAR
|
151 |
+
)
|
152 |
+
|
153 |
+
# 二值化
|
154 |
+
mask = mask > 0.0
|
155 |
+
masks_rknn.append(mask)
|
156 |
+
|
157 |
+
# 6. 可视化结果
|
158 |
+
plt.figure(figsize=(15, 5))
|
159 |
+
|
160 |
+
# 获取IoU分数排序的索引
|
161 |
+
sorted_indices = np.argsort(iou_predictions[0])[::-1] # 降序排序
|
162 |
+
|
163 |
+
for idx, mask_idx in enumerate(sorted_indices):
|
164 |
+
plt.subplot(1, 3, idx + 1)
|
165 |
+
plt.imshow(orig_image)
|
166 |
+
plt.imshow(masks_rknn[mask_idx], alpha=0.5)
|
167 |
+
plt.plot(input_point_orig[0][0], input_point_orig[0][1], 'rx')
|
168 |
+
plt.title(f'Mask {mask_idx+1}\nIoU: {iou_predictions[0][mask_idx]:.3f}')
|
169 |
+
plt.axis('off')
|
170 |
+
|
171 |
+
plt.tight_layout()
|
172 |
+
# plt.show()
|
173 |
+
plt.savefig("result.png")
|
174 |
+
|
175 |
+
print(f"\nIoU predictions: {iou_predictions}")
|
176 |
+
|
177 |
+
if __name__ == "__main__":
|
178 |
+
main()
|