Hulk: A Universal Knowledge Translator for Human-centric Tasks

[Yizhou Wang](https://scholar.google.com/citations?user=CQGaGMAAAAAJ&hl=zh-CN&authuser=1)^1*, [Yixuan Wu](https://scholar.google.com/citations?user=zjAxJcwAAAAJ&hl=en&oi=ao)^1*,2, [Shixiang Tang](https://github.com/tangshixiang)^{1 :email:}, [Weizhen He]()^2,3, [Xun Guo](https://github.com/Space-Xun)^1,4, [Feng Zhu](https://zhufengx.github.io/)³, [Lei Bai](http://leibai.site/)¹, [Rui Zhao](http://zhaorui.xyz/)³, [Jian Wu]()², [Tong He](http://tonghe90.github.io/)¹, [Wanli Ouyang](https://wlouyang.github.io/)¹ ¹[Shanghai AI Lab](https://www.shlab.org.cn/), ²[ZJU](https://www.zju.edu.cn/), ³[SenseTime](https://www.sensetime.com), ⁴[USTC](https://www.ustc.edu.cn/) [ArXiv](https://arxiv.org/abs/2312.01697) | [Project Page](https://humancentricmodels.github.io/Hulk/) [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/hulk-a-universal-knowledge-translator-for/pose-estimation-on-aic)](https://paperswithcode.com/sota/pose-estimation-on-aic?p=hulk-a-universal-knowledge-translator-for) [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/hulk-a-universal-knowledge-translator-for/human-part-segmentation-on-cihp)](https://paperswithcode.com/sota/human-part-segmentation-on-cihp?p=hulk-a-universal-knowledge-translator-for) [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/hulk-a-universal-knowledge-translator-for/skeleton-based-action-recognition-on-ntu-rgbd)](https://paperswithcode.com/sota/skeleton-based-action-recognition-on-ntu-rgbd?p=hulk-a-universal-knowledge-translator-for) [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/hulk-a-universal-knowledge-translator-for/semantic-segmentation-on-lip-val)](https://paperswithcode.com/sota/semantic-segmentation-on-lip-val?p=hulk-a-universal-knowledge-translator-for) [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/hulk-a-universal-knowledge-translator-for/human-part-segmentation-on-human3-6m)](https://paperswithcode.com/sota/human-part-segmentation-on-human3-6m?p=hulk-a-universal-knowledge-translator-for) [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/hulk-a-universal-knowledge-translator-for/pedestrian-attribute-recognition-on-rapv2)](https://paperswithcode.com/sota/pedestrian-attribute-recognition-on-rapv2?p=hulk-a-universal-knowledge-translator-for) [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/hulk-a-universal-knowledge-translator-for/pedestrian-attribute-recognition-on-pa-100k)](https://paperswithcode.com/sota/pedestrian-attribute-recognition-on-pa-100k?p=hulk-a-universal-knowledge-translator-for) [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/hulk-a-universal-knowledge-translator-for/pose-estimation-on-coco)](https://paperswithcode.com/sota/pose-estimation-on-coco?p=hulk-a-universal-knowledge-translator-for) [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/hulk-a-universal-knowledge-translator-for/object-detection-on-crowdhuman-full-body)](https://paperswithcode.com/sota/object-detection-on-crowdhuman-full-body?p=hulk-a-universal-knowledge-translator-for)

Task	pedestrian detection	2D pose	skeleton-based action	human parsing	attribute recognition	image caption	monocular 3D human pose and mesh recovery
Dataset	CrowdHuman	COCO	AIC	NTU60-XSub	H3.6M	LIP	CIHP	PA-100k	RAPv2	CUHK-PEDES	3DPW	H3.6M
Metric	mAP	MR^-2	JI	AP	AP	acc.	mIoU	mIoU	mIoU	mA	mA	B@4	MPVPE↓	MPJPE↓	PA-MPJPE↓	MPJPE↓	PA-MPJPE↓
Hulk (ViT-B)	90.7	43.8	84.0	77.0	34.5	93.8	68.08	63.95	70.58	82.85	80.90	31.1	79.8	67.0	39.9	43.6	31.9
Hulk (ViT-L)	92.2	40.1	85.8	78.3	36.3	94.1	69.31	65.86	72.33	84.36	82.85	31.6	77.4	66.3	38.5	40.3	28.8

Task

pedestrian detection

2D pose

skeleton-based action

human parsing

attribute recognition

image caption

monocular 3D human pose and mesh recovery

Dataset

CrowdHuman

COCO

AIC

NTU60-XSub

H3.6M

LIP

CIHP

PA-100k

RAPv2

CUHK-PEDES

3DPW

H3.6M

Metric

mAP

MR^-2

acc.

mIoU

B@4

MPVPE↓

MPJPE↓

PA-MPJPE↓

MPJPE↓

PA-MPJPE↓

Hulk (ViT-B)

90.7

43.8

84.0

77.0

34.5

93.8

68.08

63.95

70.58

82.85

80.90

31.1

79.8

67.0

39.9

43.6

31.9

Hulk (ViT-L)

92.2

40.1

85.8

78.3

36.3

94.1

69.31

65.86

72.33

84.36

82.85

31.6

77.4

66.3

38.5

40.3

28.8

Task	pedestrian detection	2D pose	skeleton-based action	human parsing	attribute recognition	image caption ♣	monocular 3D human pose and mesh recovery ♣
Dataset	CrowdHuman	COCO	AIC	NTU60-XSub	H3.6M	LIP	CIHP	PA-100k	RAPv2	CUHK-PEDES	3DPW	H3.6M
Metric	mAP	MR^-2	JI	AP	AP	acc.	mIoU	mIoU	mIoU	mA	mA	B@4	MPVPE↓	MPJPE↓	PA-MPJPE↓	MPJPE↓	PA-MPJPE↓
Hulk (ViT-B)	92.4	40.7	86.0	77.5	35.6	94.0	68.56	63.98	71.26	87.85	85.26	28.3	80.7	68.9	41.3	44.9	32.0
Hulk (ViT-L)	93.0	36.5	87.0	78.7	37.1	94.3	69.89	66.02	72.68	88.97	85.86	30.5	79.9	68.3	40.6	41.4	30.2

Task

pedestrian detection

2D pose

skeleton-based action

human parsing

attribute recognition

image caption ♣

monocular 3D human pose and mesh recovery ♣

Dataset

CrowdHuman

COCO

AIC

NTU60-XSub

H3.6M

LIP

CIHP

PA-100k

RAPv2

CUHK-PEDES

3DPW

H3.6M

Metric

mAP

MR^-2

acc.

mIoU

B@4

MPVPE↓

MPJPE↓

PA-MPJPE↓

MPJPE↓

PA-MPJPE↓

Hulk (ViT-B)

92.4

40.7

86.0

77.5

35.6

94.0

68.56

63.98

71.26

87.85

85.26

28.3

80.7

68.9

41.3

44.9

32.0

Hulk (ViT-L)

93.0

36.5

87.0

78.7

37.1

94.3

69.89

66.02

72.68

88.97

85.86

30.5

79.9

68.3

40.6

41.4

30.2