File size: 3,782 Bytes
cb6b726 d4276de cb6b726 e779b1d cb6b726 e779b1d cb6b726 68da5b8 cb6b726 e779b1d cb6b726 68da5b8 cb6b726 68da5b8 d4276de |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 |
---
license: apache-2.0
---
# SynthPose (MMPose HRNet48+DarkPose variant)
The SynthPose model was proposed in [OpenCapBench: A Benchmark to Bridge Pose Estimation and Biomechanics](https://arxiv.org/abs/2406.09788) by Yoni Gozlan, Antoine Falisse, Scott Uhlrich, Anthony Gatti, Michael Black, Akshay Chaudhari.
# Intended use cases
This model uses DarkPose with an HRNet backbone.
SynthPose is a new approach that enables finetuning of pre-trained 2D human pose models to predict an arbitrarily denser set of keypoints for accurate kinematic analysis through the use of synthetic data.
More details are available in [OpenCapBench: A Benchmark to Bridge Pose Estimation and Biomechanics](https://arxiv.org/abs/2406.09788).
This particular variant was finetuned on a set of keypoints usually found on motion capture setups, and include coco keypoints as well.
The model predicts the following 52 markers:
```
[
'nose',
'left_eye',
'right_eye',
'left_ear',
'right_ear',
'left_shoulder',
'right_shoulder',
'left_elbow',
'right_elbow',
'left_wrist',
'right_wrist',
'left_hip',
'right_hip',
'left_knee',
'right_knee',
'left_ankle',
'right_ankle',
'sternum',
'rshoulder',
'lshoulder',
'r_lelbow',
'l_lelbow',
'r_melbow',
'l_melbow',
'r_lwrist',
'l_lwrist',
'r_mwrist',
'l_mwrist',
'r_ASIS',
'l_ASIS',
'r_PSIS',
'l_PSIS',
'r_knee',
'l_knee',
'r_mknee',
'l_mknee',
'r_ankle',
'l_ankle',
'r_mankle',
'l_mankle',
'r_5meta',
'l_5meta',
'r_toe',
'l_toe',
'r_big_toe',
'l_big_toe',
'l_calc',
'r_calc',
'C7',
'L2',
'T11',
'T6',
]
```
Where the first 17 keypoints are the COCO keypoints, and the next 35 are anatomical markers.
# Usage
## Installation
This implementation is based on [MMPose](https://mmpose.readthedocs.io/en/latest/).
MMpose requires torch, and the installation process is the following:
```bash
pip install -U openmim
mim install mmengine
mim install "mmcv>=2.0.1"
mim install "mmdet>=3.1.0"
mim install "mmpose>=1.1.0"
```
## Image inference
Here's how to load the model and run inference on an image:
```python
from huggingface_hub import snapshot_download
from mmpose.apis import MMPoseInferencer
snapshot_download(repo_id="yonigozlan/synthpose-hrnet-48-mmpose", local_dir="./synthpose-hrnet-48-mmpose")
inferencer = MMPoseInferencer(
pose2d='./synthpose-hrnet-48-mmpose/td-hm_hrnet-w48_dark-8xb32-210e_synthpose_inference.py',
pose2d_weights='./synthpose-hrnet-48-mmpose/hrnet-w48_dark.pth'
)
url = "http://farm4.staticflickr.com/3300/3416216247_f9c6dfc939_z.jpg"
result_generator = inferencer([url], pred_out_dir='predictions', vis_out_dir='visualizations')
results = next(result_generator)
```
The following visualization will be saved:
<p>
<img src="inference_example.jpg" width=375>
</p>
Where the keypoints part of the skeleton are the COCO keypoints, and the pink ones the anatomical markers.
## Video inference
To run inference on a video, simply replace the last two lines with
```python
result_generator = inferencer("football.mp4", pred_out_dir='predictions', vis_out_dir='visualizations')
results = [result for result in result_generator]
```
## Training
Finetuning a model using SynthPose can be done by adapting the `td-hm_hrnet-w48_dark-8xb32-210e_merge_bedlam_infinity_coco_3DPW_eval_rich-384x288_pretrained.py` config on the following [MMPose fork](https://github.com/yonigozlan/mmpose).
To create annotations on a synthetic dataset (such as BEDLAM) using SynthPose, the tools present in [this repository](https://github.com/yonigozlan/OpenCapBench/tree/main/synthpose) can be used (better documentation to come).
|