Spaces:

ThunderVVV
/

HaWoR

Runtime error

App Files Files Community

ThunderVVV commited on Dec 30, 2024

Commit

5f028d6

1 Parent(s): 014faee

update

Browse files

This view is limited to 50 files because it contains too many changes. See raw diff

Files changed (50) hide show

.gitattributes +2 -0
README.md +94 -12
_DATA/data/mano/.gitkeep +0 -0
_DATA/data/mano/MANO_RIGHT.pkl +3 -0
_DATA/data/mano_mean_params.npz +3 -0
_DATA/data_left/mano_left/.gitkeep +0 -0
_DATA/data_left/mano_left/MANO_LEFT.pkl +3 -0
assets/teaser.png +3 -0
demo.py +113 -0
example/video_0.mp4 +3 -0
hawor/configs/__init__.py +120 -0
hawor/configs/__pycache__/__init__.cpython-310.pyc +0 -0
hawor/utils/__pycache__/geometry.cpython-310.pyc +0 -0
hawor/utils/__pycache__/process.cpython-310.pyc +0 -0
hawor/utils/__pycache__/pylogger.cpython-310.pyc +0 -0
hawor/utils/__pycache__/render_openpose.cpython-310.pyc +0 -0
hawor/utils/__pycache__/rotation.cpython-310.pyc +0 -0
hawor/utils/geometry.py +102 -0
hawor/utils/process.py +198 -0
hawor/utils/pylogger.py +17 -0
hawor/utils/render_openpose.py +225 -0
hawor/utils/rotation.py +293 -0
imgui.ini +15 -0
infiller/hand_utils/geometry.py +412 -0
infiller/hand_utils/geometry_utils.py +102 -0
infiller/hand_utils/mano_wrapper.py +52 -0
infiller/hand_utils/process.py +171 -0
infiller/hand_utils/rotation.py +293 -0
infiller/lib/misc/sampler.py +79 -0
infiller/lib/model/__pycache__/network.cpython-310.pyc +0 -0
infiller/lib/model/network.py +276 -0
infiller/lib/model/positional_encoding.py +42 -0
infiller/lib/model/preprocess.py +189 -0
infiller/lib/model/skeleton.py +349 -0
infiller/lib/vis/pose.py +248 -0
lib/core/__pycache__/constants.cpython-310.pyc +0 -0
lib/core/constants.py +78 -0
lib/datasets/__pycache__/track_dataset.cpython-310.pyc +0 -0
lib/datasets/track_dataset.py +78 -0
lib/eval_utils/__pycache__/custom_utils.cpython-310.pyc +0 -0
lib/eval_utils/__pycache__/filling_utils.cpython-310.pyc +0 -0
lib/eval_utils/custom_utils.py +99 -0
lib/eval_utils/filling_utils.py +306 -0
lib/eval_utils/video_utils.py +85 -0
lib/models/__pycache__/hawor.cpython-310.pyc +0 -0
lib/models/__pycache__/mano_wrapper.cpython-310.pyc +0 -0
lib/models/__pycache__/modules.cpython-310.pyc +0 -0
lib/models/backbones/__init__.py +8 -0
lib/models/backbones/__pycache__/__init__.cpython-310.pyc +0 -0
lib/models/backbones/__pycache__/vit.cpython-310.pyc +0 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,5 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+*.mp4 filter=lfs diff=lfs merge=lfs -text
+*.png filter=lfs diff=lfs merge=lfs -text

README.md CHANGED Viewed

@@ -1,12 +1,94 @@
----
-title: HaWoR
-emoji: 👁
-colorFrom: green
-colorTo: indigo
-sdk: gradio
-sdk_version: 5.9.1
-app_file: app.py
-pinned: false
----
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

+<div align="center">
+# HaWoR: World-Space Hand Motion Reconstruction from Egocentric Videos
+[Jinglei Zhang]()<sup>1</sup> &emsp; [Jiankang Deng](https://jiankangdeng.github.io/)<sup>2</sup> &emsp; [Chao Ma](https://scholar.google.com/citations?user=syoPhv8AAAAJ&hl=en)<sup>1</sup> &emsp; [Rolandos Alexandros Potamias](https://rolpotamias.github.io)<sup>2</sup> &emsp;
+<sup>1</sup>Shanghai Jiao Tong University, China
+<sup>2</sup>Imperial College London, UK <br>
+<a href='https://hawor-project.github.io/'><img src='https://img.shields.io/badge/Project-Page-blue'></a>
+<a href='https://arxiv.org/abs/'><img src='https://img.shields.io/badge/Paper-arXiv-red'></a>
+</div>
+This is the official implementation of **[HaWoR](https://hawor-project.github.io/)**, a hand reconstruction model in the world coordinates:
+![teaser](assets/teaser.png)
+## Installation
+### Installation
+```
+git clone --recursive https://github.com/ThunderVVV/HaWoR.git
+cd HaWoR
+```
+The code has been tested with PyTorch 1.13 and CUDA 11.7. It is suggested to use an anaconda environment to install the the required dependencies:
+```bash
+conda create --name hawor python=3.10
+conda activate hawor
+pip install torch==1.13.0+cu117 torchvision==0.14.0+cu117 --extra-index-url https://download.pytorch.org/whl/cu117
+# Install requirements
+pip install -r requirements.txt
+pip install pytorch-lightning==2.2.4 --no-deps
+pip install lightning-utilities torchmetrics==1.4.0
+```
+### Install masked DROID-SLAM:
+```
+cd thirdparty/DROID-SLAM
+python setup.py install
+```
+Download DROID-SLAM official weights [droid.pth](https://drive.google.com/file/d/1PpqVt1H4maBa_GbPJp4NwxRsd9jk-elh/view?usp=sharing), put it under `./weights/external/`.
+### Install Metric3D
+Download Metric3D official weights [metric_depth_vit_large_800k.pth](https://drive.google.com/file/d/1eT2gG-kwsVzNy5nJrbm4KC-9DbNKyLnr/view?usp=drive_link), put it under `thirdparty/Metric3D/weights`.
+### Download the model weights
+```bash
+wget https://huggingface.co/spaces/rolpotamias/WiLoR/resolve/main/pretrained_models/detector.pt -P ./weights/external/
+wget https://huggingface.co/ThunderVVV/HaWoR/resolve/main/hawor/checkpoints/hawor.ckpt -P ./weights/hawor/checkpoints/
+wget https://huggingface.co/ThunderVVV/HaWoR/resolve/main/hawor/checkpoints/infiller.pt -P ./weights/hawor/checkpoints/
+wget https://huggingface.co/ThunderVVV/HaWoR/resolve/main/hawor/model_config.yaml -P ./weights/hawor/
+```
+It is also required to download MANO model from [MANO website](https://mano.is.tue.mpg.de).
+Create an account by clicking Sign Up and download the models (mano_v*_*.zip). Unzip and put the hand model to the `_DATA/data/mano/MANO_RIGHT.pkl` and `_DATA/data_left/mano_left/MANO_LEFT.pkl`.
+Note that MANO model falls under the [MANO license](https://mano.is.tue.mpg.de/license.html).
+## Demo
+For visualizaiton in world view, run with:
+```bash
+python demo.py --video_path ./example/video_0.mp4  --vis_mode world
+```
+For visualizaiton in camera view, run with:
+```bash
+python demo.py --video_path ./example/video_0.mp4 --vis_mode cam
+```
+## Training
+The training code will be released soon.
+## Acknowledgements
+Parts of the code are taken or adapted from the following repos:
+- [HaMeR](https://github.com/geopavlakos/hamer/)
+- [WiLoR](https://github.com/rolpotamias/WiLoR)
+- [SLAHMR](https://github.com/vye16/slahmr)
+- [TRAM](https://github.com/yufu-wang/tram)
+- [CMIB](https://github.com/jihoonerd/Conditional-Motion-In-Betweening)
+## License
+HaWoR models fall under the [CC-BY-NC--ND License](./license.txt). This repository depends also on [MANO Model](https://mano.is.tue.mpg.de/license.html), which are fall under their own licenses. By using this repository, you must also comply with the terms of these external licenses.
+## Citing
+If you find HaWoR useful for your research, please consider citing our paper:
+```bibtex
+```

_DATA/data/mano/.gitkeep ADDED Viewed

File without changes

_DATA/data/mano/MANO_RIGHT.pkl ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:45d60aa3b27ef9107a7afd4e00808f307fd91111e1cfa35afd5c4a62de264767
+size 3821356

_DATA/data/mano_mean_params.npz ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:efc0ec58e4a5cef78f3abfb4e8f91623b8950be9eff8b8e0dbb0d036ebc63988
+size 1178

_DATA/data_left/mano_left/.gitkeep ADDED Viewed

File without changes

_DATA/data_left/mano_left/MANO_LEFT.pkl ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:c4022f7083f2ca7c78b2b3d595abbab52debd32b09d372b16923a801f0ea6a30
+size 3821391

assets/teaser.png ADDED Viewed

Git LFS Details

SHA256: 6b33d76e9a10f215f0777612dd32ac73a5ce3b0e8735813968e7048ecd1ed3a1
Pointer size: 132 Bytes
Size of remote file: 1.12 MB

demo.py ADDED Viewed

	@@ -0,0 +1,113 @@

+import argparse
+import sys
+import os
+import torch
+sys.path.insert(0, os.path.dirname(__file__))
+import numpy as np
+import joblib
+from scripts.scripts_test_video.detect_track_video import detect_track_video
+from scripts.scripts_test_video.hawor_video import hawor_motion_estimation, hawor_infiller
+from scripts.scripts_test_video.hawor_slam import hawor_slam
+from hawor.utils.process import get_mano_faces, run_mano, run_mano_left
+from lib.eval_utils.custom_utils import load_slam_cam
+from lib.vis.run_vis2 import run_vis2_on_video, run_vis2_on_video_cam
+if __name__ == '__main__':
+    parser = argparse.ArgumentParser()
+    parser.add_argument("--img_focal", type=float)
+    parser.add_argument("--video_path", type=str, default='example/video_0.mp4')
+    parser.add_argument("--input_type", type=str, default='file')
+    parser.add_argument("--checkpoint",  type=str, default='./weights/hawor/checkpoints/hawor.ckpt')
+    parser.add_argument("--infiller_weight",  type=str, default='./weights/hawor/checkpoints/infiller.pt')
+    parser.add_argument("--vis_mode",  type=str, default='world', help='cam | world')
+    args = parser.parse_args()
+    start_idx, end_idx, seq_folder, imgfiles = detect_track_video(args)
+    frame_chunks_all, img_focal = hawor_motion_estimation(args, start_idx, end_idx, seq_folder)
+    hawor_slam(args, start_idx, end_idx)
+    slam_path = os.path.join(seq_folder, f"SLAM/hawor_slam_w_scale_{start_idx}_{end_idx}.npz")
+    R_w2c_sla_all, t_w2c_sla_all, R_c2w_sla_all, t_c2w_sla_all = load_slam_cam(slam_path)
+    pred_trans, pred_rot, pred_hand_pose, pred_betas, pred_valid = hawor_infiller(args, start_idx, end_idx, frame_chunks_all)
+    # vis sequence for this video
+    hand2idx = {
+        "right": 1,
+        "left": 0
+    }
+    vis_start = 0
+    vis_end = pred_trans.shape[1] - 1
+    # get faces
+    faces = get_mano_faces()
+    faces_new = np.array([[92, 38, 234],
+            [234, 38, 239],
+            [38, 122, 239],
+            [239, 122, 279],
+            [122, 118, 279],
+            [279, 118, 215],
+            [118, 117, 215],
+            [215, 117, 214],
+            [117, 119, 214],
+            [214, 119, 121],
+            [119, 120, 121],
+            [121, 120, 78],
+            [120, 108, 78],
+            [78, 108, 79]])
+    faces_right = np.concatenate([faces, faces_new], axis=0)
+    # get right hand vertices
+    hand = 'right'
+    hand_idx = hand2idx[hand]
+    pred_glob_r = run_mano(pred_trans[hand_idx:hand_idx+1, vis_start:vis_end], pred_rot[hand_idx:hand_idx+1, vis_start:vis_end], pred_hand_pose[hand_idx:hand_idx+1, vis_start:vis_end], betas=pred_betas[hand_idx:hand_idx+1, vis_start:vis_end])
+    right_verts = pred_glob_r['vertices'][0]
+    right_dict = {
+            'vertices': right_verts.unsqueeze(0),
+            'faces': faces_right,
+        }
+    # get left hand vertices
+    faces_left = faces_right[:,[0,2,1]]
+    hand = 'left'
+    hand_idx = hand2idx[hand]
+    pred_glob_l = run_mano_left(pred_trans[hand_idx:hand_idx+1, vis_start:vis_end], pred_rot[hand_idx:hand_idx+1, vis_start:vis_end], pred_hand_pose[hand_idx:hand_idx+1, vis_start:vis_end], betas=pred_betas[hand_idx:hand_idx+1, vis_start:vis_end])
+    left_verts = pred_glob_l['vertices'][0]
+    left_dict = {
+            'vertices': left_verts.unsqueeze(0),
+            'faces': faces_left,
+        }
+    R_x = torch.tensor([[1,  0,  0],
+                        [0, -1,  0],
+                        [0,  0, -1]]).float()
+    R_c2w_sla_all = torch.einsum('ij,njk->nik', R_x, R_c2w_sla_all)
+    t_c2w_sla_all = torch.einsum('ij,nj->ni', R_x, t_c2w_sla_all)
+    R_w2c_sla_all = R_c2w_sla_all.transpose(-1, -2)
+    t_w2c_sla_all = -torch.einsum("bij,bj->bi", R_w2c_sla_all, t_c2w_sla_all)
+    left_dict['vertices'] = torch.einsum('ij,btnj->btni', R_x, left_dict['vertices'].cpu())
+    right_dict['vertices'] = torch.einsum('ij,btnj->btni', R_x, right_dict['vertices'].cpu())
+    # Here we use aitviewer(https://github.com/eth-ait/aitviewer) for simple visualization.
+    if args.vis_mode == 'world':
+        output_pth = os.path.join(seq_folder, f"vis_{vis_start}_{vis_end}")
+        if not os.path.exists(output_pth):
+            os.makedirs(output_pth)
+        image_names = imgfiles[vis_start:vis_end]
+        print(f"vis {vis_start} to {vis_end}")
+        run_vis2_on_video(left_dict, right_dict, output_pth, img_focal, image_names, R_c2w=R_c2w_sla_all[vis_start:vis_end], t_c2w=t_c2w_sla_all[vis_start:vis_end])
+    elif args.vis_mode == 'cam':
+        output_pth = os.path.join(seq_folder, f"vis_{vis_start}_{vis_end}")
+        if not os.path.exists(output_pth):
+            os.makedirs(output_pth)
+        image_names = imgfiles[vis_start:vis_end]
+        print(f"vis {vis_start} to {vis_end}")
+        run_vis2_on_video_cam(left_dict, right_dict, output_pth, img_focal, image_names, R_w2c=R_w2c_sla_all[vis_start:vis_end], t_w2c=t_w2c_sla_all[vis_start:vis_end])
+    print("finish")

example/video_0.mp4 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:13ff124a68e4b48190e0c3f0ce9f38db59c5e3bb8a093b3c7fc9c67276be2062
+size 6515891

hawor/configs/__init__.py ADDED Viewed

	@@ -0,0 +1,120 @@

+import os
+from typing import Dict
+from yacs.config import CfgNode as CN
+CACHE_DIR_HAWOR = "./_DATA"
+def to_lower(x: Dict) -> Dict:
+    """
+    Convert all dictionary keys to lowercase
+    Args:
+      x (dict): Input dictionary
+    Returns:
+      dict: Output dictionary with all keys converted to lowercase
+    """
+    return {k.lower(): v for k, v in x.items()}
+_C = CN(new_allowed=True)
+_C.GENERAL = CN(new_allowed=True)
+_C.GENERAL.RESUME = True
+_C.GENERAL.TIME_TO_RUN = 3300
+_C.GENERAL.VAL_STEPS = 100
+_C.GENERAL.LOG_STEPS = 100
+_C.GENERAL.CHECKPOINT_STEPS = 20000
+_C.GENERAL.CHECKPOINT_DIR = "checkpoints"
+_C.GENERAL.SUMMARY_DIR = "tensorboard"
+_C.GENERAL.NUM_GPUS = 1
+_C.GENERAL.NUM_WORKERS = 4
+_C.GENERAL.MIXED_PRECISION = True
+_C.GENERAL.ALLOW_CUDA = True
+_C.GENERAL.PIN_MEMORY = False
+_C.GENERAL.DISTRIBUTED = False
+_C.GENERAL.LOCAL_RANK = 0
+_C.GENERAL.USE_SYNCBN = False
+_C.GENERAL.WORLD_SIZE = 1
+_C.TRAIN = CN(new_allowed=True)
+_C.TRAIN.NUM_EPOCHS = 100
+_C.TRAIN.BATCH_SIZE = 32
+_C.TRAIN.SHUFFLE = True
+_C.TRAIN.WARMUP = False
+_C.TRAIN.NORMALIZE_PER_IMAGE = False
+_C.TRAIN.CLIP_GRAD = False
+_C.TRAIN.CLIP_GRAD_VALUE = 1.0
+_C.LOSS_WEIGHTS = CN(new_allowed=True)
+_C.DATASETS = CN(new_allowed=True)
+_C.MODEL = CN(new_allowed=True)
+_C.MODEL.IMAGE_SIZE = 224
+_C.EXTRA = CN(new_allowed=True)
+_C.EXTRA.FOCAL_LENGTH = 5000
+_C.DATASETS.CONFIG = CN(new_allowed=True)
+_C.DATASETS.CONFIG.SCALE_FACTOR = 0.3
+_C.DATASETS.CONFIG.ROT_FACTOR = 30
+_C.DATASETS.CONFIG.TRANS_FACTOR = 0.02
+_C.DATASETS.CONFIG.COLOR_SCALE = 0.2
+_C.DATASETS.CONFIG.ROT_AUG_RATE = 0.6
+_C.DATASETS.CONFIG.TRANS_AUG_RATE = 0.5
+_C.DATASETS.CONFIG.DO_FLIP = False
+_C.DATASETS.CONFIG.FLIP_AUG_RATE = 0.5
+_C.DATASETS.CONFIG.EXTREME_CROP_AUG_RATE = 0.10
+def default_config() -> CN:
+    """
+    Get a yacs CfgNode object with the default config values.
+    """
+    # Return a clone so that the defaults will not be altered
+    # This is for the "local variable" use pattern
+    return _C.clone()
+def dataset_config() -> CN:
+    """
+    Get dataset config file
+    Returns:
+      CfgNode: Dataset config as a yacs CfgNode object.
+    """
+    cfg = CN(new_allowed=True)
+    config_file = os.path.join(os.path.dirname(os.path.realpath(__file__)), 'datasets_tar.yaml')
+    cfg.merge_from_file(config_file)
+    cfg.freeze()
+    return cfg
+def dataset_eval_config() -> CN:
+    cfg = CN(new_allowed=True)
+    config_file = os.path.join(os.path.dirname(os.path.realpath(__file__)), 'datasets_eval.yaml')
+    cfg.merge_from_file(config_file)
+    cfg.freeze()
+    return cfg
+def get_config(config_file: str, merge: bool = True, update_cachedir: bool = False) -> CN:
+    """
+    Read a config file and optionally merge it with the default config file.
+    Args:
+      config_file (str): Path to config file.
+      merge (bool): Whether to merge with the default config or not.
+    Returns:
+      CfgNode: Config as a yacs CfgNode object.
+    """
+    if merge:
+      cfg = default_config()
+    else:
+      cfg = CN(new_allowed=True)
+    cfg.merge_from_file(config_file)
+    if update_cachedir:
+      def update_path(path: str) -> str:
+        if os.path.basename(CACHE_DIR_HAWOR) in path:
+          return path
+        if os.path.isabs(path):
+          return path
+        return os.path.join(CACHE_DIR_HAWOR, path)
+      cfg.MANO.MODEL_PATH = update_path(cfg.MANO.MODEL_PATH)
+      cfg.MANO.MEAN_PARAMS = update_path(cfg.MANO.MEAN_PARAMS)
+    cfg.freeze()
+    return cfg

hawor/configs/__pycache__/__init__.cpython-310.pyc ADDED Viewed

Binary file (3.55 kB). View file

hawor/utils/__pycache__/geometry.cpython-310.pyc ADDED Viewed

Binary file (4.09 kB). View file

hawor/utils/__pycache__/process.cpython-310.pyc ADDED Viewed

Binary file (5.54 kB). View file

hawor/utils/__pycache__/pylogger.cpython-310.pyc ADDED Viewed

Binary file (655 Bytes). View file

hawor/utils/__pycache__/render_openpose.cpython-310.pyc ADDED Viewed

Binary file (7.24 kB). View file

hawor/utils/__pycache__/rotation.cpython-310.pyc ADDED Viewed

Binary file (7.65 kB). View file

hawor/utils/geometry.py ADDED Viewed

	@@ -0,0 +1,102 @@

+from typing import Optional
+import torch
+from torch.nn import functional as F
+def aa_to_rotmat(theta: torch.Tensor):
+    """
+    Convert axis-angle representation to rotation matrix.
+    Works by first converting it to a quaternion.
+    Args:
+        theta (torch.Tensor): Tensor of shape (B, 3) containing axis-angle representations.
+    Returns:
+        torch.Tensor: Corresponding rotation matrices with shape (B, 3, 3).
+    """
+    norm = torch.norm(theta + 1e-8, p = 2, dim = 1)
+    angle = torch.unsqueeze(norm, -1)
+    normalized = torch.div(theta, angle)
+    angle = angle * 0.5
+    v_cos = torch.cos(angle)
+    v_sin = torch.sin(angle)
+    quat = torch.cat([v_cos, v_sin * normalized], dim = 1)
+    return quat_to_rotmat(quat)
+def quat_to_rotmat(quat: torch.Tensor) -> torch.Tensor:
+    """
+    Convert quaternion representation to rotation matrix.
+    Args:
+        quat (torch.Tensor) of shape (B, 4); 4 <===> (w, x, y, z).
+    Returns:
+        torch.Tensor: Corresponding rotation matrices with shape (B, 3, 3).
+    """
+    norm_quat = quat
+    norm_quat = norm_quat/norm_quat.norm(p=2, dim=1, keepdim=True)
+    w, x, y, z = norm_quat[:,0], norm_quat[:,1], norm_quat[:,2], norm_quat[:,3]
+    B = quat.size(0)
+    w2, x2, y2, z2 = w.pow(2), x.pow(2), y.pow(2), z.pow(2)
+    wx, wy, wz = w*x, w*y, w*z
+    xy, xz, yz = x*y, x*z, y*z
+    rotMat = torch.stack([w2 + x2 - y2 - z2, 2*xy - 2*wz, 2*wy + 2*xz,
+                          2*wz + 2*xy, w2 - x2 + y2 - z2, 2*yz - 2*wx,
+                          2*xz - 2*wy, 2*wx + 2*yz, w2 - x2 - y2 + z2], dim=1).view(B, 3, 3)
+    return rotMat
+def rot6d_to_rotmat(x: torch.Tensor) -> torch.Tensor:
+    """
+    Convert 6D rotation representation to 3x3 rotation matrix.
+    Based on Zhou et al., "On the Continuity of Rotation Representations in Neural Networks", CVPR 2019
+    Args:
+        x (torch.Tensor): (B,6) Batch of 6-D rotation representations.
+    Returns:
+        torch.Tensor: Batch of corresponding rotation matrices with shape (B,3,3).
+    """
+    x = x.reshape(-1,2,3).permute(0, 2, 1).contiguous()
+    a1 = x[:, :, 0]
+    a2 = x[:, :, 1]
+    b1 = F.normalize(a1)
+    b2 = F.normalize(a2 - torch.einsum('bi,bi->b', b1, a2).unsqueeze(-1) * b1)
+    b3 = torch.linalg.cross(b1, b2)
+    return torch.stack((b1, b2, b3), dim=-1)
+def perspective_projection(points: torch.Tensor,
+                           translation: torch.Tensor,
+                           focal_length: torch.Tensor,
+                           camera_center: Optional[torch.Tensor] = None,
+                           rotation: Optional[torch.Tensor] = None) -> torch.Tensor:
+    """
+    Computes the perspective projection of a set of 3D points.
+    Args:
+        points (torch.Tensor): Tensor of shape (B, N, 3) containing the input 3D points.
+        translation (torch.Tensor): Tensor of shape (B, 3) containing the 3D camera translation.
+        focal_length (torch.Tensor): Tensor of shape (B, 2) containing the focal length in pixels.
+        camera_center (torch.Tensor): Tensor of shape (B, 2) containing the camera center in pixels.
+        rotation (torch.Tensor): Tensor of shape (B, 3, 3) containing the camera rotation.
+    Returns:
+        torch.Tensor: Tensor of shape (B, N, 2) containing the projection of the input points.
+    """
+    batch_size = points.shape[0]
+    if rotation is None:
+        rotation = torch.eye(3, device=points.device, dtype=points.dtype).unsqueeze(0).expand(batch_size, -1, -1)
+    if camera_center is None:
+        camera_center = torch.zeros(batch_size, 2, device=points.device, dtype=points.dtype)
+    # Populate intrinsic camera matrix K.
+    K = torch.zeros([batch_size, 3, 3], device=points.device, dtype=points.dtype)
+    K[:,0,0] = focal_length[:,0]
+    K[:,1,1] = focal_length[:,1]
+    K[:,2,2] = 1.
+    K[:,:-1, -1] = camera_center
+    # Transform points
+    points = torch.einsum('bij,bkj->bki', rotation, points)
+    points = points + translation.unsqueeze(1)
+    # Apply perspective distortion
+    projected_points = points / points[:,:,-1].unsqueeze(-1)
+    # Apply camera intrinsics
+    projected_points = torch.einsum('bij,bkj->bki', K, projected_points)
+    return projected_points[:, :, :-1]

hawor/utils/process.py ADDED Viewed

	@@ -0,0 +1,198 @@

+import torch
+from lib.models.mano_wrapper import MANO
+from hawor.utils.geometry import aa_to_rotmat
+import numpy as np
+import sys
+import os
+def block_print():
+    sys.stdout = open(os.devnull, 'w')
+def enable_print():
+    sys.stdout = sys.__stdout__
+def get_mano_faces():
+    block_print()
+    MANO_cfg = {
+        'DATA_DIR': '_DATA/data/',
+        'MODEL_PATH': '_DATA/data/mano',
+        'GENDER': 'neutral',
+        'NUM_HAND_JOINTS': 15,
+        'CREATE_BODY_POSE': False
+    }
+    mano_cfg = {k.lower(): v for k,v in MANO_cfg.items()}
+    mano = MANO(**mano_cfg)
+    enable_print()
+    return mano.faces
+def run_mano(trans, root_orient, hand_pose, is_right=None, betas=None, use_cuda=True):
+    """
+    Forward pass of the SMPL model and populates pred_data accordingly with
+    joints3d, verts3d, points3d.
+    trans : B x T x 3
+    root_orient : B x T x 3
+    body_pose : B x T x J*3
+    betas : (optional) B x D
+    """
+    block_print()
+    MANO_cfg = {
+        'DATA_DIR': '_DATA/data/',
+        'MODEL_PATH': '_DATA/data/mano',
+        'GENDER': 'neutral',
+        'NUM_HAND_JOINTS': 15,
+        'CREATE_BODY_POSE': False
+    }
+    mano_cfg = {k.lower(): v for k,v in MANO_cfg.items()}
+    mano = MANO(**mano_cfg)
+    if use_cuda:
+        mano = mano.cuda()
+    B, T, _ = root_orient.shape
+    NUM_JOINTS = 15
+    mano_params = {
+        'global_orient': root_orient.reshape(B*T, -1),
+        'hand_pose': hand_pose.reshape(B*T*NUM_JOINTS, 3),
+        'betas': betas.reshape(B*T, -1),
+    }
+    rotmat_mano_params = mano_params
+    rotmat_mano_params['global_orient'] = aa_to_rotmat(mano_params['global_orient']).view(B*T, 1, 3, 3)
+    rotmat_mano_params['hand_pose'] = aa_to_rotmat(mano_params['hand_pose']).view(B*T, NUM_JOINTS, 3, 3)
+    rotmat_mano_params['transl'] = trans.reshape(B*T, 3)
+    if use_cuda:
+        mano_output = mano(**{k: v.float().cuda() for k,v in rotmat_mano_params.items()}, pose2rot=False)
+    else:
+        mano_output = mano(**{k: v.float() for k,v in rotmat_mano_params.items()}, pose2rot=False)
+    faces_right = mano.faces
+    faces_new = np.array([[92, 38, 234],
+                        [234, 38, 239],
+                        [38, 122, 239],
+                        [239, 122, 279],
+                        [122, 118, 279],
+                        [279, 118, 215],
+                        [118, 117, 215],
+                        [215, 117, 214],
+                        [117, 119, 214],
+                        [214, 119, 121],
+                        [119, 120, 121],
+                        [121, 120, 78],
+                        [120, 108, 78],
+                        [78, 108, 79]])
+    faces_right = np.concatenate([faces_right, faces_new], axis=0)
+    faces_n = len(faces_right)
+    faces_left = faces_right[:,[0,2,1]]
+    outputs = {
+        "joints": mano_output.joints.reshape(B, T, -1, 3),
+        "vertices": mano_output.vertices.reshape(B, T, -1, 3),
+    }
+    if not is_right is None:
+        # outputs["vertices"][..., 0] = (2*is_right-1)*outputs["vertices"][..., 0]
+        # outputs["joints"][..., 0] = (2*is_right-1)*outputs["joints"][..., 0]
+        is_right = (is_right[:, :, 0].cpu().numpy() > 0)
+        faces_result = np.zeros((B, T, faces_n, 3))
+        faces_right_expanded = np.expand_dims(np.expand_dims(faces_right, axis=0), axis=0)
+        faces_left_expanded = np.expand_dims(np.expand_dims(faces_left, axis=0), axis=0)
+        faces_result = np.where(is_right[..., np.newaxis, np.newaxis], faces_right_expanded, faces_left_expanded)
+        outputs["faces"] = torch.from_numpy(faces_result.astype(np.int32))
+    enable_print()
+    return outputs
+def run_mano_left(trans, root_orient, hand_pose, is_right=None, betas=None, use_cuda=True, fix_shapedirs=True):
+    """
+    Forward pass of the SMPL model and populates pred_data accordingly with
+    joints3d, verts3d, points3d.
+    trans : B x T x 3
+    root_orient : B x T x 3
+    body_pose : B x T x J*3
+    betas : (optional) B x D
+    """
+    block_print()
+    MANO_cfg = {
+        'DATA_DIR': '_DATA/data_left/',
+        'MODEL_PATH': '_DATA/data_left/mano_left',
+        'GENDER': 'neutral',
+        'NUM_HAND_JOINTS': 15,
+        'CREATE_BODY_POSE': False,
+        'is_rhand': False
+    }
+    mano_cfg = {k.lower(): v for k,v in MANO_cfg.items()}
+    mano = MANO(**mano_cfg)
+    if use_cuda:
+        mano = mano.cuda()
+    # fix MANO shapedirs of the left hand bug (https://github.com/vchoutas/smplx/issues/48)
+    if fix_shapedirs:
+        mano.shapedirs[:, 0, :] *= -1
+    B, T, _ = root_orient.shape
+    NUM_JOINTS = 15
+    mano_params = {
+        'global_orient': root_orient.reshape(B*T, -1),
+        'hand_pose': hand_pose.reshape(B*T*NUM_JOINTS, 3),
+        'betas': betas.reshape(B*T, -1),
+    }
+    rotmat_mano_params = mano_params
+    rotmat_mano_params['global_orient'] = aa_to_rotmat(mano_params['global_orient']).view(B*T, 1, 3, 3)
+    rotmat_mano_params['hand_pose'] = aa_to_rotmat(mano_params['hand_pose']).view(B*T, NUM_JOINTS, 3, 3)
+    rotmat_mano_params['transl'] = trans.reshape(B*T, 3)
+    if use_cuda:
+        mano_output = mano(**{k: v.float().cuda() for k,v in rotmat_mano_params.items()}, pose2rot=False)
+    else:
+        mano_output = mano(**{k: v.float() for k,v in rotmat_mano_params.items()}, pose2rot=False)
+    faces_right = mano.faces
+    faces_new = np.array([[92, 38, 234],
+                        [234, 38, 239],
+                        [38, 122, 239],
+                        [239, 122, 279],
+                        [122, 118, 279],
+                        [279, 118, 215],
+                        [118, 117, 215],
+                        [215, 117, 214],
+                        [117, 119, 214],
+                        [214, 119, 121],
+                        [119, 120, 121],
+                        [121, 120, 78],
+                        [120, 108, 78],
+                        [78, 108, 79]])
+    faces_right = np.concatenate([faces_right, faces_new], axis=0)
+    faces_n = len(faces_right)
+    faces_left = faces_right[:,[0,2,1]]
+    outputs = {
+        "joints": mano_output.joints.reshape(B, T, -1, 3),
+        "vertices": mano_output.vertices.reshape(B, T, -1, 3),
+    }
+    if not is_right is None:
+        # outputs["vertices"][..., 0] = (2*is_right-1)*outputs["vertices"][..., 0]
+        # outputs["joints"][..., 0] = (2*is_right-1)*outputs["joints"][..., 0]
+        is_right = (is_right[:, :, 0].cpu().numpy() > 0)
+        faces_result = np.zeros((B, T, faces_n, 3))
+        faces_right_expanded = np.expand_dims(np.expand_dims(faces_right, axis=0), axis=0)
+        faces_left_expanded = np.expand_dims(np.expand_dims(faces_left, axis=0), axis=0)
+        faces_result = np.where(is_right[..., np.newaxis, np.newaxis], faces_right_expanded, faces_left_expanded)
+        outputs["faces"] = torch.from_numpy(faces_result.astype(np.int32))
+    enable_print()
+    return outputs
+def run_mano_twohands(init_trans, init_rot, init_hand_pose, is_right, init_betas, use_cuda=True, fix_shapedirs=True):
+    outputs_left = run_mano_left(init_trans[0:1], init_rot[0:1], init_hand_pose[0:1], None, init_betas[0:1], use_cuda=use_cuda, fix_shapedirs=fix_shapedirs)
+    outputs_right = run_mano(init_trans[1:2], init_rot[1:2], init_hand_pose[1:2], None, init_betas[1:2], use_cuda=use_cuda)
+    outputs_two = {
+        "vertices": torch.cat((outputs_left["vertices"], outputs_right["vertices"]), dim=0),
+        "joints": torch.cat((outputs_left["joints"], outputs_right["joints"]), dim=0)
+    }
+    return outputs_two

hawor/utils/pylogger.py ADDED Viewed

	@@ -0,0 +1,17 @@

+import logging
+from pytorch_lightning.utilities import rank_zero_only
+def get_pylogger(name=__name__) -> logging.Logger:
+    """Initializes multi-GPU-friendly python command line logger."""
+    logger = logging.getLogger(name)
+    # this ensures all logging levels get marked with the rank zero decorator
+    # otherwise logs would get multiplied for each GPU process in multi-GPU setup
+    logging_levels = ("debug", "info", "warning", "error", "exception", "fatal", "critical")
+    for level in logging_levels:
+        setattr(logger, level, rank_zero_only(getattr(logger, level)))
+    return logger

hawor/utils/render_openpose.py ADDED Viewed

	@@ -0,0 +1,225 @@

+"""
+Render OpenPose keypoints.
+Code was ported to Python from the official C++ implementation https://github.com/CMU-Perceptual-Computing-Lab/openpose/blob/master/src/openpose/utilities/keypoint.cpp
+"""
+import cv2
+import math
+import numpy as np
+from typing import List, Tuple
+def get_keypoints_rectangle(keypoints: np.array, threshold: float) -> Tuple[float, float, float]:
+    """
+    Compute rectangle enclosing keypoints above the threshold.
+    Args:
+        keypoints (np.array): Keypoint array of shape (N, 3).
+        threshold (float): Confidence visualization threshold.
+    Returns:
+        Tuple[float, float, float]: Rectangle width, height and area.
+    """
+    valid_ind = keypoints[:, -1] > threshold
+    if valid_ind.sum() > 0:
+        valid_keypoints = keypoints[valid_ind][:, :-1]
+        max_x = valid_keypoints[:,0].max()
+        max_y = valid_keypoints[:,1].max()
+        min_x = valid_keypoints[:,0].min()
+        min_y = valid_keypoints[:,1].min()
+        width = max_x - min_x
+        height = max_y - min_y
+        area = width * height
+        return width, height, area
+    else:
+        return 0,0,0
+def render_keypoints(img: np.array,
+                     keypoints: np.array,
+                     pairs: List,
+                     colors: List,
+                     thickness_circle_ratio: float,
+                     thickness_line_ratio_wrt_circle: float,
+                     pose_scales: List,
+                     threshold: float = 0.1,
+                     alpha: float = 1.0) -> np.array:
+    """
+    Render keypoints on input image.
+    Args:
+        img (np.array): Input image of shape (H, W, 3) with pixel values in the [0,255] range.
+        keypoints (np.array): Keypoint array of shape (N, 3).
+        pairs (List): List of keypoint pairs per limb.
+        colors: (List): List of colors per keypoint.
+        thickness_circle_ratio (float): Circle thickness ratio.
+        thickness_line_ratio_wrt_circle (float): Line thickness ratio wrt the circle.
+        pose_scales (List): List of pose scales.
+        threshold (float): Only visualize keypoints with confidence above the threshold.
+    Returns:
+        (np.array): Image of shape (H, W, 3) with keypoints drawn on top of the original image.
+    """
+    img_orig = img.copy()
+    width, height = img.shape[1], img.shape[2]
+    area = width * height
+    lineType = 8
+    shift = 0
+    numberColors = len(colors)
+    thresholdRectangle = 0.1
+    person_width, person_height, person_area = get_keypoints_rectangle(keypoints, thresholdRectangle)
+    if person_area > 0:
+        ratioAreas = min(1, max(person_width / width, person_height / height))
+        thicknessRatio = np.maximum(np.round(math.sqrt(area) * thickness_circle_ratio * ratioAreas), 2)
+        thicknessCircle = np.maximum(1, thicknessRatio if ratioAreas > 0.05 else -np.ones_like(thicknessRatio))
+        thicknessLine = np.maximum(1, np.round(thicknessRatio * thickness_line_ratio_wrt_circle))
+        radius = thicknessRatio / 2
+        img = np.ascontiguousarray(img.copy())
+        for i, pair in enumerate(pairs):
+            index1, index2 = pair
+            if keypoints[index1, -1] > threshold and keypoints[index2, -1] > threshold:
+                thicknessLineScaled = int(round(min(thicknessLine[index1], thicknessLine[index2]) * pose_scales[0]))
+                colorIndex = index2
+                color = colors[colorIndex % numberColors]
+                keypoint1 = keypoints[index1, :-1].astype(int)
+                keypoint2 = keypoints[index2, :-1].astype(int)
+                cv2.line(img, tuple(keypoint1.tolist()), tuple(keypoint2.tolist()), tuple(color.tolist()), thicknessLineScaled, lineType, shift)
+        for part in range(len(keypoints)):
+            faceIndex = part
+            if keypoints[faceIndex, -1] > threshold:
+                radiusScaled = int(round(radius[faceIndex] * pose_scales[0]))
+                thicknessCircleScaled = int(round(thicknessCircle[faceIndex] * pose_scales[0]))
+                colorIndex = part
+                color = colors[colorIndex % numberColors]
+                center = keypoints[faceIndex, :-1].astype(int)
+                cv2.circle(img, tuple(center.tolist()), radiusScaled, tuple(color.tolist()), thicknessCircleScaled, lineType, shift)
+    return img
+def render_hand_keypoints(img, right_hand_keypoints, threshold=0.1, use_confidence=False, map_fn=lambda x: np.ones_like(x), alpha=1.0):
+    if use_confidence and map_fn is not None:
+        #thicknessCircleRatioLeft = 1./50 * map_fn(left_hand_keypoints[:, -1])
+        thicknessCircleRatioRight = 1./50 * map_fn(right_hand_keypoints[:, -1])
+    else:
+        #thicknessCircleRatioLeft = 1./50 * np.ones(left_hand_keypoints.shape[0])
+        thicknessCircleRatioRight = 1./50 * np.ones(right_hand_keypoints.shape[0])
+    thicknessLineRatioWRTCircle = 0.75
+    pairs = [0,1,  1,2,  2,3,  3,4,  0,5,  5,6,  6,7,  7,8,  0,9,  9,10,  10,11,  11,12,  0,13,  13,14,  14,15,  15,16,  0,17,  17,18,  18,19,  19,20]
+    pairs = np.array(pairs).reshape(-1,2)
+    colors = [100.,  100.,  100.,
+              100.,    0.,    0.,
+              150.,    0.,    0.,
+              200.,    0.,    0.,
+              255.,    0.,    0.,
+              100.,  100.,    0.,
+              150.,  150.,    0.,
+              200.,  200.,    0.,
+              255.,  255.,    0.,
+                0.,  100.,   50.,
+                0.,  150.,   75.,
+                0.,  200.,  100.,
+                0.,  255.,  125.,
+                0.,   50.,  100.,
+                0.,   75.,  150.,
+                0.,  100.,  200.,
+                0.,  125.,  255.,
+              100.,    0.,  100.,
+              150.,    0.,  150.,
+              200.,    0.,  200.,
+              255.,    0.,  255.]
+    colors = np.array(colors).reshape(-1,3)
+    #colors = np.zeros_like(colors)
+    poseScales = [1]
+    #img = render_keypoints(img, left_hand_keypoints, pairs, colors, thicknessCircleRatioLeft, thicknessLineRatioWRTCircle, poseScales, threshold, alpha=alpha)
+    img = render_keypoints(img, right_hand_keypoints, pairs, colors, thicknessCircleRatioRight, thicknessLineRatioWRTCircle, poseScales, threshold, alpha=alpha)
+    #img = render_keypoints(img, right_hand_keypoints, pairs, colors, thickness_circle_ratio, thickness_line_ratio_wrt_circle, pose_scales, 0.1)
+    return img
+def render_hand_landmarks(img, right_hand_keypoints, threshold=0.1, use_confidence=False, map_fn=lambda x: np.ones_like(x), alpha=1.0):
+    if use_confidence and map_fn is not None:
+        #thicknessCircleRatioLeft = 1./50 * map_fn(left_hand_keypoints[:, -1])
+        thicknessCircleRatioRight = 1./50 * map_fn(right_hand_keypoints[:, -1])
+    else:
+        #thicknessCircleRatioLeft = 1./50 * np.ones(left_hand_keypoints.shape[0])
+        thicknessCircleRatioRight = 1./50 * np.ones(right_hand_keypoints.shape[0])
+    thicknessLineRatioWRTCircle = 0.75
+    pairs = []
+    pairs = np.array(pairs).reshape(-1,2)
+    colors = [255, 0, 0]
+    colors = np.array(colors).reshape(-1,3)
+    #colors = np.zeros_like(colors)
+    poseScales = [1]
+    #img = render_keypoints(img, left_hand_keypoints, pairs, colors, thicknessCircleRatioLeft, thicknessLineRatioWRTCircle, poseScales, threshold, alpha=alpha)
+    img = render_keypoints(img, right_hand_keypoints, pairs, colors, thicknessCircleRatioRight * 0.1, thicknessLineRatioWRTCircle * 0.1, poseScales, threshold, alpha=alpha)
+    #img = render_keypoints(img, right_hand_keypoints, pairs, colors, thickness_circle_ratio, thickness_line_ratio_wrt_circle, pose_scales, 0.1)
+    return img
+def render_body_keypoints(img: np.array,
+                          body_keypoints: np.array) -> np.array:
+    """
+    Render OpenPose body keypoints on input image.
+    Args:
+        img (np.array): Input image of shape (H, W, 3) with pixel values in the [0,255] range.
+        body_keypoints (np.array): Keypoint array of shape (N, 3); 3 <====> (x, y, confidence).
+    Returns:
+        (np.array): Image of shape (H, W, 3) with keypoints drawn on top of the original image.
+    """
+    thickness_circle_ratio = 1./75. * np.ones(body_keypoints.shape[0])
+    thickness_line_ratio_wrt_circle = 0.75
+    pairs = []
+    pairs = [1,8,1,2,1,5,2,3,3,4,5,6,6,7,8,9,9,10,10,11,8,12,12,13,13,14,1,0,0,15,15,17,0,16,16,18,14,19,19,20,14,21,11,22,22,23,11,24]
+    pairs = np.array(pairs).reshape(-1,2)
+    colors = [255.,     0.,     85.,
+              255.,     0.,     0.,
+              255.,    85.,     0.,
+              255.,   170.,     0.,
+              255.,   255.,     0.,
+              170.,   255.,     0.,
+               85.,   255.,     0.,
+                0.,   255.,     0.,
+              255.,     0.,     0.,
+                0.,   255.,    85.,
+                0.,   255.,   170.,
+                0.,   255.,   255.,
+                0.,   170.,   255.,
+                0.,    85.,   255.,
+                0.,     0.,   255.,
+              255.,     0.,   170.,
+              170.,     0.,   255.,
+              255.,     0.,   255.,
+               85.,     0.,   255.,
+                0.,     0.,   255.,
+                0.,     0.,   255.,
+                0.,     0.,   255.,
+                0.,   255.,   255.,
+                0.,   255.,   255.,
+                0.,   255.,   255.]
+    colors = np.array(colors).reshape(-1,3)
+    pose_scales = [1]
+    return render_keypoints(img, body_keypoints, pairs, colors, thickness_circle_ratio, thickness_line_ratio_wrt_circle, pose_scales, 0.1)
+def render_openpose(img: np.array,
+                    hand_keypoints: np.array) -> np.array:
+    """
+    Render keypoints in the OpenPose format on input image.
+    Args:
+        img (np.array): Input image of shape (H, W, 3) with pixel values in the [0,255] range.
+        body_keypoints (np.array): Keypoint array of shape (N, 3); 3 <====> (x, y, confidence).
+    Returns:
+        (np.array): Image of shape (H, W, 3) with keypoints drawn on top of the original image.
+    """
+    #img = render_body_keypoints(img, body_keypoints)
+    img = render_hand_keypoints(img, hand_keypoints)
+    return img
+def render_openpose_landmarks(img: np.array,
+                    hand_keypoints: np.array) -> np.array:
+    """
+    Render keypoints in the OpenPose format on input image.
+    Args:
+        img (np.array): Input image of shape (H, W, 3) with pixel values in the [0,255] range.
+        body_keypoints (np.array): Keypoint array of shape (N, 3); 3 <====> (x, y, confidence).
+    Returns:
+        (np.array): Image of shape (H, W, 3) with keypoints drawn on top of the original image.
+    """
+    #img = render_body_keypoints(img, body_keypoints)
+    img = render_hand_landmarks(img, hand_keypoints)
+    return img

hawor/utils/rotation.py ADDED Viewed

	@@ -0,0 +1,293 @@

+import torch
+import numpy as np
+from torch.nn import functional as F
+def batch_rodrigues(rot_vecs, epsilon=1e-8, dtype=torch.float32):
+    """
+    Taken from https://github.com/mkocabas/VIBE/blob/master/lib/utils/geometry.py
+    Calculates the rotation matrices for a batch of rotation vectors
+    - param rot_vecs: torch.tensor (N, 3) array of N axis-angle vectors
+    - returns R: torch.tensor (N, 3, 3) rotation matrices
+    """
+    batch_size = rot_vecs.shape[0]
+    device = rot_vecs.device
+    angle = torch.norm(rot_vecs + 1e-8, dim=1, keepdim=True)
+    rot_dir = rot_vecs / angle
+    cos = torch.unsqueeze(torch.cos(angle), dim=1)
+    sin = torch.unsqueeze(torch.sin(angle), dim=1)
+    # Bx1 arrays
+    rx, ry, rz = torch.split(rot_dir, 1, dim=1)
+    K = torch.zeros((batch_size, 3, 3), dtype=dtype, device=device)
+    zeros = torch.zeros((batch_size, 1), dtype=dtype, device=device)
+    K = torch.cat([zeros, -rz, ry, rz, zeros, -rx, -ry, rx, zeros], dim=1).view(
+        (batch_size, 3, 3)
+    )
+    ident = torch.eye(3, dtype=dtype, device=device).unsqueeze(dim=0)
+    rot_mat = ident + sin * K + (1 - cos) * torch.bmm(K, K)
+    return rot_mat
+def quaternion_mul(q0, q1):
+    """
+    EXPECTS WXYZ
+    :param q0 (*, 4)
+    :param q1 (*, 4)
+    """
+    r0, r1 = q0[..., :1], q1[..., :1]
+    v0, v1 = q0[..., 1:], q1[..., 1:]
+    r = r0 * r1 - (v0 * v1).sum(dim=-1, keepdim=True)
+    v = r0 * v1 + r1 * v0 + torch.linalg.cross(v0, v1)
+    return torch.cat([r, v], dim=-1)
+def quaternion_inverse(q, eps=1e-8):
+    """
+    EXPECTS WXYZ
+    :param q (*, 4)
+    """
+    conj = torch.cat([q[..., :1], -q[..., 1:]], dim=-1)
+    mag = torch.square(q).sum(dim=-1, keepdim=True) + eps
+    return conj / mag
+def quaternion_slerp(t, q0, q1, eps=1e-8):
+    """
+    :param t (*, 1)  must be between 0 and 1
+    :param q0 (*, 4)
+    :param q1 (*, 4)
+    """
+    dims = q0.shape[:-1]
+    t = t.view(*dims, 1)
+    q0 = F.normalize(q0, p=2, dim=-1)
+    q1 = F.normalize(q1, p=2, dim=-1)
+    dot = (q0 * q1).sum(dim=-1, keepdim=True)
+    # make sure we give the shortest rotation path (< 180d)
+    neg = dot < 0
+    q1 = torch.where(neg, -q1, q1)
+    dot = torch.where(neg, -dot, dot)
+    angle = torch.acos(dot)
+    # if angle is too small, just do linear interpolation
+    collin = torch.abs(dot) > 1 - eps
+    fac = 1 / torch.sin(angle)
+    w0 = torch.where(collin, 1 - t, torch.sin((1 - t) * angle) * fac)
+    w1 = torch.where(collin, t, torch.sin(t * angle) * fac)
+    slerp = q0 * w0 + q1 * w1
+    return slerp
+def rotation_matrix_to_angle_axis(rotation_matrix):
+    """
+    This function is borrowed from https://github.com/kornia/kornia
+    Convert rotation matrix to Rodrigues vector
+    """
+    quaternion = rotation_matrix_to_quaternion(rotation_matrix)
+    aa = quaternion_to_angle_axis(quaternion)
+    aa[torch.isnan(aa)] = 0.0
+    return aa
+def quaternion_to_angle_axis(quaternion):
+    """
+    This function is borrowed from https://github.com/kornia/kornia
+    Convert quaternion vector to angle axis of rotation.
+    Adapted from ceres C++ library: ceres-solver/include/ceres/rotation.h
+    :param quaternion (*, 4) expects WXYZ
+    :returns angle_axis (*, 3)
+    """
+    # unpack input and compute conversion
+    q1 = quaternion[..., 1]
+    q2 = quaternion[..., 2]
+    q3 = quaternion[..., 3]
+    sin_squared_theta = q1 * q1 + q2 * q2 + q3 * q3
+    sin_theta = torch.sqrt(sin_squared_theta)
+    cos_theta = quaternion[..., 0]
+    two_theta = 2.0 * torch.where(
+        cos_theta < 0.0,
+        torch.atan2(-sin_theta, -cos_theta),
+        torch.atan2(sin_theta, cos_theta),
+    )
+    k_pos = two_theta / sin_theta
+    k_neg = 2.0 * torch.ones_like(sin_theta)
+    k = torch.where(sin_squared_theta > 0.0, k_pos, k_neg)
+    angle_axis = torch.zeros_like(quaternion)[..., :3]
+    angle_axis[..., 0] += q1 * k
+    angle_axis[..., 1] += q2 * k
+    angle_axis[..., 2] += q3 * k
+    return angle_axis
+def angle_axis_to_rotation_matrix(angle_axis):
+    """
+    :param angle_axis (*, 3)
+    return (*, 3, 3)
+    """
+    quat = angle_axis_to_quaternion(angle_axis)
+    return quaternion_to_rotation_matrix(quat)
+def quaternion_to_rotation_matrix(quaternion):
+    """
+    Convert a quaternion to a rotation matrix.
+    Taken from https://github.com/kornia/kornia, based on
+    https://github.com/matthew-brett/transforms3d/blob/8965c48401d9e8e66b6a8c37c65f2fc200a076fa/transforms3d/quaternions.py#L101
+    https://github.com/tensorflow/graphics/blob/master/tensorflow_graphics/geometry/transformation/rotation_matrix_3d.py#L247
+    :param quaternion (N, 4) expects WXYZ order
+    returns rotation matrix (N, 3, 3)
+    """
+    # normalize the input quaternion
+    quaternion_norm = F.normalize(quaternion, p=2, dim=-1, eps=1e-12)
+    *dims, _ = quaternion_norm.shape
+    # unpack the normalized quaternion components
+    w, x, y, z = torch.chunk(quaternion_norm, chunks=4, dim=-1)
+    # compute the actual conversion
+    tx = 2.0 * x
+    ty = 2.0 * y
+    tz = 2.0 * z
+    twx = tx * w
+    twy = ty * w
+    twz = tz * w
+    txx = tx * x
+    txy = ty * x
+    txz = tz * x
+    tyy = ty * y
+    tyz = tz * y
+    tzz = tz * z
+    one = torch.tensor(1.0)
+    matrix = torch.stack(
+        (
+            one - (tyy + tzz),
+            txy - twz,
+            txz + twy,
+            txy + twz,
+            one - (txx + tzz),
+            tyz - twx,
+            txz - twy,
+            tyz + twx,
+            one - (txx + tyy),
+        ),
+        dim=-1,
+    ).view(*dims, 3, 3)
+    return matrix
+def angle_axis_to_quaternion(angle_axis):
+    """
+    This function is borrowed from https://github.com/kornia/kornia
+    Convert angle axis to quaternion in WXYZ order
+    :param angle_axis (*, 3)
+    :returns quaternion (*, 4) WXYZ order
+    """
+    theta_sq = torch.sum(angle_axis**2, dim=-1, keepdim=True)  # (*, 1)
+    # need to handle the zero rotation case
+    valid = theta_sq > 0
+    theta = torch.sqrt(theta_sq)
+    half_theta = 0.5 * theta
+    ones = torch.ones_like(half_theta)
+    # fill zero with the limit of sin ax / x -> a
+    k = torch.where(valid, torch.sin(half_theta) / theta, 0.5 * ones)
+    w = torch.where(valid, torch.cos(half_theta), ones)
+    quat = torch.cat([w, k * angle_axis], dim=-1)
+    return quat
+def rotation_matrix_to_quaternion(rotation_matrix, eps=1e-6):
+    """
+    This function is borrowed from https://github.com/kornia/kornia
+    Convert rotation matrix to 4d quaternion vector
+    This algorithm is based on algorithm described in
+    https://github.com/KieranWynn/pyquaternion/blob/master/pyquaternion/quaternion.py#L201
+    :param rotation_matrix (N, 3, 3)
+    """
+    *dims, m, n = rotation_matrix.shape
+    rmat_t = torch.transpose(rotation_matrix.reshape(-1, m, n), -1, -2)
+    mask_d2 = rmat_t[:, 2, 2] < eps
+    mask_d0_d1 = rmat_t[:, 0, 0] > rmat_t[:, 1, 1]
+    mask_d0_nd1 = rmat_t[:, 0, 0] < -rmat_t[:, 1, 1]
+    t0 = 1 + rmat_t[:, 0, 0] - rmat_t[:, 1, 1] - rmat_t[:, 2, 2]
+    q0 = torch.stack(
+        [
+            rmat_t[:, 1, 2] - rmat_t[:, 2, 1],
+            t0,
+            rmat_t[:, 0, 1] + rmat_t[:, 1, 0],
+            rmat_t[:, 2, 0] + rmat_t[:, 0, 2],
+        ],
+        -1,
+    )
+    t0_rep = t0.repeat(4, 1).t()
+    t1 = 1 - rmat_t[:, 0, 0] + rmat_t[:, 1, 1] - rmat_t[:, 2, 2]
+    q1 = torch.stack(
+        [
+            rmat_t[:, 2, 0] - rmat_t[:, 0, 2],
+            rmat_t[:, 0, 1] + rmat_t[:, 1, 0],
+            t1,
+            rmat_t[:, 1, 2] + rmat_t[:, 2, 1],
+        ],
+        -1,
+    )
+    t1_rep = t1.repeat(4, 1).t()
+    t2 = 1 - rmat_t[:, 0, 0] - rmat_t[:, 1, 1] + rmat_t[:, 2, 2]
+    q2 = torch.stack(
+        [
+            rmat_t[:, 0, 1] - rmat_t[:, 1, 0],
+            rmat_t[:, 2, 0] + rmat_t[:, 0, 2],
+            rmat_t[:, 1, 2] + rmat_t[:, 2, 1],
+            t2,
+        ],
+        -1,
+    )
+    t2_rep = t2.repeat(4, 1).t()
+    t3 = 1 + rmat_t[:, 0, 0] + rmat_t[:, 1, 1] + rmat_t[:, 2, 2]
+    q3 = torch.stack(
+        [
+            t3,
+            rmat_t[:, 1, 2] - rmat_t[:, 2, 1],
+            rmat_t[:, 2, 0] - rmat_t[:, 0, 2],
+            rmat_t[:, 0, 1] - rmat_t[:, 1, 0],
+        ],
+        -1,
+    )
+    t3_rep = t3.repeat(4, 1).t()
+    mask_c0 = mask_d2 * mask_d0_d1
+    mask_c1 = mask_d2 * ~mask_d0_d1
+    mask_c2 = ~mask_d2 * mask_d0_nd1
+    mask_c3 = ~mask_d2 * ~mask_d0_nd1
+    mask_c0 = mask_c0.view(-1, 1).type_as(q0)
+    mask_c1 = mask_c1.view(-1, 1).type_as(q1)
+    mask_c2 = mask_c2.view(-1, 1).type_as(q2)
+    mask_c3 = mask_c3.view(-1, 1).type_as(q3)
+    q = q0 * mask_c0 + q1 * mask_c1 + q2 * mask_c2 + q3 * mask_c3
+    q /= torch.sqrt(
+        t0_rep * mask_c0
+        + t1_rep * mask_c1
+        + t2_rep * mask_c2  # noqa
+        + t3_rep * mask_c3
+    )  # noqa
+    q *= 0.5
+    return q.reshape(*dims, 4)

imgui.ini ADDED Viewed

	@@ -0,0 +1,15 @@

+[Window][Debug##Default]
+Pos=60,60
+Size=400,400
+Collapsed=0
+[Window][Editor]
+Pos=50,50
+Size=250,700
+Collapsed=0
+[Window][Playback]
+Pos=50,800
+Size=400,175
+Collapsed=1

infiller/hand_utils/geometry.py ADDED Viewed

	@@ -0,0 +1,412 @@

+import numpy as np
+import torch
+from torch.nn import functional as F
+def perspective_projection(points, rotation, translation,
+                           focal_length, camera_center, distortion=None):
+    """
+    This function computes the perspective projection of a set of points.
+    Input:
+        points (bs, N, 3): 3D points
+        rotation (bs, 3, 3): Camera rotation
+        translation (bs, 3): Camera translation
+        focal_length (bs,) or scalar: Focal length
+        camera_center (bs, 2): Camera center
+    """
+    batch_size = points.shape[0]
+    # Extrinsic
+    if rotation is not None:
+        points = torch.einsum('bij,bkj->bki', rotation, points)
+    if translation is not None:
+        points = points + translation.unsqueeze(1)
+    if distortion is not None:
+        kc = distortion
+        points = points[:,:,:2] / points[:,:,2:]
+        r2 = points[:,:,0]**2 + points[:,:,1]**2
+        dx = (2 * kc[:,[2]] * points[:,:,0] * points[:,:,1]
+                + kc[:,[3]] * (r2 + 2*points[:,:,0]**2))
+        dy = (2 * kc[:,[3]] * points[:,:,0] * points[:,:,1]
+                + kc[:,[2]] * (r2 + 2*points[:,:,1]**2))
+        x = (1 + kc[:,[0]]*r2 + kc[:,[1]]*r2.pow(2) + kc[:,[4]]*r2.pow(3)) * points[:,:,0] + dx
+        y = (1 + kc[:,[0]]*r2 + kc[:,[1]]*r2.pow(2) + kc[:,[4]]*r2.pow(3)) * points[:,:,1] + dy
+        points = torch.stack([x, y, torch.ones_like(x)], dim=-1)
+    # Intrinsic
+    K = torch.zeros([batch_size, 3, 3], device=points.device)
+    K[:,0,0] = focal_length
+    K[:,1,1] = focal_length
+    K[:,2,2] = 1.
+    K[:,:-1, -1] = camera_center
+    # Apply camera intrinsicsrf
+    points = points / points[:,:,-1].unsqueeze(-1)
+    projected_points = torch.einsum('bij,bkj->bki', K, points)
+    projected_points = projected_points[:, :, :-1]
+    return projected_points
+def avg_rot(rot):
+    # input [B,...,3,3] --> output [...,3,3]
+    rot = rot.mean(dim=0)
+    U, _, V = torch.svd(rot)
+    rot = U @ V.transpose(-1, -2)
+    return rot
+def rot9d_to_rotmat(x):
+    """Convert 9D rotation representation to 3x3 rotation matrix.
+    Based on Levinson et al., "An Analysis of SVD for Deep Rotation Estimation"
+    Input:
+        (B,9) or (B,J*9) Batch of 9D rotation (interpreted as 3x3 est rotmat)
+    Output:
+        (B,3,3) or (B*J,3,3) Batch of corresponding rotation matrices
+    """
+    x = x.view(-1,3,3)
+    u, _, vh = torch.linalg.svd(x)
+    sig = torch.eye(3).expand(len(x), 3, 3).clone()
+    sig = sig.to(x.device)
+    sig[:, -1, -1] = (u @ vh).det()
+    R = u @ sig @ vh
+    return R
+"""
+Deprecated in favor of: rotation_conversions.py
+Useful geometric operations, e.g. differentiable Rodrigues formula
+Parts of the code are taken from https://github.com/MandyMo/pytorch_HMR
+"""
+def batch_rodrigues(theta):
+    """Convert axis-angle representation to rotation matrix.
+    Args:
+        theta: size = [B, 3]
+    Returns:
+        Rotation matrix corresponding to the quaternion -- size = [B, 3, 3]
+    """
+    l1norm = torch.norm(theta + 1e-8, p = 2, dim = 1)
+    angle = torch.unsqueeze(l1norm, -1)
+    normalized = torch.div(theta, angle)
+    angle = angle * 0.5
+    v_cos = torch.cos(angle)
+    v_sin = torch.sin(angle)
+    quat = torch.cat([v_cos, v_sin * normalized], dim = 1)
+    return quat_to_rotmat(quat)
+def quat_to_rotmat(quat):
+    """Convert quaternion coefficients to rotation matrix.
+    Args:
+        quat: size = [B, 4] 4 <===>(w, x, y, z)
+    Returns:
+        Rotation matrix corresponding to the quaternion -- size = [B, 3, 3]
+    """
+    norm_quat = quat
+    norm_quat = norm_quat/norm_quat.norm(p=2, dim=1, keepdim=True)
+    w, x, y, z = norm_quat[:,0], norm_quat[:,1], norm_quat[:,2], norm_quat[:,3]
+    B = quat.size(0)
+    w2, x2, y2, z2 = w.pow(2), x.pow(2), y.pow(2), z.pow(2)
+    wx, wy, wz = w*x, w*y, w*z
+    xy, xz, yz = x*y, x*z, y*z
+    rotMat = torch.stack([w2 + x2 - y2 - z2, 2*xy - 2*wz, 2*wy + 2*xz,
+                          2*wz + 2*xy, w2 - x2 + y2 - z2, 2*yz - 2*wx,
+                          2*xz - 2*wy, 2*wx + 2*yz, w2 - x2 - y2 + z2], dim=1).view(B, 3, 3)
+    return rotMat
+def rot6d_to_rotmat(x):
+    """Convert 6D rotation representation to 3x3 rotation matrix.
+    Based on Zhou et al., "On the Continuity of Rotation Representations in Neural Networks", CVPR 2019
+    Input:
+        (B,6) Batch of 6-D rotation representations
+    Output:
+        (B,3,3) Batch of corresponding rotation matrices
+    """
+    x = x.view(-1,3,2)
+    a1 = x[:, :, 0]
+    a2 = x[:, :, 1]
+    b1 = F.normalize(a1)
+    b2 = F.normalize(a2 - torch.einsum('bi,bi->b', b1, a2).unsqueeze(-1) * b1)
+    b3 = torch.cross(b1, b2)
+    return torch.stack((b1, b2, b3), dim=-1)
+def rot6d_to_rotmat_hmr2(x: torch.Tensor) -> torch.Tensor:
+    """
+    Convert 6D rotation representation to 3x3 rotation matrix.
+    Based on Zhou et al., "On the Continuity of Rotation Representations in Neural Networks", CVPR 2019
+    Args:
+        x (torch.Tensor): (B,6) Batch of 6-D rotation representations.
+    Returns:
+        torch.Tensor: Batch of corresponding rotation matrices with shape (B,3,3).
+    """
+    x = x.reshape(-1,2,3).permute(0, 2, 1).contiguous()
+    a1 = x[:, :, 0]
+    a2 = x[:, :, 1]
+    b1 = F.normalize(a1)
+    b2 = F.normalize(a2 - torch.einsum('bi,bi->b', b1, a2).unsqueeze(-1) * b1)
+    b3 = torch.cross(b1, b2)
+    return torch.stack((b1, b2, b3), dim=-1)
+def rotmat_to_rot6d(rotmat):
+    """ Inverse function of the above.
+    Input:
+        (B,3,3) Batch of corresponding rotation matrices
+    Output:
+        (B,6) Batch of 6-D rotation representations
+    """
+    # rot6d = rotmat[:, :, :2]
+    rot6d = rotmat[...,:2]
+    rot6d = rot6d.reshape(rot6d.size(0), -1)
+    return rot6d
+def rotation_matrix_to_angle_axis(rotation_matrix):
+    """
+    This function is borrowed from https://github.com/kornia/kornia
+    Convert 3x4 rotation matrix to Rodrigues vector
+    Args:
+        rotation_matrix (Tensor): rotation matrix.
+    Returns:
+        Tensor: Rodrigues vector transformation.
+    Shape:
+        - Input: :math:`(N, 3, 4)`
+        - Output: :math:`(N, 3)`
+    Example:
+        >>> input = torch.rand(2, 3, 4)  # Nx4x4
+        >>> output = tgm.rotation_matrix_to_angle_axis(input)  # Nx3
+    """
+    if rotation_matrix.shape[1:] == (3,3):
+        rot_mat = rotation_matrix.reshape(-1, 3, 3)
+        hom = torch.tensor([0, 0, 1], dtype=torch.float32,
+                           device=rotation_matrix.device).reshape(1, 3, 1).expand(rot_mat.shape[0], -1, -1)
+        rotation_matrix = torch.cat([rot_mat, hom], dim=-1)
+    quaternion = rotation_matrix_to_quaternion(rotation_matrix)
+    aa = quaternion_to_angle_axis(quaternion)
+    aa[torch.isnan(aa)] = 0.0
+    return aa
+def quaternion_to_angle_axis(quaternion: torch.Tensor) -> torch.Tensor:
+    """
+    This function is borrowed from https://github.com/kornia/kornia
+    Convert quaternion vector to angle axis of rotation.
+    Adapted from ceres C++ library: ceres-solver/include/ceres/rotation.h
+    Args:
+        quaternion (torch.Tensor): tensor with quaternions.
+    Return:
+        torch.Tensor: tensor with angle axis of rotation.
+    Shape:
+        - Input: :math:`(*, 4)` where `*` means, any number of dimensions
+        - Output: :math:`(*, 3)`
+    Example:
+        >>> quaternion = torch.rand(2, 4)  # Nx4
+        >>> angle_axis = tgm.quaternion_to_angle_axis(quaternion)  # Nx3
+    """
+    if not torch.is_tensor(quaternion):
+        raise TypeError("Input type is not a torch.Tensor. Got {}".format(
+            type(quaternion)))
+    if not quaternion.shape[-1] == 4:
+        raise ValueError("Input must be a tensor of shape Nx4 or 4. Got {}"
+                         .format(quaternion.shape))
+    # unpack input and compute conversion
+    q1: torch.Tensor = quaternion[..., 1]
+    q2: torch.Tensor = quaternion[..., 2]
+    q3: torch.Tensor = quaternion[..., 3]
+    sin_squared_theta: torch.Tensor = q1 * q1 + q2 * q2 + q3 * q3
+    sin_theta: torch.Tensor = torch.sqrt(sin_squared_theta)
+    cos_theta: torch.Tensor = quaternion[..., 0]
+    two_theta: torch.Tensor = 2.0 * torch.where(
+        cos_theta < 0.0,
+        torch.atan2(-sin_theta, -cos_theta),
+        torch.atan2(sin_theta, cos_theta))
+    k_pos: torch.Tensor = two_theta / sin_theta
+    k_neg: torch.Tensor = 2.0 * torch.ones_like(sin_theta)
+    k: torch.Tensor = torch.where(sin_squared_theta > 0.0, k_pos, k_neg)
+    angle_axis: torch.Tensor = torch.zeros_like(quaternion)[..., :3]
+    angle_axis[..., 0] += q1 * k
+    angle_axis[..., 1] += q2 * k
+    angle_axis[..., 2] += q3 * k
+    return angle_axis
+def rotation_matrix_to_quaternion(rotation_matrix, eps=1e-6):
+    """
+    This function is borrowed from https://github.com/kornia/kornia
+    Convert 3x4 rotation matrix to 4d quaternion vector
+    This algorithm is based on algorithm described in
+    https://github.com/KieranWynn/pyquaternion/blob/master/pyquaternion/quaternion.py#L201
+    Args:
+        rotation_matrix (Tensor): the rotation matrix to convert.
+    Return:
+        Tensor: the rotation in quaternion
+    Shape:
+        - Input: :math:`(N, 3, 4)`
+        - Output: :math:`(N, 4)`
+    Example:
+        >>> input = torch.rand(4, 3, 4)  # Nx3x4
+        >>> output = tgm.rotation_matrix_to_quaternion(input)  # Nx4
+    """
+    if not torch.is_tensor(rotation_matrix):
+        raise TypeError("Input type is not a torch.Tensor. Got {}".format(
+            type(rotation_matrix)))
+    if len(rotation_matrix.shape) > 3:
+        raise ValueError(
+            "Input size must be a three dimensional tensor. Got {}".format(
+                rotation_matrix.shape))
+    if not rotation_matrix.shape[-2:] == (3, 4):
+        raise ValueError(
+            "Input size must be a N x 3 x 4  tensor. Got {}".format(
+                rotation_matrix.shape))
+    rmat_t = torch.transpose(rotation_matrix, 1, 2)
+    mask_d2 = rmat_t[:, 2, 2] < eps
+    mask_d0_d1 = rmat_t[:, 0, 0] > rmat_t[:, 1, 1]
+    mask_d0_nd1 = rmat_t[:, 0, 0] < -rmat_t[:, 1, 1]
+    t0 = 1 + rmat_t[:, 0, 0] - rmat_t[:, 1, 1] - rmat_t[:, 2, 2]
+    q0 = torch.stack([rmat_t[:, 1, 2] - rmat_t[:, 2, 1],
+                      t0, rmat_t[:, 0, 1] + rmat_t[:, 1, 0],
+                      rmat_t[:, 2, 0] + rmat_t[:, 0, 2]], -1)
+    t0_rep = t0.repeat(4, 1).t()
+    t1 = 1 - rmat_t[:, 0, 0] + rmat_t[:, 1, 1] - rmat_t[:, 2, 2]
+    q1 = torch.stack([rmat_t[:, 2, 0] - rmat_t[:, 0, 2],
+                      rmat_t[:, 0, 1] + rmat_t[:, 1, 0],
+                      t1, rmat_t[:, 1, 2] + rmat_t[:, 2, 1]], -1)
+    t1_rep = t1.repeat(4, 1).t()
+    t2 = 1 - rmat_t[:, 0, 0] - rmat_t[:, 1, 1] + rmat_t[:, 2, 2]
+    q2 = torch.stack([rmat_t[:, 0, 1] - rmat_t[:, 1, 0],
+                      rmat_t[:, 2, 0] + rmat_t[:, 0, 2],
+                      rmat_t[:, 1, 2] + rmat_t[:, 2, 1], t2], -1)
+    t2_rep = t2.repeat(4, 1).t()
+    t3 = 1 + rmat_t[:, 0, 0] + rmat_t[:, 1, 1] + rmat_t[:, 2, 2]
+    q3 = torch.stack([t3, rmat_t[:, 1, 2] - rmat_t[:, 2, 1],
+                      rmat_t[:, 2, 0] - rmat_t[:, 0, 2],
+                      rmat_t[:, 0, 1] - rmat_t[:, 1, 0]], -1)
+    t3_rep = t3.repeat(4, 1).t()
+    mask_c0 = mask_d2 * mask_d0_d1
+    mask_c1 = mask_d2 * ~mask_d0_d1
+    mask_c2 = ~mask_d2 * mask_d0_nd1
+    mask_c3 = ~mask_d2 * ~mask_d0_nd1
+    mask_c0 = mask_c0.view(-1, 1).type_as(q0)
+    mask_c1 = mask_c1.view(-1, 1).type_as(q1)
+    mask_c2 = mask_c2.view(-1, 1).type_as(q2)
+    mask_c3 = mask_c3.view(-1, 1).type_as(q3)
+    q = q0 * mask_c0 + q1 * mask_c1 + q2 * mask_c2 + q3 * mask_c3
+    q /= torch.sqrt(t0_rep * mask_c0 + t1_rep * mask_c1 +  # noqa
+                    t2_rep * mask_c2 + t3_rep * mask_c3)  # noqa
+    q *= 0.5
+    return q
+def estimate_translation_np(S, joints_2d, joints_conf, focal_length=5000., img_size=224.):
+    """
+    This function is borrowed from https://github.com/nkolot/SPIN/utils/geometry.py
+    Find camera translation that brings 3D joints S closest to 2D the corresponding joints_2d.
+    Input:
+        S: (25, 3) 3D joint locations
+        joints: (25, 3) 2D joint locations and confidence
+    Returns:
+        (3,) camera translation vector
+    """
+    num_joints = S.shape[0]
+    # focal length
+    f = np.array([focal_length,focal_length])
+    # optical center
+    center = np.array([img_size/2., img_size/2.])
+    # transformations
+    Z = np.reshape(np.tile(S[:,2],(2,1)).T,-1)
+    XY = np.reshape(S[:,0:2],-1)
+    O = np.tile(center,num_joints)
+    F = np.tile(f,num_joints)
+    weight2 = np.reshape(np.tile(np.sqrt(joints_conf),(2,1)).T,-1)
+    # least squares
+    Q = np.array([F*np.tile(np.array([1,0]),num_joints), F*np.tile(np.array([0,1]),num_joints), O-np.reshape(joints_2d,-1)]).T
+    c = (np.reshape(joints_2d,-1)-O)*Z - F*XY
+    # weighted least squares
+    W = np.diagflat(weight2)
+    Q = np.dot(W,Q)
+    c = np.dot(W,c)
+    # square matrix
+    A = np.dot(Q.T,Q)
+    b = np.dot(Q.T,c)
+    # solution
+    trans = np.linalg.solve(A, b)
+    return trans
+def estimate_translation(S, joints_2d, focal_length=5000., img_size=224.):
+    """Find camera translation that brings 3D joints S closest to 2D the corresponding joints_2d.
+    Input:
+        S: (B, 49, 3) 3D joint locations
+        joints: (B, 49, 3) 2D joint locations and confidence
+    Returns:
+        (B, 3) camera translation vectors
+    """
+    device = S.device
+    # Use only joints 25:49 (GT joints)
+    S = S[:, -24:, :3].cpu().numpy()
+    joints_2d = joints_2d[:, -24:, :].cpu().numpy()
+    joints_conf = joints_2d[:, :, -1]
+    joints_2d = joints_2d[:, :, :-1]
+    trans = np.zeros((S.shape[0], 3), dtype=np.float32)
+    # Find the translation for each example in the batch
+    for i in range(S.shape[0]):
+        S_i = S[i]
+        joints_i = joints_2d[i]
+        conf_i = joints_conf[i]
+        trans[i] = estimate_translation_np(S_i, joints_i, conf_i, focal_length=focal_length, img_size=img_size)
+    return torch.from_numpy(trans).to(device)

infiller/hand_utils/geometry_utils.py ADDED Viewed

	@@ -0,0 +1,102 @@

+from typing import Optional
+import torch
+from torch.nn import functional as F
+def aa_to_rotmat(theta: torch.Tensor):
+    """
+    Convert axis-angle representation to rotation matrix.
+    Works by first converting it to a quaternion.
+    Args:
+        theta (torch.Tensor): Tensor of shape (B, 3) containing axis-angle representations.
+    Returns:
+        torch.Tensor: Corresponding rotation matrices with shape (B, 3, 3).
+    """
+    norm = torch.norm(theta + 1e-8, p = 2, dim = 1)
+    angle = torch.unsqueeze(norm, -1)
+    normalized = torch.div(theta, angle)
+    angle = angle * 0.5
+    v_cos = torch.cos(angle)
+    v_sin = torch.sin(angle)
+    quat = torch.cat([v_cos, v_sin * normalized], dim = 1)
+    return quat_to_rotmat(quat)
+def quat_to_rotmat(quat: torch.Tensor) -> torch.Tensor:
+    """
+    Convert quaternion representation to rotation matrix.
+    Args:
+        quat (torch.Tensor) of shape (B, 4); 4 <===> (w, x, y, z).
+    Returns:
+        torch.Tensor: Corresponding rotation matrices with shape (B, 3, 3).
+    """
+    norm_quat = quat
+    norm_quat = norm_quat/norm_quat.norm(p=2, dim=1, keepdim=True)
+    w, x, y, z = norm_quat[:,0], norm_quat[:,1], norm_quat[:,2], norm_quat[:,3]
+    B = quat.size(0)
+    w2, x2, y2, z2 = w.pow(2), x.pow(2), y.pow(2), z.pow(2)
+    wx, wy, wz = w*x, w*y, w*z
+    xy, xz, yz = x*y, x*z, y*z
+    rotMat = torch.stack([w2 + x2 - y2 - z2, 2*xy - 2*wz, 2*wy + 2*xz,
+                          2*wz + 2*xy, w2 - x2 + y2 - z2, 2*yz - 2*wx,
+                          2*xz - 2*wy, 2*wx + 2*yz, w2 - x2 - y2 + z2], dim=1).view(B, 3, 3)
+    return rotMat
+def rot6d_to_rotmat(x: torch.Tensor) -> torch.Tensor:
+    """
+    Convert 6D rotation representation to 3x3 rotation matrix.
+    Based on Zhou et al., "On the Continuity of Rotation Representations in Neural Networks", CVPR 2019
+    Args:
+        x (torch.Tensor): (B,6) Batch of 6-D rotation representations.
+    Returns:
+        torch.Tensor: Batch of corresponding rotation matrices with shape (B,3,3).
+    """
+    x = x.reshape(-1,2,3).permute(0, 2, 1).contiguous()
+    a1 = x[:, :, 0]
+    a2 = x[:, :, 1]
+    b1 = F.normalize(a1)
+    b2 = F.normalize(a2 - torch.einsum('bi,bi->b', b1, a2).unsqueeze(-1) * b1)
+    b3 = torch.linalg.cross(b1, b2)
+    return torch.stack((b1, b2, b3), dim=-1)
+def perspective_projection(points: torch.Tensor,
+                           translation: torch.Tensor,
+                           focal_length: torch.Tensor,
+                           camera_center: Optional[torch.Tensor] = None,
+                           rotation: Optional[torch.Tensor] = None) -> torch.Tensor:
+    """
+    Computes the perspective projection of a set of 3D points.
+    Args:
+        points (torch.Tensor): Tensor of shape (B, N, 3) containing the input 3D points.
+        translation (torch.Tensor): Tensor of shape (B, 3) containing the 3D camera translation.
+        focal_length (torch.Tensor): Tensor of shape (B, 2) containing the focal length in pixels.
+        camera_center (torch.Tensor): Tensor of shape (B, 2) containing the camera center in pixels.
+        rotation (torch.Tensor): Tensor of shape (B, 3, 3) containing the camera rotation.
+    Returns:
+        torch.Tensor: Tensor of shape (B, N, 2) containing the projection of the input points.
+    """
+    batch_size = points.shape[0]
+    if rotation is None:
+        rotation = torch.eye(3, device=points.device, dtype=points.dtype).unsqueeze(0).expand(batch_size, -1, -1)
+    if camera_center is None:
+        camera_center = torch.zeros(batch_size, 2, device=points.device, dtype=points.dtype)
+    # Populate intrinsic camera matrix K.
+    K = torch.zeros([batch_size, 3, 3], device=points.device, dtype=points.dtype)
+    K[:,0,0] = focal_length[:,0]
+    K[:,1,1] = focal_length[:,1]
+    K[:,2,2] = 1.
+    K[:,:-1, -1] = camera_center
+    # Transform points
+    points = torch.einsum('bij,bkj->bki', rotation, points)
+    points = points + translation.unsqueeze(1)
+    # Apply perspective distortion
+    projected_points = points / points[:,:,-1].unsqueeze(-1)
+    # Apply camera intrinsics
+    projected_points = torch.einsum('bij,bkj->bki', K, projected_points)
+    return projected_points[:, :, :-1]

infiller/hand_utils/mano_wrapper.py ADDED Viewed

	@@ -0,0 +1,52 @@

+import torch
+import numpy as np
+import pickle
+from typing import Optional
+import smplx
+from smplx.lbs import vertices2joints
+from smplx.utils import MANOOutput, to_tensor
+from smplx.vertex_ids import vertex_ids
+class MANO(smplx.MANOLayer):
+    def __init__(self, *args, joint_regressor_extra: Optional[str] = None, **kwargs):
+        """
+        Extension of the official MANO implementation to support more joints.
+        Args:
+            Same as MANOLayer.
+            joint_regressor_extra (str): Path to extra joint regressor.
+        """
+        super(MANO, self).__init__(*args, **kwargs)
+        mano_to_openpose = [0, 13, 14, 15, 16, 1, 2, 3, 17, 4, 5, 6, 18, 10, 11, 12, 19, 7, 8, 9, 20]
+        #2, 3, 5, 4, 1
+        if joint_regressor_extra is not None:
+            self.register_buffer('joint_regressor_extra', torch.tensor(pickle.load(open(joint_regressor_extra, 'rb'), encoding='latin1'), dtype=torch.float32))
+        self.register_buffer('extra_joints_idxs', to_tensor(list(vertex_ids['mano'].values()), dtype=torch.long))
+        self.register_buffer('joint_map', torch.tensor(mano_to_openpose, dtype=torch.long))
+    def forward(self, *args, **kwargs) -> MANOOutput:
+        """
+        Run forward pass. Same as MANO and also append an extra set of joints if joint_regressor_extra is specified.
+        """
+        mano_output = super(MANO, self).forward(*args, **kwargs)
+        extra_joints = torch.index_select(mano_output.vertices, 1, self.extra_joints_idxs)
+        joints = torch.cat([mano_output.joints, extra_joints], dim=1)
+        joints = joints[:, self.joint_map, :]
+        if hasattr(self, 'joint_regressor_extra'):
+            extra_joints = vertices2joints(self.joint_regressor_extra, mano_output.vertices)
+            joints = torch.cat([joints, extra_joints], dim=1)
+        mano_output.joints = joints
+        return mano_output
+    def query(self, hmr_output):
+        batch_size = hmr_output['pred_rotmat'].shape[0]
+        pred_rotmat = hmr_output['pred_rotmat'].reshape(batch_size, -1, 3, 3)
+        pred_shape = hmr_output['pred_shape'].reshape(batch_size, 10)
+        mano_output = self(global_orient=pred_rotmat[:, [0]],
+                        hand_pose = pred_rotmat[:, 1:],
+                        betas = pred_shape,
+                        pose2rot=False)
+        return mano_output

infiller/hand_utils/process.py ADDED Viewed

	@@ -0,0 +1,171 @@

+import torch
+from hand_utils.mano_wrapper import MANO
+from hand_utils.geometry_utils import aa_to_rotmat
+import numpy as np
+def run_mano(trans, root_orient, hand_pose, is_right=None, betas=None, use_cuda=True):
+    """
+    Forward pass of the SMPL model and populates pred_data accordingly with
+    joints3d, verts3d, points3d.
+    trans : B x T x 3
+    root_orient : B x T x 3
+    body_pose : B x T x J*3
+    betas : (optional) B x D
+    """
+    MANO_cfg = {
+        'DATA_DIR': '_DATA/data/',
+        'MODEL_PATH': '_DATA/data/mano',
+        'GENDER': 'neutral',
+        'NUM_HAND_JOINTS': 15,
+        'CREATE_BODY_POSE': False
+    }
+    mano_cfg = {k.lower(): v for k,v in MANO_cfg.items()}
+    mano = MANO(**mano_cfg)
+    if use_cuda:
+        mano = mano.cuda()
+    B, T, _ = root_orient.shape
+    NUM_JOINTS = 15
+    mano_params = {
+        'global_orient': root_orient.reshape(B*T, -1),
+        'hand_pose': hand_pose.reshape(B*T*NUM_JOINTS, 3),
+        'betas': betas.reshape(B*T, -1),
+    }
+    rotmat_mano_params = mano_params
+    rotmat_mano_params['global_orient'] = aa_to_rotmat(mano_params['global_orient']).view(B*T, 1, 3, 3)
+    rotmat_mano_params['hand_pose'] = aa_to_rotmat(mano_params['hand_pose']).view(B*T, NUM_JOINTS, 3, 3)
+    rotmat_mano_params['transl'] = trans.reshape(B*T, 3)
+    if use_cuda:
+        mano_output = mano(**{k: v.float().cuda() for k,v in rotmat_mano_params.items()}, pose2rot=False)
+    else:
+        mano_output = mano(**{k: v.float() for k,v in rotmat_mano_params.items()}, pose2rot=False)
+    faces_right = mano.faces
+    faces_new = np.array([[92, 38, 234],
+                        [234, 38, 239],
+                        [38, 122, 239],
+                        [239, 122, 279],
+                        [122, 118, 279],
+                        [279, 118, 215],
+                        [118, 117, 215],
+                        [215, 117, 214],
+                        [117, 119, 214],
+                        [214, 119, 121],
+                        [119, 120, 121],
+                        [121, 120, 78],
+                        [120, 108, 78],
+                        [78, 108, 79]])
+    faces_right = np.concatenate([faces_right, faces_new], axis=0)
+    faces_n = len(faces_right)
+    faces_left = faces_right[:,[0,2,1]]
+    outputs = {
+        "joints": mano_output.joints.reshape(B, T, -1, 3),
+        "vertices": mano_output.vertices.reshape(B, T, -1, 3),
+    }
+    if not is_right is None:
+        # outputs["vertices"][..., 0] = (2*is_right-1)*outputs["vertices"][..., 0]
+        # outputs["joints"][..., 0] = (2*is_right-1)*outputs["joints"][..., 0]
+        is_right = (is_right[:, :, 0].cpu().numpy() > 0)
+        faces_result = np.zeros((B, T, faces_n, 3))
+        faces_right_expanded = np.expand_dims(np.expand_dims(faces_right, axis=0), axis=0)
+        faces_left_expanded = np.expand_dims(np.expand_dims(faces_left, axis=0), axis=0)
+        faces_result = np.where(is_right[..., np.newaxis, np.newaxis], faces_right_expanded, faces_left_expanded)
+        outputs["faces"] = torch.from_numpy(faces_result.astype(np.int32))
+    return outputs
+def run_mano_left(trans, root_orient, hand_pose, is_right=None, betas=None, use_cuda=True, fix_shapedirs=True):
+    """
+    Forward pass of the SMPL model and populates pred_data accordingly with
+    joints3d, verts3d, points3d.
+    trans : B x T x 3
+    root_orient : B x T x 3
+    body_pose : B x T x J*3
+    betas : (optional) B x D
+    """
+    MANO_cfg = {
+        'DATA_DIR': '_DATA/data_left/',
+        'MODEL_PATH': '_DATA/data_left/mano_left',
+        'GENDER': 'neutral',
+        'NUM_HAND_JOINTS': 15,
+        'CREATE_BODY_POSE': False,
+        'is_rhand': False
+    }
+    mano_cfg = {k.lower(): v for k,v in MANO_cfg.items()}
+    mano = MANO(**mano_cfg)
+    if use_cuda:
+        mano = mano.cuda()
+    # fix MANO shapedirs of the left hand bug (https://github.com/vchoutas/smplx/issues/48)
+    if fix_shapedirs:
+        mano.shapedirs[:, 0, :] *= -1
+    B, T, _ = root_orient.shape
+    NUM_JOINTS = 15
+    mano_params = {
+        'global_orient': root_orient.reshape(B*T, -1),
+        'hand_pose': hand_pose.reshape(B*T*NUM_JOINTS, 3),
+        'betas': betas.reshape(B*T, -1),
+    }
+    rotmat_mano_params = mano_params
+    rotmat_mano_params['global_orient'] = aa_to_rotmat(mano_params['global_orient']).view(B*T, 1, 3, 3)
+    rotmat_mano_params['hand_pose'] = aa_to_rotmat(mano_params['hand_pose']).view(B*T, NUM_JOINTS, 3, 3)
+    rotmat_mano_params['transl'] = trans.reshape(B*T, 3)
+    if use_cuda:
+        mano_output = mano(**{k: v.float().cuda() for k,v in rotmat_mano_params.items()}, pose2rot=False)
+    else:
+        mano_output = mano(**{k: v.float() for k,v in rotmat_mano_params.items()}, pose2rot=False)
+    faces_right = mano.faces
+    faces_new = np.array([[92, 38, 234],
+                        [234, 38, 239],
+                        [38, 122, 239],
+                        [239, 122, 279],
+                        [122, 118, 279],
+                        [279, 118, 215],
+                        [118, 117, 215],
+                        [215, 117, 214],
+                        [117, 119, 214],
+                        [214, 119, 121],
+                        [119, 120, 121],
+                        [121, 120, 78],
+                        [120, 108, 78],
+                        [78, 108, 79]])
+    faces_right = np.concatenate([faces_right, faces_new], axis=0)
+    faces_n = len(faces_right)
+    faces_left = faces_right[:,[0,2,1]]
+    outputs = {
+        "joints": mano_output.joints.reshape(B, T, -1, 3),
+        "vertices": mano_output.vertices.reshape(B, T, -1, 3),
+    }
+    if not is_right is None:
+        # outputs["vertices"][..., 0] = (2*is_right-1)*outputs["vertices"][..., 0]
+        # outputs["joints"][..., 0] = (2*is_right-1)*outputs["joints"][..., 0]
+        is_right = (is_right[:, :, 0].cpu().numpy() > 0)
+        faces_result = np.zeros((B, T, faces_n, 3))
+        faces_right_expanded = np.expand_dims(np.expand_dims(faces_right, axis=0), axis=0)
+        faces_left_expanded = np.expand_dims(np.expand_dims(faces_left, axis=0), axis=0)
+        faces_result = np.where(is_right[..., np.newaxis, np.newaxis], faces_right_expanded, faces_left_expanded)
+        outputs["faces"] = torch.from_numpy(faces_result.astype(np.int32))
+    return outputs
+def run_mano_twohands(init_trans, init_rot, init_hand_pose, is_right, init_betas, use_cuda=True, fix_shapedirs=True):
+    outputs_left = run_mano_left(init_trans[0:1], init_rot[0:1], init_hand_pose[0:1], None, init_betas[0:1], use_cuda=use_cuda, fix_shapedirs=fix_shapedirs)
+    outputs_right = run_mano(init_trans[1:2], init_rot[1:2], init_hand_pose[1:2], None, init_betas[1:2], use_cuda=use_cuda)
+    outputs_two = {
+        "vertices": torch.cat((outputs_left["vertices"], outputs_right["vertices"]), dim=0),
+        "joints": torch.cat((outputs_left["joints"], outputs_right["joints"]), dim=0)
+    }
+    return outputs_two

infiller/hand_utils/rotation.py ADDED Viewed

	@@ -0,0 +1,293 @@

+import torch
+import numpy as np
+from torch.nn import functional as F
+def batch_rodrigues(rot_vecs, epsilon=1e-8, dtype=torch.float32):
+    """
+    Taken from https://github.com/mkocabas/VIBE/blob/master/lib/utils/geometry.py
+    Calculates the rotation matrices for a batch of rotation vectors
+    - param rot_vecs: torch.tensor (N, 3) array of N axis-angle vectors
+    - returns R: torch.tensor (N, 3, 3) rotation matrices
+    """
+    batch_size = rot_vecs.shape[0]
+    device = rot_vecs.device
+    angle = torch.norm(rot_vecs + 1e-8, dim=1, keepdim=True)
+    rot_dir = rot_vecs / angle
+    cos = torch.unsqueeze(torch.cos(angle), dim=1)
+    sin = torch.unsqueeze(torch.sin(angle), dim=1)
+    # Bx1 arrays
+    rx, ry, rz = torch.split(rot_dir, 1, dim=1)
+    K = torch.zeros((batch_size, 3, 3), dtype=dtype, device=device)
+    zeros = torch.zeros((batch_size, 1), dtype=dtype, device=device)
+    K = torch.cat([zeros, -rz, ry, rz, zeros, -rx, -ry, rx, zeros], dim=1).view(
+        (batch_size, 3, 3)
+    )
+    ident = torch.eye(3, dtype=dtype, device=device).unsqueeze(dim=0)
+    rot_mat = ident + sin * K + (1 - cos) * torch.bmm(K, K)
+    return rot_mat
+def quaternion_mul(q0, q1):
+    """
+    EXPECTS WXYZ
+    :param q0 (*, 4)
+    :param q1 (*, 4)
+    """
+    r0, r1 = q0[..., :1], q1[..., :1]
+    v0, v1 = q0[..., 1:], q1[..., 1:]
+    r = r0 * r1 - (v0 * v1).sum(dim=-1, keepdim=True)
+    v = r0 * v1 + r1 * v0 + torch.linalg.cross(v0, v1)
+    return torch.cat([r, v], dim=-1)
+def quaternion_inverse(q, eps=1e-8):
+    """
+    EXPECTS WXYZ
+    :param q (*, 4)
+    """
+    conj = torch.cat([q[..., :1], -q[..., 1:]], dim=-1)
+    mag = torch.square(q).sum(dim=-1, keepdim=True) + eps
+    return conj / mag
+def quaternion_slerp(t, q0, q1, eps=1e-8):
+    """
+    :param t (*, 1)  must be between 0 and 1
+    :param q0 (*, 4)
+    :param q1 (*, 4)
+    """
+    dims = q0.shape[:-1]
+    t = t.view(*dims, 1)
+    q0 = F.normalize(q0, p=2, dim=-1)
+    q1 = F.normalize(q1, p=2, dim=-1)
+    dot = (q0 * q1).sum(dim=-1, keepdim=True)
+    # make sure we give the shortest rotation path (< 180d)
+    neg = dot < 0
+    q1 = torch.where(neg, -q1, q1)
+    dot = torch.where(neg, -dot, dot)
+    angle = torch.acos(dot)
+    # if angle is too small, just do linear interpolation
+    collin = torch.abs(dot) > 1 - eps
+    fac = 1 / torch.sin(angle)
+    w0 = torch.where(collin, 1 - t, torch.sin((1 - t) * angle) * fac)
+    w1 = torch.where(collin, t, torch.sin(t * angle) * fac)
+    slerp = q0 * w0 + q1 * w1
+    return slerp
+def rotation_matrix_to_angle_axis(rotation_matrix):
+    """
+    This function is borrowed from https://github.com/kornia/kornia
+    Convert rotation matrix to Rodrigues vector
+    """
+    quaternion = rotation_matrix_to_quaternion(rotation_matrix)
+    aa = quaternion_to_angle_axis(quaternion)
+    aa[torch.isnan(aa)] = 0.0
+    return aa
+def quaternion_to_angle_axis(quaternion):
+    """
+    This function is borrowed from https://github.com/kornia/kornia
+    Convert quaternion vector to angle axis of rotation.
+    Adapted from ceres C++ library: ceres-solver/include/ceres/rotation.h
+    :param quaternion (*, 4) expects WXYZ
+    :returns angle_axis (*, 3)
+    """
+    # unpack input and compute conversion
+    q1 = quaternion[..., 1]
+    q2 = quaternion[..., 2]
+    q3 = quaternion[..., 3]
+    sin_squared_theta = q1 * q1 + q2 * q2 + q3 * q3
+    sin_theta = torch.sqrt(sin_squared_theta)
+    cos_theta = quaternion[..., 0]
+    two_theta = 2.0 * torch.where(
+        cos_theta < 0.0,
+        torch.atan2(-sin_theta, -cos_theta),
+        torch.atan2(sin_theta, cos_theta),
+    )
+    k_pos = two_theta / sin_theta
+    k_neg = 2.0 * torch.ones_like(sin_theta)
+    k = torch.where(sin_squared_theta > 0.0, k_pos, k_neg)
+    angle_axis = torch.zeros_like(quaternion)[..., :3]
+    angle_axis[..., 0] += q1 * k
+    angle_axis[..., 1] += q2 * k
+    angle_axis[..., 2] += q3 * k
+    return angle_axis
+def angle_axis_to_rotation_matrix(angle_axis):
+    """
+    :param angle_axis (*, 3)
+    return (*, 3, 3)
+    """
+    quat = angle_axis_to_quaternion(angle_axis)
+    return quaternion_to_rotation_matrix(quat)
+def quaternion_to_rotation_matrix(quaternion):
+    """
+    Convert a quaternion to a rotation matrix.
+    Taken from https://github.com/kornia/kornia, based on
+    https://github.com/matthew-brett/transforms3d/blob/8965c48401d9e8e66b6a8c37c65f2fc200a076fa/transforms3d/quaternions.py#L101
+    https://github.com/tensorflow/graphics/blob/master/tensorflow_graphics/geometry/transformation/rotation_matrix_3d.py#L247
+    :param quaternion (N, 4) expects WXYZ order
+    returns rotation matrix (N, 3, 3)
+    """
+    # normalize the input quaternion
+    quaternion_norm = F.normalize(quaternion, p=2, dim=-1, eps=1e-12)
+    *dims, _ = quaternion_norm.shape
+    # unpack the normalized quaternion components
+    w, x, y, z = torch.chunk(quaternion_norm, chunks=4, dim=-1)
+    # compute the actual conversion
+    tx = 2.0 * x
+    ty = 2.0 * y
+    tz = 2.0 * z
+    twx = tx * w
+    twy = ty * w
+    twz = tz * w
+    txx = tx * x
+    txy = ty * x
+    txz = tz * x
+    tyy = ty * y
+    tyz = tz * y
+    tzz = tz * z
+    one = torch.tensor(1.0)
+    matrix = torch.stack(
+        (
+            one - (tyy + tzz),
+            txy - twz,
+            txz + twy,
+            txy + twz,
+            one - (txx + tzz),
+            tyz - twx,
+            txz - twy,
+            tyz + twx,
+            one - (txx + tyy),
+        ),
+        dim=-1,
+    ).view(*dims, 3, 3)
+    return matrix
+def angle_axis_to_quaternion(angle_axis):
+    """
+    This function is borrowed from https://github.com/kornia/kornia
+    Convert angle axis to quaternion in WXYZ order
+    :param angle_axis (*, 3)
+    :returns quaternion (*, 4) WXYZ order
+    """
+    theta_sq = torch.sum(angle_axis**2, dim=-1, keepdim=True)  # (*, 1)
+    # need to handle the zero rotation case
+    valid = theta_sq > 0
+    theta = torch.sqrt(theta_sq)
+    half_theta = 0.5 * theta
+    ones = torch.ones_like(half_theta)
+    # fill zero with the limit of sin ax / x -> a
+    k = torch.where(valid, torch.sin(half_theta) / theta, 0.5 * ones)
+    w = torch.where(valid, torch.cos(half_theta), ones)
+    quat = torch.cat([w, k * angle_axis], dim=-1)
+    return quat
+def rotation_matrix_to_quaternion(rotation_matrix, eps=1e-6):
+    """
+    This function is borrowed from https://github.com/kornia/kornia
+    Convert rotation matrix to 4d quaternion vector
+    This algorithm is based on algorithm described in
+    https://github.com/KieranWynn/pyquaternion/blob/master/pyquaternion/quaternion.py#L201
+    :param rotation_matrix (N, 3, 3)
+    """
+    *dims, m, n = rotation_matrix.shape
+    rmat_t = torch.transpose(rotation_matrix.reshape(-1, m, n), -1, -2)
+    mask_d2 = rmat_t[:, 2, 2] < eps
+    mask_d0_d1 = rmat_t[:, 0, 0] > rmat_t[:, 1, 1]
+    mask_d0_nd1 = rmat_t[:, 0, 0] < -rmat_t[:, 1, 1]
+    t0 = 1 + rmat_t[:, 0, 0] - rmat_t[:, 1, 1] - rmat_t[:, 2, 2]
+    q0 = torch.stack(
+        [
+            rmat_t[:, 1, 2] - rmat_t[:, 2, 1],
+            t0,
+            rmat_t[:, 0, 1] + rmat_t[:, 1, 0],
+            rmat_t[:, 2, 0] + rmat_t[:, 0, 2],
+        ],
+        -1,
+    )
+    t0_rep = t0.repeat(4, 1).t()
+    t1 = 1 - rmat_t[:, 0, 0] + rmat_t[:, 1, 1] - rmat_t[:, 2, 2]
+    q1 = torch.stack(
+        [
+            rmat_t[:, 2, 0] - rmat_t[:, 0, 2],
+            rmat_t[:, 0, 1] + rmat_t[:, 1, 0],
+            t1,
+            rmat_t[:, 1, 2] + rmat_t[:, 2, 1],
+        ],
+        -1,
+    )
+    t1_rep = t1.repeat(4, 1).t()
+    t2 = 1 - rmat_t[:, 0, 0] - rmat_t[:, 1, 1] + rmat_t[:, 2, 2]
+    q2 = torch.stack(
+        [
+            rmat_t[:, 0, 1] - rmat_t[:, 1, 0],
+            rmat_t[:, 2, 0] + rmat_t[:, 0, 2],
+            rmat_t[:, 1, 2] + rmat_t[:, 2, 1],
+            t2,
+        ],
+        -1,
+    )
+    t2_rep = t2.repeat(4, 1).t()
+    t3 = 1 + rmat_t[:, 0, 0] + rmat_t[:, 1, 1] + rmat_t[:, 2, 2]
+    q3 = torch.stack(
+        [
+            t3,
+            rmat_t[:, 1, 2] - rmat_t[:, 2, 1],
+            rmat_t[:, 2, 0] - rmat_t[:, 0, 2],
+            rmat_t[:, 0, 1] - rmat_t[:, 1, 0],
+        ],
+        -1,
+    )
+    t3_rep = t3.repeat(4, 1).t()
+    mask_c0 = mask_d2 * mask_d0_d1
+    mask_c1 = mask_d2 * ~mask_d0_d1
+    mask_c2 = ~mask_d2 * mask_d0_nd1
+    mask_c3 = ~mask_d2 * ~mask_d0_nd1
+    mask_c0 = mask_c0.view(-1, 1).type_as(q0)
+    mask_c1 = mask_c1.view(-1, 1).type_as(q1)
+    mask_c2 = mask_c2.view(-1, 1).type_as(q2)
+    mask_c3 = mask_c3.view(-1, 1).type_as(q3)
+    q = q0 * mask_c0 + q1 * mask_c1 + q2 * mask_c2 + q3 * mask_c3
+    q /= torch.sqrt(
+        t0_rep * mask_c0
+        + t1_rep * mask_c1
+        + t2_rep * mask_c2  # noqa
+        + t3_rep * mask_c3
+    )  # noqa
+    q *= 0.5
+    return q.reshape(*dims, 4)

infiller/lib/misc/sampler.py ADDED Viewed

	@@ -0,0 +1,79 @@

+import argparse
+import os
+from pathlib import Path
+import imageio
+import numpy as np
+import torch
+import torch.nn as nn
+from PIL import Image
+from sklearn.preprocessing import LabelEncoder
+from cmib.data.lafan1_dataset import LAFAN1Dataset
+from cmib.data.utils import write_json
+from cmib.lafan1.utils import quat_ik
+from cmib.model.network import TransformerModel
+from cmib.model.preprocess import (lerp_input_repr, replace_constant,
+                                  slerp_input_repr, vectorize_representation)
+from cmib.model.skeleton import (Skeleton, sk_joints_to_remove, sk_offsets, joint_names,
+                                sk_parents)
+from cmib.vis.pose import plot_pose_with_stop
+def test(opt, device):
+    save_dir = Path(os.path.join('runs', 'train', opt.exp_name))
+    wdir = save_dir / 'weights'
+    weights = os.listdir(wdir)
+    weights_paths = [wdir / weight for weight in weights]
+    latest_weight = max(weights_paths , key = os.path.getctime)
+    ckpt = torch.load(latest_weight, map_location=device)
+    print(f"Loaded weight: {latest_weight}")
+    # Load Skeleton
+    skeleton_mocap = Skeleton(offsets=sk_offsets, parents=sk_parents, device=device)
+    skeleton_mocap.remove_joints(sk_joints_to_remove)
+    # Load LAFAN Dataset
+    Path(opt.processed_data_dir).mkdir(parents=True, exist_ok=True)
+    lafan_dataset = LAFAN1Dataset(lafan_path=opt.data_path, processed_data_dir=opt.processed_data_dir, train=False, device=device)
+    total_data = lafan_dataset.data['global_pos'].shape[0]
+    # Replace with noise to In-betweening Frames
+    from_idx, target_idx = ckpt['from_idx'], ckpt['target_idx'] # default: 9-40, max: 48
+    horizon = ckpt['horizon']
+    print(f"HORIZON: {horizon}")
+    test_idx = []
+    for i in range(total_data):
+        test_idx.append(i)
+    # Compare Input data, Prediction, GT
+    save_path = os.path.join(opt.save_path, 'sampler')
+    for i in range(len(test_idx)):
+        Path(save_path).mkdir(parents=True, exist_ok=True)
+        start_pose =  lafan_dataset.data['global_pos'][test_idx[i], from_idx]
+        target_pose = lafan_dataset.data['global_pos'][test_idx[i], target_idx]
+        gt_stopover_pose = lafan_dataset.data['global_pos'][test_idx[i], from_idx]
+        gt_img_path = os.path.join(save_path)
+        plot_pose_with_stop(start_pose, target_pose, target_pose, gt_stopover_pose, i, skeleton_mocap, save_dir=gt_img_path, prefix='gt')
+        print(f"ID {test_idx[i]}: completed.")
+def parse_opt():
+    parser = argparse.ArgumentParser()
+    parser.add_argument('--project', default='runs/train', help='project/name')
+    parser.add_argument('--exp_name', default='slerp_40', help='experiment name')
+    parser.add_argument('--data_path', type=str, default='ubisoft-laforge-animation-dataset/output/BVH', help='BVH dataset path')
+    parser.add_argument('--skeleton_path', type=str, default='ubisoft-laforge-animation-dataset/output/BVH/walk1_subject1.bvh', help='path to reference skeleton')
+    parser.add_argument('--processed_data_dir', type=str, default='processed_data_original/', help='path to save pickled processed data')
+    parser.add_argument('--save_path', type=str, default='runs/test', help='path to save model')
+    parser.add_argument('--motion_type', type=str, default='jumps', help='motion type')
+    opt = parser.parse_args()
+    return opt
+if __name__ == "__main__":
+    opt = parse_opt()
+    device = torch.device("cpu")
+    test(opt, device)

infiller/lib/model/__pycache__/network.cpython-310.pyc ADDED Viewed

Binary file (7.82 kB). View file

infiller/lib/model/network.py ADDED Viewed

	@@ -0,0 +1,276 @@

+import math
+import numpy as np
+import torch
+from torch import nn, Tensor
+from torch.nn import TransformerEncoder, TransformerEncoderLayer
+# from cmib.model.positional_encoding import PositionalEmbedding
+class SinPositionalEncoding(nn.Module):
+    def __init__(self, d_model, dropout=0.1, max_len=100):
+        super(SinPositionalEncoding, self).__init__()
+        self.dropout = nn.Dropout(p=dropout)
+        pe = torch.zeros(max_len, d_model)
+        position = torch.arange(0, max_len, dtype=torch.float).unsqueeze(1)
+        div_term = torch.exp(torch.arange(0, d_model, 2).float() * (-np.log(10000.0) / d_model))
+        pe[:, 0::2] = torch.sin(position * div_term)
+        pe[:, 1::2] = torch.cos(position * div_term)
+        pe = pe.unsqueeze(0).transpose(0, 1)
+        self.register_buffer('pe', pe)
+    def forward(self, x):
+        # not used in the final model
+        x = x + self.pe[:x.shape[0], :]
+        return self.dropout(x)
+class MultiHeadedAttention(nn.Module):
+    def __init__(self, n_head, d_model, d_head, dropout=0.1,
+                 pre_lnorm=True, bias=False):
+        """
+        Multi-headed attention with relative positional encoding and
+        memory mechanism.
+        Args:
+            n_head (int): Number of heads.
+            d_model (int): Input dimension.
+            d_head (int): Head dimension.
+            dropout (float, optional): Dropout value. Defaults to 0.1.
+            pre_lnorm (bool, optional):
+                Apply layer norm before rest of calculation. Defaults to True.
+                In original Transformer paper (pre_lnorm=False):
+                    LayerNorm(x + Sublayer(x))
+                In tensor2tensor implementation (pre_lnorm=True):
+                    x + Sublayer(LayerNorm(x))
+            bias (bool, optional):
+                Add bias to q, k, v and output projections. Defaults to False.
+        """
+        super(MultiHeadedAttention, self).__init__()
+        self.n_head = n_head
+        self.d_model = d_model
+        self.d_head = d_head
+        self.dropout = dropout
+        self.pre_lnorm = pre_lnorm
+        self.bias = bias
+        self.atten_scale = 1 / math.sqrt(self.d_model)
+        self.q_linear = nn.Linear(d_model, n_head * d_head, bias=bias)
+        self.k_linear = nn.Linear(d_model, n_head * d_head, bias=bias)
+        self.v_linear = nn.Linear(d_model, n_head * d_head, bias=bias)
+        self.out_linear = nn.Linear(n_head * d_head, d_model, bias=bias)
+        self.droput_layer = nn.Dropout(dropout)
+        self.atten_dropout_layer = nn.Dropout(dropout)
+        self.layer_norm = nn.LayerNorm(d_model)
+    def forward(self, hidden, memory=None, mask=None,
+                extra_atten_score=None):
+        """
+        Args:
+            hidden (Tensor): Input embedding or hidden state of previous layer.
+                Shape: (batch, seq, dim)
+            pos_emb (Tensor): Relative positional embedding lookup table.
+                Shape: (batch, (seq+mem_len)*2-1, d_head)
+                pos_emb[:, seq+mem_len]
+            memory (Tensor): Memory tensor of previous layer.
+                Shape: (batch, mem_len, dim)
+            mask (BoolTensor, optional): Attention mask.
+                Set item value to True if you DO NOT want keep certain
+                attention score, otherwise False. Defaults to None.
+                Shape: (seq, seq+mem_len).
+        """
+        combined = hidden
+        # if memory is None:
+        #     combined = hidden
+        #     mem_len = 0
+        # else:
+        #     combined = torch.cat([memory, hidden], dim=1)
+        #     mem_len = memory.shape[1]
+        if self.pre_lnorm:
+            hidden = self.layer_norm(hidden)
+            combined = self.layer_norm(combined)
+        # shape: (batch, q/k/v_len, dim)
+        q = self.q_linear(hidden)
+        k = self.k_linear(combined)
+        v = self.v_linear(combined)
+        # reshape to (batch, q/k/v_len, n_head, d_head)
+        q = q.reshape(q.shape[0], q.shape[1], self.n_head, self.d_head)
+        k = k.reshape(k.shape[0], k.shape[1], self.n_head, self.d_head)
+        v = v.reshape(v.shape[0], v.shape[1], self.n_head, self.d_head)
+        # transpose to (batch, n_head, q/k/v_len, d_head)
+        q = q.transpose(1, 2)
+        k = k.transpose(1, 2)
+        v = v.transpose(1, 2)
+        # add n_head dimension for relative positional embedding lookup table
+        # (batch, n_head, k/v_len*2-1, d_head)
+        # pos_emb = pos_emb[:, None]
+        # (batch, n_head, q_len, k_len)
+        atten_score = torch.matmul(q, k.transpose(-1, -2))
+        # qpos = torch.matmul(q, pos_emb.transpose(-1, -2))
+        # DEBUG
+        # ones = torch.zeros(q.shape)
+        # ones[:, :, :, 0] = 1.0
+        # qpos = torch.matmul(ones, pos_emb.transpose(-1, -2))
+        # atten_score = atten_score + self.skew(qpos, mem_len)
+        atten_score = atten_score * self.atten_scale
+        # if extra_atten_score is not None:
+        #     atten_score = atten_score + extra_atten_score
+        if mask is not None:
+            # print(atten_score.shape)
+            # print(mask.shape)
+            # apply attention mask
+            atten_score = atten_score.masked_fill(mask, float("-inf"))
+        atten_score = atten_score.softmax(dim=-1)
+        atten_score = self.atten_dropout_layer(atten_score)
+        # (batch, n_head, q_len, d_head)
+        atten_vec = torch.matmul(atten_score, v)
+        # (batch, q_len, n_head*d_head)
+        atten_vec = atten_vec.transpose(1, 2).flatten(start_dim=-2)
+        # linear projection
+        output = self.droput_layer(self.out_linear(atten_vec))
+        if self.pre_lnorm:
+            return hidden + output
+        else:
+            return self.layer_norm(hidden + output)
+class FeedForward(nn.Module):
+    def __init__(self, d_model, d_inner, dropout=0.1, pre_lnorm=True):
+        """
+        Positionwise feed-forward network.
+        Args:
+            d_model(int): Dimension of the input and output.
+            d_inner (int): Dimension of the middle layer(bottleneck).
+            dropout (float, optional): Dropout value. Defaults to 0.1.
+            pre_lnorm (bool, optional):
+                Apply layer norm before rest of calculation. Defaults to True.
+                In original Transformer paper (pre_lnorm=False):
+                    LayerNorm(x + Sublayer(x))
+                In tensor2tensor implementation (pre_lnorm=True):
+                    x + Sublayer(LayerNorm(x))
+        """
+        super(FeedForward, self).__init__()
+        self.d_model = d_model
+        self.d_inner = d_inner
+        self.dropout = dropout
+        self.pre_lnorm = pre_lnorm
+        self.layer_norm = nn.LayerNorm(d_model)
+        self.network = nn.Sequential(
+            nn.Linear(d_model, d_inner),
+            nn.ReLU(),
+            nn.Dropout(dropout),
+            nn.Linear(d_inner, d_model),
+            nn.Dropout(dropout),
+        )
+    def forward(self, x):
+        if self.pre_lnorm:
+            return x + self.network(self.layer_norm(x))
+        else:
+            return self.layer_norm(x + self.network(x))
+class TransformerModel(nn.Module):
+    def __init__(
+        self,
+        seq_len: int,
+        input_dim: int,
+        d_model: int,
+        nhead: int,
+        d_hid: int,
+        nlayers: int,
+        dropout: float = 0.5,
+        out_dim=91,
+        masked_attention_stage=False,
+    ):
+        super().__init__()
+        self.model_type = "Transformer"
+        self.seq_len = seq_len
+        self.d_model = d_model
+        self.nhead = nhead
+        self.d_hid = d_hid
+        self.nlayers = nlayers
+        self.pos_embedding = SinPositionalEncoding(d_model=d_model, dropout=0.1, max_len=seq_len)
+        if masked_attention_stage:
+            self.input_layer = nn.Linear(input_dim+1, d_model)
+            # visible to invisible attention
+            self.att_layers = nn.ModuleList()
+            self.pff_layers = nn.ModuleList()
+            self.pre_lnorm = True
+            self.layer_norm = nn.LayerNorm(d_model)
+            for i in range(self.nlayers):
+                self.att_layers.append(
+                    MultiHeadedAttention(
+                        self.nhead, self.d_model,
+                        self.d_model // self.nhead, dropout=dropout,
+                        pre_lnorm=True,
+                        bias=False
+                    )
+                )
+                self.pff_layers.append(
+                    FeedForward(
+                        self.d_model, d_hid,
+                        dropout=dropout,
+                        pre_lnorm=True
+                    )
+                )
+        else:
+            self.att_layers = None
+            self.input_layer = nn.Linear(input_dim, d_model)
+        encoder_layers = TransformerEncoderLayer(
+            d_model, nhead, d_hid, dropout, activation="gelu"
+        )
+        self.transformer_encoder = TransformerEncoder(encoder_layers, nlayers)
+        self.decoder = nn.Linear(d_model, out_dim)
+        self.init_weights()
+    def init_weights(self) -> None:
+        initrange = 0.1
+        self.decoder.bias.data.zero_()
+        self.decoder.weight.data.uniform_(-initrange, initrange)
+    def forward(self, src: Tensor, src_mask: Tensor, data_mask=None, atten_mask=None) -> Tensor:
+        """
+        Args:
+            src: Tensor, shape [seq_len, batch_size, embedding_dim]
+            src_mask: Tensor, shape [seq_len, seq_len]
+        Returns:
+            output Tensor of shape [seq_len, batch_size, embedding_dim]
+        """
+        if not data_mask is None:
+            src = torch.cat([src, data_mask.expand(*src.shape[:-1], data_mask.shape[-1])], dim=-1)
+        src = self.input_layer(src)
+        output = self.pos_embedding(src)
+        # output = src
+        if self.att_layers:
+            assert not atten_mask is None
+            output = output.permute(1, 0, 2)
+            for i in range(self.nlayers):
+                output = self.att_layers[i](output, mask=atten_mask)
+                output = self.pff_layers[i](output)
+            if self.pre_lnorm:
+                output = self.layer_norm(output)
+            output = output.permute(1, 0, 2)
+        output = self.transformer_encoder(output)
+        output = self.decoder(output)
+        return output

infiller/lib/model/positional_encoding.py ADDED Viewed

	@@ -0,0 +1,42 @@

+import torch
+from torch import nn, Tensor
+import math
+class PositionalEmbedding(nn.Module):
+    def __init__(self, seq_len: int = 32, d_model: int = 96):
+        super().__init__()
+        self.pos_emb = nn.Embedding(seq_len + 1, d_model)
+    def forward(self, inputs):
+        positions = (
+            torch.arange(inputs.size(0), device=inputs.device)
+            .expand(inputs.size(1), inputs.size(0))
+            .contiguous()
+            + 1
+        )
+        outputs = inputs + self.pos_emb(positions).permute(1, 0, 2)
+        return outputs
+class PositionalEncoding(nn.Module):
+    def __init__(self, d_model: int, dropout: float = 0.1, max_len: int = 5000):
+        super().__init__()
+        self.dropout = nn.Dropout(p=dropout)
+        position = torch.arange(max_len).unsqueeze(1)
+        div_term = torch.exp(
+            torch.arange(0, d_model, 2) * (-math.log(10000.0) / d_model)
+        )
+        pe = torch.zeros(max_len, 1, d_model)
+        pe[:, 0, 0::2] = torch.sin(position * div_term)
+        pe[:, 0, 1::2] = torch.cos(position * div_term)
+        self.register_buffer("pe", pe)
+    def forward(self, x: Tensor) -> Tensor:
+        """
+        Args:
+            x: Tensor, shape [seq_len, batch_size, embedding_dim]
+        """
+        x = x + self.pe[: x.size(0)]
+        return self.dropout(x)

infiller/lib/model/preprocess.py ADDED Viewed

	@@ -0,0 +1,189 @@

+import torch
+def replace_constant(minibatch_pose_input, mask_start_frame):
+    seq_len = minibatch_pose_input.size(1)
+    interpolated = (
+        torch.ones_like(minibatch_pose_input, device=minibatch_pose_input.device) * 0.1
+    )
+    if mask_start_frame == 0 or mask_start_frame == (seq_len - 1):
+        interpolate_start = minibatch_pose_input[:, 0, :]
+        interpolate_end = minibatch_pose_input[:, seq_len - 1, :]
+        interpolated[:, 0, :] = interpolate_start
+        interpolated[:, seq_len - 1, :] = interpolate_end
+        assert torch.allclose(interpolated[:, 0, :], interpolate_start)
+        assert torch.allclose(interpolated[:, seq_len - 1, :], interpolate_end)
+    else:
+        interpolate_start1 = minibatch_pose_input[:, 0, :]
+        interpolate_end1 = minibatch_pose_input[:, mask_start_frame, :]
+        interpolate_start2 = minibatch_pose_input[:, mask_start_frame, :]
+        interpolate_end2 = minibatch_pose_input[:, seq_len - 1, :]
+        interpolated[:, 0, :] = interpolate_start1
+        interpolated[:, mask_start_frame, :] = interpolate_end1
+        interpolated[:, mask_start_frame, :] = interpolate_start2
+        interpolated[:, seq_len - 1, :] = interpolate_end2
+        assert torch.allclose(interpolated[:, 0, :], interpolate_start1)
+        assert torch.allclose(interpolated[:, mask_start_frame, :], interpolate_end1)
+        assert torch.allclose(interpolated[:, mask_start_frame, :], interpolate_start2)
+        assert torch.allclose(interpolated[:, seq_len - 1, :], interpolate_end2)
+    return interpolated
+def slerp(x, y, a):
+    """
+    Perfroms spherical linear interpolation (SLERP) between x and y, with proportion a
+    :param x: quaternion tensor
+    :param y: quaternion tensor
+    :param a: indicator (between 0 and 1) of completion of the interpolation.
+    :return: tensor of interpolation results
+    """
+    device = x.device
+    len = torch.sum(x * y, dim=-1)
+    neg = len < 0.0
+    len[neg] = -len[neg]
+    y[neg] = -y[neg]
+    a = torch.zeros_like(x[..., 0]) + a
+    amount0 = torch.zeros(a.shape, device=device)
+    amount1 = torch.zeros(a.shape, device=device)
+    linear = (1.0 - len) < 0.01
+    omegas = torch.arccos(len[~linear])
+    sinoms = torch.sin(omegas)
+    amount0[linear] = 1.0 - a[linear]
+    amount0[~linear] = torch.sin((1.0 - a[~linear]) * omegas) / sinoms
+    amount1[linear] = a[linear]
+    amount1[~linear] = torch.sin(a[~linear] * omegas) / sinoms
+    # res = amount0[..., np.newaxis] * x + amount1[..., np.newaxis] * y
+    res = amount0.unsqueeze(3) * x + amount1.unsqueeze(3) * y
+    return res
+def slerp_input_repr(minibatch_pose_input, mask_start_frame):
+    seq_len = minibatch_pose_input.size(1)
+    minibatch_pose_input = minibatch_pose_input.reshape(
+        minibatch_pose_input.size(0), seq_len, -1, 4
+    )
+    interpolated = torch.zeros_like(
+        minibatch_pose_input, device=minibatch_pose_input.device
+    )
+    if mask_start_frame == 0 or mask_start_frame == (seq_len - 1):
+        interpolate_start = minibatch_pose_input[:, 0:1]
+        interpolate_end = minibatch_pose_input[:, seq_len - 1 :]
+        for i in range(seq_len):
+            dt = 1 / (seq_len - 1)
+            interpolated[:, i : i + 1, :] = slerp(
+                interpolate_start, interpolate_end, dt * i
+            )
+        assert torch.allclose(interpolated[:, 0:1], interpolate_start)
+        assert torch.allclose(interpolated[:, seq_len - 1 :], interpolate_end)
+    else:
+        interpolate_start1 = minibatch_pose_input[:, 0:1]
+        interpolate_end1 = minibatch_pose_input[
+            :, mask_start_frame : mask_start_frame + 1
+        ]
+        interpolate_start2 = minibatch_pose_input[
+            :, mask_start_frame : mask_start_frame + 1
+        ]
+        interpolate_end2 = minibatch_pose_input[:, seq_len - 1 :]
+        for i in range(mask_start_frame + 1):
+            dt = 1 / mask_start_frame
+            interpolated[:, i : i + 1, :] = slerp(
+                interpolate_start1, interpolate_end1, dt * i
+            )
+        assert torch.allclose(interpolated[:, 0:1], interpolate_start1)
+        assert torch.allclose(
+            interpolated[:, mask_start_frame : mask_start_frame + 1], interpolate_end1
+        )
+        for i in range(mask_start_frame, seq_len):
+            dt = 1 / (seq_len - mask_start_frame - 1)
+            interpolated[:, i : i + 1, :] = slerp(
+                interpolate_start2, interpolate_end2, dt * (i - mask_start_frame)
+            )
+        assert torch.allclose(
+            interpolated[:, mask_start_frame : mask_start_frame + 1], interpolate_start2
+        )
+        assert torch.allclose(interpolated[:, seq_len - 1 :], interpolate_end2)
+    interpolated = torch.nn.functional.normalize(interpolated, p=2.0, dim=3)
+    return interpolated.reshape(minibatch_pose_input.size(0), seq_len, -1)
+def lerp_input_repr(minibatch_pose_input, mask_start_frame):
+    seq_len = minibatch_pose_input.size(1)
+    interpolated = torch.zeros_like(
+        minibatch_pose_input, device=minibatch_pose_input.device
+    )
+    if mask_start_frame == 0 or mask_start_frame == (seq_len - 1):
+        interpolate_start = minibatch_pose_input[:, 0, :]
+        interpolate_end = minibatch_pose_input[:, seq_len - 1, :]
+        for i in range(seq_len):
+            dt = 1 / (seq_len - 1)
+            interpolated[:, i, :] = torch.lerp(
+                interpolate_start, interpolate_end, dt * i
+            )
+        assert torch.allclose(interpolated[:, 0, :], interpolate_start)
+        assert torch.allclose(interpolated[:, seq_len - 1, :], interpolate_end)
+    else:
+        interpolate_start1 = minibatch_pose_input[:, 0, :]
+        interpolate_end1 = minibatch_pose_input[:, mask_start_frame, :]
+        interpolate_start2 = minibatch_pose_input[:, mask_start_frame, :]
+        interpolate_end2 = minibatch_pose_input[:, -1, :]
+        for i in range(mask_start_frame + 1):
+            dt = 1 / mask_start_frame
+            interpolated[:, i, :] = torch.lerp(
+                interpolate_start1, interpolate_end1, dt * i
+            )
+        assert torch.allclose(interpolated[:, 0, :], interpolate_start1)
+        assert torch.allclose(interpolated[:, mask_start_frame, :], interpolate_end1)
+        for i in range(mask_start_frame, seq_len):
+            dt = 1 / (seq_len - mask_start_frame - 1)
+            interpolated[:, i, :] = torch.lerp(
+                interpolate_start2, interpolate_end2, dt * (i - mask_start_frame)
+            )
+        assert torch.allclose(interpolated[:, mask_start_frame, :], interpolate_start2)
+        assert torch.allclose(interpolated[:, -1, :], interpolate_end2)
+    return interpolated
+def vectorize_representation(global_position, global_rotation):
+    batch_size = global_position.shape[0]
+    seq_len = global_position.shape[1]
+    global_pos_vec = global_position.reshape(batch_size, seq_len, -1).contiguous()
+    global_rot_vec = global_rotation.reshape(batch_size, seq_len, -1).contiguous()
+    global_pose_vec_gt = torch.cat([global_pos_vec, global_rot_vec], dim=2)
+    return global_pose_vec_gt

infiller/lib/model/skeleton.py ADDED Viewed

	@@ -0,0 +1,349 @@

+import torch
+import numpy as np
+from cmib.data.quaternion import qmul, qrot
+import torch.nn as nn
+amass_offsets = [
+    [0.0, 0.0, 0.0],
+    [0.058581, -0.082280, -0.017664],
+    [0.043451, -0.386469, 0.008037],
+    [-0.014790, -0.426874, -0.037428],
+    [0.041054, -0.060286, 0.122042],
+    [0.0, 0.0, 0.0],
+    [-0.060310, -0.090513, -0.013543],
+    [-0.043257, -0.383688, -0.004843],
+    [0.019056, -0.420046, -0.034562],
+    [-0.034840, -0.062106, 0.130323],
+    [0.0, 0.0, 0.0],
+    [0.004439, 0.124404, -0.038385],
+    [0.004488, 0.137956, 0.026820],
+    [-0.002265, 0.056032, 0.002855],
+    [-0.013390, 0.211636, -0.033468],
+    [0.010113, 0.088937, 0.050410],
+    [0.0, 0.0, 0.0],
+    [0.071702, 0.114000, -0.018898],
+    [0.122921, 0.045205, -0.019046],
+    [0.255332, -0.015649, -0.022946],
+    [0.265709, 0.012698, -0.007375],
+    [0.0, 0.0, 0.0],
+    [-0.082954, 0.112472, -0.023707],
+    [-0.113228, 0.046853, -0.008472],
+    [-0.260127, -0.014369, -0.031269],
+    [-0.269108, 0.006794, -0.006027],
+    [0.0, 0.0, 0.0]
+]
+sk_offsets = [
+    [-42.198200, 91.614723, -40.067841],
+    [0.103456, 1.857829, 10.548506],
+    [43.499992, -0.000038, -0.000002],
+    [42.372192, 0.000015, -0.000007],
+    [17.299999, -0.000002, 0.000003],
+    [0.000000, 0.000000, 0.000000],
+    [0.103457, 1.857829, -10.548503],
+    [43.500042, -0.000027, 0.000008],
+    [42.372257, -0.000008, 0.000014],
+    [17.299992, -0.000005, 0.000004],
+    [0.000000, 0.000000, 0.000000],
+    [6.901968, -2.603733, -0.000001],
+    [12.588099, 0.000002, 0.000000],
+    [12.343206, 0.000000, -0.000001],
+    [25.832886, -0.000004, 0.000003],
+    [11.766620, 0.000005, -0.000001],
+    [0.000000, 0.000000, 0.000000],
+    [19.745899, -1.480370, 6.000108],
+    [11.284125, -0.000009, -0.000018],
+    [33.000050, 0.000004, 0.000032],
+    [25.200008, 0.000015, 0.000008],
+    [0.000000, 0.000000, 0.000000],
+    [19.746099, -1.480375, -6.000073],
+    [11.284138, -0.000015, -0.000012],
+    [33.000092, 0.000017, 0.000013],
+    [25.199780, 0.000135, 0.000422],
+    [0.000000, 0.000000, 0.000000],
+]
+sk_parents = [
+    -1,
+    0,
+    1,
+    2,
+    3,
+    4,
+    0,
+    6,
+    7,
+    8,
+    9,
+    0,
+    11,
+    12,
+    13,
+    14,
+    15,
+    13,
+    17,
+    18,
+    19,
+    20,
+    13,
+    22,
+    23,
+    24,
+    25,
+]
+sk_joints_to_remove = [5, 10, 16, 21, 26]
+joint_names = [
+    "Hips",
+    "LeftUpLeg",
+    "LeftLeg",
+    "LeftFoot",
+    "LeftToe",
+    "RightUpLeg",
+    "RightLeg",
+    "RightFoot",
+    "RightToe",
+    "Spine",
+    "Spine1",
+    "Spine2",
+    "Neck",
+    "Head",
+    "LeftShoulder",
+    "LeftArm",
+    "LeftForeArm",
+    "LeftHand",
+    "RightShoulder",
+    "RightArm",
+    "RightForeArm",
+    "RightHand",
+]
+class Skeleton:
+    def __init__(
+        self,
+        offsets,
+        parents,
+        joints_left=None,
+        joints_right=None,
+        bone_length=None,
+        device=None,
+    ):
+        assert len(offsets) == len(parents)
+        self._offsets = torch.Tensor(offsets).to(device)
+        self._parents = np.array(parents)
+        self._joints_left = joints_left
+        self._joints_right = joints_right
+        self._compute_metadata()
+    def num_joints(self):
+        return self._offsets.shape[0]
+    def offsets(self):
+        return self._offsets
+    def parents(self):
+        return self._parents
+    def has_children(self):
+        return self._has_children
+    def children(self):
+        return self._children
+    def convert_to_global_pos(self, unit_vec_rerp):
+        """
+        Convert the unit offset matrix to global position.
+        First row(root) will have absolute position value in global coordinates.
+        """
+        bone_length = self.get_bone_length_weight()
+        batch_size = unit_vec_rerp.size(0)
+        seq_len = unit_vec_rerp.size(1)
+        unit_vec_table = unit_vec_rerp.reshape(batch_size, seq_len, 22, 3)
+        global_position = torch.zeros_like(unit_vec_table, device=unit_vec_table.device)
+        for i, parent in enumerate(self._parents):
+            if parent == -1:  # if root
+                global_position[:, :, i] = unit_vec_table[:, :, i]
+            else:
+                global_position[:, :, i] = global_position[:, :, parent] + (
+                    nn.functional.normalize(unit_vec_table[:, :, i], p=2.0, dim=-1)
+                    * bone_length[i]
+                )
+        return global_position
+    def convert_to_unit_offset_mat(self, global_position):
+        """
+        Convert the global position of the skeleton to a unit offset matrix.
+        First row(root) will have absolute position value in global coordinates.
+        """
+        bone_length = self.get_bone_length_weight()
+        unit_offset_mat = torch.zeros_like(
+            global_position, device=global_position.device
+        )
+        for i, parent in enumerate(self._parents):
+            if parent == -1:  # if root
+                unit_offset_mat[:, :, i] = global_position[:, :, i]
+            else:
+                unit_offset_mat[:, :, i] = (
+                    global_position[:, :, i] - global_position[:, :, parent]
+                ) / bone_length[i]
+        return unit_offset_mat
+    def remove_joints(self, joints_to_remove):
+        """
+        Remove the joints specified in 'joints_to_remove', both from the
+        skeleton definition and from the dataset (which is modified in place).
+        The rotations of removed joints are propagated along the kinematic chain.
+        """
+        valid_joints = []
+        for joint in range(len(self._parents)):
+            if joint not in joints_to_remove:
+                valid_joints.append(joint)
+        index_offsets = np.zeros(len(self._parents), dtype=int)
+        new_parents = []
+        for i, parent in enumerate(self._parents):
+            if i not in joints_to_remove:
+                new_parents.append(parent - index_offsets[parent])
+            else:
+                index_offsets[i:] += 1
+        self._parents = np.array(new_parents)
+        self._offsets = self._offsets[valid_joints]
+        self._compute_metadata()
+    def forward_kinematics(self, rotations, root_positions):
+        """
+        Perform forward kinematics using the given trajectory and local rotations.
+        Arguments (where N = batch size, L = sequence length, J = number of joints):
+         -- rotations: (N, L, J, 4) tensor of unit quaternions describing the local rotations of each joint.
+         -- root_positions: (N, L, 3) tensor describing the root joint positions.
+        """
+        assert len(rotations.shape) == 4
+        assert rotations.shape[-1] == 4
+        positions_world = []
+        rotations_world = []
+        expanded_offsets = self._offsets.expand(
+            rotations.shape[0],
+            rotations.shape[1],
+            self._offsets.shape[0],
+            self._offsets.shape[1],
+        )
+        # Parallelize along the batch and time dimensions
+        for i in range(self._offsets.shape[0]):
+            if self._parents[i] == -1:
+                positions_world.append(root_positions)
+                rotations_world.append(rotations[:, :, 0])
+            else:
+                positions_world.append(
+                    qrot(rotations_world[self._parents[i]], expanded_offsets[:, :, i])
+                    + positions_world[self._parents[i]]
+                )
+                if self._has_children[i]:
+                    rotations_world.append(
+                        qmul(rotations_world[self._parents[i]], rotations[:, :, i])
+                    )
+                else:
+                    # This joint is a terminal node -> it would be useless to compute the transformation
+                    rotations_world.append(None)
+        return torch.stack(positions_world, dim=3).permute(0, 1, 3, 2)
+    def forward_kinematics_with_rotation(self, rotations, root_positions):
+        """
+        Perform forward kinematics using the given trajectory and local rotations.
+        Arguments (where N = batch size, L = sequence length, J = number of joints):
+         -- rotations: (N, L, J, 4) tensor of unit quaternions describing the local rotations of each joint.
+         -- root_positions: (N, L, 3) tensor describing the root joint positions.
+        """
+        assert len(rotations.shape) == 4
+        assert rotations.shape[-1] == 4
+        positions_world = []
+        rotations_world = []
+        expanded_offsets = self._offsets.expand(
+            rotations.shape[0],
+            rotations.shape[1],
+            self._offsets.shape[0],
+            self._offsets.shape[1],
+        )
+        # Parallelize along the batch and time dimensions
+        for i in range(self._offsets.shape[0]):
+            if self._parents[i] == -1:
+                positions_world.append(root_positions)
+                rotations_world.append(rotations[:, :, 0])
+            else:
+                positions_world.append(
+                    qrot(rotations_world[self._parents[i]], expanded_offsets[:, :, i])
+                    + positions_world[self._parents[i]]
+                )
+                if self._has_children[i]:
+                    rotations_world.append(
+                        qmul(rotations_world[self._parents[i]], rotations[:, :, i])
+                    )
+                else:
+                    # This joint is a terminal node -> it would be useless to compute the transformation
+                    rotations_world.append(
+                        torch.Tensor([1, 0, 0, 0])
+                        .expand(rotations.shape[0], rotations.shape[1], 4)
+                        .to(rotations.device)
+                    )
+        return torch.stack(positions_world, dim=3).permute(0, 1, 3, 2), torch.stack(
+            rotations_world, dim=3
+        ).permute(0, 1, 3, 2)
+    def get_bone_length_weight(self):
+        bone_length = []
+        for i, parent in enumerate(self._parents):
+            if parent == -1:
+                bone_length.append(1)
+            else:
+                bone_length.append(
+                    torch.linalg.norm(self._offsets[i : i + 1], ord="fro").item()
+                )
+        return torch.Tensor(bone_length)
+    def joints_left(self):
+        return self._joints_left
+    def joints_right(self):
+        return self._joints_right
+    def _compute_metadata(self):
+        self._has_children = np.zeros(len(self._parents)).astype(bool)
+        for i, parent in enumerate(self._parents):
+            if parent != -1:
+                self._has_children[parent] = True
+        self._children = []
+        for i, parent in enumerate(self._parents):
+            self._children.append([])
+        for i, parent in enumerate(self._parents):
+            if parent != -1:
+                self._children[parent].append(i)

infiller/lib/vis/pose.py ADDED Viewed

	@@ -0,0 +1,248 @@

+import os
+import pathlib
+import matplotlib.pyplot as plt
+import numpy as np
+def project_root_position(position_arr: np.array, file_name: str):
+    """
+    Take batch of root arrays and porject it on 2D plane
+    N: samples
+    L: trajectory length
+    J: joints
+    position_arr: [N,L,J,3]
+    """
+    root_joints = position_arr[:, :, 0]
+    x_pos = root_joints[:, :, 0]
+    y_pos = root_joints[:, :, 2]
+    fig = plt.figure()
+    for i in range(x_pos.shape[1]):
+        if i == 0:
+            plt.scatter(x_pos[:, i], y_pos[:, i], c="b")
+        elif i == x_pos.shape[1] - 1:
+            plt.scatter(x_pos[:, i], y_pos[:, i], c="r")
+        else:
+            plt.scatter(x_pos[:, i], y_pos[:, i], c="k", marker="*", s=1)
+    plt.title(f"Root Position: {file_name}")
+    plt.xlabel("X Axis")
+    plt.ylabel("Y Axis")
+    plt.xlim((-300, 300))
+    plt.ylim((-300, 300))
+    plt.grid()
+    plt.savefig(f"{file_name}.png", dpi=200)
+def plot_single_pose(
+    pose,
+    frame_idx,
+    skeleton,
+    save_dir,
+    prefix,
+):
+    fig = plt.figure()
+    ax = fig.add_subplot(111, projection="3d")
+    parent_idx = skeleton.parents()
+    for i, p in enumerate(parent_idx):
+        if i > 0:
+            ax.plot(
+                [pose[i, 0], pose[p, 0]],
+                [pose[i, 2], pose[p, 2]],
+                [pose[i, 1], pose[p, 1]],
+                c="k",
+            )
+    x_min = pose[:, 0].min()
+    x_max = pose[:, 0].max()
+    y_min = pose[:, 1].min()
+    y_max = pose[:, 1].max()
+    z_min = pose[:, 2].min()
+    z_max = pose[:, 2].max()
+    ax.set_xlim(x_min, x_max)
+    ax.set_xlabel("$X$ Axis")
+    ax.set_ylim(z_min, z_max)
+    ax.set_ylabel("$Y$ Axis")
+    ax.set_zlim(y_min, y_max)
+    ax.set_zlabel("$Z$ Axis")
+    plt.draw()
+    title = f"{prefix}: {frame_idx}"
+    plt.title(title)
+    prefix = prefix
+    pathlib.Path(save_dir).mkdir(parents=True, exist_ok=True)
+    plt.savefig(os.path.join(save_dir, prefix + str(frame_idx) + ".png"), dpi=60)
+    plt.close()
+def plot_pose(
+    start_pose,
+    inbetween_pose,
+    target_pose,
+    frame_idx,
+    skeleton,
+    save_dir,
+    prefix,
+):
+    fig = plt.figure()
+    ax = fig.add_subplot(111, projection="3d")
+    parent_idx = skeleton.parents()
+    for i, p in enumerate(parent_idx):
+        if i > 0:
+            ax.plot(
+                [start_pose[i, 0], start_pose[p, 0]],
+                [start_pose[i, 2], start_pose[p, 2]],
+                [start_pose[i, 1], start_pose[p, 1]],
+                c="b",
+            )
+            ax.plot(
+                [inbetween_pose[i, 0], inbetween_pose[p, 0]],
+                [inbetween_pose[i, 2], inbetween_pose[p, 2]],
+                [inbetween_pose[i, 1], inbetween_pose[p, 1]],
+                c="k",
+            )
+            ax.plot(
+                [target_pose[i, 0], target_pose[p, 0]],
+                [target_pose[i, 2], target_pose[p, 2]],
+                [target_pose[i, 1], target_pose[p, 1]],
+                c="r",
+            )
+    x_min = np.min(
+        [start_pose[:, 0].min(), inbetween_pose[:, 0].min(), target_pose[:, 0].min()]
+    )
+    x_max = np.max(
+        [start_pose[:, 0].max(), inbetween_pose[:, 0].max(), target_pose[:, 0].max()]
+    )
+    y_min = np.min(
+        [start_pose[:, 1].min(), inbetween_pose[:, 1].min(), target_pose[:, 1].min()]
+    )
+    y_max = np.max(
+        [start_pose[:, 1].max(), inbetween_pose[:, 1].max(), target_pose[:, 1].max()]
+    )
+    z_min = np.min(
+        [start_pose[:, 2].min(), inbetween_pose[:, 2].min(), target_pose[:, 2].min()]
+    )
+    z_max = np.max(
+        [start_pose[:, 2].max(), inbetween_pose[:, 2].max(), target_pose[:, 2].max()]
+    )
+    ax.set_xlim(x_min, x_max)
+    ax.set_xlabel("$X$ Axis")
+    ax.set_ylim(z_min, z_max)
+    ax.set_ylabel("$Y$ Axis")
+    ax.set_zlim(y_min, y_max)
+    ax.set_zlabel("$Z$ Axis")
+    plt.draw()
+    title = f"{prefix}: {frame_idx}"
+    plt.title(title)
+    prefix = prefix
+    pathlib.Path(save_dir).mkdir(parents=True, exist_ok=True)
+    plt.savefig(os.path.join(save_dir, prefix + str(frame_idx) + ".png"), dpi=60)
+    plt.close()
+def plot_pose_with_stop(
+    start_pose,
+    inbetween_pose,
+    target_pose,
+    stopover,
+    frame_idx,
+    skeleton,
+    save_dir,
+    prefix,
+):
+    fig = plt.figure()
+    ax = fig.add_subplot(111, projection="3d")
+    parent_idx = skeleton.parents()
+    for i, p in enumerate(parent_idx):
+        if i > 0:
+            ax.plot(
+                [start_pose[i, 0], start_pose[p, 0]],
+                [start_pose[i, 2], start_pose[p, 2]],
+                [start_pose[i, 1], start_pose[p, 1]],
+                c="b",
+            )
+            ax.plot(
+                [inbetween_pose[i, 0], inbetween_pose[p, 0]],
+                [inbetween_pose[i, 2], inbetween_pose[p, 2]],
+                [inbetween_pose[i, 1], inbetween_pose[p, 1]],
+                c="k",
+            )
+            ax.plot(
+                [target_pose[i, 0], target_pose[p, 0]],
+                [target_pose[i, 2], target_pose[p, 2]],
+                [target_pose[i, 1], target_pose[p, 1]],
+                c="r",
+            )
+            ax.plot(
+                [stopover[i, 0], stopover[p, 0]],
+                [stopover[i, 2], stopover[p, 2]],
+                [stopover[i, 1], stopover[p, 1]],
+                c="indigo",
+            )
+    x_min = np.min(
+        [start_pose[:, 0].min(), inbetween_pose[:, 0].min(), target_pose[:, 0].min()]
+    )
+    x_max = np.max(
+        [start_pose[:, 0].max(), inbetween_pose[:, 0].max(), target_pose[:, 0].max()]
+    )
+    y_min = np.min(
+        [start_pose[:, 1].min(), inbetween_pose[:, 1].min(), target_pose[:, 1].min()]
+    )
+    y_max = np.max(
+        [start_pose[:, 1].max(), inbetween_pose[:, 1].max(), target_pose[:, 1].max()]
+    )
+    z_min = np.min(
+        [start_pose[:, 2].min(), inbetween_pose[:, 2].min(), target_pose[:, 2].min()]
+    )
+    z_max = np.max(
+        [start_pose[:, 2].max(), inbetween_pose[:, 2].max(), target_pose[:, 2].max()]
+    )
+    ax.set_xlim(x_min, x_max)
+    ax.set_xlabel("$X$ Axis")
+    ax.set_ylim(z_min, z_max)
+    ax.set_ylabel("$Y$ Axis")
+    ax.set_zlim(y_min, y_max)
+    ax.set_zlabel("$Z$ Axis")
+    plt.draw()
+    title = f"{prefix}: {frame_idx}"
+    plt.title(title)
+    prefix = prefix
+    pathlib.Path(save_dir).mkdir(parents=True, exist_ok=True)
+    plt.savefig(os.path.join(save_dir, prefix + str(frame_idx) + ".png"), dpi=60)
+    plt.close()

lib/core/__pycache__/constants.cpython-310.pyc ADDED Viewed

Binary file (2.87 kB). View file

lib/core/constants.py ADDED Viewed

	@@ -0,0 +1,78 @@

+FOCAL_LENGTH = 5000.
+# Mean and standard deviation for normalizing input image
+IMG_NORM_MEAN = [0.485, 0.456, 0.406]
+IMG_NORM_STD = [0.229, 0.224, 0.225]
+"""
+We create a superset of joints containing the OpenPose joints together with the ones that each dataset provides.
+We keep a superset of 24 joints such that we include all joints from every dataset.
+If a dataset doesn't provide annotations for a specific joint, we simply ignore it.
+The joints used here are the following:
+"""
+JOINT_NAMES = [
+'OP Nose', 'OP Neck', 'OP RShoulder',           #0,1,2
+'OP RElbow', 'OP RWrist', 'OP LShoulder',       #3,4,5
+'OP LElbow', 'OP LWrist', 'OP MidHip',          #6, 7,8
+'OP RHip', 'OP RKnee', 'OP RAnkle',             #9,10,11
+'OP LHip', 'OP LKnee', 'OP LAnkle',             #12,13,14
+'OP REye', 'OP LEye', 'OP REar',                #15,16,17
+'OP LEar', 'OP LBigToe', 'OP LSmallToe',        #18,19,20
+'OP LHeel', 'OP RBigToe', 'OP RSmallToe', 'OP RHeel',  #21, 22, 23, 24  ##Total 25 joints  for openpose
+'Right Ankle', 'Right Knee', 'Right Hip',               #0,1,2
+'Left Hip', 'Left Knee', 'Left Ankle',                  #3, 4, 5
+'Right Wrist', 'Right Elbow', 'Right Shoulder',     #6
+'Left Shoulder', 'Left Elbow', 'Left Wrist',            #9
+'Neck (LSP)', 'Top of Head (LSP)',                      #12, 13
+'Pelvis (MPII)', 'Thorax (MPII)',                       #14, 15
+'Spine (H36M)', 'Jaw (H36M)',                           #16, 17
+'Head (H36M)', 'Nose', 'Left Eye',                      #18, 19, 20
+'Right Eye', 'Left Ear', 'Right Ear'                    #21,22,23 (Total 24 joints)
+]
+# Dict containing the joints in numerical order
+JOINT_IDS = {JOINT_NAMES[i]: i for i in range(len(JOINT_NAMES))}
+# Map joints to SMPL joints
+JOINT_MAP = {
+'OP Nose': 24, 'OP Neck': 12, 'OP RShoulder': 17,
+'OP RElbow': 19, 'OP RWrist': 21, 'OP LShoulder': 16,
+'OP LElbow': 18, 'OP LWrist': 20, 'OP MidHip': 0,
+'OP RHip': 2, 'OP RKnee': 5, 'OP RAnkle': 8,
+'OP LHip': 1, 'OP LKnee': 4, 'OP LAnkle': 7,
+'OP REye': 25, 'OP LEye': 26, 'OP REar': 27,
+'OP LEar': 28, 'OP LBigToe': 29, 'OP LSmallToe': 30,
+'OP LHeel': 31, 'OP RBigToe': 32, 'OP RSmallToe': 33, 'OP RHeel': 34,
+'Right Ankle': 8, 'Right Knee': 5, 'Right Hip': 45,
+'Left Hip': 46, 'Left Knee': 4, 'Left Ankle': 7,
+'Right Wrist': 21, 'Right Elbow': 19, 'Right Shoulder': 17,
+'Left Shoulder': 16, 'Left Elbow': 18, 'Left Wrist': 20,
+'Neck (LSP)': 47, 'Top of Head (LSP)': 48,
+'Pelvis (MPII)': 49, 'Thorax (MPII)': 50,
+'Spine (H36M)': 51, 'Jaw (H36M)': 52,
+'Head (H36M)': 53, 'Nose': 24, 'Left Eye': 26,
+'Right Eye': 25, 'Left Ear': 28, 'Right Ear': 27
+}
+# Joint selectors
+# Indices to get the 14 LSP joints from the 17 H36M joints
+H36M_TO_J17 = [6, 5, 4, 1, 2, 3, 16, 15, 14, 11, 12, 13, 8, 10, 0, 7, 9]
+H36M_TO_J14 = H36M_TO_J17[:14]
+# Indices to get the 14 LSP joints from the ground truth joints
+J24_TO_J17 = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 18, 14, 16, 17]
+J24_TO_J14 = J24_TO_J17[:14]
+# Permutation of SMPL pose parameters when flipping the shape
+SMPL_JOINTS_FLIP_PERM = [0, 2, 1, 3, 5, 4, 6, 8, 7, 9, 11, 10, 12, 14, 13, 15, 17, 16, 19, 18, 21, 20, 23, 22]
+SMPL_POSE_FLIP_PERM = []
+for i in SMPL_JOINTS_FLIP_PERM:
+    SMPL_POSE_FLIP_PERM.append(3*i)
+    SMPL_POSE_FLIP_PERM.append(3*i+1)
+    SMPL_POSE_FLIP_PERM.append(3*i+2)
+# Permutation indices for the 24 ground truth joints
+J24_FLIP_PERM = [5, 4, 3, 2, 1, 0, 11, 10, 9, 8, 7, 6, 12, 13, 14, 15, 16, 17, 18, 19, 21, 20, 23, 22]
+# Permutation indices for the full set of 49 joints
+J49_FLIP_PERM = [0, 1, 5, 6, 7, 2, 3, 4, 8, 12, 13, 14, 9, 10, 11, 16, 15, 18, 17, 22, 23, 24, 19, 20, 21]\
+              + [25+i for i in J24_FLIP_PERM]

lib/datasets/__pycache__/track_dataset.cpython-310.pyc ADDED Viewed

Binary file (2.28 kB). View file

lib/datasets/track_dataset.py ADDED Viewed

	@@ -0,0 +1,78 @@

+import torch
+from torch.utils.data import Dataset
+from torchvision.transforms import Normalize, ToTensor, Compose
+import numpy as np
+import cv2
+from lib.core import constants
+from lib.utils.imutils import crop, boxes_2_cs
+class TrackDatasetEval(Dataset):
+    """
+    Track Dataset Class - Load images/crops of the tracked boxes.
+    """
+    def __init__(self, imgfiles, boxes,
+                 crop_size=256, dilate=1.0,
+                img_focal=None, img_center=None, normalization=True,
+                item_idx=0, do_flip=False):
+        super(TrackDatasetEval, self).__init__()
+        self.imgfiles = imgfiles
+        self.crop_size = crop_size
+        self.normalization = normalization
+        self.normalize_img = Compose([
+                            ToTensor(),
+                            Normalize(mean=constants.IMG_NORM_MEAN, std=constants.IMG_NORM_STD)
+                        ])
+        self.boxes = boxes
+        self.box_dilate = dilate
+        self.centers, self.scales = boxes_2_cs(boxes)
+        self.img_focal = img_focal
+        self.img_center = img_center
+        self.item_idx = item_idx
+        self.do_flip = do_flip
+    def __len__(self):
+        return len(self.imgfiles)
+    def __getitem__(self, index):
+        item = {}
+        imgfile = self.imgfiles[index]
+        scale = self.scales[index] * self.box_dilate
+        center = self.centers[index]
+        img_focal = self.img_focal
+        img_center = self.img_center
+        img = cv2.imread(imgfile)[:,:,::-1]
+        if self.do_flip:
+            img = img[:, ::-1, :]
+            img_width = img.shape[1]
+            center[0] = img_width - center[0] - 1
+        img_crop = crop(img, center, scale,
+                        [self.crop_size, self.crop_size],
+                        rot=0).astype('uint8')
+        # cv2.imwrite('debug_crop.png', img_crop[:,:,::-1])
+        if self.normalization:
+            img_crop = self.normalize_img(img_crop)
+        else:
+            img_crop = torch.from_numpy(img_crop)
+        item['img'] = img_crop
+        if self.do_flip:
+            # center[0] = img_width - center[0] - 1
+            item['do_flip'] = torch.tensor(1).float()
+        item['img_idx'] = torch.tensor(index).long()
+        item['scale'] = torch.tensor(scale).float()
+        item['center'] = torch.tensor(center).float()
+        item['img_focal'] = torch.tensor(img_focal).float()
+        item['img_center'] = torch.tensor(img_center).float()
+        return item

lib/eval_utils/__pycache__/custom_utils.cpython-310.pyc ADDED Viewed

Binary file (2.96 kB). View file

lib/eval_utils/__pycache__/filling_utils.cpython-310.pyc ADDED Viewed

Binary file (6.88 kB). View file

lib/eval_utils/custom_utils.py ADDED Viewed

	@@ -0,0 +1,99 @@

+import copy
+import numpy as np
+import torch
+from hawor.utils.process import run_mano, run_mano_left
+from hawor.utils.rotation import angle_axis_to_quaternion, rotation_matrix_to_angle_axis
+from scipy.interpolate import interp1d
+def cam2world_convert(R_c2w_sla, t_c2w_sla, data_out, handedness):
+    init_rot_mat = copy.deepcopy(data_out["init_root_orient"])
+    init_rot_mat = torch.einsum("tij,btjk->btik", R_c2w_sla, init_rot_mat)
+    init_rot = rotation_matrix_to_angle_axis(init_rot_mat)
+    init_rot_quat = angle_axis_to_quaternion(init_rot)
+    # data_out["init_root_orient"] = rotation_matrix_to_angle_axis(data_out["init_root_orient"])
+    # data_out["init_hand_pose"] = rotation_matrix_to_angle_axis(data_out["init_hand_pose"])
+    data_out_init_root_orient = rotation_matrix_to_angle_axis(data_out["init_root_orient"])
+    data_out_init_hand_pose = rotation_matrix_to_angle_axis(data_out["init_hand_pose"])
+    init_trans = data_out["init_trans"] # (B, T, 3)
+    if handedness == "right":
+        outputs = run_mano(data_out["init_trans"], data_out_init_root_orient, data_out_init_hand_pose, betas=data_out["init_betas"])
+    elif handedness == "left":
+        outputs = run_mano_left(data_out["init_trans"], data_out_init_root_orient, data_out_init_hand_pose, betas=data_out["init_betas"])
+    root_loc = outputs["joints"][..., 0, :].cpu()  # (B, T, 3)
+    offset = init_trans - root_loc  # It is a constant, no matter what the rotation is.
+    init_trans = (
+        torch.einsum("tij,btj->bti", R_c2w_sla, root_loc)
+        + t_c2w_sla[None, :]
+        + offset
+    )
+    data_world = {
+        "init_root_orient": init_rot, # (B, T, 3)
+        "init_hand_pose": data_out_init_hand_pose, # (B, T, 15, 3)
+        "init_trans": init_trans,  # (B, T, 3)
+        "init_betas": data_out["init_betas"]  # (B, T, 10)
+    }
+    return data_world
+def quaternion_to_matrix(quaternions):
+    """
+    Convert rotations given as quaternions to rotation matrices.
+    Args:
+        quaternions: quaternions with real part first,
+            as tensor of shape (..., 4).
+    Returns:
+        Rotation matrices as tensor of shape (..., 3, 3).
+    """
+    r, i, j, k = torch.unbind(quaternions, -1)
+    two_s = 2.0 / (quaternions * quaternions).sum(-1)
+    o = torch.stack(
+        (
+            1 - two_s * (j * j + k * k),
+            two_s * (i * j - k * r),
+            two_s * (i * k + j * r),
+            two_s * (i * j + k * r),
+            1 - two_s * (i * i + k * k),
+            two_s * (j * k - i * r),
+            two_s * (i * k - j * r),
+            two_s * (j * k + i * r),
+            1 - two_s * (i * i + j * j),
+        ),
+        -1,
+    )
+    return o.reshape(quaternions.shape[:-1] + (3, 3))
+def load_slam_cam(fpath):
+    print(f"Loading cameras from {fpath}...")
+    pred_cam = dict(np.load(fpath, allow_pickle=True))
+    pred_traj = pred_cam['traj']
+    t_c2w_sla = torch.tensor(pred_traj[:, :3]) * pred_cam['scale']
+    pred_camq = torch.tensor(pred_traj[:, 3:])
+    R_c2w_sla = quaternion_to_matrix(pred_camq[:,[3,0,1,2]])
+    R_w2c_sla = R_c2w_sla.transpose(-1, -2)
+    t_w2c_sla = -torch.einsum("bij,bj->bi", R_w2c_sla, t_c2w_sla)
+    return R_w2c_sla, t_w2c_sla, R_c2w_sla, t_c2w_sla
+def interpolate_bboxes(bboxes):
+    T = bboxes.shape[0]
+    zero_indices = np.where(np.all(bboxes == 0, axis=1))[0]
+    non_zero_indices = np.where(np.any(bboxes != 0, axis=1))[0]
+    if len(zero_indices) == 0:
+        return bboxes
+    interpolated_bboxes = bboxes.copy()
+    for i in range(5):
+        interp_func = interp1d(non_zero_indices, bboxes[non_zero_indices, i], kind='linear', fill_value="extrapolate")
+        interpolated_bboxes[zero_indices, i] = interp_func(zero_indices)
+    return interpolated_bboxes

lib/eval_utils/filling_utils.py ADDED Viewed

	@@ -0,0 +1,306 @@

+import copy
+import os
+import joblib
+import numpy as np
+from scipy.spatial.transform import Slerp, Rotation
+import torch
+from hawor.utils.process import run_mano, run_mano_left
+from hawor.utils.rotation import angle_axis_to_quaternion, angle_axis_to_rotation_matrix, quaternion_to_rotation_matrix, rotation_matrix_to_angle_axis
+from lib.utils.geometry import rotmat_to_rot6d
+from lib.utils.geometry import rot6d_to_rotmat
+def slerp_interpolation_aa(pos, valid):
+    B, T, N, _ = pos.shape  # B: 批次大小, T: 时间步长, N: 关节数, 4: 四元数维度
+    pos_interp = pos.copy()  # 创建副本以存储插值结果
+    for b in range(B):
+        for n in range(N):
+            quat_b_n = pos[b, :, n, :]
+            valid_b_n = valid[b, :]
+            invalid_idxs = np.where(~valid_b_n)[0]
+            valid_idxs = np.where(valid_b_n)[0]
+            if len(invalid_idxs) == 0:
+                continue
+            if len(valid_idxs) > 1:
+                valid_times = valid_idxs  # 有效时间步
+                valid_rots = Rotation.from_rotvec(quat_b_n[valid_idxs])  # 有效四元数
+                slerp = Slerp(valid_times, valid_rots)
+                for idx in invalid_idxs:
+                    if idx < valid_idxs[0]:  # 时间步小于第一个有效时间步，进行外推
+                        pos_interp[b, idx, n, :] = quat_b_n[valid_idxs[0]]  # 复制第一个有效四元数
+                    elif idx > valid_idxs[-1]:  # 时间步大于最后一个有效时间步，进行外推
+                        pos_interp[b, idx, n, :] = quat_b_n[valid_idxs[-1]]  # 复制最后一个有效四元数
+                    else:
+                        interp_rot = slerp([idx])
+                        pos_interp[b, idx, n, :] = interp_rot.as_rotvec()[0]
+    # print("#######")
+    # if N > 1:
+    #     print(pos[1,0,11])
+    #     print(pos_interp[1,0,11])
+    return pos_interp
+def slerp_interpolation_quat(pos, valid):
+    # wxyz to xyzw
+    pos = pos[:, :, :, [1, 2, 3, 0]]
+    B, T, N, _ = pos.shape  # B: 批次大小, T: 时间步长, N: 关节数, 4: 四元数维度
+    pos_interp = pos.copy()  # 创建副本以存储插值结果
+    for b in range(B):
+        for n in range(N):
+            quat_b_n = pos[b, :, n, :]
+            valid_b_n = valid[b, :]
+            invalid_idxs = np.where(~valid_b_n)[0]
+            valid_idxs = np.where(valid_b_n)[0]
+            if len(invalid_idxs) == 0:
+                continue
+            if len(valid_idxs) > 1:
+                valid_times = valid_idxs  # 有效时间步
+                valid_rots = Rotation.from_quat(quat_b_n[valid_idxs])  # 有效四元数
+                slerp = Slerp(valid_times, valid_rots)
+                for idx in invalid_idxs:
+                    if idx < valid_idxs[0]:  # 时间步小于第一个有效时间步，进行外推
+                        pos_interp[b, idx, n, :] = quat_b_n[valid_idxs[0]]  # 复制第一个有效四元数
+                    elif idx > valid_idxs[-1]:  # 时间步大于最后一个有效时间步，进行外推
+                        pos_interp[b, idx, n, :] = quat_b_n[valid_idxs[-1]]  # 复制最后一个有效四元数
+                    else:
+                        interp_rot = slerp([idx])
+                        pos_interp[b, idx, n, :] = interp_rot.as_quat()[0]
+    # xyzw to wxyz
+    pos_interp = pos_interp[:, :, :, [3, 0, 1, 2]]
+    return pos_interp
+def linear_interpolation_nd(pos, valid):
+    B, T = pos.shape[:2]  # 取出批次大小B和时间步长T
+    feature_dim = pos.shape[2]  # ** 代表的任意维度
+    pos_interp = pos.copy()  # 创建一个副本，用来保存插值结果
+    for b in range(B):
+        for idx in range(feature_dim):  # 针对任意维度
+            pos_b_idx = pos[b, :, idx]  # 取出第b批次对应的**维度下的一个时间序列
+            valid_b = valid[b, :]  # 当前批次的有效标志
+            # 找到无效的索引（False）
+            invalid_idxs = np.where(~valid_b)[0]
+            valid_idxs = np.where(valid_b)[0]
+            if len(invalid_idxs) == 0:
+                continue
+            # 对无效部分进行线性插值
+            if len(valid_idxs) > 1:  # 确保有足够的有效点用于插值
+                pos_b_idx[invalid_idxs] = np.interp(invalid_idxs, valid_idxs, pos_b_idx[valid_idxs])
+                pos_interp[b, :, idx] = pos_b_idx  # 保存插值结果
+    return pos_interp
+def world2canonical_convert(R_c2w_sla, t_c2w_sla, data_out, handedness):
+    init_rot_mat = copy.deepcopy(data_out["init_root_orient"])
+    init_rot_mat = torch.einsum("tij,btjk->btik", R_c2w_sla, init_rot_mat)
+    init_rot = rotation_matrix_to_angle_axis(init_rot_mat)
+    init_rot_quat = angle_axis_to_quaternion(init_rot)
+    # data_out["init_root_orient"] = rotation_matrix_to_angle_axis(data_out["init_root_orient"])
+    # data_out["init_hand_pose"] = rotation_matrix_to_angle_axis(data_out["init_hand_pose"])
+    data_out_init_root_orient = rotation_matrix_to_angle_axis(data_out["init_root_orient"])
+    data_out_init_hand_pose = rotation_matrix_to_angle_axis(data_out["init_hand_pose"])
+    init_trans = data_out["init_trans"] # (B, T, 3)
+    if handedness == "left":
+        outputs = run_mano_left(data_out["init_trans"], data_out_init_root_orient, data_out_init_hand_pose, betas=data_out["init_betas"])
+    elif handedness == "right":
+        outputs = run_mano(data_out["init_trans"], data_out_init_root_orient, data_out_init_hand_pose, betas=data_out["init_betas"])
+    root_loc = outputs["joints"][..., 0, :].cpu()  # (B, T, 3)
+    offset = init_trans - root_loc  # It is a constant, no matter what the rotation is.
+    init_trans = (
+        torch.einsum("tij,btj->bti", R_c2w_sla, root_loc)
+        + t_c2w_sla[None, :]
+        + offset
+    )
+    data_world = {
+        "init_root_orient": init_rot, # (B, T, 3)
+        "init_hand_pose": data_out_init_hand_pose, # (B, T, 15, 3)
+        "init_trans": init_trans,  # (B, T, 3)
+        "init_betas": data_out["init_betas"]  # (B, T, 10)
+    }
+    return data_world
+def filling_preprocess(item):
+    num_joints = 15
+    global_trans = item['trans'] # (2, seq_len, 3)
+    global_rot = item['rot'] #(2, seq_len, 3)
+    hand_pose = item['hand_pose'] # (2, seq_len, 45)
+    betas = item['betas'] # (2, seq_len, 10)
+    valid = item['valid'] # (2, seq_len)
+    N, T, _ = global_trans.shape
+    R_canonical2world_left_aa = torch.from_numpy(global_rot[0, 0])
+    R_canonical2world_right_aa = torch.from_numpy(global_rot[1, 0])
+    R_world2canonical_left = angle_axis_to_rotation_matrix(R_canonical2world_left_aa).t()
+    R_world2canonical_right = angle_axis_to_rotation_matrix(R_canonical2world_right_aa).t()
+    # transform left hand to canonical
+    hand_pose = hand_pose.reshape(N, T, num_joints, 3)
+    data_world_left = {
+        "init_trans": torch.from_numpy(global_trans[0:1]),
+        "init_root_orient": angle_axis_to_rotation_matrix(torch.from_numpy(global_rot[0:1])),
+        "init_hand_pose": angle_axis_to_rotation_matrix(torch.from_numpy(hand_pose[0:1])),
+        "init_betas": torch.from_numpy(betas[0:1]),
+    }
+    data_left_init_root_orient = rotation_matrix_to_angle_axis(data_world_left["init_root_orient"])
+    data_left_init_hand_pose = rotation_matrix_to_angle_axis(data_world_left["init_hand_pose"])
+    outputs = run_mano_left(data_world_left["init_trans"], data_left_init_root_orient, data_left_init_hand_pose, betas=data_world_left["init_betas"])
+    init_trans = data_world_left["init_trans"][0, 0] # (3,)
+    root_loc = outputs["joints"][0, 0, 0, :].cpu()  # (3,)
+    offset = init_trans - root_loc  # It is a constant, no matter what the rotation is.
+    t_world2canonical_left = -torch.einsum("ij,j->i", R_world2canonical_left, root_loc) - offset
+    R_world2canonical_left = R_world2canonical_left.repeat(T, 1, 1)
+    t_world2canonical_left = t_world2canonical_left.repeat(T, 1)
+    data_canonical_left = world2canonical_convert(R_world2canonical_left, t_world2canonical_left, data_world_left, "left")
+    # transform right hand to canonical
+    data_world_right = {
+        "init_trans": torch.from_numpy(global_trans[1:2]),
+        "init_root_orient": angle_axis_to_rotation_matrix(torch.from_numpy(global_rot[1:2])),
+        "init_hand_pose": angle_axis_to_rotation_matrix(torch.from_numpy(hand_pose[1:2])),
+        "init_betas": torch.from_numpy(betas[1:2]),
+    }
+    data_right_init_root_orient = rotation_matrix_to_angle_axis(data_world_right["init_root_orient"])
+    data_right_init_hand_pose = rotation_matrix_to_angle_axis(data_world_right["init_hand_pose"])
+    outputs = run_mano(data_world_right["init_trans"], data_right_init_root_orient, data_right_init_hand_pose, betas=data_world_right["init_betas"])
+    init_trans = data_world_right["init_trans"][0, 0] # (3,)
+    root_loc = outputs["joints"][0, 0, 0, :].cpu()  # (3,)
+    offset = init_trans - root_loc  # It is a constant, no matter what the rotation is.
+    t_world2canonical_right = -torch.einsum("ij,j->i", R_world2canonical_right, root_loc) - offset
+    R_world2canonical_right = R_world2canonical_right.repeat(T, 1, 1)
+    t_world2canonical_right = t_world2canonical_right.repeat(T, 1)
+    data_canonical_right = world2canonical_convert(R_world2canonical_right, t_world2canonical_right, data_world_right, "right")
+    # merge left and right canonical data
+    global_rot = torch.cat((data_canonical_left['init_root_orient'], data_canonical_right['init_root_orient']))
+    global_trans = torch.cat((data_canonical_left['init_trans'], data_canonical_right['init_trans'])).numpy()
+    # global_rot = angle_axis_to_quaternion(global_rot).numpy().reshape(N, T, 1, 4)
+    global_rot = global_rot.reshape(N, T, 1, 3).numpy()
+    hand_pose = hand_pose.reshape(N, T, 15, 3)
+    # hand_pose = angle_axis_to_quaternion(torch.from_numpy(hand_pose)).numpy()
+    # lerp and slerp
+    global_trans_lerped = linear_interpolation_nd(global_trans, valid)
+    betas_lerped = linear_interpolation_nd(betas, valid)
+    global_rot_slerped = slerp_interpolation_aa(global_rot, valid)
+    hand_pose_slerped = slerp_interpolation_aa(hand_pose, valid)
+    # convert to rot6d
+    global_rot_slerped_mat = angle_axis_to_rotation_matrix(torch.from_numpy(global_rot_slerped.reshape(N*T, -1)))
+    # global_rot_slerped_mat = quaternion_to_rotation_matrix(torch.from_numpy(global_rot_slerped.reshape(N*T, -1)))
+    global_rot_slerped_rot6d = rotmat_to_rot6d(global_rot_slerped_mat).reshape(N, T, -1).numpy()
+    hand_pose_slerped_mat = angle_axis_to_rotation_matrix(torch.from_numpy(hand_pose_slerped.reshape(N*T*num_joints, -1)))
+    # hand_pose_slerped_mat = quaternion_to_rotation_matrix(torch.from_numpy(hand_pose_slerped.reshape(N*T*num_joints, -1)))
+    hand_pose_slerped_rot6d = rotmat_to_rot6d(hand_pose_slerped_mat).reshape(N, T, -1).numpy()
+    # concat to (T, concat_dim)
+    global_pose_vec_input = np.concatenate((global_trans_lerped, betas_lerped, global_rot_slerped_rot6d, hand_pose_slerped_rot6d), axis=-1).transpose(1, 0, 2).reshape(T, -1)
+    R_canon2w_left = R_world2canonical_left.transpose(-1, -2)
+    t_canon2w_left = -torch.einsum("tij,tj->ti", R_canon2w_left, t_world2canonical_left)
+    R_canon2w_right = R_world2canonical_right.transpose(-1, -2)
+    t_canon2w_right = -torch.einsum("tij,tj->ti", R_canon2w_right, t_world2canonical_right)
+    transform_w_canon = {
+        "R_w2canon_left": R_world2canonical_left,
+        "t_w2canon_left": t_world2canonical_left,
+        "R_canon2w_left": R_canon2w_left,
+        "t_canon2w_left": t_canon2w_left,
+        "R_w2canon_right": R_world2canonical_right,
+        "t_w2canon_right": t_world2canonical_right,
+        "R_canon2w_right": R_canon2w_right,
+        "t_canon2w_right": t_canon2w_right,
+    }
+    return global_pose_vec_input, transform_w_canon
+def custom_rot6d_to_rotmat(rot6d):
+    original_shape = rot6d.shape[:-1]
+    rot6d = rot6d.reshape(-1, 6)
+    mat = rot6d_to_rotmat(rot6d)
+    mat = mat.reshape(*original_shape, 3, 3)
+    return mat
+def filling_postprocess(output, transform_w_canon):
+    # output = output.numpy()
+    output = output.permute(1, 0, 2) # (2, T, -1)
+    N, T, _ = output.shape
+    canon_trans = output[:, :, :3]
+    betas = output[:, :, 3:13]
+    canon_rot_rot6d = output[:, :, 13:19]
+    hand_pose_rot6d = output[:, :, 19:109].reshape(N, T, 15, 6)
+    canon_rot_mat = custom_rot6d_to_rotmat(canon_rot_rot6d)
+    hand_pose_mat = custom_rot6d_to_rotmat(hand_pose_rot6d)
+    data_canonical_left = {
+        "init_trans": canon_trans[[0], :, :],
+        "init_root_orient": canon_rot_mat[[0], :, :, :],
+        "init_hand_pose": hand_pose_mat[[0], :, :, :, :],
+        "init_betas": betas[[0], :, :]
+    }
+    data_canonical_right = {
+        "init_trans": canon_trans[[1], :, :],
+        "init_root_orient": canon_rot_mat[[1], :, :, :],
+        "init_hand_pose": hand_pose_mat[[1], :, :, :, :],
+        "init_betas": betas[[1], :, :]
+    }
+    R_canon2w_left = transform_w_canon['R_canon2w_left']
+    t_canon2w_left = transform_w_canon['t_canon2w_left']
+    R_canon2w_right = transform_w_canon['R_canon2w_right']
+    t_canon2w_right = transform_w_canon['t_canon2w_right']
+    world_left = world2canonical_convert(R_canon2w_left, t_canon2w_left, data_canonical_left, "left")
+    world_right = world2canonical_convert(R_canon2w_right, t_canon2w_right, data_canonical_right, "right")
+    global_rot = torch.cat((world_left['init_root_orient'], world_right['init_root_orient'])).numpy()
+    global_trans = torch.cat((world_left['init_trans'], world_right['init_trans'])).numpy()
+    pred_data = {
+        "trans": global_trans, # (2, T, 3)
+        "rot": global_rot, # (2, T, 3)
+        "hand_pose": rotation_matrix_to_angle_axis(hand_pose_mat).flatten(-2).numpy(), # (2, T, 45)
+        "betas": betas.numpy(), # (2, T, 10)
+    }
+    return pred_data

lib/eval_utils/video_utils.py ADDED Viewed

	@@ -0,0 +1,85 @@

+import cv2
+import os
+import subprocess
+def make_video_grid_2x2(out_path, vid_paths, overwrite=False):
+    """
+    将四个视频以原始分辨率拼接成 2x2 网格。
+    :param out_path: 输出视频路径。
+    :param vid_paths: 输入视频路径的列表（长度必须为 4）。
+    :param overwrite: 如果为 True，覆盖已存在的输出文件。
+    """
+    if os.path.isfile(out_path) and not overwrite:
+        print(f"{out_path} already exists, skipping.")
+        return
+    if any(not os.path.isfile(v) for v in vid_paths):
+        print("Not all inputs exist!", vid_paths)
+        return
+    # 确保视频路径长度为 4
+    if len(vid_paths) != 4:
+        print("Error: Exactly 4 video paths are required!")
+        return
+    # 获取视频路径
+    v1, v2, v3, v4 = vid_paths
+    # ffmpeg 拼接命令，直接拼接不调整大小
+    cmd = (
+        f"ffmpeg -i {v1} -i {v2} -i {v3} -i {v4} "
+        f"-filter_complex '[0:v][1:v][2:v][3:v]xstack=inputs=4:layout=0_0|w0_0|0_h0|w0_h0[v]' "
+        f"-map '[v]' {out_path} -y"
+    )
+    print(cmd)
+    subprocess.call(cmd, shell=True, stdin=subprocess.PIPE)
+def create_video_from_images(image_list, output_path, fps=15, target_resolution=(540, 540)):
+    """
+    将图片列表合成为 MP4 视频。
+    :param image_list: 图片路径的列表。
+    :param output_path: 输出视频的文件路径（如 output.mp4）。
+    :param fps: 视频的帧率（默认 15 FPS）。
+    """
+    # if not image_list:
+    #     print("图片列表为空！")
+    #     return
+    # 读取第一张图片以获取宽度和高度
+    first_image = cv2.imread(image_list[0])
+    if first_image is None:
+        print(f"无法读取图片: {image_list[0]}")
+        return
+    height, width, _ = first_image.shape
+    if height != width:
+        if height < width:
+            vis_w = target_resolution[0]
+            vis_h = int(target_resolution[0] / width * height)
+        elif height > width:
+            vis_h = target_resolution[0]
+            vis_w = int(target_resolution[0] / height * width)
+    else:
+        vis_h = target_resolution[0]
+        vis_w = target_resolution[0]
+    target_resolution = (vis_w, vis_h)
+    # 定义视频编码器和输出参数
+    fourcc = cv2.VideoWriter_fourcc(*'mp4v')  # 使用 mp4v 编码器
+    video_writer = cv2.VideoWriter(output_path, fourcc, fps, target_resolution)
+    # 遍历图片列表并写入视频
+    for image_path in image_list:
+        frame = cv2.imread(image_path)
+        frame_resized = cv2.resize(frame, target_resolution)
+        if frame is None:
+            print(f"无法读取图片: {image_path}")
+            continue
+        video_writer.write(frame_resized)
+    # 释放视频写入器
+    video_writer.release()
+    print(f"视频已保存至: {output_path}")

lib/models/__pycache__/hawor.cpython-310.pyc ADDED Viewed

Binary file (15.3 kB). View file

lib/models/__pycache__/mano_wrapper.cpython-310.pyc ADDED Viewed

Binary file (2.43 kB). View file

lib/models/__pycache__/modules.cpython-310.pyc ADDED Viewed

Binary file (4.71 kB). View file

lib/models/backbones/__init__.py ADDED Viewed

	@@ -0,0 +1,8 @@

+from .vit import vit
+def create_backbone(cfg):
+    if cfg.MODEL.BACKBONE.TYPE == 'vit':
+        return vit(cfg)
+    else:
+        raise NotImplementedError('Backbone type is not implemented')

lib/models/backbones/__pycache__/__init__.cpython-310.pyc ADDED Viewed

Binary file (445 Bytes). View file

lib/models/backbones/__pycache__/vit.cpython-310.pyc ADDED Viewed

Binary file (11.3 kB). View file