Update README.md
Browse files
README.md
CHANGED
|
@@ -1,11 +1,41 @@
|
|
| 1 |
---
|
| 2 |
license: mit
|
| 3 |
pipeline_tag: video-classification
|
| 4 |
-
tags:
|
| 5 |
-
- model_hub_mixin
|
| 6 |
-
- pytorch_model_hub_mixin
|
| 7 |
---
|
| 8 |
|
| 9 |
-
|
| 10 |
-
|
| 11 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
license: mit
|
| 3 |
pipeline_tag: video-classification
|
|
|
|
|
|
|
|
|
|
| 4 |
---
|
| 5 |
|
| 6 |
+
## Introduction
|
| 7 |
+
|
| 8 |
+
This repository contains the 6B model of the paper [InternVideo2](https://arxiv.org/pdf/2403.15377) in stage 2.
|
| 9 |
+
|
| 10 |
+
Code: https://github.com/OpenGVLab/InternVideo/tree/main/InternVideo2/multi_modality
|
| 11 |
+
|
| 12 |
+
## 🚀 Installation
|
| 13 |
+
|
| 14 |
+
Please refer to https://github.com/OpenGVLab/InternVideo/blob/main/InternVideo2/multi_modality/INSTALL.md
|
| 15 |
+
|
| 16 |
+
## Usage
|
| 17 |
+
|
| 18 |
+
```python
|
| 19 |
+
import cv2
|
| 20 |
+
from transformers import AutoModel
|
| 21 |
+
from modeling_internvideo2 import (retrieve_text, vid2tensor, _frame_from_video,)
|
| 22 |
+
|
| 23 |
+
|
| 24 |
+
if __name__ == '__main__':
|
| 25 |
+
model = AutoModel.from_pretrained("OpenGVLab/InternVideo2-Stage2_6B", trust_remote_code=True).eval()
|
| 26 |
+
|
| 27 |
+
video = cv2.VideoCapture('example1.mp4')
|
| 28 |
+
frames = [x for x in _frame_from_video(video)]
|
| 29 |
+
text_candidates = ["A playful dog and its owner wrestle in the snowy yard, chasing each other with joyous abandon.",
|
| 30 |
+
"A man in a gray coat walks through the snowy landscape, pulling a sleigh loaded with toys.",
|
| 31 |
+
"A person dressed in a blue jacket shovels the snow-covered pavement outside their house.",
|
| 32 |
+
"A cat excitedly runs through the yard, chasing a rabbit.",
|
| 33 |
+
"A person bundled up in a blanket walks through the snowy landscape, enjoying the serene winter scenery."]
|
| 34 |
+
|
| 35 |
+
texts, probs = retrieve_text(frames, text_candidates, model=model, topk=5)
|
| 36 |
+
for t, p in zip(texts, probs):
|
| 37 |
+
print(f'text: {t} ~ prob: {p:.4f}')
|
| 38 |
+
|
| 39 |
+
vidtensor = vid2tensor('example1.mp4', fnum=4)
|
| 40 |
+
feat = model.get_vid_feat(vidtensor)
|
| 41 |
+
```
|