thanks to DAMO-NLP-SG ❤
Browse files- README.md +33 -0
- finetune-billa7b-zh.pth +3 -0
- finetune-vicuna13b-v2.pth +3 -0
- finetune-vicuna7b-v2.pth +3 -0
- finetune-ziya13b-zh.pth +3 -0
- finetune_vicuna7b_audiobranch.pth +3 -0
- pretrain-billa7b-zh.pth +3 -0
- pretrain-vicuna13b.pth +3 -0
- pretrain-ziya13b-zh.pth +3 -0
- pretrain_vicuna7b-v2.pth +3 -0
- pretrain_vicuna7b_audiobranch.pth +3 -0
README.md
ADDED
@@ -0,0 +1,33 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: bsd-3-clause
|
3 |
+
language:
|
4 |
+
- en
|
5 |
+
- zh
|
6 |
+
pipeline_tag: visual-question-answering
|
7 |
+
---
|
8 |
+
|
9 |
+
# Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding
|
10 |
+
This is the Hugging Face repo for storing pre-trained & fine-tuned checkpoints of our [Video-LLaMA](https://arxiv.org/abs/2306.02858), which is a multi-modal conversational large language model with video understanding capability.
|
11 |
+
|
12 |
+
|
13 |
+
## Vision-Language Branch
|
14 |
+
| Checkpoint | Link | Note |
|
15 |
+
|:------------|-------------|-------------|
|
16 |
+
| pretrain-vicuna7b | [link](https://huggingface.co/DAMO-NLP-SG/Video-LLaMA-Series/resolve/main/pretrain_vicuna7b-v2.pth) | Pre-trained on WebVid (2.5M video-caption pairs) and LLaVA-CC3M (595k image-caption pairs) |
|
17 |
+
| finetune-vicuna7b-v2 | [link](https://huggingface.co/DAMO-NLP-SG/Video-LLaMA-Series/resolve/main/finetune-vicuna7b-v2.pth) | Fine-tuned on the instruction-tuning data from [MiniGPT-4](https://github.com/Vision-CAIR/MiniGPT-4), [LLaVA](https://github.com/haotian-liu/LLaVA) and [VideoChat](https://github.com/OpenGVLab/Ask-Anything)|
|
18 |
+
| pretrain-vicuna13b | [link](https://huggingface.co/DAMO-NLP-SG/Video-LLaMA-Series/resolve/main/pretrain-vicuna13b.pth) | Pre-trained on WebVid (2.5M video-caption pairs) and LLaVA-CC3M (595k image-caption pairs) |
|
19 |
+
| finetune-vicuna13b-v2 | [link](https://huggingface.co/DAMO-NLP-SG/Video-LLaMA-Series/resolve/main/finetune-vicuna13b-v2.pth) | Fine-tuned on the instruction-tuning data from [MiniGPT-4](https://github.com/Vision-CAIR/MiniGPT-4), [LLaVA](https://github.com/haotian-liu/LLaVA) and [VideoChat](https://github.com/OpenGVLab/Ask-Anything)|
|
20 |
+
| pretrain-ziya13b-zh | [link](https://huggingface.co/DAMO-NLP-SG/Video-LLaMA-Series/resolve/main/pretrain-ziya13b-zh.pth) | Pre-trained with Chinese LLM [Ziya-13B](https://huggingface.co/IDEA-CCNL/Ziya-LLaMA-13B-v1) |
|
21 |
+
| finetune-ziya13b-zh | [link](https://huggingface.co/DAMO-NLP-SG/Video-LLaMA-Series/resolve/main/finetune-ziya13b-zh.pth) | Fine-tuned on machine-translated [VideoChat](https://github.com/OpenGVLab/Ask-Anything) instruction-following dataset (in Chinese)|
|
22 |
+
| pretrain-billa7b-zh | [link](https://huggingface.co/DAMO-NLP-SG/Video-LLaMA-Series/resolve/main/pretrain-billa7b-zh.pth) | Pre-trained with Chinese LLM [BiLLA-7B](https://huggingface.co/IDEA-CCNL/Ziya-LLaMA-13B-v1) |
|
23 |
+
| finetune-billa7b-zh | [link](https://huggingface.co/DAMO-NLP-SG/Video-LLaMA-Series/resolve/main/finetune-billa7b-zh.pth) | Fine-tuned on machine-translated [VideoChat](https://github.com/OpenGVLab/Ask-Anything) instruction-following dataset (in Chinese) |
|
24 |
+
|
25 |
+
## Audio-Language Branch
|
26 |
+
| Checkpoint | Link | Note |
|
27 |
+
|:------------|-------------|-------------|
|
28 |
+
| pretrain-vicuna7b | [link](https://huggingface.co/DAMO-NLP-SG/Video-LLaMA-Series/resolve/main/pretrain_vicuna7b_audiobranch.pth) | Pre-trained on WebVid (2.5M video-caption pairs) and LLaVA-CC3M (595k image-caption pairs) |
|
29 |
+
| finetune-vicuna7b-v2 | [link](https://huggingface.co/DAMO-NLP-SG/Video-LLaMA-Series/resolve/main/finetune_vicuna7b_audiobranch.pth) | Fine-tuned on the instruction-tuning data from [MiniGPT-4](https://github.com/Vision-CAIR/MiniGPT-4), [LLaVA](https://github.com/haotian-liu/LLaVA) and [VideoChat](https://github.com/OpenGVLab/Ask-Anything)|
|
30 |
+
|
31 |
+
|
32 |
+
## Usage
|
33 |
+
For launching the pre-trained Video-LLaMA on your own machine, please refer to our [github repo](https://github.com/DAMO-NLP-SG/Video-LLaMA).
|
finetune-billa7b-zh.pth
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:91f1047d8e1d6970680db961ab9057fdf78919069cfc4c164e08023b66ff6e5d
|
3 |
+
size 265435817
|
finetune-vicuna13b-v2.pth
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:2ebf848c8affaaa00194ffd6d3e1f5148ebd64bff08050fc12523a28d0023285
|
3 |
+
size 274898177
|
finetune-vicuna7b-v2.pth
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:0680ad8eb14c2a3273b7be71309ab6b06c9f426e87ad4675a903371fe0fa8162
|
3 |
+
size 265436777
|
finetune-ziya13b-zh.pth
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:a773de8e84dec9d980d4f040b522d0dc9d600161bc8ebe13ebb149bf1dfa3fc2
|
3 |
+
size 274897409
|
finetune_vicuna7b_audiobranch.pth
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:72877c69ae31ea436507af14ac9f1f5275feed98955e2271f4e79294b994c404
|
3 |
+
size 274578593
|
pretrain-billa7b-zh.pth
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:f50a51db3055e1be6461f6dec833fbbbba28650287d26c8787664c8ee31dcf0f
|
3 |
+
size 265435689
|
pretrain-vicuna13b.pth
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:6bc8fafd174e08e076b0b46b02330376a4813bf61d230eaea46a8e919721931c
|
3 |
+
size 274897345
|
pretrain-ziya13b-zh.pth
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:2db583659e4b6d9bfb24f765f077c9ae3c0810618d2cf769b21bdde92e7c9d24
|
3 |
+
size 274897281
|
pretrain_vicuna7b-v2.pth
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:ab4d69838d4281eb62d0da8a26c15cbd4e46f9e6168fb89919199da9899de089
|
3 |
+
size 265435753
|
pretrain_vicuna7b_audiobranch.pth
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:85cf6cf68906042f107928ffa635ed539ed104ae1fecacd22bb488ce80131e5a
|
3 |
+
size 274577569
|