Edit model card

TRACE: Temporal Grounding Video LLM via Causal Event Modeling

If our project helps you, please give us a star ⭐ on GitHub and cite our paper!

πŸ“° News

  • [2024.10.19] πŸ”₯ We release trace-retrieval by forcing the predicted timestamps to be align with the input frame timestamps. Results show trace-retrieval achieve better performance on dense video captioning tasks.
  • [2024.10.10] πŸ”₯ Our code and paper are released!
  • [2024.10.10] πŸ”₯ Our checkpoints are available now!

Overview

In this work

  • We model the videos by a series of events, and propose causal event modeling framework to capture videos' inherent structure.
  • We present a novel task-interleaved video LLM model, TRACE, tailored to implement the causal event modeling framework through the sequential encoding/decoding of timestamps, salient scores, and textual captions.

Model Zoo

Checkpoints Description URL
Initialization Weights initialized from VideoLLaMA2 trace-init
Stage-1 Model checkpoints trained after stage-1 trace-stage1
Stage-2 Model checkpoints trained after stage-2 trace
FT-Charades Fine-tuned on Charades-STA dataset trace-ft-charades
FT-Youcook2 Fine-tuned on Youcook2 dataset trace-ft-youcook2
FT-QVHighlights Fine-tuned on QVHighlights dataset trace-ft-qvhighlights
TRACE-retrieval Forcing the predicted timestamps to be align with input timestamps trace-retrieval

Results

Youcook2 (Zero-Shot) CIDER METEOR SODA_c F1
TRACE 8.1 2.8 2.2 22.4
TRACE-retrieval 8.3 2.9 2.3 24.1
Charades-STA (Zero-Shot) 0.3 0.5 0.7 mIOU
TRACE 58.6 40.3 19.4 38.7
TRACE-retrieval 57.9 37.4 17.3 37.4
QVHighlights (Zero-Shot) mAP Hit@1
TRACE 26.8 42.7
TRACE-retrieval 27.9 44.3
ActivityNet-DVC CIDER METEOR SODA_c F1
TRACE 25.9 6.0 6.4 39.3
TRACE-retrieval 25.7 5.9 6.5 40.1
ActivityNet-MR 0.3 0.5 0.7 mIOU
TRACE 54.0 37.7 24.0 39.0
TRACE-retrieval 54.4 39.8 24.9 40.2
Downloads last month
4
Safetensors
Model size
7.55B params
Tensor type
BF16
Β·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for Yongxin-Guo/trace-retrieval

Finetuned
(356)
this model

Collection including Yongxin-Guo/trace-retrieval