alibaba-pai
/

LVDR

Model card Files Files and versions Community

LVDR / README.md

jpWang's picture

Update README.md

d262258 verified 26 days ago

|

1.21 kB

	---
	license: cc-by-nc-sa-4.0
	language:
	- en
	---
	# The LVDR Benchmark (Long Video Description Ranking)

	This benchmark is proposed from [VideoCLIP-XL](https://arxiv.org/abs/2410.00741).
	Given each video and its corresponding ground-truth description, we perform a synthesis process that iterates p − 1 times and alters q words as hallucination during each iteration, resulting in totally p descriptions with gradually increasing degrees of hallucination. We denote such a subset as p × q and construct five subsets as {4 × 1, 4 × 2, 4 × 3, 4 × 4, 4 × 5}. The video CLIP models need to be able to correctly rank these descriptions in descending order of similarity given the video.

	# Format
	```json
	{
	"long_captions": [
	"...",
	],
	"video_id": "..."
	}
	{
	.....
	},
	.....
	```


	# Source
	~~~
	@misc{wang2024videoclipxladvancinglongdescription,
	title={VideoCLIP-XL: Advancing Long Description Understanding for Video CLIP Models},
	author={Jiapeng Wang and Chengyu Wang and Kunzhe Huang and Jun Huang and Lianwen Jin},
	year={2024},
	eprint={2410.00741},
	archivePrefix={arXiv},
	primaryClass={cs.CL},
	url={https://arxiv.org/abs/2410.00741},
	}
	~~~