CCCCCC nielsr HF Staff commited on
Commit
9a43e1b
·
verified ·
1 Parent(s): e440bcc

Add pipeline tag, link to paper and code (#1)

Browse files

- Add pipeline tag, link to paper and code (a3c7474a5a49721ab31666e07c5c55b4f366485e)


Co-authored-by: Niels Rogge <[email protected]>

Files changed (1) hide show
  1. README.md +89 -86
README.md CHANGED
@@ -1,87 +1,90 @@
1
- ---
2
- license: apache-2.0
3
- language:
4
- - en
5
- inference: false
6
- ---
7
- <h1>VPO: Aligning Text-to-Video Generation Models with Prompt Optimization</h1>
8
-
9
- - **Repository:** https://github.com/thu-coai/VPO
10
- <!-- - **Paper:** -->
11
- - **Data:** https://huggingface.co/datasets/CCCCCC/VPO
12
-
13
- # VPO
14
- VPO is a principled prompt optimization framework grounded in the principles of harmlessness, accuracy, and helpfulness.
15
- VPO employs a two-stage process that first constructs a supervised fine-tuning dataset guided by safety and alignment, and then conducts preference learning with both text-level and video-level feedback. As a result, VPO preserves user intent while enhancing video quality and safety.
16
-
17
- ## Model Details
18
-
19
- ### Video Generation Model
20
- This model is trained to optimize user prompt for CogVideoX-5B. [VPO-2B](https://huggingface.co/CCCCCC/VPO-2B) is for CogVideoX-2B.
21
-
22
- ### Data
23
- Our dataset can be found [here](https://huggingface.co/datasets/CCCCCC/VPO).
24
-
25
- ### Language
26
- English
27
-
28
- ## Intended Use
29
-
30
- ### Prompt Template
31
- We adopt a prompt template as
32
- ```
33
- In this task, your goal is to expand the user's short query into a detailed and well-structured English prompt for generating short videos.
34
-
35
- Please ensure that the generated video prompt adheres to the following principles:
36
-
37
- 1. **Harmless**: The prompt must be safe, respectful, and free from any harmful, offensive, or unethical content.
38
- 2. **Aligned**: The prompt should fully preserve the user's intent, incorporating all relevant details from the original query while ensuring clarity and coherence.
39
- 3. **Helpful for High-Quality Video Generation**: The prompt should be descriptive and vivid to facilitate high-quality video creation. Keep the scene feasible and well-suited for a brief duration, avoiding unnecessary complexity or unrealistic elements not mentioned in the query.
40
-
41
- User Query:{user prompt}
42
-
43
- Video Prompt:
44
- ```
45
-
46
- ### Inference code
47
- Here is an example code for inference:
48
- ```python
49
- from transformers import AutoModelForCausalLM, AutoTokenizer
50
-
51
- model_path = ''
52
-
53
- prompt_template = """In this task, your goal is to expand the user's short query into a detailed and well-structured English prompt for generating short videos.
54
-
55
- Please ensure that the generated video prompt adheres to the following principles:
56
-
57
- 1. **Harmless**: The prompt must be safe, respectful, and free from any harmful, offensive, or unethical content.
58
- 2. **Aligned**: The prompt should fully preserve the user's intent, incorporating all relevant details from the original query while ensuring clarity and coherence.
59
- 3. **Helpful for High-Quality Video Generation**: The prompt should be descriptive and vivid to facilitate high-quality video creation. Keep the scene feasible and well-suited for a brief duration, avoiding unnecessary complexity or unrealistic elements not mentioned in the query.
60
-
61
- User Query:{}
62
-
63
- Video Prompt:"""
64
-
65
- device = 'cuda:0'
66
- model = AutoModelForCausalLM.from_pretrained(model_path).half().eval().to(device)
67
- # for 8bit
68
- # model = AutoModelForCausalLM.from_pretrained(model_path, device_map=device, load_in_8bit=True)
69
- tokenizer = AutoTokenizer.from_pretrained(model_path)
70
-
71
- text = "a cute dog on the grass"
72
- messgae = [{'role': 'user', 'content': prompt_template.format(text)}]
73
-
74
- model_inputs = tokenizer.apply_chat_template(messgae, add_generation_prompt=True, tokenize=True, return_tensors="pt").to(device)
75
- output = model.generate(model_inputs, max_new_tokens=1024, do_sample=True, top_p=1.0, temperature=0.7, num_beams=1)
76
- resp = tokenizer.decode(output[0]).split('<|start_header_id|>assistant<|end_header_id|>')[1].split('<|eot_id|>')[0].strip()
77
-
78
- print(resp)
79
- ```
80
- See our [Github Repo](https://github.com/thu-coai/VPO) for more detailed usage (e.g. Inference with Vllm).
81
-
82
-
83
- <!-- ## Citation
84
- If you find our model is useful in your work, please cite it with:
85
- ```
86
-
 
 
 
87
  ``` -->
 
1
+ ---
2
+ language:
3
+ - en
4
+ license: apache-2.0
5
+ inference: false
6
+ library_name: transformers
7
+ pipeline_tag: text-generation
8
+ ---
9
+
10
+ <h1>VPO: Aligning Text-to-Video Generation Models with Prompt Optimization</h1>
11
+
12
+ - **Repository:** https://github.com/thu-coai/VPO
13
+ - **Paper:** [VPO: Aligning Text-to-Video Generation Models with Prompt Optimization](https://huggingface.co/papers/2503.20491)
14
+ - **Data:** https://huggingface.co/datasets/CCCCCC/VPO
15
+
16
+ # VPO
17
+ VPO is a principled prompt optimization framework grounded in the principles of harmlessness, accuracy, and helpfulness.
18
+ VPO employs a two-stage process that first constructs a supervised fine-tuning dataset guided by safety and alignment, and then conducts preference learning with both text-level and video-level feedback. As a result, VPO preserves user intent while enhancing video quality and safety.
19
+
20
+ ## Model Details
21
+
22
+ ### Video Generation Model
23
+ This model is trained to optimize user prompt for CogVideoX-5B. [VPO-2B](https://huggingface.co/CCCCCC/VPO-2B) is for CogVideoX-2B.
24
+
25
+ ### Data
26
+ Our dataset can be found [here](https://huggingface.co/datasets/CCCCCC/VPO).
27
+
28
+ ### Language
29
+ English
30
+
31
+ ## Intended Use
32
+
33
+ ### Prompt Template
34
+ We adopt a prompt template as
35
+ ```
36
+ In this task, your goal is to expand the user's short query into a detailed and well-structured English prompt for generating short videos.
37
+
38
+ Please ensure that the generated video prompt adheres to the following principles:
39
+
40
+ 1. **Harmless**: The prompt must be safe, respectful, and free from any harmful, offensive, or unethical content.
41
+ 2. **Aligned**: The prompt should fully preserve the user's intent, incorporating all relevant details from the original query while ensuring clarity and coherence.
42
+ 3. **Helpful for High-Quality Video Generation**: The prompt should be descriptive and vivid to facilitate high-quality video creation. Keep the scene feasible and well-suited for a brief duration, avoiding unnecessary complexity or unrealistic elements not mentioned in the query.
43
+
44
+ User Query:{user prompt}
45
+
46
+ Video Prompt:
47
+ ```
48
+
49
+ ### Inference code
50
+ Here is an example code for inference:
51
+ ```python
52
+ from transformers import AutoModelForCausalLM, AutoTokenizer
53
+
54
+ model_path = ''
55
+
56
+ prompt_template = """In this task, your goal is to expand the user's short query into a detailed and well-structured English prompt for generating short videos.
57
+
58
+ Please ensure that the generated video prompt adheres to the following principles:
59
+
60
+ 1. **Harmless**: The prompt must be safe, respectful, and free from any harmful, offensive, or unethical content.
61
+ 2. **Aligned**: The prompt should fully preserve the user's intent, incorporating all relevant details from the original query while ensuring clarity and coherence.
62
+ 3. **Helpful for High-Quality Video Generation**: The prompt should be descriptive and vivid to facilitate high-quality video creation. Keep the scene feasible and well-suited for a brief duration, avoiding unnecessary complexity or unrealistic elements not mentioned in the query.
63
+
64
+ User Query:{}
65
+
66
+ Video Prompt:"""
67
+
68
+ device = 'cuda:0'
69
+ model = AutoModelForCausalLM.from_pretrained(model_path).half().eval().to(device)
70
+ # for 8bit
71
+ # model = AutoModelForCausalLM.from_pretrained(model_path, device_map=device, load_in_8bit=True)
72
+ tokenizer = AutoTokenizer.from_pretrained(model_path)
73
+
74
+ text = "a cute dog on the grass"
75
+ messgae = [{'role': 'user', 'content': prompt_template.format(text)}]
76
+
77
+ model_inputs = tokenizer.apply_chat_template(messgae, add_generation_prompt=True, tokenize=True, return_tensors="pt").to(device)
78
+ output = model.generate(model_inputs, max_new_tokens=1024, do_sample=True, top_p=1.0, temperature=0.7, num_beams=1)
79
+ resp = tokenizer.decode(output[0]).split('<|start_header_id|>assistant<|end_header_id|>')[1].split('<|eot_id|>')[0].strip()
80
+
81
+ print(resp)
82
+ ```
83
+ See our [Github Repo](https://github.com/thu-coai/VPO) for more detailed usage (e.g. Inference with Vllm).
84
+
85
+
86
+ <!-- ## Citation
87
+ If you find our model is useful in your work, please cite it with:
88
+ ```
89
+
90
  ``` -->