Diangle commited on
Commit
66b35be
·
1 Parent(s): 7c9c3f9

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +29 -26
README.md CHANGED
@@ -11,6 +11,7 @@ pipeline_tag: text-to-video
11
 
12
  # Model Card for CLIP4Clip/WebVid-150k
13
  ## Model Details
 
14
  A CLIP4Clip video-text retrieval model trained on a subset of the WebVid dataset.
15
  The model and training method are described in the paper ["Clip4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval"](https://arxiv.org/pdf/2104.08860.pdf) by Lou et el, and implemented in the accompanying [GitHub repository](https://github.com/ArrowLuo/CLIP4Clip).
16
 
@@ -27,6 +28,33 @@ visual-temporal concepts from videos, thereby improving video-based searches.
27
  By using the WebVid dataset, the model's capabilities were enhanced even beyond those described in the paper, thanks to the large-scale and diverse nature of the dataset empowering the model's performance.
28
 
29
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
30
  ## Model Intended Use
31
 
32
  This model is intended for use in large scale video-text retrieval applications.
@@ -34,6 +62,7 @@ This model is intended for use in large scale video-text retrieval applications.
34
  To illustrate its functionality, refer to the accompanying [**Video Search Space**](https://huggingface.co/spaces/Diangle/Clip4Clip-webvid) which provides a search demonstration on a vast collection of approximately 1.5 million videos.
35
  This interactive demo showcases the model's capability to effectively retrieve videos based on text queries, highlighting its potential for handling substantial video datasets.
36
 
 
37
  ## Evaluations
38
 
39
  To evaluate the model's performance we used the last last 10,000 video clips and their accompanying text from the Webvid dataset.
@@ -58,32 +87,6 @@ For an elaborate description of the evaluation refer to the notebook
58
  <p>[1] For overall search acceleration capabilities, in order to boost you search application, please refer to searchium.ai</p>
59
  </div>
60
 
61
- ### How to use
62
- ### Extracting Text Embeddings:
63
-
64
- ```python
65
- import numpy as np
66
- import torch
67
- from transformers import CLIPTokenizer, CLIPTextModelWithProjection
68
-
69
-
70
- search_sentence = "a basketball player performing a slam dunk"
71
-
72
- model = CLIPTextModelWithProjection.from_pretrained("Diangle/clip4clip-webvid")
73
- tokenizer = CLIPTokenizer.from_pretrained("Diangle/clip4clip-webvid")
74
-
75
- inputs = tokenizer(text=search_sentence , return_tensors="pt")
76
- outputs = model(input_ids=inputs["input_ids"], attention_mask=inputs["attention_mask"])
77
-
78
- # Normalize embeddings for retrieval:
79
- final_output = outputs[0] / outputs[0].norm(dim=-1, keepdim=True)
80
- final_output = final_output.cpu().detach().numpy()
81
- print("sequence_output: ", sequence_output)
82
- ```
83
-
84
- ### Extracting Video Embeddings:
85
-
86
- Due to a moderate level of complexity in extracting video embeddings, an example usage with utility functions are provided in the additional notebook [GSI_VideoRetrieval_VideoEmbedding.ipynb](https://huggingface.co/Diangle/clip4clip-webvid/blob/main/Notebooks/GSI_VideoRetrieval_VideoEmbedding.ipynb).
87
 
88
  ## Acknowledgements
89
  Acknowledging Diana Mazenko of [Searchium](https://www.searchium.ai) for adapting and loading the model to Hugging Face, and for creating a Hugging Face [**SPACE**](https://huggingface.co/spaces/Diangle/Clip4Clip-webvid) for a large-scale video-search demo.
 
11
 
12
  # Model Card for CLIP4Clip/WebVid-150k
13
  ## Model Details
14
+
15
  A CLIP4Clip video-text retrieval model trained on a subset of the WebVid dataset.
16
  The model and training method are described in the paper ["Clip4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval"](https://arxiv.org/pdf/2104.08860.pdf) by Lou et el, and implemented in the accompanying [GitHub repository](https://github.com/ArrowLuo/CLIP4Clip).
17
 
 
28
  By using the WebVid dataset, the model's capabilities were enhanced even beyond those described in the paper, thanks to the large-scale and diverse nature of the dataset empowering the model's performance.
29
 
30
 
31
+ ### How to use
32
+ ### Extracting Text Embeddings:
33
+
34
+ ```python
35
+ import numpy as np
36
+ import torch
37
+ from transformers import CLIPTokenizer, CLIPTextModelWithProjection
38
+
39
+
40
+ search_sentence = "a basketball player performing a slam dunk"
41
+
42
+ model = CLIPTextModelWithProjection.from_pretrained("Diangle/clip4clip-webvid")
43
+ tokenizer = CLIPTokenizer.from_pretrained("Diangle/clip4clip-webvid")
44
+
45
+ inputs = tokenizer(text=search_sentence , return_tensors="pt")
46
+ outputs = model(input_ids=inputs["input_ids"], attention_mask=inputs["attention_mask"])
47
+
48
+ # Normalize embeddings for retrieval:
49
+ final_output = outputs[0] / outputs[0].norm(dim=-1, keepdim=True)
50
+ final_output = final_output.cpu().detach().numpy()
51
+ print("sequence_output: ", sequence_output)
52
+ ```
53
+
54
+ ### Extracting Video Embeddings:
55
+
56
+ An additional [notebook](https://huggingface.co/Diangle/clip4clip-webvid/blob/main/Notebooks/GSI_VideoRetrieval_VideoEmbedding.ipynb) is available that provides instructions on how to perform video embedding.
57
+
58
  ## Model Intended Use
59
 
60
  This model is intended for use in large scale video-text retrieval applications.
 
62
  To illustrate its functionality, refer to the accompanying [**Video Search Space**](https://huggingface.co/spaces/Diangle/Clip4Clip-webvid) which provides a search demonstration on a vast collection of approximately 1.5 million videos.
63
  This interactive demo showcases the model's capability to effectively retrieve videos based on text queries, highlighting its potential for handling substantial video datasets.
64
 
65
+
66
  ## Evaluations
67
 
68
  To evaluate the model's performance we used the last last 10,000 video clips and their accompanying text from the Webvid dataset.
 
87
  <p>[1] For overall search acceleration capabilities, in order to boost you search application, please refer to searchium.ai</p>
88
  </div>
89
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
90
 
91
  ## Acknowledgements
92
  Acknowledging Diana Mazenko of [Searchium](https://www.searchium.ai) for adapting and loading the model to Hugging Face, and for creating a Hugging Face [**SPACE**](https://huggingface.co/spaces/Diangle/Clip4Clip-webvid) for a large-scale video-search demo.