Text Classification
Transformers
PyTorch
bert
Inference Endpoints
youval commited on
Commit
3ac3c2c
1 Parent(s): caf4064

update model card

Browse files
Files changed (1) hide show
  1. README.md +21 -15
README.md CHANGED
@@ -8,8 +8,7 @@ language:
8
 
9
  # Model Card for `passage-ranker-v1-L-multilingual`
10
 
11
- This model is a passage ranker developed by Sinequa. It produces a relevance score given a query-passage pair and is
12
- used to order search results.
13
 
14
  Model name: `passage-ranker-v1-L-multilingual`
15
 
@@ -33,23 +32,32 @@ Note that the relevance score is computed as an average over 14 retrieval datase
33
 
34
  ## Inference Times
35
 
36
- | GPU | Batch size 32 |
37
- |:-----------|--------------:|
38
- | NVIDIA A10 | 83 ms |
39
- | NVIDIA T4 | 357 ms |
 
 
 
 
40
 
41
- The inference times only measure the time the model takes to process a single batch, it does not include pre- or
42
- post-processing steps like the tokenization.
43
 
44
- ## Requirements
45
-
46
- - Minimal Sinequa version: 11.10.0
47
- - GPU memory usage: 1130 MiB
48
 
49
  Note that GPU memory usage only includes how much GPU memory the actual model consumes on an NVIDIA T4 GPU with a batch
50
  size of 32. It does not include the fix amount of memory that is consumed by the ONNX Runtime upon initialization which
51
  can be around 0.5 to 1 GiB depending on the used GPU.
52
 
 
 
 
 
 
 
53
  ## Model Details
54
 
55
  ### Overview
@@ -92,9 +100,7 @@ To determine the relevance score, we averaged the results that we obtained when
92
  | TREC-COVID | 0.711 |
93
  | Webis-Touche-2020 | 0.334 |
94
 
95
- We evaluated the model on the datasets of the [MIRACL benchmark](https://github.com/project-miracl/miracl) to test its
96
- multilingual capacities. Note that not all training languages are part of the benchmark, so we only report the metrics
97
- for the existing languages.
98
 
99
  | Language | NDCG@10 |
100
  |:---------|--------:|
 
8
 
9
  # Model Card for `passage-ranker-v1-L-multilingual`
10
 
11
+ This model is a passage ranker developed by Sinequa. It produces a relevance score given a query-passage pair and is used to order search results.
 
12
 
13
  Model name: `passage-ranker-v1-L-multilingual`
14
 
 
32
 
33
  ## Inference Times
34
 
35
+ | GPU | Quantization type | Batch size 1 | Batch size 32 |
36
+ |:------------------------------------------|:------------------|---------------:|---------------:|
37
+ | NVIDIA A10 | FP16 | 2 ms | 31 ms |
38
+ | NVIDIA A10 | FP32 | 4 ms | 82 ms |
39
+ | NVIDIA T4 | FP16 | 3 ms | 65 ms |
40
+ | NVIDIA T4 | FP32 | 14 ms | 364 ms |
41
+ | NVIDIA L4 | FP16 | 2 ms | 38 ms |
42
+ | NVIDIA L4 | FP32 | 5 ms | 124 ms |
43
 
44
+ ## Gpu Memory usage
 
45
 
46
+ | Quantization type | Memory |
47
+ |:-------------------------------------------------|-----------:|
48
+ | FP16 | 550 MiB |
49
+ | FP32 | 1050 MiB |
50
 
51
  Note that GPU memory usage only includes how much GPU memory the actual model consumes on an NVIDIA T4 GPU with a batch
52
  size of 32. It does not include the fix amount of memory that is consumed by the ONNX Runtime upon initialization which
53
  can be around 0.5 to 1 GiB depending on the used GPU.
54
 
55
+ ## Requirements
56
+
57
+ - Minimal Sinequa version: 11.10.0
58
+ - Minimal Sinequa version for using FP16 models and GPUs with CUDA compute capability of 8.9+ (like NVIDIA L4): 11.11.0
59
+ - [Cuda compute capability](https://developer.nvidia.com/cuda-gpus): above 5.0 (above 6.0 for FP16 use)
60
+
61
  ## Model Details
62
 
63
  ### Overview
 
100
  | TREC-COVID | 0.711 |
101
  | Webis-Touche-2020 | 0.334 |
102
 
103
+ We evaluated the model on the datasets of the [MIRACL benchmark](https://github.com/project-miracl/miracl) to test its multilingual capacities. Note that not all training languages are part of the benchmark, so we only report the metrics for the existing languages.
 
 
104
 
105
  | Language | NDCG@10 |
106
  |:---------|--------:|