petermcaughan
commited on
Commit
•
cceea9d
1
Parent(s):
a44ba12
Update README.md
Browse files
README.md
CHANGED
@@ -35,14 +35,14 @@ Below is average latency of generating a token using a prompt of varying size us
|
|
35 |
|
36 |
| Prompt Length | Batch Size | PyTorch 2.1 torch.compile | ONNX Runtime CUDA |
|
37 |
|-------------|------------|----------------|-------------------|
|
38 |
-
|
|
39 |
-
| 256 | 1 |
|
40 |
-
| 1024 | 1 |
|
41 |
-
| 2048 | 1 |
|
42 |
-
|
|
43 |
-
| 256 | 4 |
|
44 |
-
| 1024 | 4 |
|
45 |
-
| 2048 | 4 | N/A |
|
46 |
|
47 |
## Usage Example
|
48 |
|
|
|
35 |
|
36 |
| Prompt Length | Batch Size | PyTorch 2.1 torch.compile | ONNX Runtime CUDA |
|
37 |
|-------------|------------|----------------|-------------------|
|
38 |
+
| 32 | 1 | 53.64ms | 15.68ms |
|
39 |
+
| 256 | 1 | 59.55ms | 26.05ms |
|
40 |
+
| 1024 | 1 | 89.82ms | 99.05ms |
|
41 |
+
| 2048 | 1 | 208.0ms | 227.0ms |
|
42 |
+
| 32 | 4 | 70.8ms | 19.62ms |
|
43 |
+
| 256 | 4 | 78.6ms | 81.29ms |
|
44 |
+
| 1024 | 4 | 373.7ms | 369.6ms |
|
45 |
+
| 2048 | 4 | N/A | 879.2ms |
|
46 |
|
47 |
## Usage Example
|
48 |
|