Commit History

Update Dockerfile
94a34ed
verified

yusufs commited on

Update Dockerfile
89c17e2
verified

yusufs commited on

Update Dockerfile
b983fc2
verified

yusufs commited on

Update Dockerfile
258633e
verified

yusufs commited on

Update Dockerfile
f6ddd47
verified

yusufs commited on

Update Dockerfile
bc37efd
verified

yusufs commited on

Update Dockerfile
04254aa
verified

yusufs commited on

Update Dockerfile
344825e
verified

yusufs commited on

--extra-index-url https://download.pytorch.org/whl/cu113
3044680
verified

yusufs commited on

nvidia/cuda:11.7.1-cudnn8-runtime-ubuntu22.04
053dce6
verified

yusufs commited on

debug
bf084e5
verified

yusufs commited on

python3 -m vllm.entrypoints.openai.api_server
39a1959
verified

yusufs commited on

fix(Dockerfile): use cmd single line
7b16e9f
verified

yusufs commited on

fix(Dockerfile): revision
4bd51f5
verified

yusufs commited on

fix(entrypoint) Dockerile
a48cf7b
verified

yusufs commited on

(feat:vllm serve) Dockerfile
4dd2e29
verified

yusufs commited on

feat(llama32-3b-instruct): change llama32-3b-instruct
46e845e

yusufs commited on

feat(sailor2-3b-chat): change readme
0f8187a

yusufs commited on

fix(float16): Bfloat16 is only supported on GPUs with compute capability of at least 8.0. Your Tesla T4 GPU has compute capability 7.5. You can use float16 instead by explicitly setting the`dtype` flag in CLI, for example: --dtype=half.
78963b9

yusufs commited on

fix(using sail/Sailor2-3B-Chat): sail/Sailor2-3B-Chat
8132d1f

yusufs commited on

docs(add-comment): add comment
22ac900

yusufs commited on

feat(runner.sh): DeepSeek-R1-Distill-Qwen-32B d66bcfc2f3fd52799f95943264f32ba15ca0003d
148829b

yusufs commited on

feat(runner.sh): --trust-remote-code
1530e6e

yusufs commited on

feat(runner.sh): add deepseek-ai/DeepSeek-R1 and deepseek-ai/DeepSeek-V3
57f9fa5

yusufs commited on

feat(Dockerfile): install gcc
e8cd3e0

yusufs commited on

feat(runner.sh): only enable prefix caching and disable log request
c0cde8e

yusufs commited on

feat(runner.sh): --enable-chunked-prefill and --enable-prefix-caching for faster generate
8c5a84b

yusufs commited on

fix(runner.sh): enable eager mode (disabling cuda graph)
5bd7bc7

yusufs commited on

fix(runner.sh): --enforce-eager not support values
cb15911

yusufs commited on

fix(runner.sh): explicitly disabling enforce_eager
266e7dd

yusufs commited on

fix(runner.sh): disable eager-loading so it using cuda graph (in order for parallel and faster processing)
6bb48e9

yusufs commited on

feat(runner.sh): add specific task and code revision
dc19c1d

yusufs commited on

feat(runner.sh): using MODEL_ID only
490e6a3

yusufs commited on

feat(runner.sh): using runner.sh to select llm in the run time
69c6372

yusufs commited on

feat(seed): Random seed for reproducibility.
d4b0956

yusufs commited on

feat(/app/run-llama.sh): /app/run-llama.sh
cab183f

yusufs commited on

feat(/app/run-sailor.sh): /app/run-sailor.sh
6d92442

yusufs commited on

feat(llama3.2): using llama model first for cost saving, until we want test sailor
92a4a4a

yusufs commited on

docs(sailor): add not about minimum resources of sailor
6dac0d0

yusufs commited on

feat(sailorchat): using sailor chat model
0f3cd25

yusufs commited on

feat(quantization): T4 not support bfloat16
0345d26

yusufs commited on

feat(llama3.2): run llama3.2 using bfloat16 with cache dtype fp8 with same model len
38d356a

yusufs commited on

feat(sail/Sailor-4B-Chat): try increase gpu-memory-utilization to 0.9 before changing the token length
4a9e328

yusufs commited on

feat(sailor-8B): using sailor-8b
811d851

yusufs commited on

feat(llama3.2): using Llama-3.2-3B-Instruct 0cb88a4f764b7a12671c53f0838cd831a0843b95
8b37c20

yusufs commited on

feat(llama3.2): change model to llama3.2
b826155

yusufs commited on

feat(dep_sizes.txt): removes dep_sizes.txt during build, it not needed
8e49b3b

yusufs commited on

feat(download_model.py): remove download_model.py during build, it causing big image size
c360fd3

yusufs commited on

docs(Dockerfile): add comment about estimated image size after compile
8dc2050

yusufs commited on

feat(add-model): always download model during build, it will be cached in the consecutive builds
8679a35

yusufs commited on