Problem with Text Embeddings Inference
Hi, I'm trying to load the model in te Text Embeddings Inference (https://huggingface.co/docs/text-embeddings-inference) but i got the following stack:
2024-02-21T17:38:04.364056Z INFO text_embeddings_router: router/src/main.rs:112: Args { model_id: "jin***/-****--e-es", revision: Some("main"), tokenization_workers: None, dtype: None, pooling: None, max_concurrent_requests: 4, max_batch_tokens: 16384, max_batch_requests: None, max_client_batch_size: 32, hf_api_token: Some("hf_U***********************HHh"), hostname: "0.0.0.0", port: 3001, uds_path: "/tmp/text-embeddings-inference-server", huggingface_hub_cache: Some("/data"), json_output: false, otlp_endpoint: None }
2024-02-21T17:38:04.364114Z INFO hf_hub: /root/.cargo/git/checkouts/hf-hub-1aadb4c6e2cbe1ba/b167f69/src/lib.rs:55: Token file not found "/root/.cache/huggingface/token"
2024-02-21T17:38:04.402470Z INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:9: Starting download
2024-02-21T17:38:04.644138Z WARN download_artifacts: text_embeddings_core::download: core/src/download.rs:18: model.safetensors
not found. Using pytorch_model.bin
instead. Model loading will be significantly slower.
2024-02-21T17:38:04.644157Z INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:26: Model artifacts downloaded in 241.687733ms
2024-02-21T17:38:04.704679Z INFO text_embeddings_core::tokenization: core/src/tokenization.rs:22: Starting 12 tokenization workers
2024-02-21T17:38:04.782218Z INFO text_embeddings_router: router/src/lib.rs:239: Starting model backend
2024-02-21T17:38:04.858720Z INFO text_embeddings_backend_candle: backends/candle/src/lib.rs:112: Starting JinaBert model on Cuda
Error: Could not create backend
Caused by:
Could not start backend: cannot retrieve non-contiguous tensors Layout { shape: [2, 768], stride: [768, 1], start_offset: 46891008 }
When I try the jinaai/jina-embeddings-v2-base-en on Text Embeddings Inference its load with no problems/errors, using the same docker init script:
#!/bin/sh
docker run --rm --network llm_network -e HF_HUB_ENABLE_HF_TRANSFER=0
--name emb_server --gpus device=0 --shm-size 1g -p 3001:3001
-v $(pwd)/datafolder:/data ghcr.io/huggingface/text-embeddings-inference:89-0.6
--model-id jinaai/jina-embeddings-v2-base-en --revision main --port 3001
--hf-api-token xxxxxxxxxx
--max-concurrent-requests 4 --hostname 0.0.0.0
Only changed jinaai/jina-embeddings-v2-base-en to jinaai/jina-embeddings-v2-base-es
also tried the jinaai/jina-embeddings-v2-base-de and works like a charm, so seems only the spanish version with the issue reported.
Could not start backend: cannot retrieve non-contiguous tensors Layout { shape: [2, 768], stride: [768, 1], start_offset: 46891008 }
sounds like an arch/layers trouble
@prudant can you remove the cache (~/.cache/huggingface/hub and ~/.cache/huggingface/modules`) and try again?
also can you please properly set huggingface token as env (make sure you "agreed" the repo license since it is a gated repo)?
2024-02-21T17:38:04.364114Z INFO hf_hub: /root/.cargo/git/checkouts/hf-hub-1aadb4c6e2cbe1ba/b167f69/src/lib.rs:55: **Token file not found **"/root/.cache/huggingface/token"
thanks will try, regards!