Spaces:
Running
on
T4
Running
on
T4
File size: 4,832 Bytes
f043430 a6fe46a f043430 1500d25 f043430 a6fe46a ddc67e6 f043430 313814b f043430 00ff8e1 f043430 ddc67e6 f043430 9bbc276 f043430 01eb1bb 39ee116 313814b 608e57c 313814b 608e57c 313814b 39ee116 f043430 39ee116 f043430 39ee116 49f71ac 39ee116 9bbc276 e2a7610 f043430 313814b f043430 c8f37a4 f043430 c8f37a4 f043430 c8f37a4 f043430 a6fe46a f043430 47627a9 c8f37a4 f043430 c8f37a4 f043430 5783d42 29d246f f043430 5783d42 f043430 313814b |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 |
# Faster Whisper Server
`faster-whisper-server` is an OpenAI API-compatible transcription server which uses [faster-whisper](https://github.com/SYSTRAN/faster-whisper) as its backend.
Features:
- GPU and CPU support.
- Easily deployable using Docker.
- **Configurable through environment variables (see [config.py](./src/faster_whisper_server/config.py))**.
- OpenAI API compatible.
- Streaming support (transcription is sent via [SSE](https://en.wikipedia.org/wiki/Server-sent_events) as the audio is transcribed. You don't need to wait for the audio to fully be transcribed before receiving it).
- Live transcription support (audio is sent via websocket as it's generated).
- Dynamic model loading / offloading. Just specify which model you want to use in the request and it will be loaded automatically. It will then be unloaded after a period of inactivity.
Please create an issue if you find a bug, have a question, or a feature suggestion.
## OpenAI API Compatibility ++
See [OpenAI API reference](https://platform.openai.com/docs/api-reference/audio) for more information.
- Audio file transcription via `POST /v1/audio/transcriptions` endpoint.
- Unlike OpenAI's API, `faster-whisper-server` also supports streaming transcriptions (and translations). This is useful for when you want to process large audio files and would rather receive the transcription in chunks as they are processed, rather than waiting for the whole file to be transcribed. It works similarly to chat messages when chatting with LLMs.
- Audio file translation via `POST /v1/audio/translations` endpoint.
- Live audio transcription via `WS /v1/audio/transcriptions` endpoint.
- LocalAgreement2 ([paper](https://aclanthology.org/2023.ijcnlp-demo.3.pdf) | [original implementation](https://github.com/ufal/whisper_streaming)) algorithm is used for live transcription.
- Only transcription of a single channel, 16000 sample rate, raw, 16-bit little-endian audio is supported.
## Quick Start
[Hugging Face Space](https://huggingface.co/spaces/Iatalking/fast-whisper-server)
![image](https://github.com/fedirz/faster-whisper-server/assets/76551385/6d215c52-ded5-41d2-89a5-03a6fd113aa0)
Using Docker
```bash
docker run --gpus=all --publish 8000:8000 --volume ~/.cache/huggingface:/root/.cache/huggingface fedirz/faster-whisper-server:latest-cuda
# or
docker run --publish 8000:8000 --volume ~/.cache/huggingface:/root/.cache/huggingface fedirz/faster-whisper-server:latest-cpu
```
Using Docker Compose
```bash
curl -sO https://raw.githubusercontent.com/fedirz/faster-whisper-server/master/compose.yaml
docker compose up --detach faster-whisper-server-cuda
# or
docker compose up --detach faster-whisper-server-cpu
```
Using Kubernetes: [tutorial](https://substratus.ai/blog/deploying-faster-whisper-on-k8s)
## Usage
If you are looking for a step-by-step walkthrough, check out [this](https://www.youtube.com/watch?app=desktop&v=vSN-oAl6LVs) YouTube video.
### OpenAI API CLI
```bash
export OPENAI_API_KEY="cant-be-empty"
export OPENAI_BASE_URL=http://localhost:8000/v1/
```
```bash
openai api audio.transcriptions.create -m Systran/faster-distil-whisper-large-v3 -f audio.wav --response-format text
openai api audio.translations.create -m Systran/faster-distil-whisper-large-v3 -f audio.wav --response-format verbose_json
```
### OpenAI API Python SDK
```python
from openai import OpenAI
client = OpenAI(api_key="cant-be-empty", base_url="http://localhost:8000/v1/")
audio_file = open("audio.wav", "rb")
transcript = client.audio.transcriptions.create(
model="Systran/faster-distil-whisper-large-v3", file=audio_file
)
print(transcript.text)
```
### cURL
```bash
# If `model` isn't specified, the default model is used
curl http://localhost:8000/v1/audio/transcriptions -F "[email protected]"
curl http://localhost:8000/v1/audio/transcriptions -F "[email protected]"
curl http://localhost:8000/v1/audio/transcriptions -F "[email protected]" -F "stream=true"
curl http://localhost:8000/v1/audio/transcriptions -F "[email protected]" -F "model=Systran/faster-distil-whisper-large-v3"
# It's recommended that you always specify the language as that will reduce the transcription time
curl http://localhost:8000/v1/audio/transcriptions -F "[email protected]" -F "language=en"
curl http://localhost:8000/v1/audio/translations -F "[email protected]"
```
### Live Transcription (using WebSocket)
From [live-audio](./examples/live-audio) example
https://github.com/fedirz/faster-whisper-server/assets/76551385/e334c124-af61-41d4-839c-874be150598f
[websocat](https://github.com/vi/websocat?tab=readme-ov-file#installation) installation is required.
Live transcription of audio data from a microphone.
```bash
ffmpeg -loglevel quiet -f alsa -i default -ac 1 -ar 16000 -f s16le - | websocat --binary ws://localhost:8000/v1/audio/transcriptions
```
|