ASTRA ATC Models
Fine-tuned ASR and LLM models for Singapore military air traffic control, built for the ASTRA training simulator. The two models work as a pipeline:
Audio --> ASR (Whisper) --> normalized text --> LLM (Qwen3) --> display text
"camel climb flight level zero nine zero" "CAMEL climb FL090"
Models
ASR/
Fine-tuned for Singapore military ATC speech. Uses CTranslate2 float16 format for fast inference with faster-whisper.
| Metric | Value |
|---|---|
| WER | 0.24% |
| Base model | jacktol/whisper-large-v3-finetuned-for-ATC |
| Size | 2.9 GB |
| Training data | 6,730 entries (6,680 synthetic + 50 real recordings) |
LLM/
Converts normalized ASR output into structured ATC display text (uppercases callsigns, contracts flight levels, formats frequencies, etc.).
| Metric | Value |
|---|---|
| Exact match | 100% (161/161) |
| Base model | unsloth/Qwen3-1.7B |
| Size | 3.3 GB |
| Training data | 1,915 examples |
Pipeline Architecture
In production, the models are chained with confidence-based routing:
- ASR confidence >= 90% โ rule-based formatter (23 deterministic rules, <1ms, 0 VRAM)
- ASR confidence < 90% โ LLM formatter (handles noisy/ambiguous ASR output better)
Audio --> VAD (Silero) --> ASR (Whisper ct2) --> Post-processing
|
confidence >= 0.90?
/ \
yes no
| |
Rule formatter LLM formatter
| |
\ /
--> Display text
| State | VRAM |
|---|---|
| ASR only (startup) | ~2 GB |
| ASR + LLM (after first low-confidence call) | ~5.5 GB |
Domain
Singapore military ATC covering:
- Airbases: Tengah (WSAT, runway 18/36), Paya Lebar (WSAP, runway 02/20)
- Aircraft: F-16C/D, F-15SG, C-130
- Approaches: ILS, GCA, PAR, TACAN, DVOR/DME, Visual Straight-in
- 60 callsigns: CAMEL, NINJA, BEETLE, TAIPAN, HONDA, etc.
- Categories: departure, approach, handoff, maneuver, landing, emergency, ground, recovery, pilot reports, military-specific ops
Training History
ASR
| Run | WER | Key Change |
|---|---|---|
| ct2_run5 | 0.48% | Initial fine-tune, pitch shift augmentation |
| ct2_run6 | 0.40% | Removed pitch shift, added BPF/silence padding, weight decay |
| ct2_run7 | 0.24% | Continued training, frozen encoder, +50 real recordings |
LLM
| Run | Accuracy | Key Change |
|---|---|---|
| llm_run3 | 98.1% (Qwen3-8B) | QLoRA 4-bit, 871 examples |
| llm_run4 | 100% (Qwen3-1.7B) | bf16 LoRA, 1,915 examples with ASR noise augmentation |
Quick Start
ASR
from faster_whisper import WhisperModel
model = WhisperModel("./ASR", device="cuda", compute_type="float16")
segments, info = model.transcribe("audio.wav", language="en", beam_size=5)
text = " ".join(seg.text.strip() for seg in segments)
LLM
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("./LLM", torch_dtype="auto", device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("./LLM")
messages = [
{"role": "system", "content": "Convert the following air traffic control transcript into structured display text."},
{"role": "user", "content": "camel climb flight level zero nine zero"},
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True, enable_thinking=False)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=128, temperature=0.3, top_p=0.9, top_k=30)
result = tokenizer.decode(outputs[0][inputs.input_ids.shape[-1]:], skip_special_tokens=True)
Download
# Full repo
huggingface-cli download aether-raid/astra-atc-models --local-dir ./models
# ASR only
huggingface-cli download aether-raid/astra-atc-models --include "ASR/*" --local-dir ./models
# LLM only
huggingface-cli download aether-raid/astra-atc-models --include "LLM/*" --local-dir ./models