ASTRA ATC Models

Fine-tuned ASR and LLM models for Singapore military air traffic control, built for the ASTRA training simulator. The two models work as a pipeline:

Audio  -->  ASR (Whisper)  -->  normalized text  -->  LLM (Qwen3)  -->  display text
            "camel climb flight level zero nine zero"     "CAMEL climb FL090"

Models

ASR/

Fine-tuned for Singapore military ATC speech. Uses CTranslate2 float16 format for fast inference with faster-whisper.

Metric	Value
WER	0.24%
Base model	`jacktol/whisper-large-v3-finetuned-for-ATC`
Size	2.9 GB
Training data	6,730 entries (6,680 synthetic + 50 real recordings)

LLM/

Converts normalized ASR output into structured ATC display text (uppercases callsigns, contracts flight levels, formats frequencies, etc.).

Metric	Value
Exact match	100% (161/161)
Base model	`unsloth/Qwen3-1.7B`
Size	3.3 GB
Training data	1,915 examples

Pipeline Architecture

In production, the models are chained with confidence-based routing:

ASR confidence >= 90% — rule-based formatter (23 deterministic rules, <1ms, 0 VRAM)
ASR confidence < 90% — LLM formatter (handles noisy/ambiguous ASR output better)

Audio --> VAD (Silero) --> ASR (Whisper ct2) --> Post-processing
                                                    |
                                          confidence >= 0.90?
                                          /                \
                                        yes                no
                                         |                  |
                                   Rule formatter      LLM formatter
                                         |                  |
                                         \                 /
                                          --> Display text

State	VRAM
ASR only (startup)	~2 GB
ASR + LLM (after first low-confidence call)	~5.5 GB

Domain

Singapore military ATC covering:

Airbases: Tengah (WSAT, runway 18/36), Paya Lebar (WSAP, runway 02/20)
Aircraft: F-16C/D, F-15SG, C-130
Approaches: ILS, GCA, PAR, TACAN, DVOR/DME, Visual Straight-in
60 callsigns: CAMEL, NINJA, BEETLE, TAIPAN, HONDA, etc.
Categories: departure, approach, handoff, maneuver, landing, emergency, ground, recovery, pilot reports, military-specific ops

Training History

ASR

Run	WER	Key Change
ct2_run5	0.48%	Initial fine-tune, pitch shift augmentation
ct2_run6	0.40%	Removed pitch shift, added BPF/silence padding, weight decay
ct2_run7	0.24%	Continued training, frozen encoder, +50 real recordings

LLM

Run	Accuracy	Key Change
llm_run3	98.1% (Qwen3-8B)	QLoRA 4-bit, 871 examples
llm_run4	100% (Qwen3-1.7B)	bf16 LoRA, 1,915 examples with ASR noise augmentation

Quick Start

ASR

from faster_whisper import WhisperModel

model = WhisperModel("./ASR", device="cuda", compute_type="float16")
segments, info = model.transcribe("audio.wav", language="en", beam_size=5)
text = " ".join(seg.text.strip() for seg in segments)

LLM

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("./LLM", torch_dtype="auto", device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("./LLM")

messages = [
    {"role": "system", "content": "Convert the following air traffic control transcript into structured display text."},
    {"role": "user", "content": "camel climb flight level zero nine zero"},
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True, enable_thinking=False)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=128, temperature=0.3, top_p=0.9, top_k=30)
result = tokenizer.decode(outputs[0][inputs.input_ids.shape[-1]:], skip_special_tokens=True)

Download

# Full repo
huggingface-cli download aether-raid/astra-atc-models --local-dir ./models

# ASR only
huggingface-cli download aether-raid/astra-atc-models --include "ASR/*" --local-dir ./models

# LLM only
huggingface-cli download aether-raid/astra-atc-models --include "LLM/*" --local-dir ./models

Downloads last month: -; Downloads are not tracked for this model. How to track