metadata

library_name: transformers
license: llama3
language:
  - th
  - en
pipeline_tag: text-generation

Typhoon-Audio Preview

llama-3-typhoon-v1.5-8b-audio-preview is a 🇹🇭 Thai audio-language model. It supports both text and audio input modalities natively while the output is text. This version (August 2024) is our first audio-language model as a part of our multimodal effort, and it is a research preview version. The base language model is our llama-3-typhoon-v1.5-8b-instruct.

More details can be found in our release blog and technical report. *To acknowledge Meta's effort in creating the foundation model and to comply with the license, we explicitly include "llama-3" in the model name.

Model Description

Model type: The LLM is based on Typhoon-1.5-8b-instruct, and the audio encoder is based on Whisper's encoder and BEATs.
Requirement: transformers 4.38.0 or newer.
Primary Language(s): Thai 🇹🇭 and English 🇬🇧
Demo: https://audio.opentyphoon.ai/
License: Llama 3 Community License

Usage Example

from transformers import AutoModel

# Initialize from the trained model
model = AutoModel.from_pretrained(
    "scb10x/llama-3-typhoon-v1.5-8b-audio-preview", 
    torch_dtype=torch.float16,
    trust_remote_code=True
)
model.to("cuda")
model.eval()

# Run generation
prompt_pattern="<|begin_of_text|><|start_header_id|>user<|end_header_id|>\n\n<Speech><SpeechHere></Speech> {}<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n"
response = model.generate(
    wav_path="path_to_your_audio.wav",
    prompt="transcribe this audio",
    prompt_pattern=prompt_pattern,
    do_sample=False,
    max_length=1200,
    repetition_penalty=1.1,
    num_beams=1,
    # temperature=0.4,
    # top_p=0.9,
    # streamer=streamer # supports TextIteratorStreamer
)
print(response)

Evaluation Results

Acknowledgements

In addition to common libraries and tools, we would like to thank the following projects for releasing model weights and code:

Training recipe: SALMONN from ByteDance
Audio encoder: BEATs from Microsoft
Whisper encoder: Fine-tuned Whisper from Biomedical and Data Lab @ Mahidol University