scb10x
/

llama-3-typhoon-v1.5-8b-audio-preview

Text Generation

feature-extraction

Model card Files Files and versions Community

llama-3-typhoon-v1.5-8b-audio-preview / README.md

potsawee's picture

Update README.md

e1fd6c2 verified 5 months ago

|

2.69 kB

	---
	library_name: transformers
	license: llama3
	language:
	- th
	- en
	pipeline_tag: text-generation
	---

	# Typhoon-Audio Preview

	<div align="center">
	<img src="https://i.postimg.cc/DycZ98w2/typhoon-audio.png" alt="typhoon-audio" style="width: 100%; max-width: 20cm; margin-left: 'auto'; margin-right:'auto'; display:'block'"/>
	</div>

	llama-3-typhoon-v1.5-8b-audio-preview is a 🇹🇭 Thai audio-language model. It supports both text and audio input modalities natively while the output is text. This version (August 2024) is our first audio-language model as a part of our multimodal effort, and it is a research preview version. The base language model is our [llama-3-typhoon-v1.5-8b-instruct](https://huggingface.co/scb10x/llama-3-typhoon-v1.5-8b-instruct).

	More details can be found in our [release blog]() and [technical report](). *To acknowledge Meta's effort in creating the foundation model and to comply with the license, we explicitly include "llama-3" in the model name.

	## Model Description

	- Model type: The LLM is based on Typhoon-1.5-8b-instruct, and the audio encoder is based on Whisper's encoder and BEATs.
	- Requirement: transformers 4.38.0 or newer.
	- Primary Language(s): Thai 🇹🇭 and English 🇬🇧
	- Demo: https://audio.opentyphoon.ai/
	- License: [Llama 3 Community License](https://llama.meta.com/llama3/license/)

	## Usage Example

	```python
	from transformers import AutoModel

	# Initialize from the trained model
	model = AutoModel.from_pretrained(
	"scb10x/llama-3-typhoon-v1.5-8b-audio-preview",
	torch_dtype=torch.float16,
	trust_remote_code=True
	)
	model.to("cuda")
	model.eval()

	# Run generation
	prompt_pattern="<\|begin_of_text\|><\|start_header_id\|>user<\|end_header_id\|>\n\n<Speech><SpeechHere></Speech> {}<\|eot_id\|><\|start_header_id\|>assistant<\|end_header_id\|>\n\n"
	response = model.generate(
	wav_path="path_to_your_audio.wav",
	prompt="transcribe this audio",
	prompt_pattern=prompt_pattern,
	do_sample=False,
	max_length=1200,
	repetition_penalty=1.1,
	num_beams=1,
	# temperature=0.4,
	# top_p=0.9,
	# streamer=streamer # supports TextIteratorStreamer
	)
	print(response)
	```

	## Evaluation Results

	## Acknowledgements
	In addition to common libraries and tools, we would like to thank the following projects for releasing model weights and code:
	- Training recipe: [SALMONN](https://github.com/bytedance/SALMONN) from ByteDance
	- Audio encoder: [BEATs]( https://github.com/microsoft/unilm/tree/master/beats) from Microsoft
	- Whisper encoder: [Fine-tuned Whisper](https://huggingface.co/biodatlab/whisper-th-large-v3-combined) from Biomedical and Data Lab @ Mahidol University