AudioLLM Model

This repository contains the trained weights for an AudioLLM model, which combines LLaMA and Whisper models for audio-enhanced language understanding and generation.

Model Details

Base LLaMA model: meta-llama/Llama-3.2-3B-Instruct
Base Whisper model: openai/whisper-large-v3-turbo
LoRA rank: 32

Usage

You can use this model with the inference.py script available in this repository:

from inference import load_audio_llm, transcribe_and_generate

# Load the model
model = load_audio_llm(
    repo_id="cdreetz/audio-llama-v1.1",
    llama_path="meta-llama/Llama-3.2-3B-Instruct",
    whisper_path="openai/whisper-large-v3-turbo"
)

# Generate text from an audio file
response = transcribe_and_generate(
    model=model,
    audio_path="path/to/audio.wav",
    prompt="Describe what you hear in this audio:"
)

print(response)

For more details, see the included inference script.