YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

AudioLLM Model

This repository contains the trained weights for an AudioLLM model, which combines LLaMA and Whisper models for audio-enhanced language understanding and generation.

Model Details

  • Base LLaMA model: meta-llama/Llama-3.2-3B-Instruct
  • Base Whisper model: openai/whisper-large-v3-turbo
  • LoRA rank: 32

Usage

You can use this model with the inference.py script available in this repository:

from inference import load_audio_llm, transcribe_and_generate

# Load the model
model = load_audio_llm(
    repo_id="cdreetz/audio-llama-v1.1",
    llama_path="meta-llama/Llama-3.2-3B-Instruct",
    whisper_path="openai/whisper-large-v3-turbo"
)

# Generate text from an audio file
response = transcribe_and_generate(
    model=model,
    audio_path="path/to/audio.wav",
    prompt="Describe what you hear in this audio:"
)

print(response)

For more details, see the included inference script.

Downloads last month
3
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support