YAML Metadata
Warning:
empty or missing yaml metadata in repo card
(https://huggingface.co/docs/hub/model-cards#model-card-metadata)
AudioLLM Model
This repository contains the trained weights for an AudioLLM model, which combines LLaMA and Whisper models for audio-enhanced language understanding and generation.
Model Details
- Base LLaMA model:
meta-llama/Llama-3.2-3B-Instruct
- Base Whisper model:
openai/whisper-large-v3-turbo
- LoRA rank: 32
Usage
You can use this model with the inference.py
script available in this repository:
from inference import load_audio_llm, transcribe_and_generate
# Load the model
model = load_audio_llm(
repo_id="cdreetz/audio-llama-v1.1",
llama_path="meta-llama/Llama-3.2-3B-Instruct",
whisper_path="openai/whisper-large-v3-turbo"
)
# Generate text from an audio file
response = transcribe_and_generate(
model=model,
audio_path="path/to/audio.wav",
prompt="Describe what you hear in this audio:"
)
print(response)
For more details, see the included inference script.
- Downloads last month
- 3
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support
HF Inference deployability: The model has no library tag.