YAML Metadata
Warning:
empty or missing yaml metadata in repo card
(https://huggingface.co/docs/hub/model-cards#model-card-metadata)
Synchformer Hugging Face Model
This repository contains a Synchformer model for audio-visual synchronization. The model predicts the offset between audio and video tracks.
Usage
import torch
import os
import sys
# Add the current directory to the path
current_dir = os.path.dirname(os.path.abspath(__file__))
if current_dir not in sys.path:
sys.path.insert(0, current_dir)
# Import the auto factory to register the model
import auto_factory
# Now you can use the model with transformers
from transformers import AutoConfig, AutoModel
# Load the model
model = AutoModel.from_pretrained("AmrMKayid/synchformer-hf")
model.to("cuda" if torch.cuda.is_available() else "cpu")
# Predict offset for a video
results = model.predict_offset(
"path/to/your/video.mp4",
offset_sec=0.0, # Ground truth offset (if known)
v_start_i_sec=0.0 # Start time in seconds for video
)
# Print results
print("\nPrediction Results:")
for pred in results["predictions"]:
print(f'p={pred["probability"]:.4f}, "{pred["offset_sec"]:.2f}" (class {pred["class_idx"]})')
Model Details
This model is based on the Synchformer architecture, which uses a transformer to predict the offset between audio and video tracks.
Requirements
- torch
- torchaudio
- torchvision
- omegaconf
- ffmpeg (for video processing)
- Downloads last month
- 2
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support
HF Inference deployability: The model has no library tag.