Submit media inputs to generate text and speech responses
SOTA real-time object detection model
An Agentic Framework with Tools for Complex Reasoning
Interact with AI using text, images, or audio
Magma-8B model for UI Agents
OmniParser, turn your LLM into GUI agent
@image @rAgent @web @text @tts1 @tts2 @3d
Wan: Open and Advanced Large-Scale Video Generative Models
Talk to OpenAI (Gradio UI)
Say computer (Gradio)
Talk to Gemini using Google's multimodal API