Hello everyone! With the rapid advancement of AI agent technology, two architectures have come into the spotlight: MCP (Model Context Protocol) and MCO (Model Context Open-json). Today, we’ll introduce the key features and differences of these two approaches.
MCP: The Traditional Approach 🏛️ Centralized Function Registry: All functions are hardcoded into the core system.
Static Function Definitions & Tight Coupling: New features require changes to the core application code, limiting scalability.
Monolithic Design: Complex deployment and version management can cause a single error to affect the whole system.
Code Example: '''py FUNCTION_REGISTRY = { "existing_function": existing_function, "new_function": new_function # Adding a new function } '''
MCO: A Revolutionary Approach 🆕 JSON-based Function Definitions: Function details are stored in external JSON files, enabling dynamic module loading.
Loose Coupling & Microservices: Each function can be developed, tested, and deployed as an independent module.
Flexible Scalability: Add new features by simply updating the JSON and module files, without modifying the core system.
JSON Example: [ { "name": "analyze_sentiment", "module_path": "nlp_tools", "func_name_in_module": "sentiment_analysis", "example_usage": "analyze_sentiment(text=\"I love this product!\")" } ]
Why MCO? 💡 Enhanced Development Efficiency: Developers can focus on their own modules with independent testing and deployment.
Simplified Error Management: Errors remain confined within their modules, enabling quick hotfixes.
Future-Proofing: With potential features like remote function calls (RPC), access control, auto-documentation, and a function marketplace, MCO paves the way for rapid innovation.
Practical Use & Community 🤝 The MCO implementation has been successfully tested on Vidraft’s LLM (based on Google Gemma-3)
FramePack from legendary lllyasviel full Windows local tutorial with a very advanced Gradio app to generate consistent videos from images with as long as 120 seconds and as low as 6 GB GPUs. This tutorial will show you step by step how to install and use FramePack locall with a very advanced Graido app. Moreover, I have published installers for cloud services such as RunPod and Massed Compute for those GPU poor and who wants to scale.
Packing Input Frame Context in Next-Frame Prediction Models for Video Generation FramePack, to train next-frame (or nextframe-section) prediction models for video generation. The FramePack compresses. Input frames to make the transformer context length a fixed number regardless of the video length.
multimodal > Moonshot AI released Kimi VL Thinking, first working open-source multimodal reasoning model and Kimi VL Instruct, both 16B MoEs with 3B active params (OS) > InternVL3 released based on Qwen2.5VL, 7 ckpts with various sizes (1B to 78B)
LLMs > NVIDIA released Llama-3_1-Nemotron-Ultra-253B-v1 an LLM built on Llama 405B for reasoning, chat and tool use > Agentica released DeepCoder-14B-Preview, fine-tuned version of DeepSeek-R1-Distilled-Qwen-14B on problem-test pairs, along with the compiled dataset > Zyphra/ZR1-1.5B is a new small reasoning LLM built on R1-Distill-1.5B (OS) > Skywork-OR1-32B-Preview is a new reasoning model by Skywork
Image Generation > HiDream releases three new models, HiDream I1 Dev, I1 Full, and I1 fast for image generation (OS)
Agents seem to be everywhere and this collection is for a deep dive into the theory and practice:
1. "Agents" Google's whitepaper by Julia Wiesinger, Patrick Marlow and Vladimir Vuskovic -> https://www.kaggle.com/whitepaper-agents Covers agents, their functions, tool use and how they differ from models
3. "AI Engineer Summit 2025: Agent Engineering" 8-hour video -> https://www.youtube.com/watch?v=D7BzTxVVMuw Experts' talks that share insights on the freshest Agent Engineering advancements, such as Google Deep Research, scaling tips and more
5. "Artificial Intelligence: Foundations of Computational Agents", 3rd Edition, book by David L. Poole and Alan K. Mackworth -> https://artint.info/3e/html/ArtInt3e.html Agents' architectures, how they learn, reason, plan and act with certainty and uncertainty
7. The Turing Post articles "AI Agents and Agentic Workflows" on Hugging Face -> @Kseniase We explore agentic workflows in detail and agents' building blocks, such as memory and knowledge
RAG techniques continuously evolve to enhance LLM response accuracy by retrieving relevant external data during generation. To keep up with current AI trends, new RAG types incorporate deep step-by-step reasoning, tree search, citations, multimodality and other effective techniques.
3. Chain-of-Retrieval Augmented Generation (CoRAG) -> Chain-of-Retrieval Augmented Generation (2501.14342) Retrieves information step-by-step and adjusts it, also deciding how much compute power to use at test time. If needed it reformulates queries.
GRPO has helped DeepSeek R1 to learn reasoning. Can it also help VLMs perform stronger for general computer vision tasks?
The answer is YES and it generalizes better than SFT. We trained Qwen 2.5 VL 3B on RefCOCO (a visual grounding task) and eval on RefCOCO Val and RefGTA (an OOD task).