BuiDoan (BuiDoan)

upvoted a paper about 20 hours ago

Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention

Paper • 2502.11089 • Published Feb 16 • 155

liked 3 models 4 days ago

reacted to openfree's post with 🔥 4 days ago

Post

8313

Agentic AI Era: Analyzing MCP vs MCO 🚀

Hello everyone!
With the rapid advancement of AI agent technology, two architectures have come into the spotlight: MCP (Model Context Protocol) and MCO (Model Context Open-json). Today, we’ll introduce the key features and differences of these two approaches.

VIDraft/Agentic-AI-CHAT

MCP: The Traditional Approach 🏛️
Centralized Function Registry: All functions are hardcoded into the core system.

Static Function Definitions & Tight Coupling: New features require changes to the core application code, limiting scalability.

Monolithic Design: Complex deployment and version management can cause a single error to affect the whole system.

Code Example:
'''py
FUNCTION_REGISTRY = {
"existing_function": existing_function,
"new_function": new_function # Adding a new function
}
'''

MCO: A Revolutionary Approach 🆕
JSON-based Function Definitions: Function details are stored in external JSON files, enabling dynamic module loading.

Loose Coupling & Microservices: Each function can be developed, tested, and deployed as an independent module.

Flexible Scalability: Add new features by simply updating the JSON and module files, without modifying the core system.

JSON Example:
[
{
"name": "analyze_sentiment",
"module_path": "nlp_tools",
"func_name_in_module": "sentiment_analysis",
"example_usage": "analyze_sentiment(text=\"I love this product!\")"
}
]

Why MCO? 💡
Enhanced Development Efficiency: Developers can focus on their own modules with independent testing and deployment.

Simplified Error Management: Errors remain confined within their modules, enabling quick hotfixes.

Future-Proofing: With potential features like remote function calls (RPC), access control, auto-documentation, and a function marketplace, MCO paves the way for rapid innovation.

Practical Use & Community 🤝
The MCO implementation has been successfully tested on Vidraft’s LLM (based on Google Gemma-3)

reacted to MonsterMMORPG's post with 🔥 4 days ago

Post

2563

FramePack Full Tutorial: 1-Click to Install on Windows - Up to 120 Second Image-to-Videos with 6GB > https://youtu.be/HwMngohRmHg

Tutorial video : https://youtu.be/HwMngohRmHg

FramePack from legendary lllyasviel full Windows local tutorial with a very advanced Gradio app to generate consistent videos from images with as long as 120 seconds and as low as 6 GB GPUs. This tutorial will show you step by step how to install and use FramePack locall with a very advanced Graido app. Moreover, I have published installers for cloud services such as RunPod and Massed Compute for those GPU poor and who wants to scale.

🔗 Full Instructions, Installers and Links Shared Post (the one used in the tutorial) ⤵️
▶️ https://www.patreon.com/posts/click-to-open-post-used-in-tutorial-126855226

🔗 SECourses Official Discord 10500+ Members ⤵️
▶️ https://discord.com/servers/software-engineering-courses-secourses-772774097734074388

🔗 Stable Diffusion, FLUX, Generative AI Tutorials and Resources GitHub ⤵️
▶️ https://github.com/FurkanGozukara/Stable-Diffusion

🔗 SECourses Official Reddit - Stay Subscribed To Learn All The News and More ⤵️
▶️ https://www.reddit.com/r/SECourses/

🔗 MSI RTX 5090 TRIO FurMark Benchmarking + Overclocking + Noise Testing and Comparing with RTX 3090 TI ⤵️
▶️ https://youtu.be/uV3oqdILOmA

🔗 RTX 5090 Tested Against FLUX DEV, SD 3.5 Large, SD 3.5 Medium, SDXL, SD 1.5, AMD 9950X + RTX 3090 TI ⤵️
▶️ https://youtu.be/jHlGzaDLkto

Packing Input Frame Context in Next-Frame Prediction Models for Video Generation
FramePack, to train next-frame (or nextframe-section) prediction models for video generation. The FramePack compresses. Input frames to make the transformer context length a fixed number regardless of the video length.

Paper : https://lllyasviel.github.io/frame_pack_gitpage/pack.pdf

Project Page : https://github.com/lllyasviel/FramePack

updated a collection 5 days ago

Great paper

Collection

26 items • Updated 5 days ago

upvoted a paper 5 days ago

InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models

Paper • 2504.10479 • Published 7 days ago • 232

reacted to merve's post with 👍 7 days ago

Post

4074

sooo many open AI releases past week, let's summarize! 🤗
merve/april-11-releases-67fcd78be33d241c0977b9d2

multimodal
> Moonshot AI released Kimi VL Thinking, first working open-source multimodal reasoning model and Kimi VL Instruct, both 16B MoEs with 3B active params (OS)
> InternVL3 released based on Qwen2.5VL, 7 ckpts with various sizes (1B to 78B)

LLMs
> NVIDIA released Llama-3_1-Nemotron-Ultra-253B-v1 an LLM built on Llama 405B for reasoning, chat and tool use
> Agentica released DeepCoder-14B-Preview, fine-tuned version of DeepSeek-R1-Distilled-Qwen-14B on problem-test pairs, along with the compiled dataset
> Zyphra/ZR1-1.5B is a new small reasoning LLM built on R1-Distill-1.5B (OS)
> Skywork-OR1-32B-Preview is a new reasoning model by Skywork

Image Generation
> HiDream releases three new models, HiDream I1 Dev, I1 Full, and I1 fast for image generation (OS)

*OS ones have Apache 2.0 or MIT licenses

4 replies

·

liked 2 models 8 days ago

ByteDance/MegaTTS3

Text-to-Speech • Updated 18 days ago • 3.48k • 352

doannguyenmmo/VI-TEXT-TO-SPEECH

Text-to-Speech • Updated 23 days ago • 7 • 2

liked a model 28 days ago

deepseek-ai/DeepSeek-V3-0324

Text Generation • Updated 26 days ago • 245k • • 2.7k

upvoted a paper about 1 month ago

Transformers without Normalization

Paper • 2503.10622 • Published Mar 13 • 157

updated a collection about 1 month ago

Great paper

Collection

26 items • Updated 5 days ago

upvoted a paper about 1 month ago

SemViQA: A Semantic Question Answering System for Vietnamese Information Fact-Checking

Paper • 2503.00955 • Published Mar 2 • 27

updated a collection about 2 months ago

Great paper

Collection

26 items • Updated 5 days ago

upvoted a paper about 2 months ago

From Hours to Minutes: Lossless Acceleration of Ultra Long Sequence Generation up to 100K Tokens

Paper • 2502.18890 • Published Feb 26 • 28

reacted to Kseniase's post with 👍 about 2 months ago

Post

9703

8 Free Sources about AI Agents:

Agents seem to be everywhere and this collection is for a deep dive into the theory and practice:

1. "Agents" Google's whitepaper by Julia Wiesinger, Patrick Marlow and Vladimir Vuskovic -> https://www.kaggle.com/whitepaper-agents
Covers agents, their functions, tool use and how they differ from models

2. "Agents in the Long Game of AI. Computational Cognitive Modeling for Trustworthy, Hybrid AI" book by Marjorie McShane, Sergei Nirenburg, and Jesse English -> https://direct.mit.edu/books/oa-monograph/5833/Agents-in-the-Long-Game-of-AIComputational
Explores building AI agents, using Hybrid AI, that combines ML with knowledge-based reasoning

3. "AI Engineer Summit 2025: Agent Engineering" 8-hour video -> https://www.youtube.com/watch?v=D7BzTxVVMuw
Experts' talks that share insights on the freshest Agent Engineering advancements, such as Google Deep Research, scaling tips and more

4. AI Agents Course from Hugging Face -> https://huggingface.co/learn/agents-course/en/unit0/introduction
Agents' theory and practice to learn how to build them using top libraries and tools

5. "Artificial Intelligence: Foundations of Computational Agents", 3rd Edition, book by David L. Poole and Alan K. Mackworth -> https://artint.info/3e/html/ArtInt3e.html
Agents' architectures, how they learn, reason, plan and act with certainty and uncertainty

6. "Intelligent Agents: Theory and Practice" book by Michael Wooldridge -> https://www.cs.ox.ac.uk/people/michael.wooldridge/pubs/ker95/ker95-html.html
A fascinating option to dive into how agents were seen in 1995 and explore their theory, architectures and agent languages

7. The Turing Post articles "AI Agents and Agentic Workflows" on Hugging Face -> @Kseniase
We explore agentic workflows in detail and agents' building blocks, such as memory and knowledge

8. Our collection "8 Free Sources to Master Building AI Agents" -> https://www.turingpost.com/p/building-ai-agents-sources

5 replies

·

reacted to Kseniase's post with 🔥 2 months ago

Post

7869

8 New Types of RAG

RAG techniques continuously evolve to enhance LLM response accuracy by retrieving relevant external data during generation. To keep up with current AI trends, new RAG types incorporate deep step-by-step reasoning, tree search, citations, multimodality and other effective techniques.

Here's a list of 8 latest RAG advancements:

1. DeepRAG -> DeepRAG: Thinking to Retrieval Step by Step for Large Language Models (2502.01142)
Models retrieval-augmented reasoning as a Markov Decision Process, enabling strategic retrieval. It dynamically decides when to retrieve external knowledge and when rely on parametric reasoning.

2. RealRAG -> RealRAG: Retrieval-augmented Realistic Image Generation via Self-reflective Contrastive Learning (2502.00848)
Enhances novel object generation by retrieving real-world images and using self-reflective contrastive learning to fill knowledge gap, improve realism and reduce distortions.

3. Chain-of-Retrieval Augmented Generation (CoRAG) -> Chain-of-Retrieval Augmented Generation (2501.14342)
Retrieves information step-by-step and adjusts it, also deciding how much compute power to use at test time. If needed it reformulates queries.

4. VideoRAG -> VideoRAG: Retrieval-Augmented Generation over Video Corpus (2501.05874)
Enables unlimited-length video processing, using dual-channel architecture that integrates graph-based textual grounding and multi-modal context encoding.

5. CFT-RAG -> CFT-RAG: An Entity Tree Based Retrieval Augmented Generation Algorithm With Cuckoo Filter (2501.15098)
A tree-RAG acceleration method uses an improved Cuckoo Filter to optimize entity localization, enabling faster retrieval.

6. Contextualized Graph RAG (CG-RAG) -> CG-RAG: Research Question Answering by Citation Graph Retrieval-Augmented LLMs (2501.15067)
Uses Lexical-Semantic Graph Retrieval (LeSeGR) to integrate sparse and dense signals within graph structure and capture citation relationships

7. GFM-RAG -> GFM-RAG: Graph Foundation Model for Retrieval Augmented Generation (2502.01113)
A graph foundation model that uses a graph neural network to refine query-knowledge connections

8. URAG -> URAG: Implementing a Unified Hybrid RAG for Precise Answers in University Admission Chatbots -- A Case Study at HCMUT (2501.16276)
A hybrid system combining rule-based and RAG methods to improve lightweight LLMs for educational chatbots

1 reply

·

reacted to tianchez's post with 👍 2 months ago

Post

4319

Introducing VLM-R1!

GRPO has helped DeepSeek R1 to learn reasoning. Can it also help VLMs perform stronger for general computer vision tasks?

The answer is YES and it generalizes better than SFT. We trained Qwen 2.5 VL 3B on RefCOCO (a visual grounding task) and eval on RefCOCO Val and RefGTA (an OOD task).

https://github.com/om-ai-lab/VLM-R1

3 replies

·

BuiDoan

AI & ML interests

Recent Activity

Organizations

BuiDoan's activity

Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention

ibm-granite/granite-speech-3.3-8b

moonshotai/Kimi-VL-A3B-Thinking

THUDM/GLM-Z1-9B-0414

Great paper

InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models

ByteDance/MegaTTS3

doannguyenmmo/VI-TEXT-TO-SPEECH

deepseek-ai/DeepSeek-V3-0324

Transformers without Normalization

Great paper

SemViQA: A Semantic Question Answering System for Vietnamese Information Fact-Checking

Great paper

From Hours to Minutes: Lossless Acceleration of Ultra Long Sequence Generation up to 100K Tokens