Ksenia Se

Kseniase

AI & ML interests

None yet

Recent Activity

Organizations

Turing Post's profile picture Journalists on Hugging Face's profile picture Social Post Explorers's profile picture Hugging Face Discord Community's profile picture Sandbox's profile picture

Kseniase's activity

reacted to their post with βž•πŸ‘ about 9 hours ago
view post
Post
3707
8 Free Sources about AI Agents:

Agents seem to be everywhere and this collection is for a deep dive into the theory and practice:

1. "Agents" Google's whitepaper by Julia Wiesinger, Patrick Marlow and Vladimir Vuskovic -> https://www.kaggle.com/whitepaper-agents
Covers agents, their functions, tool use and how they differ from models

2. "Agents in the Long Game of AI. Computational Cognitive Modeling for Trustworthy, Hybrid AI" book by Marjorie McShane, Sergei Nirenburg, and Jesse English -> https://direct.mit.edu/books/oa-monograph/5833/Agents-in-the-Long-Game-of-AIComputational
Explores building AI agents, using Hybrid AI, that combines ML with knowledge-based reasoning

3. "AI Engineer Summit 2025: Agent Engineering" 8-hour video -> https://www.youtube.com/watch?v=D7BzTxVVMuw
Experts' talks that share insights on the freshest Agent Engineering advancements, such as Google Deep Research, scaling tips and more

4. AI Agents Course from Hugging Face -> https://huggingface.co/learn/agents-course/en/unit0/introduction
Agents' theory and practice to learn how to build them using top libraries and tools

5. "Artificial Intelligence: Foundations of Computational Agents", 3rd Edition, book by David L. Poole and Alan K. Mackworth -> https://artint.info/3e/html/ArtInt3e.html
Agents' architectures, how they learn, reason, plan and act with certainty and uncertainty

6. "Intelligent Agents: Theory and Practice" book by Michael Wooldridge -> https://www.cs.ox.ac.uk/people/michael.wooldridge/pubs/ker95/ker95-html.html
A fascinating option to dive into how agents were seen in 1995 and explore their theory, architectures and agent languages

7. The Turing Post articles "AI Agents and Agentic Workflows" on Hugging Face -> https://huggingface.co/Kseniase
We explore agentic workflows in detail and agents' building blocks, such as memory and knowledge

8. Our collection "8 Free Sources to Master Building AI Agents" -> https://www.turingpost.com/p/building-ai-agents-sources
Β·
posted an update 1 day ago
view post
Post
3707
8 Free Sources about AI Agents:

Agents seem to be everywhere and this collection is for a deep dive into the theory and practice:

1. "Agents" Google's whitepaper by Julia Wiesinger, Patrick Marlow and Vladimir Vuskovic -> https://www.kaggle.com/whitepaper-agents
Covers agents, their functions, tool use and how they differ from models

2. "Agents in the Long Game of AI. Computational Cognitive Modeling for Trustworthy, Hybrid AI" book by Marjorie McShane, Sergei Nirenburg, and Jesse English -> https://direct.mit.edu/books/oa-monograph/5833/Agents-in-the-Long-Game-of-AIComputational
Explores building AI agents, using Hybrid AI, that combines ML with knowledge-based reasoning

3. "AI Engineer Summit 2025: Agent Engineering" 8-hour video -> https://www.youtube.com/watch?v=D7BzTxVVMuw
Experts' talks that share insights on the freshest Agent Engineering advancements, such as Google Deep Research, scaling tips and more

4. AI Agents Course from Hugging Face -> https://huggingface.co/learn/agents-course/en/unit0/introduction
Agents' theory and practice to learn how to build them using top libraries and tools

5. "Artificial Intelligence: Foundations of Computational Agents", 3rd Edition, book by David L. Poole and Alan K. Mackworth -> https://artint.info/3e/html/ArtInt3e.html
Agents' architectures, how they learn, reason, plan and act with certainty and uncertainty

6. "Intelligent Agents: Theory and Practice" book by Michael Wooldridge -> https://www.cs.ox.ac.uk/people/michael.wooldridge/pubs/ker95/ker95-html.html
A fascinating option to dive into how agents were seen in 1995 and explore their theory, architectures and agent languages

7. The Turing Post articles "AI Agents and Agentic Workflows" on Hugging Face -> https://huggingface.co/Kseniase
We explore agentic workflows in detail and agents' building blocks, such as memory and knowledge

8. Our collection "8 Free Sources to Master Building AI Agents" -> https://www.turingpost.com/p/building-ai-agents-sources
Β·
reacted to their post with πŸ˜ŽπŸ‘πŸš€πŸ”₯ 4 days ago
view post
Post
3140
8 New Applications of Test-Time Scaling

We've noticed a huge interest in test-time scaling (TTS), so we decided to explore this concept further. Test-time compute (TTC) refers to the amount of computational power used by an AI model when generating a response. Many researchers are now focused on scaling TTC, as it enables slow, deep "thinking" and step-by-step reasoning, which improves overall models' performance.

Here are 8 fresh studies on test-time scaling:

1. Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach (2502.05171)
Introduces an LM that scales TTC by reasoning in latent space instead of generating more tokens with no special training. Here, a recurrent block to processes information iteratively.

2. Generating Symbolic World Models via Test-time Scaling of Large Language Models (2502.04728)
Shows how TTS is applied to enhance model's Planning Domain Definition Language (PDDL) reasoning capabilities, which can be used to generate a symbolic world model.

3. Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling (2502.06703)
Analyzes optimal TTS strategies and shows how small models can outperform much larger ones.

4. Llasa: Scaling Train-Time and Inference-Time Compute for Llama-based Speech Synthesis (2502.04128)
Shows how TTS improves expressiveness, timbre consistency and accuracy in speech synthesis with Llasa framework. It also dives into benefits of scaling train-time compute.

5. Rethinking Fine-Tuning when Scaling Test-Time Compute: Limiting Confidence Improves Mathematical Reasoning (2502.07154)
Suggests a modified training loss for better reasoning of LLMs when scaling TTC.

6. Adaptive Graph of Thoughts: Test-Time Adaptive Reasoning Unifying Chain, Tree, and Graph Structures (2502.05078)
Unifies the strengths of chain, tree, and graph paradigms into one framework that expands reasoning only on necessary subproblems.

7. Sample, Scrutinize and Scale: Effective Inference-Time Search by Scaling Verification (2502.01839)
Explores scaling trends of self-verification and how to improve its capabilities with TTC.

8. CodeMonkeys: Scaling Test-Time Compute for Software Engineering (2501.14723)
Explores how scaling serial compute (iterations) and parallel compute (trajectories), can improve accuracy in real-world software engineering issues.

Also, explore our article about TTS for more -> https://huggingface.co/blog/Kseniase/testtimecompute
  • 1 reply
Β·
posted an update 8 days ago
view post
Post
3140
8 New Applications of Test-Time Scaling

We've noticed a huge interest in test-time scaling (TTS), so we decided to explore this concept further. Test-time compute (TTC) refers to the amount of computational power used by an AI model when generating a response. Many researchers are now focused on scaling TTC, as it enables slow, deep "thinking" and step-by-step reasoning, which improves overall models' performance.

Here are 8 fresh studies on test-time scaling:

1. Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach (2502.05171)
Introduces an LM that scales TTC by reasoning in latent space instead of generating more tokens with no special training. Here, a recurrent block to processes information iteratively.

2. Generating Symbolic World Models via Test-time Scaling of Large Language Models (2502.04728)
Shows how TTS is applied to enhance model's Planning Domain Definition Language (PDDL) reasoning capabilities, which can be used to generate a symbolic world model.

3. Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling (2502.06703)
Analyzes optimal TTS strategies and shows how small models can outperform much larger ones.

4. Llasa: Scaling Train-Time and Inference-Time Compute for Llama-based Speech Synthesis (2502.04128)
Shows how TTS improves expressiveness, timbre consistency and accuracy in speech synthesis with Llasa framework. It also dives into benefits of scaling train-time compute.

5. Rethinking Fine-Tuning when Scaling Test-Time Compute: Limiting Confidence Improves Mathematical Reasoning (2502.07154)
Suggests a modified training loss for better reasoning of LLMs when scaling TTC.

6. Adaptive Graph of Thoughts: Test-Time Adaptive Reasoning Unifying Chain, Tree, and Graph Structures (2502.05078)
Unifies the strengths of chain, tree, and graph paradigms into one framework that expands reasoning only on necessary subproblems.

7. Sample, Scrutinize and Scale: Effective Inference-Time Search by Scaling Verification (2502.01839)
Explores scaling trends of self-verification and how to improve its capabilities with TTC.

8. CodeMonkeys: Scaling Test-Time Compute for Software Engineering (2501.14723)
Explores how scaling serial compute (iterations) and parallel compute (trajectories), can improve accuracy in real-world software engineering issues.

Also, explore our article about TTS for more -> https://huggingface.co/blog/Kseniase/testtimecompute
  • 1 reply
Β·
reacted to their post with πŸš€πŸ€—πŸ”₯ 12 days ago
view post
Post
7663
8 New Types of RAG

RAG techniques continuously evolve to enhance LLM response accuracy by retrieving relevant external data during generation. To keep up with current AI trends, new RAG types incorporate deep step-by-step reasoning, tree search, citations, multimodality and other effective techniques.

Here's a list of 8 latest RAG advancements:

1. DeepRAG -> DeepRAG: Thinking to Retrieval Step by Step for Large Language Models (2502.01142)
Models retrieval-augmented reasoning as a Markov Decision Process, enabling strategic retrieval. It dynamically decides when to retrieve external knowledge and when rely on parametric reasoning.

2. RealRAG -> RealRAG: Retrieval-augmented Realistic Image Generation via Self-reflective Contrastive Learning (2502.00848)
EnhancesΒ  novel object generation by retrieving real-world images and using self-reflective contrastive learning to fill knowledge gap, improve realism and reduce distortions.

3. Chain-of-Retrieval Augmented Generation (CoRAG) -> Chain-of-Retrieval Augmented Generation (2501.14342)
Retrieves information step-by-step and adjusts it, also deciding how much compute power to use at test time. If needed it reformulates queries.

4. VideoRAG -> VideoRAG: Retrieval-Augmented Generation over Video Corpus (2501.05874)
Enables unlimited-length video processing, using dual-channel architecture that integrates graph-based textual grounding and multi-modal context encoding.

5. CFT-RAG ->Β  CFT-RAG: An Entity Tree Based Retrieval Augmented Generation Algorithm With Cuckoo Filter (2501.15098)
A tree-RAG acceleration method uses an improved Cuckoo Filter to optimize entity localization, enabling faster retrieval.

6. Contextualized Graph RAG (CG-RAG) -> CG-RAG: Research Question Answering by Citation Graph Retrieval-Augmented LLMs (2501.15067)
Uses Lexical-Semantic Graph Retrieval (LeSeGR) to integrate sparse and dense signals within graph structure and capture citation relationships

7. GFM-RAG -> GFM-RAG: Graph Foundation Model for Retrieval Augmented Generation (2502.01113)
A graph foundation model that uses a graph neural network to refine query-knowledge connections

8. URAG -> URAG: Implementing a Unified Hybrid RAG for Precise Answers in University Admission Chatbots -- A Case Study at HCMUT (2501.16276)
A hybrid system combining rule-based and RAG methods to improve lightweight LLMs for educational chatbots
  • 1 reply
Β·
replied to their post 14 days ago
view reply

Other important RAG advancements:

  • SafeRAG (a benchmark) -> https://huggingface.co/papers/2501.18636
    Establishes a security benchmark revealing how RAG systems are vulnerable to attacks like adversarial data injection, inter-context conflicts, and soft ad poisoning. Evaluates weaknesses in 14 RAG components, emphasizing the need for better filtering and security measures.

  • Topic-FlipRAG: Adversarial Opinion Manipulation -> https://huggingface.co/papers/2502.01386
    Demonstrates a two-stage adversarial attack that manipulates RAG-generated opinions on sensitive topics. Alters retrieval rankings and LLM reasoning to subtly flip the stance of generated answers, exposing the difficulty of mitigating semantic-level manipulation.

  • Experiments with LLMs on RAG for Closed-Source Simulation Software -> https://huggingface.co/papers/2502.03916
    Tests how RAG can support proprietary software by injecting relevant documentation dynamically. Shows that retrieval helps mitigate hallucinations in closed-source contexts, though some knowledge gaps remain, necessitating further improvements.

  • Health-RAG -> https://huggingface.co/papers/2502.04666
    Focuses on medical information retrieval by introducing a three-stage pipeline: retrieve, generate a reference summary (GenText), and re-rank based on factual alignment. Ensures accurate, evidence-backed health answers while mitigating misinformation risks.

posted an update 15 days ago
view post
Post
7663
8 New Types of RAG

RAG techniques continuously evolve to enhance LLM response accuracy by retrieving relevant external data during generation. To keep up with current AI trends, new RAG types incorporate deep step-by-step reasoning, tree search, citations, multimodality and other effective techniques.

Here's a list of 8 latest RAG advancements:

1. DeepRAG -> DeepRAG: Thinking to Retrieval Step by Step for Large Language Models (2502.01142)
Models retrieval-augmented reasoning as a Markov Decision Process, enabling strategic retrieval. It dynamically decides when to retrieve external knowledge and when rely on parametric reasoning.

2. RealRAG -> RealRAG: Retrieval-augmented Realistic Image Generation via Self-reflective Contrastive Learning (2502.00848)
EnhancesΒ  novel object generation by retrieving real-world images and using self-reflective contrastive learning to fill knowledge gap, improve realism and reduce distortions.

3. Chain-of-Retrieval Augmented Generation (CoRAG) -> Chain-of-Retrieval Augmented Generation (2501.14342)
Retrieves information step-by-step and adjusts it, also deciding how much compute power to use at test time. If needed it reformulates queries.

4. VideoRAG -> VideoRAG: Retrieval-Augmented Generation over Video Corpus (2501.05874)
Enables unlimited-length video processing, using dual-channel architecture that integrates graph-based textual grounding and multi-modal context encoding.

5. CFT-RAG ->Β  CFT-RAG: An Entity Tree Based Retrieval Augmented Generation Algorithm With Cuckoo Filter (2501.15098)
A tree-RAG acceleration method uses an improved Cuckoo Filter to optimize entity localization, enabling faster retrieval.

6. Contextualized Graph RAG (CG-RAG) -> CG-RAG: Research Question Answering by Citation Graph Retrieval-Augmented LLMs (2501.15067)
Uses Lexical-Semantic Graph Retrieval (LeSeGR) to integrate sparse and dense signals within graph structure and capture citation relationships

7. GFM-RAG -> GFM-RAG: Graph Foundation Model for Retrieval Augmented Generation (2502.01113)
A graph foundation model that uses a graph neural network to refine query-knowledge connections

8. URAG -> URAG: Implementing a Unified Hybrid RAG for Precise Answers in University Admission Chatbots -- A Case Study at HCMUT (2501.16276)
A hybrid system combining rule-based and RAG methods to improve lightweight LLMs for educational chatbots
  • 1 reply
Β·
posted an update 22 days ago
view post
Post
4868
8 Free Sources on Reinforcement Learning

With the phenomenon of DeepSeek-R1's top reasoning capabilities, we all saw the true power of RL. At its core, RL is a type of machine learning where a model/agent learns to make decisions by interacting with an environment to maximize a reward. RL learns through trial and error, receiving feedback in the form of rewards or penalties.

Here's a list of free sources that will help you dive into RL and how to use it:

1. "Reinforcement Learning: An Introduction" book by Richard S. Sutton and Andrew G. Barto -> https://web.stanford.edu/class/psych209/Readings/SuttonBartoIPRLBook2ndEd.pdf

2. Hugging Face Deep Reinforcement Learning Course -> https://huggingface.co/learn/deep-rl-course/unit0/introduction
You'll learn how to train agents in unique environments, using best libraries, share your results, compete in challenges, and earn a certificate.

3. OpenAI Spinning Up in Deep RL -> https://spinningup.openai.com/en/latest/index.html
A comprehensive overview of RL with many useful resources

4. "Reinforcement Learning and Optimal Control" books, video lectures and course material by Dimitri P. Bertsekas from ASU -> https://web.mit.edu/dimitrib/www/RLbook.html
Explores approximate Dynamic Programming (DP) and RL with key concepts and methods like rollout, tree search, and neural network training for RL and more.

5. RL Course by David Silver (Google DeepMind) -> https://www.youtube.com/watch?v=2pWv7GOvuf0&list=PLqYmG7hTraZDM-OYHWgPeb
Many recommend these video lectures as a good foundation

6. RL theory seminars -> https://sites.google.com/view/rltheoryseminars/home?authuser=0
Provides virtual seminars from different experts about RL advancements

7. "Reinforcement Learning Specialization" (a 4-course series on Coursera) -> https://www.coursera.org/learn/fundament

8. Concepts: RLHF, RLAIF, RLEF, RLCF -> https://www.turingpost.com/p/rl-f
Our flashcards easily explain what are these four RL approaches with different feedback
reacted to their post with πŸ‘πŸš€ 28 days ago
view post
Post
3066
7 Open-source Methods to Improve Video Generation and Understanding

AI community is making great strides toward achieving the full potential of multimodality in video generation and understanding. Last week studies showed that working with videos is now one of the main focuses for improving AI models. Another highlight of the week is that open source, once again, proves its value. For those who were impressed by DeepSeek-R1, we’re with you!

Today, we’re combining these two key focuses and bringing you a list of open-source methods for better video generation and understanding:

1. VideoLLaMA 3 model: Excels in various video and image tasks thanks to vision-centric training approach. VideoLLaMA 3: Frontier Multimodal Foundation Models for Image and Video Understanding (2501.13106)

2. FILMAGENT framework assigns roles to multiple AI agents, like a director, screenwriter, actor, and cinematographer, to automate the filmmaking process in 3D virtual environments. FilmAgent: A Multi-Agent Framework for End-to-End Film Automation in Virtual 3D Spaces (2501.12909)

3. Improving Video Generation with Human Feedback (2501.13918) proposes a new VideoReward Model and approach that uses human feedback to refine video generation models.

4. DiffuEraser video inpainting model, based on stable diffusion, is designed to fill in missing areas with detailed, realistic content and to ensure consistent structures across frames. DiffuEraser: A Diffusion Model for Video Inpainting (2501.10018)

5. MAGI is a hybrid video gen model that combines masked and casual modeling. Its key innovation, Complete Teacher Forcing (CTF), conditions masked frames on fully visible frames. Taming Teacher Forcing for Masked Autoregressive Video Generation (2501.12389)

6. Go-with-the-Flow: Motion-Controllable Video Diffusion Models Using Real-Time Warped Noise (2501.08331) proposes motion control, allowing users to guide how objects or the camera move in generated videos. Its noise warping algorithm replaces random noise in videos with structured noise based on motion info.

7. Video Depth Anything model estimates depth consistently in super-long videos (several minutes or more) without sacrificing quality or speed. Video Depth Anything: Consistent Depth Estimation for Super-Long Videos (2501.12375)
reacted to their post with πŸ”₯ 29 days ago
view post
Post
3066
7 Open-source Methods to Improve Video Generation and Understanding

AI community is making great strides toward achieving the full potential of multimodality in video generation and understanding. Last week studies showed that working with videos is now one of the main focuses for improving AI models. Another highlight of the week is that open source, once again, proves its value. For those who were impressed by DeepSeek-R1, we’re with you!

Today, we’re combining these two key focuses and bringing you a list of open-source methods for better video generation and understanding:

1. VideoLLaMA 3 model: Excels in various video and image tasks thanks to vision-centric training approach. VideoLLaMA 3: Frontier Multimodal Foundation Models for Image and Video Understanding (2501.13106)

2. FILMAGENT framework assigns roles to multiple AI agents, like a director, screenwriter, actor, and cinematographer, to automate the filmmaking process in 3D virtual environments. FilmAgent: A Multi-Agent Framework for End-to-End Film Automation in Virtual 3D Spaces (2501.12909)

3. Improving Video Generation with Human Feedback (2501.13918) proposes a new VideoReward Model and approach that uses human feedback to refine video generation models.

4. DiffuEraser video inpainting model, based on stable diffusion, is designed to fill in missing areas with detailed, realistic content and to ensure consistent structures across frames. DiffuEraser: A Diffusion Model for Video Inpainting (2501.10018)

5. MAGI is a hybrid video gen model that combines masked and casual modeling. Its key innovation, Complete Teacher Forcing (CTF), conditions masked frames on fully visible frames. Taming Teacher Forcing for Masked Autoregressive Video Generation (2501.12389)

6. Go-with-the-Flow: Motion-Controllable Video Diffusion Models Using Real-Time Warped Noise (2501.08331) proposes motion control, allowing users to guide how objects or the camera move in generated videos. Its noise warping algorithm replaces random noise in videos with structured noise based on motion info.

7. Video Depth Anything model estimates depth consistently in super-long videos (several minutes or more) without sacrificing quality or speed. Video Depth Anything: Consistent Depth Estimation for Super-Long Videos (2501.12375)
posted an update 29 days ago
view post
Post
3066
7 Open-source Methods to Improve Video Generation and Understanding

AI community is making great strides toward achieving the full potential of multimodality in video generation and understanding. Last week studies showed that working with videos is now one of the main focuses for improving AI models. Another highlight of the week is that open source, once again, proves its value. For those who were impressed by DeepSeek-R1, we’re with you!

Today, we’re combining these two key focuses and bringing you a list of open-source methods for better video generation and understanding:

1. VideoLLaMA 3 model: Excels in various video and image tasks thanks to vision-centric training approach. VideoLLaMA 3: Frontier Multimodal Foundation Models for Image and Video Understanding (2501.13106)

2. FILMAGENT framework assigns roles to multiple AI agents, like a director, screenwriter, actor, and cinematographer, to automate the filmmaking process in 3D virtual environments. FilmAgent: A Multi-Agent Framework for End-to-End Film Automation in Virtual 3D Spaces (2501.12909)

3. Improving Video Generation with Human Feedback (2501.13918) proposes a new VideoReward Model and approach that uses human feedback to refine video generation models.

4. DiffuEraser video inpainting model, based on stable diffusion, is designed to fill in missing areas with detailed, realistic content and to ensure consistent structures across frames. DiffuEraser: A Diffusion Model for Video Inpainting (2501.10018)

5. MAGI is a hybrid video gen model that combines masked and casual modeling. Its key innovation, Complete Teacher Forcing (CTF), conditions masked frames on fully visible frames. Taming Teacher Forcing for Masked Autoregressive Video Generation (2501.12389)

6. Go-with-the-Flow: Motion-Controllable Video Diffusion Models Using Real-Time Warped Noise (2501.08331) proposes motion control, allowing users to guide how objects or the camera move in generated videos. Its noise warping algorithm replaces random noise in videos with structured noise based on motion info.

7. Video Depth Anything model estimates depth consistently in super-long videos (several minutes or more) without sacrificing quality or speed. Video Depth Anything: Consistent Depth Estimation for Super-Long Videos (2501.12375)
reacted to their post with πŸ‘€ about 1 month ago
view post
Post
2036
10 Recent Advancements in Math Reasoning

Over the last few weeks, we have witnessed a surge in AI models' math reasoning capabilities. Top companies like Microsoft, NVIDIA, and Alibaba Qwen have already joined this race to make models "smarter" in mathematics. But why is this shift happening now?

Complex math calculations require advanced multi-step reasoning, making mathematics an ideal domain for demonstrating a model's strong "thinking" capabilities. Additionally, as AI continues to evolve and is applied in math-intensive fields such as machine learning and quantum computing (which is predicted to see significant growth in 2025), it must meet the demands of complex reasoning.
Moreover, AI models can be integrated with external tools like symbolic solvers or computational engines to tackle large-scale math problems, which also needs high-quality math reasoning.

So here’s a list of 10 recent advancements in math reasoning of AI models:

1. NVIDIA: AceMath: Advancing Frontier Math Reasoning with Post-Training and Reward Modeling (2412.15084)

2. Qwen, Alibaba: Qwen2.5-Math-PRM The Lessons of Developing Process Reward Models in Mathematical Reasoning (2501.07301) and PROCESSBENCH evaluation ProcessBench: Identifying Process Errors in Mathematical Reasoning (2412.06559)

3. Microsoft Research: rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking (2501.04519)

4. BoostStep: Boosting mathematical capability of Large Language Models via improved single-step reasoning (2501.03226)

5. URSA: Understanding and Verifying Chain-of-thought Reasoning in Multimodal Mathematics (2501.04686)

6. U-MATH: A University-Level Benchmark for Evaluating Mathematical Skills in LLMs (2412.03205)

7. Open Eyes, Then Reason: Fine-grained Visual Mathematical Understanding in MLLMs (2501.06430)

8. End-to-End Bangla AI for Solving Math Olympiad Problem Benchmark: Leveraging Large Language Model Using Integrated Approach (2501.04425)

9. Quantization Meets Reasoning: Exploring LLM Low-Bit Quantization Degradation for Mathematical Reasoning (2501.03035)

10. System-2 Mathematical Reasoning via Enriched Instruction Tuning (2412.16964)
posted an update about 1 month ago
view post
Post
2036
10 Recent Advancements in Math Reasoning

Over the last few weeks, we have witnessed a surge in AI models' math reasoning capabilities. Top companies like Microsoft, NVIDIA, and Alibaba Qwen have already joined this race to make models "smarter" in mathematics. But why is this shift happening now?

Complex math calculations require advanced multi-step reasoning, making mathematics an ideal domain for demonstrating a model's strong "thinking" capabilities. Additionally, as AI continues to evolve and is applied in math-intensive fields such as machine learning and quantum computing (which is predicted to see significant growth in 2025), it must meet the demands of complex reasoning.
Moreover, AI models can be integrated with external tools like symbolic solvers or computational engines to tackle large-scale math problems, which also needs high-quality math reasoning.

So here’s a list of 10 recent advancements in math reasoning of AI models:

1. NVIDIA: AceMath: Advancing Frontier Math Reasoning with Post-Training and Reward Modeling (2412.15084)

2. Qwen, Alibaba: Qwen2.5-Math-PRM The Lessons of Developing Process Reward Models in Mathematical Reasoning (2501.07301) and PROCESSBENCH evaluation ProcessBench: Identifying Process Errors in Mathematical Reasoning (2412.06559)

3. Microsoft Research: rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking (2501.04519)

4. BoostStep: Boosting mathematical capability of Large Language Models via improved single-step reasoning (2501.03226)

5. URSA: Understanding and Verifying Chain-of-thought Reasoning in Multimodal Mathematics (2501.04686)

6. U-MATH: A University-Level Benchmark for Evaluating Mathematical Skills in LLMs (2412.03205)

7. Open Eyes, Then Reason: Fine-grained Visual Mathematical Understanding in MLLMs (2501.06430)

8. End-to-End Bangla AI for Solving Math Olympiad Problem Benchmark: Leveraging Large Language Model Using Integrated Approach (2501.04425)

9. Quantization Meets Reasoning: Exploring LLM Low-Bit Quantization Degradation for Mathematical Reasoning (2501.03035)

10. System-2 Mathematical Reasoning via Enriched Instruction Tuning (2412.16964)