Spaces:
Running
Running
<html> | |
<head> | |
<meta charset="utf-8"> | |
<meta name="description" content="A collection of research papers from DeepSeek."> | |
<meta name="keywords" content="DeepSeek, AI Research, Machine Learning"> | |
<meta name="viewport" content="width=device-width, initial-scale=1"> | |
<title>DeepSeek Research Papers</title> | |
<link href="https://fonts.googleapis.com/css?family=Google+Sans|Noto+Sans|Castoro" rel="stylesheet"> | |
<link rel="stylesheet" href="./static/css/bulma.min.css"> | |
<link rel="stylesheet" href="./static/css/bulma-carousel.min.css"> | |
<link rel="stylesheet" href="./static/css/bulma-slider.min.css"> | |
<link rel="stylesheet" href="./static/css/fontawesome.all.min.css"> | |
<link rel="stylesheet" href="https://cdn.jsdelivr.net/gh/jpswalsh/academicons@1/css/academicons.min.css"> | |
<link rel="stylesheet" href="./static/css/index.css"> | |
<link rel="icon" href="./static/images/favicon.svg"> | |
<script src="https://ajax.googleapis.com/ajax/libs/jquery/3.5.1/jquery.min.js"></script> | |
<script defer src="./static/js/fontawesome.all.min.js"></script> | |
<script src="./static/js/bulma-carousel.min.js"></script> | |
<script src="./static/js/bulma-slider.min.js"></script> | |
<script src="./static/js/index.js"></script> | |
</head> | |
<body> | |
<section class="hero"> | |
<div class="hero-body"> | |
<div class="container is-max-desktop"> | |
<div class="columns is-centered"> | |
<div class="column has-text-centered"> | |
<h1 class="title is-1 publication-title">DeepSeek Research Papers</h1> | |
<div class="is-size-5 publication-authors"> | |
<span class="author-block">DeepSeek Research Team</span> | |
</div> | |
</div> | |
</div> | |
</div> | |
</div> | |
</section> | |
<section class="hero teaser"> | |
<div class="container is-max-desktop"> | |
<div class="hero-body"> | |
<h2 class="subtitle has-text-centered"> | |
Advancing AI through Open Research and Innovation | |
</h2> | |
</div> | |
</div> | |
</section> | |
<section class="section"> | |
<div class="container is-max-desktop"> | |
<div class="columns is-centered"> | |
<div class="column is-full-width"> | |
<h2 class="title is-3 has-text-centered">Publications</h2> | |
<!-- Paper List Start --> | |
<div class="content"> | |
<!-- Paper 1 --> | |
<div class="paper-item box"> | |
<h3 class="title is-4">1. DeepSeek LLM: Scaling Open-Source Language Models with Longtermism</h3> | |
<p>Scaling open-source language models with a focus on longtermism.</p> | |
<div class="publication-links"> | |
<span class="link-block"> | |
<a href="https://huggingface.co/papers/2401.02954" target="_blank" | |
class="external-link button is-normal is-rounded is-dark"> | |
<span class="icon"> | |
<i class="fas fa-file-pdf"></i> | |
</span> | |
<span>Paper</span> | |
</a> | |
</span> | |
<span class="is-size-6 has-text-grey">{Jan 6, 2024}</span> | |
</div> | |
</div> | |
<!-- Paper 2 --> | |
<div class="paper-item box"> | |
<h3 class="title is-4">2. DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models</h3> | |
<p>Exploring expert specialization in Mixture-of-Experts language models.</p> | |
<div class="publication-links"> | |
<span class="link-block"> | |
<a href="https://arxiv.org/abs/2401.06066" target="_blank" | |
class="external-link button is-normal is-rounded is-dark"> | |
<span class="icon"> | |
<i class="fas fa-file-pdf"></i> | |
</span> | |
<span>Paper</span> | |
</a> | |
</span> | |
<span class="is-size-6 has-text-grey">{Jan 11, 2024}</span> | |
</div> | |
</div> | |
<!-- Paper 3 --> | |
<div class="paper-item box"> | |
<h3 class="title is-4">3. DeepSeek-Coder: When the Large Language Model Meets Programming -- The Rise of Code Intelligence</h3> | |
<p>Investigating the intersection of large language models and programming.</p> | |
<div class="publication-links"> | |
<span class="link-block"> | |
<a href="https://arxiv.org/abs/2401.14196" target="_blank" | |
class="external-link button is-normal is-rounded is-dark"> | |
<span class="icon"> | |
<i class="fas fa-file-pdf"></i> | |
</span> | |
<span>Paper</span> | |
</a> | |
</span> | |
<span class="is-size-6 has-text-grey">{Jan 25, 2024}</span> | |
</div> | |
</div> | |
<!-- Paper 4 --> | |
<div class="paper-item box"> | |
<h3 class="title is-4">4. DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models</h3> | |
<p>Advancing mathematical reasoning capabilities in open language models.</p> | |
<div class="publication-links"> | |
<span class="link-block"> | |
<a href="https://arxiv.org/abs/2402.03300" target="_blank" | |
class="external-link button is-normal is-rounded is-dark"> | |
<span class="icon"> | |
<i class="fas fa-file-pdf"></i> | |
</span> | |
<span>Paper</span> | |
</a> | |
</span> | |
<span class="is-size-6 has-text-grey">{Feb 6, 2024}</span> | |
</div> | |
</div> | |
<!-- Paper 5 --> | |
<div class="paper-item box"> | |
<h3 class="title is-4">5. DeepSeek-VL: Towards Real-World Vision-Language Understanding</h3> | |
<p>Focusing on real-world vision-language understanding.</p> | |
<div class="publication-links"> | |
<span class="link-block"> | |
<a href="https://arxiv.org/abs/2403.05525" target="_blank" | |
class="external-link button is-normal is-rounded is-dark"> | |
<span class="icon"> | |
<i class="fas fa-file-pdf"></i> | |
</span> | |
<span>Paper</span> | |
</a> | |
</span> | |
<span class="is-size-6 has-text-grey">{Mar 9, 2024}</span> | |
</div> | |
</div> | |
<!-- Paper 6 --> | |
<div class="paper-item box"> | |
<h3 class="title is-4">6. DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model</h3> | |
<p>Developing a strong, economical, and efficient Mixture-of-Experts language model.</p> | |
<div class="publication-links"> | |
<span class="link-block"> | |
<a href="https://arxiv.org/abs/2405.04434" target="_blank" | |
class="external-link button is-normal is-rounded is-dark"> | |
<span class="icon"> | |
<i class="fas fa-file-pdf"></i> | |
</span> | |
<span>Paper</span> | |
</a> | |
</span> | |
<span class="is-size-6 has-text-grey">{May 7, 2024}</span> | |
</div> | |
</div> | |
<!-- Paper 7 --> | |
<div class="paper-item box"> | |
<h3 class="title is-4">7. DeepSeek-Prover: Advancing Theorem Proving in LLMs through Large-Scale Synthetic Data</h3> | |
<p>Using large-scale synthetic data to advance theorem proving in LLMs.</p> | |
<div class="publication-links"> | |
<span class="link-block"> | |
<a href="https://arxiv.org/abs/2405.14333" target="_blank" | |
class="external-link button is-normal is-rounded is-dark"> | |
<span class="icon"> | |
<i class="fas fa-file-pdf"></i> | |
</span> | |
<span>Paper</span> | |
</a> | |
</span> | |
<span class="is-size-6 has-text-grey">{May 23, 2024}</span> | |
</div> | |
</div> | |
<!-- Paper 8 --> | |
<div class="paper-item box"> | |
<h3 class="title is-4">8. DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence</h3> | |
<p>Aiming to surpass closed-source models in code intelligence.</p> | |
<div class="publication-links"> | |
<span class="link-block"> | |
<a href="https://arxiv.org/abs/2406.11931" target="_blank" | |
class="external-link button is-normal is-rounded is-dark"> | |
<span class="icon"> | |
<i class="fas fa-file-pdf"></i> | |
</span> | |
<span>Paper</span> | |
</a> | |
</span> | |
<span class="is-size-6 has-text-grey">{Jun 17, 2024}</span> | |
</div> | |
</div> | |
<!-- Paper 9 --> | |
<div class="paper-item box"> | |
<h3 class="title is-4">9. Let the Expert Stick to His Last: Expert-Specialized Fine-Tuning for Sparse Architectural Large Language Models</h3> | |
<p>Fine-tuning sparse architectural large language models with expert specialization.</p> | |
<div class="publication-links"> | |
<span class="link-block"> | |
<a href="https://arxiv.org/abs/2407.01906" target="_blank" | |
class="external-link button is-normal is-rounded is-dark"> | |
<span class="icon"> | |
<i class="fas fa-file-pdf"></i> | |
</span> | |
<span>Paper</span> | |
</a> | |
</span> | |
<span class="is-size-6 has-text-grey">{Jul 2, 2024}</span> | |
</div> | |
</div> | |
<!-- Paper 10 --> | |
<div class="paper-item box"> | |
<h3 class="title is-4">10. DeepSeek-Prover-V1.5: Harnessing Proof Assistant Feedback for Reinforcement Learning and Monte-Carlo Tree Search</h3> | |
<p>Utilizing proof assistant feedback for reinforcement learning and Monte-Carlo Tree Search.</p> | |
<div class="publication-links"> | |
<span class="link-block"> | |
<a href="https://arxiv.org/abs/2408.08152" target="_blank" | |
class="external-link button is-normal is-rounded is-dark"> | |
<span class="icon"> | |
<i class="fas fa-file-pdf"></i> | |
</span> | |
<span>Paper</span> | |
</a> | |
</span> | |
<span class="is-size-6 has-text-grey">{Aug 15, 2024}</span> | |
</div> | |
</div> | |
<!-- Paper 11 --> | |
<div class="paper-item box"> | |
<h3 class="title is-4">11. Janus: Decoupling Visual Encoding for Unified Multimodal Understanding and Generation</h3> | |
<p>Decoupling visual encoding for unified multimodal understanding and generation.</p> | |
<div class="publication-links"> | |
<span class="link-block"> | |
<a href="https://arxiv.org/abs/2410.13848" target="_blank" | |
class="external-link button is-normal is-rounded is-dark"> | |
<span class="icon"> | |
<i class="fas fa-file-pdf"></i> | |
</span> | |
<span>Paper</span> | |
</a> | |
</span> | |
<span class="is-size-6 has-text-grey">{Oct 17, 2024}</span> | |
</div> | |
</div> | |
<!-- Paper 12 --> | |
<div class="paper-item box"> | |
<h3 class="title is-4">12. JanusFlow: Harmonizing Autoregression and Rectified Flow for Unified Multimodal Understanding and Generation</h3> | |
<p>Harmonizing autoregression and rectified flow for unified multimodal understanding and generation.</p> | |
<div class="publication-links"> | |
<span class="link-block"> | |
<a href="https://arxiv.org/abs/2411.07975" target="_blank" | |
class="external-link button is-normal is-rounded is-dark"> | |
<span class="icon"> | |
<i class="fas fa-file-pdf"></i> | |
</span> | |
<span>Paper</span> | |
</a> | |
</span> | |
<span class="is-size-6 has-text-grey">{Nov 12, 2024}</span> | |
</div> | |
</div> | |
<!-- Paper 13 --> | |
<div class="paper-item box"> | |
<h3 class="title is-4">13. DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding</h3> | |
<p>Mixture-of-Experts Vision-Language Models for advanced multimodal understanding.</p> | |
<div class="publication-links"> | |
<span class="link-block"> | |
<a href="https://arxiv.org/abs/2412.10302" target="_blank" | |
class="external-link button is-normal is-rounded is-dark"> | |
<span class="icon"> | |
<i class="fas fa-file-pdf"></i> | |
</span> | |
<span>Paper</span> | |
</a> | |
</span> | |
<span class="is-size-6 has-text-grey">{Dec 13, 2024}</span> | |
</div> | |
</div> | |
<!-- Paper 14 --> | |
<div class="paper-item box"> | |
<h3 class="title is-4">14. DeepSeek-V3 Technical Report</h3> | |
<p>Technical report for DeepSeek-V3.</p> | |
<div class="publication-links"> | |
<span class="link-block"> | |
<a href="https://arxiv.org/abs/2412.19437" target="_blank" | |
class="external-link button is-normal is-rounded is-dark"> | |
<span class="icon"> | |
<i class="fas fa-file-pdf"></i> | |
</span> | |
<span>Paper</span> | |
</a> | |
</span> | |
<span class="is-size-6 has-text-grey">{Dec 27, 2024}</span> | |
</div> | |
</div> | |
<!-- Paper 15 --> | |
<div class="paper-item box"> | |
<h3 class="title is-4">15. DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning</h3> | |
<p>Incentivizing Reasoning Capability in LLMs via Reinforcement Learning.</p> | |
<div class="publication-links"> | |
<span class="link-block"> | |
<a href="https://arxiv.org/abs/2501.12948" target="_blank" | |
class="external-link button is-normal is-rounded is-dark"> | |
<span class="icon"> | |
<i class="fas fa-file-pdf"></i> | |
</span> | |
<span>Paper</span> | |
</a> | |
</span> | |
<span class="is-size-6 has-text-grey">{Jan 27, 2025}</span> | |
</div> | |
</div> | |
<!-- Paper 16 --> | |
<div class="paper-item box"> | |
<h3 class="title is-4">16. Janus-Pro: Unified Multimodal Understanding and Generation with Data and Model Scaling</h3> | |
<p>Unified Multimodal Understanding and Generation with Data and Model Scaling.</p> | |
<div class="publication-links"> | |
<span class="link-block"> | |
<a href="https://arxiv.org/abs/2501.17811" target="_blank" | |
class="external-link button is-normal is-rounded is-dark"> | |
<span class="icon"> | |
<i class="fas fa-file-pdf"></i> | |
</span> | |
<span>Paper</span> | |
</a> | |
</span> | |
<span class="is-size-6 has-text-grey">{Jan 31, 2025}</span> | |
</div> | |
</div> | |
<!-- Paper 17 --> | |
<div class="paper-item box"> | |
<h3 class="title is-4">17. Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention</h3> | |
<p>Hardware-Aligned and Natively Trainable Sparse Attention.</p> | |
<div class="publication-links"> | |
<span class="link-block"> | |
<a href="https://arxiv.org/abs/2502.11089" target="_blank" | |
class="external-link button is-normal is-rounded is-dark"> | |
<span class="icon"> | |
<i class="fas fa-file-pdf"></i> | |
</span> | |
<span>Paper</span> | |
</a> | |
</span> | |
<span class="is-size-6 has-text-grey">{Feb 16, 2025}</span> | |
</div> | |
</div> | |
</div> | |
<!-- Paper List End --> | |
</div> | |
</div> | |
</div> | |
</section> | |
<footer class="footer"> | |
<div class="container"> | |
<div class="content has-text-centered"> | |
<p> | |
© 2024 DeepSeek. All rights reserved. | |
</p> | |
<p> | |
This website is built using the Bulma CSS framework. | |
</p> | |
</div> | |
</div> | |
</footer> | |
</body> | |
</html> |