Jade's picture

Jade

euclaise

·

AI & ML interests

None yet

Recent Activity

liked a model 4 days ago

ibm-granite/granite-3.1-2b-base

liked a model 4 days ago

tiiuae/Falcon3-1B-Base

liked a model 4 days ago

kyutai/helium-1-preview-2b

View all activity

Organizations

euclaise's activity

upvoted 4 papers 4 days ago

Tensor Product Attention Is All You Need

Paper • 2501.06425 • Published 15 days ago • 74

Transformer^2: Self-adaptive LLMs

Paper • 2501.06252 • Published 17 days ago • 50

Towards System 2 Reasoning in LLMs: Learning How to Think With Meta Chain-of-Though

Paper • 2501.04682 • Published 17 days ago • 89

REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models

Paper • 2501.03262 • Published 22 days ago • 87

upvoted 4 papers 16 days ago

Deliberation in Latent Space via Differentiable Cache Augmentation

Paper • 2412.17747 • Published Dec 23, 2024 • 29

Fourier Position Embedding: Enhancing Attention's Periodic Extension for Length Generalization

Paper • 2412.17739 • Published Dec 23, 2024 • 40

SDPO: Segment-Level Direct Preference Optimization for Social Agents

Paper • 2501.01821 • Published 23 days ago • 18

rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking

Paper • 2501.04519 • Published 18 days ago • 248

upvoted a paper about 1 month ago

Normalizing Flows are Capable Generative Models

Paper • 2412.06329 • Published Dec 9, 2024 • 9

upvoted 3 papers about 2 months ago

Free Process Rewards without Process Labels

Paper • 2412.01981 • Published Dec 2, 2024 • 31

Critical Tokens Matter: Token-Level Contrastive Estimation Enhence LLM's Reasoning Capability

Paper • 2411.19943 • Published Nov 29, 2024 • 57

TinyFusion: Diffusion Transformers Learned Shallow

Paper • 2412.01199 • Published Dec 2, 2024 • 14

upvoted a collection about 2 months ago

Skywork-o1-Open

Skywork o1 open model collections • 3 items • Updated Nov 27, 2024 • 20

upvoted 2 papers 2 months ago

Cautious Optimizers: Improving Training with One Line of Code

Paper • 2411.16085 • Published Nov 25, 2024 • 15

Cut Your Losses in Large-Vocabulary Language Models

Paper • 2411.09009 • Published Nov 13, 2024 • 44

upvoted 4 papers 3 months ago

TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters

Paper • 2410.23168 • Published Oct 30, 2024 • 24

Relaxed Recursive Transformers: Effective Parameter Sharing with Layer-wise LoRA

Paper • 2410.20672 • Published Oct 28, 2024 • 6

Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive Loss

Paper • 2410.17243 • Published Oct 22, 2024 • 89

MiniPLM: Knowledge Distillation for Pre-Training Language Models

Paper • 2410.17215 • Published Oct 22, 2024 • 14

upvoted a paper 4 months ago

GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models

Paper • 2410.05229 • Published Oct 7, 2024 • 22