AlphaGaO/DeepSeek-V3-0324-Fused-4E-29B-Unhealed-Preview Text Generation • Updated 12 days ago • 2 • 1
BlackGoose Rimer: Harnessing RWKV-7 as a Simple yet Superior Replacement for Transformers in Large-Scale Time Series Modeling Paper • 2503.06121 • Published Mar 8 • 5
BlackGoose Rimer: Harnessing RWKV-7 as a Simple yet Superior Replacement for Transformers in Large-Scale Time Series Modeling Paper • 2503.06121 • Published Mar 8 • 5 • 2
ARWKV: Pretrain is not what we need, an RNN-Attention-Based Language Model Born from Transformer Paper • 2501.15570 • Published Jan 26 • 25
ARWKV: Pretrain is not what we need, an RNN-Attention-Based Language Model Born from Transformer Paper • 2501.15570 • Published Jan 26 • 25 • 2