Hugging Face
Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up
bird-of-paradise
/
deepseek-mla
like
5
Text Generation
Transformers
PyTorch
English
deepseek-mla
attention-mechanism
mla
efficient-attention
arxiv:
2405.04434
License:
mit
Model card
Files
Files and versions
Community
1
Use this model
main
deepseek-mla
/
src
2 contributors
History:
2 commits
bird-of-paradise
Update class names to MultiHeadLatentAttention
2d7348d
about 1 month ago
__pycache__
Initial commit: DeepSeek Multi-Latent Attention implementation
about 1 month ago
tests
Update class names to MultiHeadLatentAttention
about 1 month ago
__init__.py
393 Bytes
Update class names to MultiHeadLatentAttention
about 1 month ago
mla.py
13.3 kB
Update class names to MultiHeadLatentAttention
about 1 month ago