Tempo14
's Collections
small models
updated
Approximating Two-Layer Feedforward Networks for Efficient Transformers
Paper
•
2310.10837
•
Published
•
10
BitNet: Scaling 1-bit Transformers for Large Language Models
Paper
•
2310.11453
•
Published
•
96
QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models
Paper
•
2310.16795
•
Published
•
27
LLM-FP4: 4-Bit Floating-Point Quantized Transformers
Paper
•
2310.16836
•
Published
•
14
FP8-LM: Training FP8 Large Language Models
Paper
•
2310.18313
•
Published
•
33
Atom: Low-bit Quantization for Efficient and Accurate LLM Serving
Paper
•
2310.19102
•
Published
•
11
Ziya2: Data-centric Learning is All LLMs Need
Paper
•
2311.03301
•
Published
•
17
Mini-GPTs: Efficient Large Language Models through Contextual Pruning
Paper
•
2312.12682
•
Published
•
9
SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective
Depth Up-Scaling
Paper
•
2312.15166
•
Published
•
57
TinyLlama: An Open-Source Small Language Model
Paper
•
2401.02385
•
Published
•
91
SliceGPT: Compress Large Language Models by Deleting Rows and Columns
Paper
•
2401.15024
•
Published
•
71
Specialized Language Models with Cheap Inference from Limited Domain
Data
Paper
•
2402.01093
•
Published
•
46
Rethinking Optimization and Architecture for Tiny Language Models
Paper
•
2402.02791
•
Published
•
13
Scaling Laws for Downstream Task Performance of Large Language Models
Paper
•
2402.04177
•
Published
•
18
HARE: HumAn pRiors, a key to small language model Efficiency
Paper
•
2406.11410
•
Published
•
39