File size: 1,981 Bytes
1003963 1d745f0 1003963 7cf335f |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 |
---
language:
- en
library_name: transformers
license: apache-2.0
pipeline_tag: text-generation
tags:
- PolyCom
- PolyNorm
- PolyReLU
---
# Introduction
This repository contains the checkpoints of ICLR 2025 paper **[“Polynomial Composition Activations: Unleashing the Dynamics of Large Language Models](https://arxiv.org/pdf/2411.03884)”.**
In this work, we introduce a novel activation function called **Polynomial Composition (PolyCom)**, which enhances the expressiveness of large language models (LLMs) through dynamic polynomial compositions. Our method significantly improves the performance of dense and mixture of experts (MoE) models across a variety of downstream tasks, without adding significant computational overhead.
# Datasets and Training
We use the [RedPajama-Data-1T](https://huggingface.co/datasets/togethercomputer/RedPajama-Data-1T) dataset and pretrain the PolyCom model on 250B tokens. For more training details, please refer to [the source code](https://github.com/BryceZhuo/PolyCom).
# Inference
Here is an example of how to use the PolyCom model for inference:
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained(path_of_model, device_map="cuda",trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained(path_of_model, padding_side="right",trust_remote_code=True)
prompt = "Hello, my name is"
input_ids = tokenizer.encode(prompt, return_tensors='pt').to('cuda')
greedy_output = model.generate(input_ids)
print(tokenizer.decode(greedy_output[0], skip_special_tokens=True))
```
# Citing this work
If you find this work helpful or use it in your research, please consider citing our paper:
```bibtex
@inproceedings{zhuo2025polycom,
title={Polynomial Composition Activations: Unleashing the Dynamics of Large Language Models},
author={Zhijian Zhuo and Ya Wang and Yutao Zeng and Xiaoqing Li and Xun Zhou and Jinwen Ma},
booktitle={ICLR 2025},
year={2025}
}
``` |