File size: 1,981 Bytes

---
language:
- en
library_name: transformers
license: apache-2.0
pipeline_tag: text-generation
tags:
- PolyCom
- PolyNorm
- PolyReLU
---

# Introduction

This repository contains the checkpoints of ICLR 2025 paper **[“Polynomial Composition Activations: Unleashing the Dynamics of Large Language Models](https://arxiv.org/pdf/2411.03884)”.**
In this work, we introduce a novel activation function called **Polynomial Composition (PolyCom)**, which enhances the expressiveness of large language models (LLMs) through dynamic polynomial compositions. Our method significantly improves the performance of dense and mixture of experts (MoE) models across a variety of downstream tasks, without adding significant computational overhead.

# Datasets and Training

We use the [RedPajama-Data-1T](https://huggingface.co/datasets/togethercomputer/RedPajama-Data-1T) dataset and pretrain the PolyCom model on 250B tokens. For more training details, please refer to [the source code](https://github.com/BryceZhuo/PolyCom).


# Inference

Here is an example of how to use the PolyCom model for inference:

```python
from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(path_of_model, device_map="cuda",trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained(path_of_model, padding_side="right",trust_remote_code=True)

prompt = "Hello, my name is"
input_ids = tokenizer.encode(prompt, return_tensors='pt').to('cuda')

greedy_output = model.generate(input_ids)
print(tokenizer.decode(greedy_output[0], skip_special_tokens=True))
```


# Citing this work

If you find this work helpful or use it in your research, please consider citing our paper:
```bibtex
@inproceedings{zhuo2025polycom,
  title={Polynomial Composition Activations: Unleashing the Dynamics of Large Language Models},
  author={Zhijian Zhuo and Ya Wang and Yutao Zeng and Xiaoqing Li and Xun Zhou and Jinwen Ma},
  booktitle={ICLR 2025},
  year={2025}
}
```