|
--- |
|
language: |
|
- en |
|
library_name: transformers |
|
license: apache-2.0 |
|
pipeline_tag: text-generation |
|
tags: |
|
- PolyCom |
|
- PolyNorm |
|
- PolyReLU |
|
--- |
|
|
|
# Introduction |
|
|
|
This repository contains the checkpoints of ICLR 2025 paper **[“Polynomial Composition Activations: Unleashing the Dynamics of Large Language Models](https://arxiv.org/pdf/2411.03884)”.** |
|
In this work, we introduce a novel activation function called **Polynomial Composition (PolyCom)**, which enhances the expressiveness of large language models (LLMs) through dynamic polynomial compositions. Our method significantly improves the performance of dense and mixture of experts (MoE) models across a variety of downstream tasks, without adding significant computational overhead. |
|
|
|
# Datasets and Training |
|
|
|
We use the [RedPajama-Data-1T](https://huggingface.co/datasets/togethercomputer/RedPajama-Data-1T) dataset and pretrain the PolyCom model on 250B tokens. For more training details, please refer to [the source code](https://github.com/BryceZhuo/PolyCom). |
|
|
|
|
|
# Inference |
|
|
|
Here is an example of how to use the PolyCom model for inference: |
|
|
|
```python |
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
|
|
model = AutoModelForCausalLM.from_pretrained(path_of_model, device_map="cuda",trust_remote_code=True) |
|
tokenizer = AutoTokenizer.from_pretrained(path_of_model, padding_side="right",trust_remote_code=True) |
|
|
|
prompt = "Hello, my name is" |
|
input_ids = tokenizer.encode(prompt, return_tensors='pt').to('cuda') |
|
|
|
greedy_output = model.generate(input_ids) |
|
print(tokenizer.decode(greedy_output[0], skip_special_tokens=True)) |
|
``` |
|
|
|
|
|
# Citing this work |
|
|
|
If you find this work helpful or use it in your research, please consider citing our paper: |
|
```bibtex |
|
@inproceedings{zhuo2025polycom, |
|
title={Polynomial Composition Activations: Unleashing the Dynamics of Large Language Models}, |
|
author={Zhijian Zhuo and Ya Wang and Yutao Zeng and Xiaoqing Li and Xun Zhou and Jinwen Ma}, |
|
booktitle={ICLR 2025}, |
|
year={2025} |
|
} |
|
``` |