BinT5

BinT5 is a Binary Code Summarization model, the base models are CodeT5 and fine-tuned with Capybara.

We offer 5 variations of the model:

Name Training Data
BinT5-C C Source
BinT5-Decom Decompiled C Binaries
BinT5-Stripped Stripped Decompiled C Binaries
BinT5-Demi Demi-stripped Decompiled C Binaries
BinT5-NoFunName Decompiled C Binaries with the Function Name removed

Citation Information

@inproceedings{alkaswan2023extending,
  title={Extending Source Code Pre-Trained Language Models to Summarise Decompiled Binaries},
  author={Al-Kaswan, Ali and Ahmed, Toufique and Izadi, Maliheh and Sawant, Anand Ashok and Devanbu, Premkumar and van Deursen, Arie},
  booktitle={2023 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)},
  pages={260--271},
  year={2023},
  organization={IEEE}
}
Downloads last month
7
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train AISE-TUDelft/BinT5-Decom

Collection including AISE-TUDelft/BinT5-Decom