ComBack: A Versatile Dataset for Enhancing Compiler Backend Development Efficiency

ComBack is a large-scale multi-platform compiler backend code dataset. This repository contains all fine-tuned models and scripts for reproducing experimental results.

Dataset Information

Details can be found at https://huggingface.co/datasets/docz-ict/ComBack

Task Example

  • Statement-Level Completion: complete current statement.
//Inputs:
...
adjustReg(MBB,LastFrameDestroy, DL, SPReg, FPReg, -StackSize+RVFI->getVarArgsSaveSize() 
//Ground Truth:
MachineInstr::FrameDestroy);
  • Next-Statement Suggestion: predict the next statement.
  //Inputs:
  ...
  maxCallFrameSize = (maxCallFrameSize + AlignMask) & ~AlignMask;
  //Ground Truth:
  MFI -> setMaxCallFrameSize(maxCallFrameSize);
  • Code Generation: generate a function with function description in natrual language.
  //Inputs:
  getPointerRegClass: Returns a TargetRegisterClass used for pointer values.
  Target-Specific Value: Sparc, SP::I64RegsRegClass, SP::IntRegsRegClass.
  //Ground Truth:
  TargetRegisterClass *SparcRegisterInfo::getPointerRegClass(MachineFunction &MF ,unsigned Kind) {
   return Subtarget.is64Bit() ? &SP::I64RegsRegClass : &SP::IntRegsRegClass;
  }

1. Dependency

  • python version == 3.8.1
  • pip install -r requirements.txt

2. Fine-Tuning

We fine-tuned six pre-trained code language models on 8 Tesla V100 each with 16GB. You can fine-tune each model on our datasets by running:

# Model Type Options: CodeBert, GraphCodeBert, UnixCoder, CodeT5, NatGen, CodeT5+
# Task Options: code-generation, code-completion, new-target-completion(Only for CodeT5+), new-target-generation(Only for CodeT5+)
bash ./Script/Model/{Model Type}/{Task}/run_fine_tuning*.sh

3. Reproducing Results in Table.2

Dataset split scheme

Split data of all 178 backends into train/valid/test set in the ratio of 80%:10%:10%

  • Dataset Info
Task Train Valid Test
Statement-Level Comp. 128,899(11.36M Token) 16,112(1.43M Token) 16,113(1.43M Token)
Next-Statement Sugg. 173,052(15.69M Token) 21,631(1.99M Token) 21,632(1.98M Token)
Code Generation. 36,236(5.10M Token) 4,530(0.64M Token) 4,530(0.64M Token)

Reproducing results in Table.2 by running:

# Model Type Options: CodeBert, GraphCodeBert, UnixCoder, CodeT5, NatGen, CodeT5+
# Task Options: code-generation, code-completion
bash ./Script/Model/{Model Type}/{Task}/run_test.sh

Results

  • Without Fine-Tuning

    Stmt. Comp. Stmt. Comp. Next. Sugg. Next. Sugg. Code. Gen. Code. Gen.
    Model EM ED EM ED BLEU4 ED
    CodeBert-c 0.00 0.97 0.00 1.31 0.00 0.44
    GraphCodeBert-c 0.00 0.35 0.00 0.54 0.00 2.41
    UnixCoder-base-nine 0.07 27.56 15.93 29.11 0.00 31.81
    CodeT5-base 0.65 21.45 7.23 23.50 0.00 13.57
    NatGen 0.00 13.52 0.02 15.95 0.01 28.76
    CodeT5+-220m 0.02 7.24 0.12 9.87 0.00 12.33
  • Fine-Tuned

    Stmt. Comp. Stmt. Comp. Next. Sugg. Next. Sugg. Code. Gen. Code. Gen.
    Model EM ED EM ED BLEU4 ED
    CodeBert-c 53.84 77.44 52.67 70.82 23.54 54.63
    GraphCodeBert-c 43.00 71.89 47.10 61.31 20.73 48.83
    UnixCoder-base-nine 67.84 85.06 58.51 75.31 56.24 73.45
    CodeT5-base 66.38 84.34 58.52 76.03 70.87 80.45
    NatGen 67.47 84.83 60.30 76.84 71.73 81.39
    CodeT5+-220m 66.93 84.45 59.57 76.41 75.28 82.95

4. Reproducing Results in Table.3

Dataset split scheme

Take data of RISC-V,ARC,NVPTX both in GCC and LLVM as test set, split train/valid set in the ratio of 85%:15% of other CPU, MPU and GPU targets excluding RI5CY(RI5CY is custmoized based on RISCV)

  • Datset Info

    Task Train Valid Test
    Statement-Level Comp. 114,016(10.20M Token) 20,121(1.81M Token) 6,645(0.58M Token)
    Next-Statement Sugg. 152,114(14.10M Token) 26,844(2.49M Token) 9,313(0.83M Token)
    Code Generation. 30,633(4.44M Token) 5,406(0.79M Token) 2,819(0.37M Token)

Input examples for ChatGPT-3.5-Turbo and Code-LLaMA-34B-Instruct

Statement-Level Completion

//Prompt: Complete the last statement of this code snippet:
...
adjustReg(MBB,LastFrameDestroy, DL, SPReg, FPReg, -StackSize+RVFI->getVarArgsSaveSize() 

Next-Statement Suggestion

//Prompt: Predict the next statement of this code snippet:
...
maxCallFrameSize = (maxCallFrameSize + AlignMask) & ~AlignMask;

Code Generation

//Prompt: Create a function named "getPointerRegClass" for "Sparc" backend of LLVM Compiler. 
//The description of this function is "Returns a TargetRegisterClass used for pointer values". 
//It contains "Sparc", "SP::I64RegsRegClass", "SP::IntRegsRegClass" as target specific values.

Reproducing results in Table.3 by running:

# Task Options: new-target-completion, new-target-generation
bash ./Script/Model/CodeT5+/{Task}/run_test_existing_type.sh

# ChatGPT
bash ./Script/Exp_Script/ChatGPT/run_chatgpt.sh

# Code-LLaMA
bash ./Script/Exp_Script/ChatGPT/run_codellama.sh

Results

  • GCC
Stmt. Comp. Stmt. Comp. Stmt. Comp. Stmt. Comp. Stmt. Comp. Stmt. Comp. Next. Sugg. Next. Sugg. Next. Sugg. Next. Sugg. Next. Sugg. Next. Sugg. Code. Gen. Code. Gen. Code. Gen. Code. Gen. Code. Gen. Code. Gen.
RISC-V RISC-V ARC ARC NVPTX NVPTX RISC-V RISC-V ARC ARC NVPTX NVPTX RISC-V RISC-V ARC ARC NVPTX NVPTX
Model EM ED EM ED EM ED EM ED EM ED EM ED BLEU4 ED BLEU4 ED BLEU4 ED
ChatGPT-3.5-Turbo 10.34 38.41 15.35 42.94 12.01 41.47 6.44 12.9 9.75 20.79 7.97 17.79 1.37 24.12 1.67 28.26 1.57 26.97
Code-LLaMA-34B 0.41 19.07 0.85 16.77 0.56 18.22 1.58 13.54 2.66 17.95 2.47 16.59 1.67 27.89 1.71 30.49 1.57 27.65
CodeT5+-220m 51.16 75.32 52.45 74.57 50.56 75.52 49.11 67.84 38.26 59.21 38.33 56.31 32.56 58.67 19.94 50.27 25.47 52.60
  • LLVM
Stmt. Comp. Stmt. Comp. Stmt. Comp. Stmt. Comp. Stmt. Comp. Stmt. Comp. Next. Sugg. Next. Sugg. Next. Sugg. Next. Sugg. Next. Sugg. Next. Sugg. Code. Gen. Code. Gen. Code. Gen. Code. Gen. Code. Gen. Code. Gen.
RISC-V RISC-V ARC ARC NVPTX NVPTX RISC-V RISC-V ARC ARC NVPTX NVPTX RISC-V RISC-V ARC ARC NVPTX NVPTX
Model EM ED EM ED EM ED EM ED EM ED EM ED BLEU4 ED BLEU4 ED BLEU4 ED
ChatGPT-3.5-Turbo 12.08 41.39 16.77 42.02 14.73 43.72 9.80 21.86 10.81 20.66 11.39 22.82 1.23 25.12 1.30 27.19 1.43 25.45
Code-LLaMA-34B 0.45 17.61 0.61 17.21 0.99 17.23 1.75 15.04 0.42 11.27 2.42 16.25 1.43 27.24 1.61 32.12 1.59 28.08
CodeT5+-220m 62.68 82.02 71.34 85.98 64.45 81.53 48.71 68.95 58.68 74.57 47.81 65.5 50.34 72.98 55.38 74.41 44.33 66.36

5. Reproducing Results in Figure.6

Reproducing results in Table.4 by running:

# Task Options: new-target-completion, new-target-generation
bash ./Script/Model/CodeT5+/{Task}/run_test_existing_type.sh

# Fork-Flow
bash ./Script/Exp_Script/ForkFlow/run_forkflow.sh

Results

  • GCC
RISCV RISCV ARC ARC NVPTX NVPTX
Method BLEU4 ED BLEU4 ED BLEU4 ED
ForkFlow Avg 3.48 5.79 1.77 3.73 4.7 3.81
ForkFlow Max 28.77 34.8 4.94 8.85 4.7 3.81
CodeT5+ 32.56 58.67 25.47 52.6 19.94 50.27
  • LLVM
RISCV RISCV ARC ARC NVPTX NVPTX
Method BLEU4 ED BLEU4 ED BLEU4 ED
ForkFlow Avg 12.45 22.18 19.98 33.43 15.06 28.73
ForkFlow Max 27.32 46.47 41.8 60.62 18.81 39.04
CodeT5+ 50.34 72.98 55.38 74.41 44.33 66.36

6. Reproducing Results in Table.4

Dataset split scheme

Take data of ARC,NVPTX both in GCC and LLVM as test set, split train/valid set in the ratio of 85%:15% of CPU targets excluding RISC-V and RI5CY

  • Datset Info

    Task Train Valid Test
    Statement-Level Comp. 87,018(7.78M Token) 15,357(1.37M Token) 2,764(0.26M Token)
    Next-Statement Sugg. 113,684(10.65M Token) 20,063(1.87M Token) 4,029(0.38M Token)
    Code Generation. 21,184(3.14M Token) 3,739(0.55M Token) 1,372(0.18M Token)

Reproducing results in Table.4 by running:

# Task Options: new-target-completion, new-target-generation
bash ./Script/Model/CodeT5+/{Task}/run_test_new_type.sh

Results

  • GCC

    Stmt. Comp. Stmt. Comp. Stmt. Comp. Stmt. Comp. Next. Sugg. Next. Sugg. Next. Sugg. Next. Sugg. Code. Gen. Code. Gen. Code. Gen. Code. Gen.
    ARC(MPU) ARC(MPU) NVPTX(GPU) NVPTX(GPU) ARC(MPU) ARC(MPU) NVPTX(GPU) NVPTX(GPU) ARC(MPU) ARC(MPU) NVPTX(GPU) NVPTX(GPU)
    Dataset EM ED EM ED EM ED EM ED BLEU4 ED BLEU4 ED
    -w GPU and MPU 52.45 74.57 50.56 75.52 38.26 59.21 38.33 56.31 19.94 50.27 25.47 52.6
    -w/o GPU and MPU 50.53 74.09 46.37 72.45 37.22 58.21 38.33 56.83 19.29 49.12 22.46 50.33
    Decrease 1.92 0.48 4.19 3.07 1.04 1.00 0.00 -0.52 0.65 1.15 3.01 3.37
  • LLVM

    Stmt. Comp. Stmt. Comp. Stmt. Comp. Stmt. Comp. Next. Sugg. Next. Sugg. Next. Sugg. Next. Sugg. Code. Gen. Code. Gen. Code. Gen. Code. Gen.
    ARC(MPU) ARC(MPU) NVPTX(GPU) NVPTX(GPU) ARC(MPU) ARC(MPU) NVPTX(GPU) NVPTX(GPU) ARC(MPU) ARC(MPU) NVPTX(GPU) NVPTX(GPU)
    Dataset EM ED EM ED EM ED EM ED BLEU4 ED BLEU4 ED
    -w GPU and MPU 71.34 85.98 64.45 81.53 58.68 74.57 47.81 65.50 55.38 74.41 44.33 66.36
    -w/o GPU and MPU 69.82 85.59 60.04 79.85 58.26 73.75 46.28 63.92 49.62 70.26 42.94 65.43
    Decrease 1.52 0.39 4.41 1.68 0.42 0.82 1.53 1.58 5.76 4.15 1.39 0.93

7. Reproducing Results in Table.5

Dataset split scheme

Take data of RI5CY in LLVM as test set, split train/valid set in the ratio of 85%:15% of CPU targets excluding RISC-V and including RISC-V

  • Datset Info

    • Excluding RISC-V
    Task Train Valid Test
    Statement-Level Comp. 87,018(7.78M Token) 15,357(1.37M Token) 721(0.04M Token)
    Next-Statement Sugg. 113,684(10.65M Token) 20,063(1.87M Token) 1,035(0.06M Token)
    Code Generation. 21,184(3.14M Token) 3,739(0.55M Token) 219(0.02M Token)
    • Including RISC-V
    Task Train Valid Test
    Statement-Level Comp. 90,316(8.06M Token) 15,940(1.42M Token) 721(0.04M Token)
    Next-Statement Sugg. 118,175(11.04M Token) 20,856(1.94M Token) 1,035(0.06M Token)
    Code Generation. 22,413(3.30M Token) 3,957(0.58M Token) 219(0.02M Token)

Reproducing results in Table.5 by running:

# Task Options: new-target-completion, new-target-generation
bash ./Script/Model/CodeT5+/{Task}/run_test_itr_exp.sh

Results

Stmt. Comp. Stmt. Comp. Next. Sugg. Next. Sugg. Code. Gen. Code. Gen.
Dataset EM ED EM ED BLEU4 ED
-w/o RISC-V 66.16 83.79 57.29 74.73 54.41 75.41
-w RISC-V 74.06 87.91 67.25 81.28 79.46 89.92
Diff 7.90 4.12 9.96 6.55 25.05 14.51

Citation

@inproceedings{zhong2024comback,
  title={ComBack: A Versatile Dataset for Enhancing Compiler Backend Development Efficiency},
  author={Ming Zhong, Fang Lyu, Lulin Wang, Hongna Geng, Lei Qiu, Huimin Cui, Xiaobing Feng},
  booktitle={Thirty-eighth Conference on Neural Information Processing Systems Datasets and Benchmarks Track},
  year={2024}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference API
Unable to determine this model's library. Check the docs .