arxiv:2309.05557

An Empirical Study of NetOps Capability of Pre-Trained Large Language Models

Published on Sep 11, 2023

Authors:

Yu Bai ,

Abstract

Large language models (LLMs) can respond to human language queries and have shown powerful potential applications in network operations (NetOps). Thanks to the large amount of commonsense knowledge inherent, LLMs achieve much better inference accuracy than traditional models and emerge with strong abilities in generalization, reasoning, and code generation. These abilities may have a crucial boost to automated and intelligent NetOps. However, it remains under-explored how well LLMs perform in various NetOps tasks. In this work, we make a systematic assessment of the capabilities, strengths, and limitations of selected LLMs in the field of NetOps. The evaluation is conducted on a collection of 5,732 questions about NetOps, encompassing 26 publicly available general-domain LLMs, including ChatGPT, LLaMA, Falcon, etc. We also finetune some of these LLMs with our collected NetOps corpus and evaluate the resulting models. The evaluation method follows the widely adopted benchmarks for general-domain LLMs, combined with Chain-of-Thought Prompts and Retrieval-Augmented Generation. The results show that only GPT-4 achieves high accuracy equivalent to passing the NetOps certification exam for humans, while all the other LLMs have much lower accuracy. However, some open models like LLaMA 2 still demonstrate significant potential. Furthermore, we evaluate the impact of factors such as model parameters, prompt engineering, instruction fine-tuning etc. This work shall be treated as the initial effort to systematic evaluation of LLMs in NetOps, and a more rigorous study is required for production use. The evaluation code and dataset will be released to benefit future research.

View arXiv page View PDF Add to collection

Community

stevehenderson

Sep 25, 2024

Hi there!

I'm looking for the dataset referenced in this paper:



@article
	 {miao2023empirical,
  title={An empirical study of netops capability of pre-trained large language models},
  author={Miao, Yukai and Bai, Yu and Chen, Li and Li, Dan and Sun, Haifeng and Wang, Xizheng and Luo, Ziqiu and Ren, Yanyu and Sun, Dapeng and Xu, Xiuting and others},
  journal={arXiv preprint arXiv:2309.05557},
  year={2023}
}

The paper has a footnote to neteval-exam. The paper mentions: "NetEval consists of 5,732 questions about NetOps, covering five different subdomains of NetOps". However, the question/answers in the huggingface neteval-exam dataset do not equal 5732. Here are the row counts (404 total):

6 ./neteval-exam/dev/security.csv
6 ./neteval-exam/dev/network.csv
4 ./neteval-exam/test/security.csv
12 ./neteval-exam/test/network.csv
289 ./neteval-exam/val/security.csv
87 ./neteval-exam/val/network.csv

Is there another link to the completed data referenced in the paper? It would be an amazing contribution to the field.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2309.05557 in a model README.md to link it from this page.

Datasets citing this paper 1

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2309.05557 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.