Papers
arxiv:1701.07852

An Empirical Analysis of Feature Engineering for Predictive Modeling

Published on Jan 26, 2017
Authors:

Abstract

Machine learning models, such as neural networks, decision trees, random forests, and gradient boosting machines, accept a feature vector, and provide a prediction. These models learn in a supervised fashion where we provide feature vectors mapped to the expected output. It is common practice to engineer new features from the provided feature set. Such engineered features will either augment or replace portions of the existing feature vector. These engineered features are essentially calculated fields based on the values of the other features. Engineering such features is primarily a manual, time-consuming task. Additionally, each type of model will respond differently to different kinds of engineered features. This paper reports empirical research to demonstrate what kinds of engineered features are best suited to various machine learning model types. We provide this recommendation by generating several datasets that we designed to benefit from a particular type of engineered feature. The experiment demonstrates to what degree the machine learning model can synthesize the needed feature on its own. If a model can synthesize a planned feature, it is not necessary to provide that feature. The research demonstrated that the studied models do indeed perform differently with various types of engineered features.

Community

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/1701.07852 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/1701.07852 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/1701.07852 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.