Model Card: Resume Classification Using BERT

Model Overview

This model is a fine-tuned version of bert-base-uncased designed for multiclass classification. It categorizes resumes into one of 24 predefined job categories, making it suitable for automated resume screening and classification tasks.

Dataset

The dataset used for fine-tuning consists of 2400+ resumes in string and PDF formats. These resumes are categorized into 24 job categories. The dataset is available at https://www.kaggle.com/competitions/jarvis-calling-hiring-contest/data

Classes:
['ACCOUNTANT', 'ADVOCATE', 'AGRICULTURE', 'APPAREL', 'ARTS', 'AUTOMOBILE', 'AVIATION', 'BANKING', 'BPO', 'BUSINESS-DEVELOPMENT', 'CHEF', 'CONSTRUCTION', 'CONSULTANT', 'DESIGNER', 'DIGITAL-MEDIA', 'ENGINEERING', 'FINANCE', 'FITNESS', 'HEALTHCARE', 'HR', 'INFORMATION-TECHNOLOGY', 'PUBLIC-RELATIONS', 'SALES', 'TEACHER']

The dataset underwent significant preprocessing to remove noise and improve text quality for tokenization.
Preprocessing steps include:

Removal of HTML tags, URLs, punctuation, unicode characters, escape sequences, stop words, and irrelevant white spaces.
All the functions available in preprocessing.py

Model Configuration

Base Model: bert-base-uncased
Fine-tuning Task: Multiclass classification (24 classes)
Preprocessing Summary: The preprocessing steps applied to the training data have been encapsulated in the preprocess_function to simplify and standardize usage.
Model Output: The raw output consists of logits for each class. To obtain probabilities, you can apply the sigmoid activation function using torch.nn.Sigmoid().
Postprocessing: A postprocessing utility, included as the postprocess_function, converts the raw logits into the corresponding classified class names in text format for easier interpretation.

Training Details

The fine-tuning process involved:

Input tokenization using bert-base-uncased tokenizer.
Feeding preprocessed text into the BERT model for contextual understanding.
Output logits normalized using the sigmoid activation function to produce probabilities for each class.
The entire training code is available in kaggle: https://www.kaggle.com/code/naandhu/bert-base-uncased-fine-tuned-for-classification

Model Output

The model provides raw output logits for each job category. These logits can be converted into probabilities using:

import torch.nn as nn

sigmoid = nn.Sigmoid()
probs = sigmoid(logits)

The highest probability corresponds to the predicted job category.

Use Cases

Automated resume classification for HR platforms.
Sorting resumes into industry-specific categories for targeted hiring processes.
Candidate profiling and analysis for recruitment agencies.

Limitations

Model performance is reliant on the quality and diversity of the dataset. Biases in the dataset may affect predictions.
Preprocessing removes non-textual elements, which might strip out context-critical features.
PDFs with poor formatting or heavy graphical content may not preprocess effectively.

Citation

If you use this model in your work, please cite:
"Resume Classification Model using BERT for Multiclass Job Categorization."

Naandhu
/

bert-resume-classifier