Model Card: Resume Classification Using BERT

Model Overview

This model is a fine-tuned version of bert-base-uncased designed for multiclass classification. It categorizes resumes into one of 24 predefined job categories, making it suitable for automated resume screening and classification tasks.


Dataset

The dataset used for fine-tuning consists of 2400+ resumes in string and PDF formats. These resumes are categorized into 24 job categories. The dataset is available at https://www.kaggle.com/competitions/jarvis-calling-hiring-contest/data

  • Classes:
    ['ACCOUNTANT', 'ADVOCATE', 'AGRICULTURE', 'APPAREL', 'ARTS', 'AUTOMOBILE', 'AVIATION', 'BANKING', 'BPO', 'BUSINESS-DEVELOPMENT', 'CHEF', 'CONSTRUCTION', 'CONSULTANT', 'DESIGNER', 'DIGITAL-MEDIA', 'ENGINEERING', 'FINANCE', 'FITNESS', 'HEALTHCARE', 'HR', 'INFORMATION-TECHNOLOGY', 'PUBLIC-RELATIONS', 'SALES', 'TEACHER']

The dataset underwent significant preprocessing to remove noise and improve text quality for tokenization.
Preprocessing steps include:

  • Removal of HTML tags, URLs, punctuation, unicode characters, escape sequences, stop words, and irrelevant white spaces.
  • All the functions available in preprocessing.py

Model Configuration

  • Base Model: bert-base-uncased

  • Fine-tuning Task: Multiclass classification (24 classes)

  • Preprocessing Summary: The preprocessing steps applied to the training data have been encapsulated in the preprocess_function to simplify and standardize usage.

  • Model Output: The raw output consists of logits for each class. To obtain probabilities, you can apply the sigmoid activation function using torch.nn.Sigmoid().

  • Postprocessing: A postprocessing utility, included as the postprocess_function, converts the raw logits into the corresponding classified class names in text format for easier interpretation.


Training Details

The fine-tuning process involved:


Model Output

The model provides raw output logits for each job category. These logits can be converted into probabilities using:

import torch.nn as nn

sigmoid = nn.Sigmoid()
probs = sigmoid(logits)

The highest probability corresponds to the predicted job category.


Use Cases

  • Automated resume classification for HR platforms.
  • Sorting resumes into industry-specific categories for targeted hiring processes.
  • Candidate profiling and analysis for recruitment agencies.

Limitations

  • Model performance is reliant on the quality and diversity of the dataset. Biases in the dataset may affect predictions.
  • Preprocessing removes non-textual elements, which might strip out context-critical features.
  • PDFs with poor formatting or heavy graphical content may not preprocess effectively.

Citation

If you use this model in your work, please cite:
"Resume Classification Model using BERT for Multiclass Job Categorization."

Downloads last month
73
Safetensors
Model size
110M params
Tensor type
F32
·
Inference Examples
Unable to determine this model's library. Check the docs .

Model tree for Naandhu/bert-resume-classifier

Finetuned
(2309)
this model