vb-jd-kwextractor / README.md

Update README.md with preliminary information

fa5914b verified about 2 months ago

7.14 kB

	---
	library_name: transformers
	language:
	- en
	metrics:
	- accuracy
	base_model:
	- ml6team/keyphrase-extraction-kbir-kpcrowd
	- bloomberg/KBIR
	---

	# Model Card for Model ID

	Resume Customization is a key step in the job search process. A customized resume can dramatically increase the chances of proceeding to the next stages in the job application process.
	Job recruiters use sophisticated ATS (Applicant Tracking Systems) which can rank job seeker resumes in order of customization to the job description in question and even prima facie reject resumes that are not customized enough and beyond a certain threshold.
	In such a situation, how does one go about customizing their resume? We at Veersynd Bessa, a freelance project group at Humber Polytechnic at Toronto, asked the same question and found that one of the best (and recommended) ways is to pick out keywords from the job description itself to populate your resume with.

	It is true that one can simply pick the keywords out of job descriptions themselves and repeat the process for every job application that they fill. However, this extractor can save them considerable time and effort which can be utilized productively in other tasks.
	Our project team has even gone one step further and built a resume customizer which populates these identified keywords into resume bullet points using a Large Language Model (LLM) in the SAR (Story, Action, Result) format. You can find the link to the github [here](https://github.com/arya19/veersynd-bessa).
	The fact remains that as a first step, we need to identify meaningful keywords out of the job description.

	We turned to Artificial Intelligence for the task of extracting keywords. Information Theory tells us that there are specific semantic patterns in a body of text (information) that makes the whole text meaningful.
	It thus follows that there are specific semantic patterns that make some words out of the body of text more important and relevant than others, i.e. keyphrases. We can use a Deep Learning Model to exploit these semantic patterns, learn them and predict these keyphrases in any given body of text.
	The task could also be achieved statistically (by employing the concepts of statistical entropy and frequency, etc.) but a Deep Learning model, in our opinion, can deliver much better results due to the bottom up approach it employs; consequently overcoming any formulative limitations.
	The present model has thus been trained to identify keywords out of job descriptions which can then be used to populate a resume either manually or through the use of an automation tool (not unlike ours).

	### Model Description

	The present AI model is based on [KBIR](https://huggingface.co/bloomberg/KBIR) model which was fine tuned by [ml6team](https://huggingface.co/ml6team) who used it to extract keywords out of news articles by training the model on the [KPCrowd Dataset](https://huggingface.co/datasets/midas/kpcrowd).
	You can read more about their work [here](https://huggingface.co/ml6team/keyphrase-extraction-kbir-kpcrowd).
	The present model is fine tuned on a custom dataset which was painstakingly created by our team in our free time. At the time of writing this description, the dataset is still under development and will be made available to the public once it is of a more superior quality.

	Like the [KBIR-KPCrowd](https://huggingface.co/ml6team/keyphrase-extraction-kbir-kpcrowd) model, the present model is a transformer model fine tuned as a token classification problem where each word in the document is classified as being part of a keyphrase or not.

	\| Label \| Description \|
	\| ----- \| ------------------------------- \|
	\| B-KEY \| At the beginning of a keyphrase \|
	\| I-KEY \| Inside a keyphrase \|
	\| O \| Outside a keyphrase \|


	## Uses

	The model is intended to be used in the context of Token Classification (or Name Entity Recognition) for Job Descriptions written in the English Language _ONLY_.

	## How to Get Started with the Model

	Use the code below to get started with the model.

	[More Information Needed]

	## Training Details

	### Training Data

	<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->

	[More Information Needed]

	### Training Procedure

	<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->

	#### Preprocessing [optional]

	[More Information Needed]


	#### Training Hyperparameters

	- Training regime: [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->

	#### Speeds, Sizes, Times [optional]

	<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->

	[More Information Needed]

	## Evaluation

	<!-- This section describes the evaluation protocols and provides the results. -->

	### Testing Data, Factors & Metrics

	#### Testing Data

	<!-- This should link to a Dataset Card if possible. -->

	[More Information Needed]

	#### Factors

	<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->

	[More Information Needed]

	#### Metrics

	<!-- These are the evaluation metrics being used, ideally with a description of why. -->

	[More Information Needed]

	### Results

	[More Information Needed]

	#### Summary



	## Model Examination [optional]

	<!-- Relevant interpretability work for the model goes here -->

	[More Information Needed]

	## Environmental Impact

	<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->

	Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).

	- Hardware Type: [More Information Needed]
	- Hours used: [More Information Needed]
	- Cloud Provider: [More Information Needed]
	- Compute Region: [More Information Needed]
	- Carbon Emitted: [More Information Needed]

	## Technical Specifications [optional]

	### Model Architecture and Objective

	[More Information Needed]

	### Compute Infrastructure

	[More Information Needed]

	#### Hardware

	[More Information Needed]

	#### Software

	[More Information Needed]

	## Citation [optional]

	<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->

	BibTeX:

	[More Information Needed]

	APA:

	[More Information Needed]

	## Glossary [optional]

	<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->

	[More Information Needed]

	## More Information [optional]

	[More Information Needed]

	## Model Card Authors [optional]

	[More Information Needed]

	## Model Card Contact

	[More Information Needed]