\\\\\\\\\\\\\\\\\\\\\\ \----------- Resume Parser ----------\ \\\\\\\\\\\\\\\\\\\\\\

Overview:

This project is a comprehensive Resume Parsing tool built using Python, integrating the Mistral-Nemo-Instruct-2407 model for primary parsing. If Mistral fails or encounters issues, the system falls back to a custom-trained spaCy model to ensure continued functionality. The tool is wrapped with a Flask API and has a user interface built using HTML and CSS.

Installation Guide:

Create and Activate a Virtual Environment python -m venv venv source venv/bin/activate # For Linux/Mac

or

venv\Scripts\activate # For Windows

NOTE: If the virtual environment (venv) is already created, you can skip the creation step and just activate.
```
 - For Linux/Mac:
     source venv/bin/activate
 - For Windows:
     venv\Scripts\activate
```
Install Required Libraries pip install -r requirements.txt

Ensure the following dependencies are included:
- Flask
- spaCy
- huggingface_hub
- PyMuPDF
- python-docx
- Tesseract-OCR (for image-based parsing)
Set up Hugging Face Token
- Add your Hugging Face token to the .env file as: HF_TOKEN=

File Structure Overview:

Mistral_With_Spacy/
│
├── Spacy_Models/
│   └── ner_model_05_3  # Pretrained spaCy model directory for resume parsing
│
├── templates/
│   ├── index.html  # UI for file upload
│   └── result.html  # Display parsed results in structured JSON
│
├── uploads/  # Directory for uploaded resume files
│
├── utils/
│   ├── mistral.py  # Code for calling Mistral API and handling responses
│   ├── spacy.py  # spaCy fallback model for parsing resumes
│   ├── error.py  # Error handling utilities
│   └── fileTotext.py  # Functions to extract text from different file formats (PDF, DOCX, etc.)
│
├── venv/  # Virtual environment
│
├── .env  # Environment variables file (contains Hugging Face token)
│
├── main.py  # Flask app handling API routes for uploading and processing resumes
│
└── requirements.txt  # Dependencies required for the project

Program Overview:

# Mistral Integration (utils/mistral.py)
    - Mistral API Calls: Uses Hugging Face’s Mistral-Nemo-Instruct-2407 model to parse resumes.
    - Personal and Professional Extraction: Two functions extract personal and professional information in structured JSON format.
    - Fallback Mechanism: If Mistral fails, spaCy NER model is used as a fallback.

# SpaCy Integration (utils/spacy.py)
    - Custom Trained Model: Uses a spaCy model (ner_model_05_3) trained specifically for resume parsing.
    - Named Entity Recognition: Extracts key information like Name, Email, Contact, Location, Skills, Experience, etc., from resumes.
    - Validation: Includes validation for extracted emails and contacts.

# File Conversion (utils/fileTotext.py)
   - Text Extraction: Handles different resume formats (PDF, DOCX, ODT, RSF, and images like PNG, JPG, JPEG) and extracts text for further processing.
      - PDF Files: Uses PyMuPDF to extract text and, if necessary, Tesseract-OCR for image-based PDF content.
      - DOCX Files: Uses `python-docx` to extract structured text from Word documents.
      - ODT Files: Uses `odfpy` to extract text from ODT (OpenDocument) files.
      - RSF Files: Reads plain text from RSF files.
      - Images (PNG, JPG, JPEG): Uses Tesseract-OCR to extract text from image-based resumes.

   - Hyperlink Extraction: Extracts hyperlinks from PDF files, capturing any embedded URLs during the parsing process.


# Error Handling (utils/error.py)
    - Handles API response errors, file format errors, and ensures smooth fallbacks without crashing the app.

# Flask API (main.py)
    Endpoints:
    - /upload for uploading resumes.
    - Displays parsed results in JSON format on the results page.
    - UI: Simple interface for uploading resumes and viewing the parsing results.

Tree map of your program:

main.py
├── Handles API side
├── File upload/remove
├── Process resumes
└── Show result

utils
├── fileTotext.py
│   └── Converts files to text
│       ├── PDF
│       ├── DOCX
│       ├── RTF
│       ├── ODT
│       ├── PNG
│       ├── JPG
│       └── JPEG
├── mistral.py
│   ├── Mistral API Calls
│   │   └── Uses Mistral-Nemo-Instruct-2407 model
│   ├── Personal and Professional Extraction
│   │   ├── Extracts personal information
│   │   └── Extracts professional information
│   └── Fallback Mechanism
│       └── Uses spaCy NER model if Mistral fails
└── spacy.py
    ├── Custom Trained Model
    │   └── Uses spaCy model (ner_model_05_3)
    ├── Named Entity Recognition
    │   └── Extracts key information (Name, Email, Contact, etc.)
    └── Validation
        └── Validates emails and contacts

Spaces:

WebashalarForML
/

ResumeExtractor2

Running

Overview:

Installation Guide:

or

NOTE: If the virtual environment (venv) is already created, you can skip the creation step and just activate.

Ensure the following dependencies are included:

File Structure Overview:

Program Overview:

Tree map of your program:

References: