File size: 5,051 Bytes
8fd59f5
 
76624b9
 
 
8fd59f5
 
 
 
 
 
f04673f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7467c5d
f04673f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8fd59f5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
---
title: Resume Profile Extractor
emoji: πŸ“š
colorFrom: yellow
colorTo: pink
sdk: docker
pinned: false
---

Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

# πŸš€ Resume Profile Extractor

A powerful AI-powered application that automatically extracts professional profiles from resumes in PDF format. This application uses LLMs (via Groq) to intelligently parse resume content and generates structured profile data that can be used for portfolio generation, professional websites, and more.

## ✨ Features

- **PDF Resume Parsing**: Extract text from PDF resumes automatically
- **AI-Powered Information Extraction**: Uses large language models to extract structured information
- **Interactive Web UI**: Clean Streamlit interface for uploading and editing profiles
- **RESTful API**: Access extracted profiles via a FastAPI backend
- **Grammar Correction**: Clean up extracted text with AI grammar correction
- **Data Storage**: Persistent SQLite storage for extracted profiles
- **Profile Image Support**: Upload and store profile images
- **Docker Ready**: Easy deployment with included Dockerfile

## πŸ› οΈ Architecture

The application consists of two main components:

1. **Streamlit Web UI**: A user-friendly interface for uploading resumes, editing extracted information, and managing profiles
2. **FastAPI Backend**: A RESTful API service for accessing profiles programmatically

Both components run simultaneously in a single container when deployed.

## πŸ“‹ Technical Stack

- **Python 3.9+**
- **Streamlit**: Web interface framework
- **FastAPI**: API framework
- **LangChain + Groq**: AI language models for text extraction & processing
- **SQLite**: Lightweight database for profile storage
- **PyPDF2**: PDF parsing
- **Pydantic**: Data validation and settings management
- **Uvicorn**: ASGI server
- **Docker**: Containerization

## πŸƒβ€β™€οΈ Quick Start

### Local Development

1. Clone the repository
2. Install dependencies:
   ```bash
   pip install -r requirements.txt
   ```
3. Create a `.env` file from the sample:
   ```bash
   cp .env.sample .env
   ```
4. Add your Groq API key to the `.env` file
5. Run the application:
   ```bash
   python run_combined.py
   ```
6. Open http://localhost:7860 in your browser

### Using Docker

```bash
# Build the Docker image
docker build -t profile-extractor .

# Run the container
docker run -p 7860:7860 -p 8000:8000 -e GROQ_API_KEY=your_key_here profile-extractor
```

## πŸš€ Deployment on Hugging Face Spaces

This application is designed to be easily deployed on Hugging Face Spaces:

1. Create a new Space on [Hugging Face](https://huggingface.co/spaces)
2. Select **Docker** as the Space SDK
3. Link your GitHub repository or upload the files directly
4. **Add your `GROQ_API_KEY` in the **Settings** > **Variables** section**
5. (Optional) Set `EXTERNAL_API_URL` to your Space's URL (e.g., `https://your-username-your-space-name.hf.space`)
6. Deploy the Space!

### Required Environment Variables

| Variable | Description | Required |
|----------|-------------|----------|
| `GROQ_API_KEY` | Your Groq API key for LLM access | Yes |
| `EXTERNAL_API_URL` | Public URL of your API (for production) | No |
| `DEBUG` | Enable debug logging (true/false) | No |

## πŸ”„ API Endpoints

The API is available at port 8000 when running locally, or through the Hugging Face Space URL.

| Endpoint | Method | Description |
|----------|--------|-------------|
| `/health` | GET | Health check endpoint |
| `/api/profile/{id}` | GET | Get a complete profile by ID |
| `/api/profile/{id}/image` | GET | Get just the profile image |

## πŸ“š Usage Guide

1. **Upload Resume**: Start by uploading a PDF resume
2. **Review & Edit**: The system will extract information and allow you to review and edit
3. **Save Profile**: Save your profile to get a unique profile ID
4. **Access API**: Use the API endpoints to access your profile data
5. **Build Portfolio**: Use the structured data to build dynamic portfolios and websites

## 🧩 Project Structure

```
agentAi/
β”œβ”€β”€ agents/            # AI agents for extraction and processing
β”œβ”€β”€ services/          # Backend services (storage, etc.)
β”œβ”€β”€ utils/             # Utility functions
β”œβ”€β”€ app.py             # Streamlit web application
β”œβ”€β”€ api.py             # FastAPI endpoints
β”œβ”€β”€ models.py          # Pydantic data models
β”œβ”€β”€ config.py          # Application configuration
β”œβ”€β”€ run_combined.py    # Script to run both services
β”œβ”€β”€ requirements.txt   # Python dependencies
β”œβ”€β”€ Dockerfile         # For containerized deployment
└── README.md          # Documentation
```

## πŸ“ License

MIT License

## πŸ™ Acknowledgements

- [Groq](https://groq.com) for the LLM API
- [Streamlit](https://streamlit.io) for the web framework
- [FastAPI](https://fastapi.tiangolo.com) for the API framework
- [LangChain](https://langchain.com) for LLM interactions
- [Hugging Face](https://huggingface.co) for hosting capabilities