# Content Classification LoRA Adapter for Gemma-2B
A LoRA adapter for unsloth/gemma-2b that determines content indexing suitability using chain-of-thought reasoning.
Used in a pipeline.
## Technical Specifications
### Base Model
- Model: unsloth/gemma-2b
- LoRA Rank: 64
- Target Modules: q_proj, up_proj, down_proj, gate_proj, o_proj, k_proj, v_proj
- Task: CAUSAL_LM
- Dropout: 0
- Alpha: 32
### Input/Output Format
Input XML structure:
```xml
Determine true or false if the following content is suitable and should be indexed.
{input_text}
```
Output XML structure:
```xml
{reasoning_process}
{content_type}
{true|false}
```
The model then expects an indefinite list of ` ... ` that you may not want. But you can use this to do fewshots with incontext learning to correct a mistake or enhance the results.
Your stop token should be ``.
## Deployment
### VLLM Server Setup
```bash
export VLLM_ALLOW_RUNTIME_LORA_UPDATING=1
export VLLM_ALLOW_LONG_MAX_MODEL_LEN=1
vllm serve unsloth/gemma-2-2b \
--gpu-memory-utilization=1 \
--port 6002 \
--served-model-name="gemma" \
--trust-remote-code \
--max-model-len 8192 \
--disable-log-requests \
--enable-lora \
--lora-modules lora=./dataset/output/unsloth/lora_model \
--max-lora-rank 64
```
### Processing Pipeline
1. Install Dependencies:
```bash
pip install requests tqdm concurrent.futures
```
2. Run Content Processor:
```bash
python process.py --input corpus.jsonl --output results.jsonl --threads 24
```
### Client Implementation
```python
import requests
def classify_content(text: str, vllm_url: str = "http://localhost:6002/v1/completions") -> dict:
xml_content = (
'Determine true or false if the following content is '
'suitable and should be indexed.\n'
'\n'
f' {text}'
)
response = requests.post(
vllm_url,
json={
"prompt": xml_content,
"max_tokens": 6000,
"temperature": 1,
"model": "lora",
"stop": [""]
},
timeout=30000
)
completion = response.json()["choices"][0]["text"]
# Parse XML tags
import re
def extract_tag(tag: str) -> str:
match = re.search(f'<{tag}>(.*?){tag}>', completion, re.DOTALL)
return match.group(1).strip() if match else ""
return {
"thinking": extract_tag("thinking"),
"category": extract_tag("category"),
"should_index": extract_tag("should_index")
}
```
### Example Usage
```python
text = """Multiservice Tactics, Techniques, and Procedures
for
Nuclear, Biological, and Chemical Aspects of Consequence
Management
TABLE OF CONTENTS..."""
result = classify_content(text)
print(result)
```
Example output:
```json
{
"thinking": "This is a table of contents for a document, not the actual content.",
"category": "table of contents",
"should_index": "false"
}
```
## Batch Processing
The included processor supports parallel processing of JSONL files:
```python
from request_processor import RequestProcessor
processor = RequestProcessor(
input_file="corpus.jsonl",
output_file="results.jsonl",
num_threads=24
)
processor.process_file()
```
Input JSONL format:
```json
{
"pid": "document_id",
"docid": "path/to/source",
"content": "document text",
"metadata": {
"key": "value"
}
}
```
Output JSONL format:
```json
{
"pid": "document_id",
"docid": "path/to/source",
"content": "document text",
"metadata": {
"key": "value"
},
"thinking": "reasoning process",
"category": "content type",
"should_index": "true/false",
"processed_at": "2024-10-22 02:52:33"
}
```
## Implementation and Performance Considerations
- Use thread pooling for parallel processing
- Implement atomic writes with file locking
- Progress tracking with tqdm
- Automatic error handling and logging
- Configurable thread count for optimization
## Error Handling
Errors are captured in the output JSONL:
```json
{
"error": "error message",
"processed_at": "timestamp"
}
```
Monitor errors in real-time:
```bash
tail -f results.jsonl | grep error
```