File size: 4,431 Bytes
0543aa3 e0b1078 313af0d 0543aa3 313af0d 0543aa3 c46926b 313af0d c46926b 313af0d 0543aa3 6f81f69 0543aa3 6f81f69 0543aa3 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 |
# Content Classification LoRA Adapter for Gemma-2B
A LoRA adapter for unsloth/gemma-2b that determines content indexing suitability using chain-of-thought reasoning.
Used in a pipeline.
## Technical Specifications
### Base Model
- Model: unsloth/gemma-2b
- LoRA Rank: 64
- Target Modules: q_proj, up_proj, down_proj, gate_proj, o_proj, k_proj, v_proj
- Task: CAUSAL_LM
- Dropout: 0
- Alpha: 32
### Input/Output Format
Input XML structure:
```xml
<instruction>Determine true or false if the following content is suitable and should be indexed.</instruction>
<suitable>
<content>{input_text}</content>
```
Output XML structure:
```xml
<thinking>{reasoning_process}</thinking>
<category>{content_type}</category>
<should_index>{true|false}</should_index>
</suitable>
```
The model then expects an indefinite list of `<suitable> ... </suitable>` that you may not want. But you can use this to do fewshots with incontext learning to correct a mistake or enhance the results.
Your stop token should be `</suitable>`.
## Deployment
### VLLM Server Setup
```bash
export VLLM_ALLOW_RUNTIME_LORA_UPDATING=1
export VLLM_ALLOW_LONG_MAX_MODEL_LEN=1
vllm serve unsloth/gemma-2-2b \
--gpu-memory-utilization=1 \
--port 6002 \
--served-model-name="gemma" \
--trust-remote-code \
--max-model-len 8192 \
--disable-log-requests \
--enable-lora \
--lora-modules lora=./dataset/output/unsloth/lora_model \
--max-lora-rank 64
```
### Processing Pipeline
1. Install Dependencies:
```bash
pip install requests tqdm concurrent.futures
```
2. Run Content Processor:
```bash
python process.py --input corpus.jsonl --output results.jsonl --threads 24
```
### Client Implementation
```python
import requests
def classify_content(text: str, vllm_url: str = "http://localhost:6002/v1/completions") -> dict:
xml_content = (
'<instruction>Determine true or false if the following content is '
'suitable and should be indexed.</instruction>\n'
'<suitable>\n'
f' <content>{text}</content>'
)
response = requests.post(
vllm_url,
json={
"prompt": xml_content,
"max_tokens": 6000,
"temperature": 1,
"model": "lora",
"stop": ["</suitable>"]
},
timeout=30000
)
completion = response.json()["choices"][0]["text"]
# Parse XML tags
import re
def extract_tag(tag: str) -> str:
match = re.search(f'<{tag}>(.*?)</{tag}>', completion, re.DOTALL)
return match.group(1).strip() if match else ""
return {
"thinking": extract_tag("thinking"),
"category": extract_tag("category"),
"should_index": extract_tag("should_index")
}
```
### Example Usage
```python
text = """Multiservice Tactics, Techniques, and Procedures
for
Nuclear, Biological, and Chemical Aspects of Consequence
Management
TABLE OF CONTENTS..."""
result = classify_content(text)
print(result)
```
Example output:
```json
{
"thinking": "This is a table of contents for a document, not the actual content.",
"category": "table of contents",
"should_index": "false"
}
```
## Batch Processing
The included processor supports parallel processing of JSONL files:
```python
from request_processor import RequestProcessor
processor = RequestProcessor(
input_file="corpus.jsonl",
output_file="results.jsonl",
num_threads=24
)
processor.process_file()
```
Input JSONL format:
```json
{
"pid": "document_id",
"docid": "path/to/source",
"content": "document text",
"metadata": {
"key": "value"
}
}
```
Output JSONL format:
```json
{
"pid": "document_id",
"docid": "path/to/source",
"content": "document text",
"metadata": {
"key": "value"
},
"thinking": "reasoning process",
"category": "content type",
"should_index": "true/false",
"processed_at": "2024-10-22 02:52:33"
}
```
## Implementation and Performance Considerations
- Use thread pooling for parallel processing
- Implement atomic writes with file locking
- Progress tracking with tqdm
- Automatic error handling and logging
- Configurable thread count for optimization
## Error Handling
Errors are captured in the output JSONL:
```json
{
"error": "error message",
"processed_at": "timestamp"
}
```
Monitor errors in real-time:
```bash
tail -f results.jsonl | grep error
``` |