address-extraction

Next Geography

This is a simple library to extract addresses from text. The train.py file contains the code to train but is just included for reference, not to be run. The model is trained on our own dataset of addresses, which is not included in this repo. There is also predict.py which is a simple script to run the model on a single address.

The model is based on dbmdz/bert-base-turkish-cased from Hugging Face.

Example Results

(g:\projects\address-extraction\venv) G:\projects\address-extraction>python predict.py
Osmangazi Mahallesi, Hoca Ahmet Yesevi Cd. No:34, 16050 Osmangazi/Bursa
Osmangazi                                   Mahalle 98.80%
Hoca Ahmet Yesevi                             Cadde 98.55%
34                                    Bina Numarası 99.50%
16050                                    Posta Kodu 98.49%
Osmangazi                                      İlçe 98.71%
Bursa                                            İl 99.21%
Average Score:  0.9874102413654328
Labels Found:  6
----------------------------------------------------------------------
Karşıyaka Mahallesi, Mavişehir Caddesi No: 91, Daire 4, 35540 Karşıyaka/İzmir
Karşıyaka                                   Mahalle 98.93%
Mavişehir                                     Cadde 96.90%
91                                    Bina Numarası 99.25%
4                                     Bina Numarası 30.75%
35540                                    Posta Kodu 98.97%
Karşıyaka                                      İlçe 98.84%
İzmir                                            İl 98.86%
Average Score:  0.9173339426517486
Labels Found:  7
----------------------------------------------------------------------
Selçuklu Mahallesi, Atatürk Bulvarı No: 55, 42050 Selçuklu/Konya
Selçuklu                                    Mahalle 98.53%
Atatürk                                       Cadde 47.01%
55                                    Bina Numarası 99.49%
42050                                    Posta Kodu 98.78%
Selçuklu                                       İlçe 98.74%
Konya                                            İl 99.16%
Average Score:  0.9240859523415565
Labels Found:  6
----------------------------------------------------------------------
Alsancak Mahallesi, 1475. Sk. No:3, 35220 Konak/İzmir
Alsancak                                    Mahalle 99.35%
1475                                          Sokak 97.71%
3                                     Bina Numarası 99.18%
35220                                    Posta Kodu 99.00%
Konak                                          İlçe 98.90%
İzmir                                            İl 98.95%
Average Score:  0.9881603717803955
Labels Found:  6
----------------------------------------------------------------------
Kocatepe Mahallesi, Yaşam Caddesi 3. Sokak No:4, 06420 Bayrampaşa/İstanbul
Kocatepe                                    Mahalle 99.44%
Yaşam                                         Cadde 92.45%
3                                             Sokak 70.61%
4                                     Bina Numarası 99.18%
06420                                    Posta Kodu 99.00%
Bayrampaşa                                     İlçe 98.86%
İstanbul                                         İl 98.90%
Average Score:  0.9558616995811462
Labels Found:  7
----------------------------------------------------------------------

Installation & Usage

The environment.yml file contains the conda environment used to run the model. Environment is configured to use cuda enabled gpus but should work with no gpus too. To run the model, you can use the following commands:

conda env create -f environment.yml -p ./condaenv
conda activate ./condaenv

python predict.py

License

This project is licensed under the terms of the MIT license.

Downloads last month
62
Safetensors
Model size
110M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for nextgeo/address-extraction

Finetuned
(97)
this model