--- library_name: transformers license: apache-2.0 base_model: - google-bert/bert-base-uncased pipeline_tag: text-classification --- # Model Card for Model ID ## Model Details ### Model Description This model was fine-tuned on addresses from Canada open data portal to parse Canadian addresses into ["B-STREET_NO", "I-STREET_NO", "B-STREET_NAME", "I-STREET_NAME", "B-STREET_TYPE", "I-STREET_TYPE", "B-STREET_DIR","I-STREET_DIR", "B-CITY", "I-CITY"] The results with the same tag need to be concatenated to provide meaningful output; please see section "How to Get Started with the Model" for inference example. - **Developed by:** [Juntao Zhang] - **Funded by [optional]:** [More Information Needed] - **Shared by [optional]:** [More Information Needed] - **Model type:** [BERT-based token classification model] - **Language(s) (NLP):** [More Information Needed] - **License:** [More Information Needed] - **Finetuned from model [optional]:** [bert-base-uncased] ### Model Sources [optional] - **Repository:** [More Information Needed] - **Paper [optional]:** [More Information Needed] - **Demo [optional]:** [More Information Needed] ## Uses ### Direct Use [This model can be used for token classification tasks, such as named entity recognition (NER) or address token classification. ] ### Downstream Use [optional] [address matching, address auto-correction etc.] ### Out-of-Scope Use [More Information Needed] ## Bias, Risks, and Limitations [More Information Needed] ### Recommendations Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. ## How to Get Started with the Model Use the code below to get started with the model. [ ``` import torch from transformers import pipeline import os import json class GeoLLMBertInference: def __init__(self, config_path='config.json'): with open(config_path, 'r') as config_file: config = json.load(config_file) self.project_path = config['project_path'] self.tokenizer_path = os.path.join(self.project_path, config['tokenizer_path']) self.model_path = os.path.join(self.project_path, config['model_path']) # Check if a GPU is available and set the device accordingly self.device = 0 if torch.cuda.is_available() else -1 self.ner_pipeline = pipeline("ner", model=self.model_path, tokenizer=self.tokenizer_path, device=self.device) self.result = None self.concatenate_result = None def get_ner_result(self, address): self.result = self.ner_pipeline(address.upper()) return self.result def concatenate_entities(self): if self.result is None: raise ValueError("NER result is not available. Please run get_ner_result first.") concatenated_result = {} for entity in self.result: tag = entity['entity'] word = entity['word'].replace('##', '').replace(',', '') if tag not in concatenated_result: concatenated_result[tag] = word.upper() else: concatenated_result[tag] += '' + word.upper() self.concatenate_result = concatenated_result return self.concatenate_result def get_json_result(self): if self.concatenate_result is None: raise ValueError("Concatenated result is not available. Please run concatenate_entities first.") return json.dumps(self.concatenate_result, indent=4) # Example Usage if __name__ == "__main__": geo_llm = GeoLLMBertInference('code/geo_llm/config.json') address = "16 ChSeAStREtST.CATHARINE" result = geo_llm.get_ner_result(address) print(result) concatenate_result = geo_llm.concatenate_entities() print(concatenate_result) # Get the concatenated result in JSON format json_result = geo_llm.get_json_result() data = json.loads(json_result) # Print the JSON string print(json_result) ``` ] ## Training Details ### Training Data [More Information Needed] ### Training Procedure #### Preprocessing [optional] [More Information Needed] #### Training Hyperparameters - **Training regime:** [More Information Needed] #### Speeds, Sizes, Times [optional] [More Information Needed] ## Evaluation ### Testing Data, Factors & Metrics #### Testing Data [More Information Needed] #### Factors [More Information Needed] #### Metrics [More Information Needed] ### Results [More Information Needed] #### Summary ## Model Examination [optional] [More Information Needed] ## Environmental Impact Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). - **Hardware Type:** [More Information Needed] - **Hours used:** [More Information Needed] - **Cloud Provider:** [More Information Needed] - **Compute Region:** [More Information Needed] - **Carbon Emitted:** [More Information Needed] ## Technical Specifications [optional] ### Model Architecture and Objective [More Information Needed] ### Compute Infrastructure [More Information Needed] #### Hardware [More Information Needed] #### Software [More Information Needed] ## Citation [optional] **BibTeX:** [More Information Needed] **APA:** [More Information Needed] ## Glossary [optional] [More Information Needed] ## More Information [optional] [More Information Needed] ## Model Card Authors [optional] [More Information Needed] ## Model Card Contact [More Information Needed]