--- language: zh license: mit tags: - text-classification - bert - chinese - vehicle-control - user-instructions datasets: - custom metrics: - accuracy - f1 - precision - recall --- # Vehicle User Instructions Classification - BERT (Chinese) This repository contains a fine-tuned BERT model for classifying vehicle user instructions in Chinese. The model is trained on a dataset of user instructions related to various vehicle control commands. ## Preface This fine-tuned model is for Our team's UOW CSIT998 Professional Capstone Project. ## Dataset The dataset used for training and evaluation consists of Chinese text instructions corresponding to different vehicle control commands. The distribution of the dataset is as follows: - Training set: 4499 samples - Validation set: 2249 samples - Test set: 2250 samples The instructions cover a range of vehicle control commands, including: ``` {'开车窗': 0, '关左车门': 1, '关右前车窗': 2, '关闭引擎': 3, '关左前车窗': 4, '开右前车窗': 5, '关左后车窗': 6, '开左后车窗': 7, '开后备箱': 8, '关车门': 9, '关车窗': 10, '开左前车窗': 11, '关右后车窗': 12, '开敞篷': 13, '开左侧车窗': 14, '关敞篷': 15, '喇叭': 16, '开右后车窗': 17, '开右车门': 18, '停车点1': 19, '关后备箱': 20, '关右车门': 21, '开左车门': 22, '停车点2': 23, '开车门': 24, '打开引擎': 25, '关左侧车窗': 26} ``` ## Model The model is based on the pre-trained Chinese BERT model (`bert-base-chinese`). It has been fine-tuned on the vehicle user instructions dataset using the following training arguments: ```python training_args = TrainingArguments( output_dir='', do_train=True, do_eval=True, num_train_epochs=3, per_device_train_batch_size=16, per_device_eval_batch_size=32, warmup_steps=100, weight_decay=0.01, logging_strategy='steps', logging_dir='', logging_steps=50, evaluation_strategy="steps", eval_steps=50, save_strategy="steps", save_steps=200, fp16=True, load_best_model_at_end=True ) ``` ## Training Results The model was trained for 3 epochs, and the training progress can be summarized as follows: | Step | Training Loss | Validation Loss | Accuracy | F1 | Precision | Recall | |------|---------------|-----------------|----------|--------|-----------|---------| | 50 | 3.257000 | 2.964479 | 0.168519 | 0.089801 | 0.229036 | 0.126555 | | 100 | 2.525000 | 1.711695 | 0.648288 | 0.532127 | 0.595545 | 0.590985 | | 150 | 1.197200 | 0.628560 | 0.921298 | 0.888212 | 0.892879 | 0.890719 | | ... | ... | ... | ... | ... | ... | ... | | 8000 | 0.045900 | 0.136842 | 0.969320 | 0.969658 | 0.969638 | 0.970056 | ## Evaluation The trained model was evaluated on the training, validation, and test sets, achieving the following performance: | | eval_loss | eval_Accuracy | eval_F1 | eval_Precision | eval_Recall | |-------|-----------|---------------|----------|----------------|-------------| | train | 0.036020 | 0.991331 | 0.991048 | 0.991615 | 0.990673 | | val | 0.136842 | 0.969320 | 0.969658 | 0.969638 | 0.970056 | | test | 0.126695 | 0.974222 | 0.975473 | 0.975814 | 0.975435 | The model achieves high accuracy, F1 score, precision, and recall on all three datasets, indicating its effectiveness in classifying vehicle user instructions. ## Usage To use the fine-tuned model for inference, you can utilize the Hugging Face Inference API. Here's an example of how to make a request to the API using Python: ```python import requests API_URL = "https://api-inference.huggingface.co/models/lindsey-chang/vehicle-user-instructions-classification-bert-chinese" headers = {"Authorization": f"Bearer {API_TOKEN}"} def query(payload): response = requests.post(API_URL, headers=headers, json=payload) return response.json() # Example usage input_text = "请打开车窗" output = query({"inputs": input_text}) print(output) ``` Replace `your-username` with your Hugging Face username and `API_TOKEN` with your personal API token, which you can create in your Hugging Face account settings. The model will return the predicted class index for the input instruction. You can map the class index back to the corresponding vehicle control command using the provided class labels.