--- library_name: transformers license: mit base_model: - meta-llama/Llama-3.2-11B-Vision-Instruct --- # Model Card for Model ID ## Model Details ### Model Description This is the model card of a πŸ€— transformers model that has been pushed on the Hub. This model card has been automatically generated. - **Developed by:** ASUS, NTHU, NTU - **Model type:** Based on Llama-3.2-11B-Vision-Instruct, with added support for voice input. - **Language(s) (NLP):** Supports multiple languages, but optimized for Traditional Chinese. - **License:** MIT - **Finetuned from model [optional]:** meta-llama/Llama-3.2-11B-Vision-Instruct ## Uses The purpose of this multimodal model is to enrich knowledge about tourist attractions in Taiwan and engage travelers through interactive voice responses. You can provide a picture of a Taiwan's landscape to initiate a conversation. ## How to Get Started with the Model Use the code below to get started with the model. ```python import torch from transformers import pipeline import librosa from PIL import Image model_path = "taipei-1-mllama-project-2024/multi-modal-llama-tp1" pipe = pipeline(model=model_path, trust_remote_code=True, device_map='auto') audio, sr = librosa.load("/path/to/θ«‹ε•εœ–η‰‡δΈ­ηš„ζ™―ι»žζ˜―ε“ͺ裑.wav", sr=16000) image = Image.open("/path/to/ε°ε—ε­”ε»Ÿ.jpg") turns = [ dict( role='system', content = "You are a travel expert who can accurately analyze the attractions in the pictures. All conversations should be conducted in Traditional Chinese.", ), dict( role='user', content='<|image|><|begin_of_audio|><|audio|><|end_of_audio|>' ) ] y_pred = pipe({'audio': [audio], 'images': [image], 'turns': turns, 'sampling_rate': sr}, max_new_tokens=300) print(y_pred) # ι€™εΌ΅η…§η‰‡δΈ­ηš„ζ™―ι»žζ˜―ε°η£ηš„γ€Œε°ε—ε­”ε»Ÿγ€γ€‚... ``` ## Training Details ### Training Procedure #### Training Hyperparameters - **Training regime:** [More Information Needed] ## Evaluation ### Testing Data, Factors & Metrics #### Testing Data [More Information Needed] #### Factors [More Information Needed] #### Metrics [More Information Needed] ### Results [More Information Needed] #### Summary ## Technical Specifications [optional] ### Model Architecture and Objective [More Information Needed] ### Compute Infrastructure [Taiwan-1 Super Computer](https://en.wikipedia.org/wiki/Taipei-1_(supercomputer)) #### Hardware H100 x 8 GPUs per node x 16 nodes