--- license: cc-by-nc-4.0 base_model: lmms-lab/llava-onevision-qwen2-7b-mid-stage-a4 model-index: - name: llama3-siglip-taco-8b results: [] --- # 🌮 TACO: Learning Multi-modal Action Models with Synthetic Chains-of-Thought-and-Action
Figure 1. TACO vs. other multi-modal models
## Usage See our [Github repository](https://github.com/SalesforceAIResearch/TACO). ## Intended uses & limitations This model is intended to be used on complex, multi-step and multi-modal question answering tasks. It is trained to answer visual questions with some of the following 15 actions:```OCR```, ```LocalizeObjects```, ```GetObjects```, ```EstimateRegionDepth```, ```EstimateObjectDepth```, ```Crop```, ```ZoomIn```, ```QueryLanguageModel```, ```GetImageToImagesSimilarity```, ```GetImageToTextsSimilarity```, ```GetTextToImagesSimilarity```, ```DetectFaces```, ```QueryKnowledgeBase```, ```Calculate```, and ```SolveMathEquation```. Additionally, the ```Terminate``` action is also supported for the model to provide a final answer. For other types of tasks that don't benefit from the actions above, you might need to train a new model or further finetune it with other actions. ## Training and evaluation data See our [paper]("https://arxiv.org/pdf/2412.05479") for details. ## Training procedure and hyperparameters See our [paper]("https://arxiv.org/pdf/2412.05479") for details. ## Training results See our [paper]("https://arxiv.org/pdf/2412.05479") for details. ### License information This release is for research purposes only in support of an academic paper. This repository is licensed under the noncommercial license [CC-BY-NC 4.0](https://creativecommons.org/licenses/by-nc/4.0/). ### Citation Please cite us if you find our repository helpful. Thank you! ``` @misc{ma2024tacolearningmultimodalaction, title={TACO: Learning Multi-modal Action Models with Synthetic Chains-of-Thought-and-Action}, author={Zixian Ma and Jianguo Zhang and Zhiwei Liu and Jieyu Zhang and Juntao Tan and Manli Shu and Juan Carlos Niebles and Shelby Heinecke and Huan Wang and Caiming Xiong and Ranjay Krishna and Silvio Savarese}, year={2024}, eprint={2412.05479}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2412.05479}, } ```