Engage in multi-modal conversations with images and videos
Generate chat responses using Llama-2 13B model