PHI4

#9
by mans0987 - opened

How does this model compare with PHI4 multi-modal?

Microsoft org

interesting question, they are developed from different objectives. It is hard to compare apple to apple.

can you please elaborate? when should I use Phi4 and when should I use this model (assuming that I am only interested in text and Vison and not audio which Phi4 has but this model doesn't have)?

Microsoft org

If you are only interested in text and vision, Magma is good at spatial understanding and reasoning for multimodal inputs, but phi4 is better at reading texts from the images based on my rough glimpse.

I am looking to detect if a house image shows the entry to that house, what is your suggestion?

Microsoft org

I think you can try both for this task.

I did try magma and it can not distinguish between the front of a house and the back or side of a house. I have difficulty using PHI4 as flash attention takes too long to be installed.

Is there any documentation on how to write prompts that can distinguish objects in the image?

mans0987 changed discussion status to closed
Your need to confirm your account before you can post a new comment.

Sign up or log in to comment