PHI4

by mans0987 - opened 14 days ago

Discussion

mans0987

14 days ago

How does this model compare with PHI4 multi-modal?

jw2yang

Microsoft org 14 days ago

interesting question, they are developed from different objectives. It is hard to compare apple to apple.

mans0987

14 days ago

can you please elaborate? when should I use Phi4 and when should I use this model (assuming that I am only interested in text and Vison and not audio which Phi4 has but this model doesn't have)?

jw2yang

Microsoft org 14 days ago

If you are only interested in text and vision, Magma is good at spatial understanding and reasoning for multimodal inputs, but phi4 is better at reading texts from the images based on my rough glimpse.

mans0987

14 days ago

I am looking to detect if a house image shows the entry to that house, what is your suggestion?

jw2yang

Microsoft org 13 days ago

I think you can try both for this task.

mans0987

13 days ago

I did try magma and it can not distinguish between the front of a house and the back or side of a house. I have difficulty using PHI4 as flash attention takes too long to be installed.

Is there any documentation on how to write prompts that can distinguish objects in the image?

mans0987 changed discussion status to closed 13 days ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment