Infi-MM
/

infimm-hd

Text Generation

Model card Files Files and versions

infimm-hd / README.md

lllliuhhhhggg's picture

Update README.md

c20153b verified over 1 year ago

|

history blame contribute delete

2.34 kB

	---
	language: en
	tags:
	- multimodal
	- text
	- image
	- image-to-text
	datasets:
	- HuggingFaceM4/OBELICS
	- laion/laion2B-en
	- coyo-700m
	- mmc4
	pipeline_tag: text-generation
	inference: true
	---
	## Paper

	More detailes can be found in our paper at https://arxiv.org/abs/2403.01487. We have released the pretraining model and the pyotrch code at https://github.com/InfiMM/infimm-hd/. Feel free to build your model from our pretrained model.

	## Quickstart

	Use the code below to get started with the base model:
	```python
	import torch
	from transformers import AutoModelForCausalLM, AutoProcessor

	processor = AutoProcessor.from_pretrained("Infi-MM/infimm-hd", trust_remote_code=True)

	prompts = [
	{
	"role": "user",
	"content": [
	{"image": "/xxx/test.jpg"}, # change it with you image
	"Please describe the image in detail.",
	],
	}
	]
	inputs = processor(prompts)
	# use bf16 and gpu 0
	model = AutoModelForCausalLM.from_pretrained(
	"Infi-MM/infimm-hd",
	torch_dtype=torch.bfloat16,
	trust_remote_code=True,
	).to(0).eval()

	inputs = inputs

	inputs["batch_images"] = inputs["batch_images"].to(torch.bfloat16)
	for k in inputs:
	inputs[k] = inputs[k].to(model.device)

	generated_ids = model.generate(
	**inputs,
	min_new_tokens=0,
	max_new_tokens=256,
	)
	generated_text = processor.batch_decode(generated_ids, skip_special_tokens=True)
	print(generated_text)
	```
	## License

	<a href="https://creativecommons.org/licenses/by-nc/4.0/deed.en">
	<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/d/d3/Cc_by-nc_icon.svg/600px-Cc_by-nc_icon.svg.png" width="160">
	</a>

	This project is licensed under the CC BY-NC 4.0.

	The copyright of the images belongs to the original authors.

	See [LICENSE](LICENSE) for more information.

	## Contact Us

	Please feel free to contact us via email [[email protected]]([email protected]) if you have any questions.

	## Citation

	```latex
	@misc{liu2024infimmhd,
	title={InfiMM-HD: A Leap Forward in High-Resolution Multimodal Understanding},
	author={Haogeng Liu and Quanzeng You and Xiaotian Han and Yiqi Wang and Bohan Zhai and Yongfei Liu and Yunzhe Tao and Huaibo Huang and Ran He and Hongxia Yang},
	year={2024},
	eprint={2403.01487},
	archivePrefix={arXiv},
	primaryClass={cs.CV}
	}
	```