chembra

deepufrk

AI & ML interests

None yet

Recent Activity

liked a Space 4 months ago
ai4bharat/indic-parler-tts
liked a Space 5 months ago
DmitryRyumin/MASAI
updated a Space about 1 year ago
deepufrk/gender_identification
View all activity

Organizations

None yet

deepufrk's activity

reacted to vladbogo's post with ๐Ÿค about 1 year ago
view post
Post
Meta Reality Labs has developed Lumos, a system that merges Multimodal Large Language Models (MM-LLMs) with Scene Text Recognition (STR) to boost the efficiency of various tasks such as multimodal question-answering and text summarization.

Key aspects of Lumos include:

* Hybrid Computing: Utilizes a combination of on-device and cloud computing to process inputs, aiming to reduce latency.
* STR Components:
* Region of Interest (ROI) Detection: Focuses on text-rich areas within images for optimized text extraction.
* Text Detection and Recognition: Ensures high-quality text recognition within the ROI.
* Reading Order Reconstruction: Arranges recognized text to mimic natural reading order, essential for context understanding.

Lumos demonstrates significant improvement with 80% accuracy in question-answering benchmarks and a low word error rate.

Paper: Lumos : Empowering Multimodal LLMs with Scene Text Recognition (2402.08017)

Congrats to the authors for their work!
  • 2 replies
ยท
New activity in HumanAIGC/OutfitAnyone about 1 year ago

Upload IMG_20230825_142611.jpg

1
#70 opened about 1 year ago by
deepufrk

Upload IMG_20230825_142611.jpg

1
#70 opened about 1 year ago by
deepufrk

Upload IMG_20230825_142611.jpg

1
#70 opened about 1 year ago by
deepufrk