1 2

chembra

deepufrk

deepufrk

AI & ML interests

None yet

Recent Activity

liked a Space 4 months ago

ai4bharat/indic-parler-tts

liked a Space 5 months ago

DmitryRyumin/MASAI

updated a Space about 1 year ago

deepufrk/gender_identification

View all activity

Organizations

None yet

deepufrk's activity

liked a Space 4 months ago

171

Indic Parler-TTS

👀

A demo of Indic Parler-TTS

liked a Space 5 months ago

MASAI

😀

Intelligent system for Multimodal Affective States Analysis

updated a Space about 1 year ago

Gender Identification

🐠

reacted to vladbogo's post with 🤝 about 1 year ago

Post

Meta Reality Labs has developed Lumos, a system that merges Multimodal Large Language Models (MM-LLMs) with Scene Text Recognition (STR) to boost the efficiency of various tasks such as multimodal question-answering and text summarization.

Key aspects of Lumos include:

* Hybrid Computing: Utilizes a combination of on-device and cloud computing to process inputs, aiming to reduce latency.
* STR Components:
* Region of Interest (ROI) Detection: Focuses on text-rich areas within images for optimized text extraction.
* Text Detection and Recognition: Ensures high-quality text recognition within the ROI.
* Reading Order Reconstruction: Arranges recognized text to mimic natural reading order, essential for context understanding.

Lumos demonstrates significant improvement with 80% accuracy in question-answering benchmarks and a low word error rate.

Paper: Lumos : Empowering Multimodal LLMs with Scene Text Recognition (2402.08017)

Congrats to the authors for their work!

2 replies

New activity in HumanAIGC/OutfitAnyone about 1 year ago

Upload IMG_20230825_142611.jpg

#70 opened about 1 year ago by

deepufrk

Upload IMG_20230825_142611.jpg

#70 opened about 1 year ago by

deepufrk

Upload IMG_20230825_142611.jpg

#70 opened about 1 year ago by

deepufrk