Stephen Fernandes 's picture

29 7 7

Stephen Fernandes

StephennFernandes

·

StephennFernandes

AI & ML interests

Natural Language Processing , Reinforcement Learning

Recent Activity

new activity 1 day ago

microsoft/Phi-4-multimodal-instruct:Experience with Phi-4-Multimodal vs. Whisper-1 for Speech-to-Text

reacted to wassemgtk's post with 👀 2 days ago

I’ve been diving into the iRoPE architecture from Llama 4—a game-changer for long-context models! It interleaves local attention (with RoPE) for short contexts and global attention (with inference-time temp scaling) for long-range reasoning, aiming for infinite context. I’m going to try writing iRoPE—who wants to help? Code: https://github.com/wassemgtk/iRoPE-try/blob/main/iRoPE.ipynb

upvoted a collection 3 days ago

View all activity

Organizations

StephennFernandes's activity

New activity in microsoft/Phi-4-multimodal-instruct 1 day ago

Experience with Phi-4-Multimodal vs. Whisper-1 for Speech-to-Text

#39 opened 28 days ago by

reacted to wassemgtk's post with 👀 2 days ago

Post

2601

I’ve been diving into the iRoPE architecture from Llama 4—a game-changer for long-context models! It interleaves local attention (with RoPE) for short contexts and global attention (with inference-time temp scaling) for long-range reasoning, aiming for infinite context. I’m going to try writing iRoPE—who wants to help?

Code: https://github.com/wassemgtk/iRoPE-try/blob/main/iRoPE.ipynb

1 reply

·

upvoted a collection 3 days ago

Llama 4

Llama 4 release • 10 items • Updated 3 days ago • 397

liked a dataset 5 days ago

glaiveai/reasoning-v1-20m

Viewer • Updated 20 days ago • 22.2M • 11k • 175

liked 2 models 3 months ago

MiniMaxAI/MiniMax-Text-01

Text Generation • Updated 22 days ago • 3.34k • 568

deepseek-ai/DeepSeek-V3

Text Generation • Updated 13 days ago • 753k • • 3.8k

liked a dataset 8 months ago

argilla/magpie-ultra-v0.1

Viewer • Updated Nov 26, 2024 • 50k • 365 • 222

New activity in google/umt5-xxl 9 months ago

RuntimeError: Error(s) in loading state_dict for UMT5ForTokenClassification:

#1 opened 9 months ago by

StephennFernandes

New activity in Qwen/Qwen-72B 10 months ago

how did you guys pretrain the tokenizer using tiktoken ?

#9 opened 10 months ago by

StephennFernandes

New activity in facebook/w2v-bert-2.0 11 months ago

ValueError: negative dimensions are not allowed

#26 opened 11 months ago by

StephennFernandes

How to use an LM like n-gram LM with w2v-bert-2.0?

#22 opened about 1 year ago by

How to use n-gram with this model?

#24 opened about 1 year ago by

New activity in google/siglip-base-patch16-256-multilingual 11 months ago

number of languages supported ?

#3 opened 11 months ago by

StephennFernandes

New activity in nvidia/Llama3-ChatQA-1.5-8B 11 months ago

Megatron LM training (fine-tuning) code ?

#9 opened 11 months ago by

StephennFernandes

upvoted a paper 12 months ago

CatLIP: CLIP-level Visual Recognition Accuracy with 2.7x Faster Pre-training on Web-scale Image-Text Data

Paper • 2404.15653 • Published Apr 24, 2024 • 29

New activity in ai4bharat/sangraha 12 months ago

how to download sangraha synthetic dataset ?

#3 opened about 1 year ago by

StephennFernandes

upvoted a collection 12 months ago

Meta Llama 3

This collection hosts the transformers and original repos of the Meta Llama 3 and Llama Guard 2 releases • 5 items • Updated Dec 6, 2024 • 737

New activity in ylacombe/w2v-bert-2.0 12 months ago

ValueError: negative dimensions are not allowed

#13 opened about 1 year ago by

StephennFernandes

liked a Space about 1 year ago

Beam Search Visualizer

View how beam search decoding works, in detail!

New activity in ai4bharat/sangraha about 1 year ago

datasets.utils.info_utils.NonMatchingSplitsSizesError:

#4 opened about 1 year ago by

StephennFernandes