Nicolay Rusnachenko

nicolay-r

https://nicolay-r.github.io/

AI & ML interests

Information Retrieval・Medical Multimodal NLP (🖼+📝) Research Fellow @BU_Research・software developer http://arekit.io・PhD in NLP

Recent Activity

posted an update about 4 hours ago

📢 For those who wish to launch distilled DeepSeek R1 for reasoning with schema, sharing the Google Colab notebook: 📙 https://github.com/nicolay-r/nlp-thirdgate/blob/master/tutorials/llm_deep_seek_7b_distill_colab.ipynb This is a wrapper of the Qwen2 model hf provider via bulk-chain framework. Model: https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B GPU: T4 (15GB) is nearly enough in float32 mode. 🚀 To boost performance to load in bf16 (setup use_bf16=True) 🌟 Powered by bulk-chain: https://github.com/nicolay-r/bulk-chain

upvoted a collection 1 day ago

DeepSeek-R1

posted an update 1 day ago

📢 For those who wish to apply DeepSeek-R1 for handling tabular / streaming data using schema of prompts (CoT), the OpenRouter AI hosts API for accessing: https://openrouter.ai/deepseek/deepseek-r1 The no-string option to quick start with using DeepSeek-R1 includes three steps: ✅ OpenRouter provider: https://github.com/nicolay-r/nlp-thirdgate/blob/master/llm/open_router.py ✅ Bulk-chain for infering data: https://github.com/nicolay-r/bulk-chain ✅ Json Schema for Chain-of-Though reasoning (see screenshot 📷 below) 📺 below is a screenshot of how to quick start the demo, in which you can test your schema for LLM responses. It would ask to type all the parameters first for completing the requests (which is `text` within this example). 📃 To apply it for JSONL/CSV data, you can use `--src` shell parameter for passing the related file ⏳ As for time, OpenRouter finds me relatively slow with 30~40 seconds per request Models: https://huggingface.co/deepseek-ai/DeepSeek-R1

View all activity

Organizations

None yet

nicolay-r's activity

posted an update about 4 hours ago

Post

148

📢 For those who wish to launch distilled DeepSeek R1 for reasoning with schema, sharing the Google Colab notebook:
📙 https://github.com/nicolay-r/nlp-thirdgate/blob/master/tutorials/llm_deep_seek_7b_distill_colab.ipynb
This is a wrapper of the Qwen2 model hf provider via bulk-chain framework.
Model: deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
GPU: T4 (15GB) is nearly enough in float32 mode.
🚀 To boost performance to load in bf16 (setup use_bf16=True)
🌟 Powered by bulk-chain: https://github.com/nicolay-r/bulk-chain

upvoted a collection 1 day ago

DeepSeek-R1

Collection

8 items • Updated 7 days ago • 206

posted an update 1 day ago

Post

1055

📢 For those who wish to apply DeepSeek-R1 for handling tabular / streaming data using schema of prompts (CoT), the OpenRouter AI hosts API for accessing:
https://openrouter.ai/deepseek/deepseek-r1

The no-string option to quick start with using DeepSeek-R1 includes three steps:
✅ OpenRouter provider: https://github.com/nicolay-r/nlp-thirdgate/blob/master/llm/open_router.py
✅ Bulk-chain for infering data: https://github.com/nicolay-r/bulk-chain
✅ Json Schema for Chain-of-Though reasoning (see screenshot 📷 below)

📺 below is a screenshot of how to quick start the demo, in which you can test your schema for LLM responses. It would ask to type all the parameters first for completing the requests (which is text within this example).

📃 To apply it for JSONL/CSV data, you can use --src shell parameter for passing the related file

⏳ As for time, OpenRouter finds me relatively slow with 30~40 seconds per request

Models:
deepseek-ai/DeepSeek-R1

liked a model 1 day ago

deepseek-ai/DeepSeek-R1

Text Generation • Updated 1 day ago • 149k • 3.65k

posted an update 3 days ago

Post

721

📢 For those who tracking adances in Sentiment Analysis, this post might be relevant for you. So far we arranged RuOpinionNE-2024 competition with final stage that has just been completed several days ago. Here the quick findings we got from the top submissions. 🎉

🔍 First, RuOpinonNE-2024 competition 💻 on extraction of opinion tuples (opinion source, opinion object, tonality, linguistic expression) from low resource domain news texts written in Russian language. The competition is hosted by codalab platform:

https://codalab.lisn.upsaclay.fr/competitions/20244

To asses the advanes, we adopt F1 over sentiment classes which also involves evalution of the spans.
👏 Among 7 participants in total, the top three submissions showcase the following results:

🥉msuai F1=0.33 🎊
🥈RefalMachine showcase +0.02 F1=0.35 🎊
🏆VatolinAlexey showcase +0.06 F1=0.41 🎊

📝 At present, the competition organizers are working on:
1. 🟡 Collecting information about models utilized participants to contribute here with pre-trained models / concepts;
2. 🟡 Wrapping up findings from the submissions in a paper.

🔔 For more information and further updates, the most complete source that complements codalab is this github:
https://github.com/dialogue-evaluation/RuOpinionNE-2024

The RuOpinionNE-2024 are now in post-evaluation stage, so everyone interested in low resouce domain evaluation on opinon extraction are welcome 🙌

posted an update 4 days ago

Post

354

📢 Deligted to share the new version of the bulk-ner which represent a tiny framework that would save you time for deploing NER with any model.

📦: https://pypi.org/project/bulk-ner/0.25.1/
🌟: https://github.com/nicolay-r/bulk-ner

The direct adaptaion of the LM for NER would result in spending signifcant amount of time on formatting your texts according to the NER-model needs.
In particular:
1. Processing CONLL format with B-I-O tags from model outputs
2. Input trimming: long input content might not be completely fitted

The 0.25.1 I made a huge steps forward by providing:
✅ Enhanced integration by providing function for casting extracted enties to your type (see picture below)
✅ Enhanced integration with AREkit pipelines
✅ Simpified API for using (Example using DeepPavlov NER models): https://github.com/nicolay-r/bulk-ner/wiki#api

👏 The code for pipeline deployment is taken from the AREkit project:
https://github.com/nicolay-r/AREkit

reacted to cutechicken's post with 🚀 7 days ago

Post

2832

🔬 PaperImpact
: Scientific Impact Predictor Powered by Deep Learning 🎯

VIDraft/PaperImpact

📚 Overview
A cutting-edge AI system that combines transformer architecture with citation pattern analysis to predict research impact. Our model, trained on 120,000+ CS papers, analyzes innovation potential, methodological robustness, and future impact, providing researchers with valuable insights before publication.
🧠 Scientific Foundation

BERT-based semantic analysis
Citation network pattern learning
NDCG optimization & MSE loss
Cross-validated prediction engine
GPU-accelerated inference

💫 Why Researchers Need This

Pre-submission impact assessment
Research direction optimization
Time-saving paper evaluation
Competitive edge in academia
Trend identification advantage

🎯 Key Features

One-click arXiv paper analysis
Real-time impact scoring (0-1)
9-tier grading system (AAA-C)
Smart input validation
Instant visual feedback

🌟 Unique Benefits
"Don't wait years to know your paper's impact. Get instant, AI-powered insights to strengthen your research strategy and maximize your academic influence."
Perfect for:

Research authors
PhD students
Journal editors
Research institutions
Grant committees

#ResearchImpact #AcademicAI #ScienceMetrics #ResearchExcellence

1 reply

posted an update 8 days ago

Post

471

📢 I am happy to share the bulk-translate 0.25.1. 🎊
This is a framework that allows you adapt your LM or use default (googletranslate API) for a quick translation of your dataset data.

⭐ https://github.com/nicolay-r/bulk-translate

bulk-translate is a tiny Python 🐍 no-string framework that allows translate a massive datasets of pre-annotated fixed-spans with related metadata that are invariant for translator (see picture below). It supports 👨‍💻 API for quick data translation with (optionaly) annotated objects in texts (see figure below) in Python 🐍

In new release we enahnce:
1. Fixed: sync type checking for spans representation
2. Compatibility with AREkit pipelines

🤖 The quick tutorial for applying it towards list of textual data with optional spans:
https://github.com/nicolay-r/nlp-thirdgate/blob/master/tutorials/translate_texts_with_spans_via_googletrans.ipynb

reacted to mkurman's post with 🔥👍 8 days ago

Post

1233

ReasonFlow 🧠

Are you fascinated by reasoning models? If so, you won't want to miss my latest project! I've implemented multiple path generations to supercharge the reasoning capabilities of O1-like models. Explore how this work can elevate your model in complex reasoning tasks!

https://github.com/mkurman/ReasonFlow

Use it with:
mkurman/phi4-MedIT-10B-o1
- or -
mkurman/llama-3.2-MEDIT-3B-o1

1 reply

reacted to Kseniase's post with 👍👀 8 days ago

Post

1857

10 Recent Advancements in Math Reasoning

Over the last few weeks, we have witnessed a surge in AI models' math reasoning capabilities. Top companies like Microsoft, NVIDIA, and Alibaba Qwen have already joined this race to make models "smarter" in mathematics. But why is this shift happening now?

Complex math calculations require advanced multi-step reasoning, making mathematics an ideal domain for demonstrating a model's strong "thinking" capabilities. Additionally, as AI continues to evolve and is applied in math-intensive fields such as machine learning and quantum computing (which is predicted to see significant growth in 2025), it must meet the demands of complex reasoning.
Moreover, AI models can be integrated with external tools like symbolic solvers or computational engines to tackle large-scale math problems, which also needs high-quality math reasoning.

So here’s a list of 10 recent advancements in math reasoning of AI models:

1. NVIDIA: AceMath: Advancing Frontier Math Reasoning with Post-Training and Reward Modeling (2412.15084)

2. Qwen, Alibaba: Qwen2.5-Math-PRM The Lessons of Developing Process Reward Models in Mathematical Reasoning (2501.07301) and PROCESSBENCH evaluation ProcessBench: Identifying Process Errors in Mathematical Reasoning (2412.06559)

3. Microsoft Research: rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking (2501.04519)

4. BoostStep: Boosting mathematical capability of Large Language Models via improved single-step reasoning (2501.03226)

5. URSA: Understanding and Verifying Chain-of-thought Reasoning in Multimodal Mathematics (2501.04686)

6. U-MATH: A University-Level Benchmark for Evaluating Mathematical Skills in LLMs (2412.03205)

7. Open Eyes, Then Reason: Fine-grained Visual Mathematical Understanding in MLLMs (2501.06430)

8. End-to-End Bangla AI for Solving Math Olympiad Problem Benchmark: Leveraging Large Language Model Using Integrated Approach (2501.04425)

9. Quantization Meets Reasoning: Exploring LLM Low-Bit Quantization Degradation for Mathematical Reasoning (2501.03035)

10. System-2 Mathematical Reasoning via Enriched Instruction Tuning (2412.16964)

reacted to neph1's post with 🤗 8 days ago

Post

1622

There's a new version of the Swedish instruct model, bellman. Due to 'popular demand' (at least as opposed to 'no demand'), I based it off the latest mistral 7b, v0.3. The v0.2 seems to be the most popular of the bunch, despite being quite old by now. Why, I don't know. Must be a link in some old reddit post that is drawing clicks. :)
Anyway, here it is:
neph1/bellman-mistral-7b-instruct-v0.3
You can also try it out (on cpu), here:
neph1/bellman

reacted to singhsidhukuldeep's post with 🔥 8 days ago

Post

2008

Exciting breakthrough in large-scale recommendation systems! ByteDance researchers have developed a novel real-time indexing method called "Streaming Vector Quantization" (Streaming VQ) that revolutionizes how recommendations work at scale.

>> Key Innovations

Real-time Indexing: Unlike traditional methods that require periodic reconstruction of indexes, Streaming VQ attaches items to clusters in real time, enabling immediate capture of emerging trends and user interests.

Superior Balance: The system achieves remarkable index balancing through innovative techniques like merge-sort modification and popularity-aware cluster assignment, ensuring all clusters participate effectively in recommendations.

Implementation Efficiency: Built on VQ-VAE architecture, Streaming VQ features a lightweight and clear framework that makes it highly implementation-friendly for large-scale deployments.

>> Technical Deep Dive

The system operates in two key stages:
- An indexing step using a two-tower architecture for real-time item-cluster assignment
- A ranking step that employs sophisticated attention mechanisms and deep neural networks for precise recommendations.

>> Real-world Impact

Already deployed in Douyin and Douyin Lite, replacing all major retrievers and delivering significant user engagement improvements. The system handles a billion-scale corpus while maintaining exceptional performance and computational efficiency.

This represents a significant leap forward in recommendation system architecture, especially for platforms dealing with dynamic, rapidly-evolving content. The ByteDance team's work demonstrates how rethinking fundamental indexing approaches can lead to substantial real-world improvements.

reacted to KnutJaegersberg's post with 🔥 8 days ago

Post

1744

Understanding and Benchmarking Artificial Intelligence: OpenAI's o3 Is Not AGI

It's an interesting paper that argues "new approaches are required that can reliably solve a wide variety of problems without existing skills."
"It is therefore hoped that the benchmark outlined in this article contributes to further exploration of this direction of research and incentivises the development of new AGI approaches that focus on intelligence rather than skills."

https://arxiv.org/abs/2501.07458

posted an update 9 days ago

Post

1311

📢 So far I been passioned about making NLP pipeline for handling iterator of texts with no-string dependency from besides third-party providers of your choice.

By starting with text-translation, delighted to share the related notebooks that might save you time for handling your data

⭐ https://github.com/nicolay-r/nlp-thirdgate

Example of using GoogleTranslate API in no-string for handling textual data iterators with spans:

📙 https://github.com/nicolay-r/nlp-thirdgate/blob/master/tutorials/translate_texts_with_spans_via_googletrans.ipynb

The key concept is that all these API examples could be tied into a single pipeline using AREkit

📘 https://github.com/nicolay-r/AREkit

🛠️ The further plan is to popualte this repo with
1. NER (DeepPavlov models wrapper)
2. LLM with fancy out-of-the-box chain-of-thought declaration support.

upvoted a paper 12 days ago

HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs

Paper • 2412.18925 • Published Dec 25, 2024 • 97

liked a Space 14 days ago

Running on CPU Upgrade

324

🥇

Open Medical-LLM Leaderboard

liked a model 14 days ago

johnsnowlabs/JSL-MedLlama-3-8B-v2.0

Text Generation • Updated Apr 30, 2024 • 11.8k • 29

posted an update 28 days ago

Post

653

📢 Through the 2024 we attempting in advancing opinion mining by proposing evaluation which involves explanations!

A while ago we launched RuOpinionNE-2024 aimed at extraction of sentiment opinions with spans (as explanations) from mass media news written in Russian language. The formed competition is at the final stage on codalab platform:
📊 https://codalab.lisn.upsaclay.fr/competitions/20244

🔎 What we already observe? For the two type of sentiment labels (positive and negative), our recent findings were that the top performing submission results in F1=0.34 while the baseline LLM approach results in F1=0.17 (see screenshot of the leaderboard below 📸)

⏰️ We finally launch the final stage with a decent amount of submissions which lasts until
15th of January 2025.

🙌 Everyone who wish to evaluate most recent advances on explainable opinion mining during the final stage are welcome!

Codalab main page:
https://codalab.lisn.upsaclay.fr/competitions/20244#learn_the_details
More details on github:
https://github.com/dialogue-evaluation/RuOpinionNE-2024