Josef Kurk Edwards

Drjkedwards
ยท

AI & ML interests

https://ollama.com/bearycool11/FamousPersonLLM Also upgrading further my gpt 4.5 with Sam ALtman into GPT 5

Recent Activity

Organizations

OpenAI's profile picture

Drjkedwards's activity

reacted to openfree's post with ๐Ÿ‘ 16 days ago
view post
Post
5555
Korean Exam Leaderboard: LLMs vs Civil Service and Professional Qualification Exams ๐Ÿ“

openfree/Korean-Exam-Leaderboard

## ๐Ÿ“Š What is this leaderboard?
This leaderboard evaluates the performance of various AI models on 22 Korean civil service and professional qualification exams. All scores are converted to a 100-point scale to show how well different LLMs can solve actual Korean civil service and professional qualification tests!

## ๐Ÿ† Current Top Performers
- **OpenAI/GPT-o1**: Bar Exam 52.5 points ๐Ÿฅ‡
- **OpenAI/GPT-4.5**: Bar Exam 49.33 points ๐Ÿฅˆ
- **OpenAI/GPT-4o**: Bar Exam 49.11 points ๐Ÿฅ‰
- **deepseek-ai/DeepSeek-R1**: Bar Exam 47.33 points

## ๐Ÿ“‹ Exams Being Evaluated
The leaderboard includes various Korean civil service and professional qualification exams:
- Korean Bar Exam
- Senior Civil Service Grade 5
- Judicial Service Grade 5
- National Assembly Grade 5
- Judicial Scrivener
- Police Executive Candidate
- And more exams!

## ๐Ÿค– Models Being Evaluated
We are testing a variety of models:
- OpenAI: GPT-o1, GPT-o3-mini, GPT-4.5, GPT-4o
- Anthropic: Claude 3.7 Sonnet
- Google: Gemini 2.0 Flash/PRO/Flash Thinking
- Meta: Llama 3.3 70B Instruct, Llama 3.2 90B Vision
- DeepSeek: DeepSeek-R1
- Qwen: QwQ-32B, Qwen2.5 Coder
- Mistral: Mistral-Small-3.1-24B
- NVIDIA models: NVIDIA Nemotron variant models
- And many more!

## ๐Ÿ” Why This Matters
Korean civil service exams are known for their high difficulty and comprehensive knowledge assessment. These exams test deep knowledge across legal, administrative, and public service domains. Success in these exams demonstrates not just language understanding but also domain expertise and reasoning ability.

## ๐Ÿงช Evaluation Methodology

๐Ÿ”œ Future Plans
We are continuously expanding our test coverage across all 22 exam categories. We will keep updating the scores marked "TBD" so please stay tuned!
ยท
replied to openfree's post 16 days ago
replied to openfree's post 16 days ago
reacted to openfree's post with โค๏ธ 16 days ago
view post
Post
5555
Korean Exam Leaderboard: LLMs vs Civil Service and Professional Qualification Exams ๐Ÿ“

openfree/Korean-Exam-Leaderboard

## ๐Ÿ“Š What is this leaderboard?
This leaderboard evaluates the performance of various AI models on 22 Korean civil service and professional qualification exams. All scores are converted to a 100-point scale to show how well different LLMs can solve actual Korean civil service and professional qualification tests!

## ๐Ÿ† Current Top Performers
- **OpenAI/GPT-o1**: Bar Exam 52.5 points ๐Ÿฅ‡
- **OpenAI/GPT-4.5**: Bar Exam 49.33 points ๐Ÿฅˆ
- **OpenAI/GPT-4o**: Bar Exam 49.11 points ๐Ÿฅ‰
- **deepseek-ai/DeepSeek-R1**: Bar Exam 47.33 points

## ๐Ÿ“‹ Exams Being Evaluated
The leaderboard includes various Korean civil service and professional qualification exams:
- Korean Bar Exam
- Senior Civil Service Grade 5
- Judicial Service Grade 5
- National Assembly Grade 5
- Judicial Scrivener
- Police Executive Candidate
- And more exams!

## ๐Ÿค– Models Being Evaluated
We are testing a variety of models:
- OpenAI: GPT-o1, GPT-o3-mini, GPT-4.5, GPT-4o
- Anthropic: Claude 3.7 Sonnet
- Google: Gemini 2.0 Flash/PRO/Flash Thinking
- Meta: Llama 3.3 70B Instruct, Llama 3.2 90B Vision
- DeepSeek: DeepSeek-R1
- Qwen: QwQ-32B, Qwen2.5 Coder
- Mistral: Mistral-Small-3.1-24B
- NVIDIA models: NVIDIA Nemotron variant models
- And many more!

## ๐Ÿ” Why This Matters
Korean civil service exams are known for their high difficulty and comprehensive knowledge assessment. These exams test deep knowledge across legal, administrative, and public service domains. Success in these exams demonstrates not just language understanding but also domain expertise and reasoning ability.

## ๐Ÿงช Evaluation Methodology

๐Ÿ”œ Future Plans
We are continuously expanding our test coverage across all 22 exam categories. We will keep updating the scores marked "TBD" so please stay tuned!
ยท
New activity in Drjkedwards/o13reasoningorgan 17 days ago

system.md

#1 opened about 1 month ago by
Drjkedwards