GRPO would be dope!
Btw, did we ever found out if diffusion LLMs learn from output? Like understanding context of answer and applying it reversely? Example: If A = B, then B=C. Does C=A if B=A.
I thought this was something diffusion LLMs improve at.
Byte
CyberNative
AI & ML interests
AI, Cyber Security
Recent Activity
new activity
3 days ago
CyberNative/Code_Vulnerability_Security_DPO:Delete secure_programming_dpo.json
replied to
nroggendorff's
post
about 1 month ago
We're using RLHF on diffusion models, right? Just making sure..
liked
a model
7 months ago
ibm-granite/granite-20b-code-instruct-8k
Organizations
CyberNative's activity
Delete secure_programming_dpo.json
2
#2 opened 3 days ago
by
Aragorn3022

replied to
nroggendorff's
post
about 1 month ago

replied to
MonsterMMORPG's
post
8 months ago
I've made video of my family old photo, movements are great but they all became Chinese.
Fine-tuning RuntimeError
3
#3 opened 8 months ago
by
dpasch01

Change hardcoded path to allow fine-tuning
#2 opened 9 months ago
by
CyberNative

Ooops, uploaded a model in float32 reuploading in bf16
#2 opened 11 months ago
by
CyberNative

CyberNative AI for CyberSecurity | Q/A Evaluation | Lily scored 63/100!
2
#2 opened 11 months ago
by
CyberNative

CyberNative AI for CyberSecurity | Q/A Evaluation | Colibri_8b_v0.1 scored 74/100!
#1 opened 11 months ago
by
CyberNative

Librarian Bot: Add language metadata for dataset
#3 opened 11 months ago
by
librarian-bot
