hangingardens (Baki)

New activity in Skylion007/openwebtext about 1 month ago

whats the most anyone has trainined this dataset what was the best loss they reached

1

#17 opened 2 months ago by

sahil26k

liked a model about 1 month ago

mistralai/Mistral-Large-Instruct-2411

Updated Nov 19 • 3.11M • 176

updated a model about 1 month ago

hangingardens/llama-ioi

Text Generation • Updated Nov 17 • 17

liked a dataset about 1 month ago

microsoft/orca-agentinstruct-1M-v1

Viewer • Updated Nov 1 • 1.05M • 12.9k • 404

updated 3 models about 1 month ago

replied to louisbrulenaudet's post about 2 months ago

i have a project idea to build a reasoning model like o1, while there are many projects exist like that mine is a bit different, first of all I create some manual data myself and then train using a new formulation based on reinforce algorithm, its simple btw, I included a sample from my data below also, I can get help for these: 1. fine tuning models like mistral large or llama 3 using the data
2. creating more data using existing models which I already have a prompt ready, which will work like I prompt model to use this format and style of my samples I created manually, note, this does not work for hard questions, I have to manually do them sometimes, that samples is my longest sample:

: {
"user": "solve this sudoku: 000005920000902803000074000096210507057006000284050690670000409830000752040020316",
"model": {
"steps": {
"step1": "lets start with some one has most rows, 3x3 block or column filled",
"step2": "okay, 9. 3x3 block has only one number left",
"step3": "1, 2, 3, 4, 5, 6, 7, 9 are there",
…………….
"step105": "and they get filled by these numbers in order: 6 and 5",
"step106": "so full grid finished at end is: 763185924415962873928374165396218547157496238284753691672531489831649752549827316, and this is the answer."
},
"validity": [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
"answer": "The full grid as fully solved is: 763185924415962873928374165396218547157496238284753691672531489831649752549827316"
}
},