but I think now we just merge it with DeepSeek R1 Distill Llama 8B something or other and see if anything neat happens.

eventually I'll probably end up somewhere in this lineage of mergekits

but I think now we just merge it with DeepSeek R1 Distill Llama 8B something or other and see if anything neat happens.

eventually I'll probably end up somewhere in this lineage of mergekits

https://huggingface.co/grimjim/DeepSauerHuatuoSkywork-R1-o1-Llama-3.1-8B

merged grimjim/HuatuoSkywork-o1-Llama-3.1-8B (i.e. not the above model) with the "decimated" Llama model (i.e. 90% base, 10% instruct) at just 1% to see what happens

chose 1% mainly because i wanted to call it llama-o1-8b-percentum

looking great so far, but still a long way to go

it shows excellent balance in terms of "minimal" instruction following & reasoning ability, without noticeably impacting the randomness & creativity of the base model

the above model would actually be a good one to test against it