☡ pfp

@stultulo

26 Following
85 Followers


☡ pfp
@stultulo
khdjysbfshfsjtstjsjts
1 reply
0 recast
1 reaction

☡ pfp
@stultulo
So I've learned that you can use mergekit to target attention separately from MLP layers? hm. Makes sense, but, hmm.
0 reply
0 recast
1 reaction

☡ pfp
@stultulo
I'm not sure if there's anything in, say, GPT-4.5 that you can't get from Llama 3.1 8B, it's just a question of how *minimally* you can align the base model. I mean this about people who say things like, "GPT-4.5 is the first model that's actually funny/has high EQ/etc." We shall see.
0 reply
0 recast
2 reactions

☡ pfp
@stultulo
aww
0 reply
0 recast
2 reactions

☡ pfp
@stultulo
it actually makes me a little emotional when it "gets it"
0 reply
0 recast
1 reaction

☡ pfp
@stultulo
my llama-o1-8b-percentum model is impressive to me because it anticipates things that actually happen in the story, but are not easy to extrapolate from the notes i'm giving it my benchmarks, for now, are nothing more than my taste in literature: can it get to the same conclusions that I did about the characters?
1 reply
0 recast
2 reactions

☡ pfp
@stultulo
merged grimjim/HuatuoSkywork-o1-Llama-3.1-8B (i.e. not the above model) with the "decimated" Llama model (i.e. 90% base, 10% instruct) at just 1% to see what happens chose 1% mainly because i wanted to call it llama-o1-8b-percentum looking great so far, but still a long way to go it shows excellent balance in terms of "minimal" instruction following & reasoning ability, without noticeably impacting the randomness & creativity of the base model the above model would actually be a good one to test against it
0 reply
0 recast
1 reaction

☡ pfp
@stultulo
There's a reason why this prison is the worst hell on earth: hope. Every man who has rotted here over the centuries has looked up to the light and imagined climbing to freedom. So easy. So simple. And like shipwrecked men turning to sea water from uncontrollable thirst, many have died trying. I learned here that there can be no true despair without hope.
0 reply
0 recast
2 reactions

☡ pfp
@stultulo
pretty sure that i still have tetralogy notes somewhere which include one of my characters’ favorite quotes from the Varèse translation of A Season in Hell which are not the same as my favorite quotes, of course, need to find dis
0 reply
0 recast
1 reaction

☡ pfp
@stultulo
Ceci me fait peu regretter le monde. J’ai de la chance de ne pas souffrir plus. Ma vie ne fut que folies douces, c’est regrettable.
0 reply
0 recast
1 reaction

☡ pfp
@stultulo
plan tonight is to do a few tests, rent an H100, do a few more tests
0 reply
0 recast
3 reactions

☡ pfp
@stultulo
i picked either the right day or the wrong day to wake up at 6 PM
0 reply
0 recast
1 reaction

☡ pfp
@stultulo
indeed
0 reply
0 recast
1 reaction

☡ pfp
@stultulo
Blaze softly past the suns of copper bro
0 reply
0 recast
2 reactions

☡ pfp
@stultulo
one thing about me imma do my thing idc about a user score HAHH!!
0 reply
0 recast
2 reactions

☡ pfp
@stultulo
let's fkn go baby
2 replies
0 recast
4 reactions

☡ pfp
@stultulo
Currently messing with something i'm calling llama-3.1-8b-daydream first we merged llama-3.1-8b and llama-3.1-8b-instruct using mergekit/SLERP like this, to create llama-3.1-8b-decimated: Layers 0–7: passthrough base only Layers 8–23: 90% base + 10% instruct Layers 24–31: 80% base + 20% instruct ...then merged that result with Aion-RP-Llama-3.1-8B like this, to make llama-3.1-8b-daydream: Layers 0–3: passthrough decimated only Layers 4–7: 94% decimated + 6% Aion-RP Layers 8–31: 98% decimated + 2% Aion-RP documenting here because i'm not wanting to upload anything to HuggingFace unless it's been thoroughly benchmarked and also proven to not be psycho, as far as LLMs go. So far, they're very endearing and mostly well-behaved, though. I love the tone.
0 reply
0 recast
3 reactions

☡ pfp
@stultulo
but I think now we just merge it with DeepSeek R1 Distill Llama 8B something or other and see if anything neat happens. eventually I'll probably end up somewhere in this lineage of mergekits https://huggingface.co/grimjim/DeepSauerHuatuoSkywork-R1-o1-Llama-3.1-8B
2 replies
0 recast
2 reactions

☡ pfp
@stultulo
https://farcaster.xyz/stultulo/0xc3d83d0d
0 reply
0 recast
2 reactions

☡ pfp
@stultulo
this decimated model (90% base, 10% instruct) is still a little too coherent for my taste but it did get pretty close to the standard "writerly answer" for what makes writing (and reading) fiction worthwhile, imo, when i gave it some notes and asked what it'd say if it were a character in my tetralogy 🤔
0 reply
1 recast
2 reactions