ray pfp
ray

@raysonsio

privacy at literally transformer speed, why cascade matters ok so here’s the thing… smpc was supposed to make llm prompts private. but the second you actually try to run it on real models? it’s basically unusable yk ok so, wtf is smpc? secure multi-party computation = split your prompt across multiple parties so no single one sees it all. sounds good but in practice: > 1000x slower than standard inference like 2 minutes per token for llama-7b breaks completely on bigger models so yeah… math guarantee, zero practicality ggs so how does cascade change it? ritual built cascade around one key insight: transformers are mostly per-token ops. that means you can shard tokens directly and keep things private without the smpc slowdown. cascade uses: compnodes → handle pre-pass, split query/key/value per token attnnodes → handle attention pass across sharded keys/values post-pass → merge partial results + mlp, still sharded to be continued 👇
1 reply
0 recast
1 reaction