Yhprum pfp
Yhprum

@yhprumslaw

would models be delivered split? no full weight set exists - nodes host shards (e.g., tensor slices). delivery is inference via a network api, querying shards live over the internet. so users call an endpoint, like ‘api.pluralis.ai/modelX’. the system routes inputs to shard-hosting nodes, each processing its piece. outputs merge and return, all without centralizing the model?
1 reply
0 recast
0 reaction