@metaend.eth
Concluding Klearu for my idea now (latest update)
Learnings:
- Fine-tuned dense works well, data classifications are passing (~50s on avg) - too slow for my specific usecase
- Sparse is still broken for Qwen, *however* it doesn't really make sense, sparse inference was designed for 7B+ models where MLP layers are 11K-28K wide and compute becomes the bottleneck. At 0.8B it's all overhead with negligible benefit.
It's not feasible to run 7B+ models on Klearu because of CPU/speed limitations, which means sparse only adds overhead as a feature IMO.