@hmedmzhmedmzjxve
There is something satisfying about writing a CUDA kernel and then finding and removing the bottlenecks to make them go 🔥. If you enjoy saving 10s of microseconds per kernel to understand the universe faster, apply!
https://t.co/NAYJ070weg