Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Eh yes but from my experience its lack of prefetch lends to significant memory stalls waiting for the copy. It might be suitable if your entire dataset fits in VRAM after doing a “manual prefetch” but it killed performance for my application (ML training) so hard that we actually got time to move to streaming loads.


Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: