I wish there were more details on this part. > Missing @cython.cdivision(True) i...

pavpanchekha · 2026-03-14T17:02:06 1773507726

They're cheap but not free, especially at the front end of the CPU where it's just a lot more instructions to churn through. What the branch predictor gets you is it turns branches, which would normally cause a pipeline bubble, to be executed like straightline code if they're predicted right. It's a bit like a tracing jit. But you will still have a bunch of extra instructions to, like, compute the branch predicate.

beng-nl · 2026-03-14T17:30:02 1773509402

Worse, IMO, is the never taken branch taking up space in branch prediction buffers. Which will cause worse predictions elsewhere (when this branch ip collides with a legitimate ip). Unless I missed a subtlety and never taken branches don’t get assigned any resources until they are taken (which would be pretty smart actually).

gsnedders · 2026-03-15T00:28:03 1773534483

From when I was working on optimizing one or two things with Cython years ago, it wasn’t per-se the branch cost that hurt: it was impeding the compiler from various loop optimisations, potentially being the impediment from going all the way to auto-vectorisation.