rep movsb/rep movsd works well for moving data. However, you obviously can't use that approach for searching for a 0. That's why the code is optimized as it was. My point is that using rep scasb is suboptimal.
Don't know what you're talking about "lower cost to invoke the routine", and the cache/BTB entries would be negligible on a small routine like this.
You seem kinda angry and bitter whenever you reply to me :/ Chill out eh.
Don't know what you're talking about "lower cost to invoke the routine", and the cache/BTB entries would be negligible on a small routine like this.
You seem kinda angry and bitter whenever you reply to me :/ Chill out eh.