I will shed a small tear for the passing of AArch32 as it was the first processor architecture I really enjoyed programming.
I wrote hundreds of thousands of lines of assembler for it and what a fun architecture it was to program for. Enough registers to mean you rarely needed extra storage in a function and conditional instructions to lose those branches and combine that with 3 register arguments to each instruction meant that there was lots of opportunity for optimisation. Plus the (not very RISC but extremely useful) load and store multiple instructions and that made it a joy to work with.
AArch64 is quite nice too but they took a some of the fun out of the instruction set - conditional execution and load and store multiple. They did that for good reason though in order to make faster superscalar processors so I don't blame them!
Also fun bits for optimization: the barrel shifter that can be used on the second argument in any instruction, and the possibility to make any instruction conditional on the flags.
...At some point I started an x86 on Arm emulator, and managed to run x86 instructions in like 5-7 Arm instructions, without JIT, including reading next instruction and jumping to it - all thanks to the powerful Arm instruction set.
I was going to make a snarky comment about how all CPU architectures die, but it's not so easy to find a major one that isn't still used somewhere by someone—either in its original form or a distant derivation of it. Whether it's the Zilog Z80 or the Motorola 68000 series, there's probably still someone still actively using them embedded into cheap devices, or industrial computing, or automotive parts, or in military systems, or rad-hardened aerospace.
Fun story: Pachinko, Japanese gamble machine like Slot, is still enforced to use Z80 for calculating winning chance, to be able to regulator could check the calculating logic.
I've never played them but I heard something like: Throwing metal ball is completely physical, but if a ball fell into a specific pocket (at maybe specific timing or order?), computer determines is it win or not. If it decided to win, then special mode starts and possibly earn more metal balls.
At least in the M0 range, A64 won't be competitive on a gate count basis. M0s can be ridiculously tiny, down to ~12k gates, which is how they were able to take a huge chunk out of the market that had previously been 8/16 bit processors like 8051s and the like.
Power consumption is a hard constraint. The benefits of A64 have to outweigh the cost in power consumption, and the benefits just aren't there. Coin cells, small solar collectors and supercaps are among the extremely limited power sources for many Cortex M applications.
Many reasons. Cost being the biggest (64bit dies would cost way more than the 32bit ones) and power consumption (having a complex pipeline with 64bit registers isn't great when you have to run on a coin cell for a year).
Similar reasons to why 8 and 16 bit micros stuck around for so long in low cost and low power devices before Cortex M0+ became cheap and frugal enough.
My current Cortex-M4 project is about 40kB of code, and it runs at 48MHz. If I needed more performance, would just get one with a higher clock-rate. 64-bit has zero advantages in this context.
Also, worth mentioning that our safety-critical auditor recommends staying on 90nm or larger chips. That’s many of the Cortex chips, and the same as the original iPhone.
> not very RISC but extremely useful.. load and store multiple
Who was it that said that R and RISC as a nomenclature was always hogwash. A more apt name would be “load-store architecture”
Onne random thing I remember about LDM/STM was the earliest rev silicon of the Motorola (now Freescale) ARM based DragonBall (then better named the MX line) was the ARM9 core had a bug where LDM/STM would not work with the cache enabled - which of course was horrible, so we hacked gcc to not emit these instructions as a temporary workaround.
I wrote hundreds of thousands of lines of assembler for it and what a fun architecture it was to program for. Enough registers to mean you rarely needed extra storage in a function and conditional instructions to lose those branches and combine that with 3 register arguments to each instruction meant that there was lots of opportunity for optimisation. Plus the (not very RISC but extremely useful) load and store multiple instructions and that made it a joy to work with.
AArch64 is quite nice too but they took a some of the fun out of the instruction set - conditional execution and load and store multiple. They did that for good reason though in order to make faster superscalar processors so I don't blame them!