> "sharing data" is quite illusory. Even if two threads have a reference to the same object, the CPU deals with it internally by making several copies, asynchronous message passing and locking,
No. Two threads on the same CPU core really access the same data(1) without the delay and no locking happens unless the programmer wrote some locking code.
Synchronizing the different cores or processors is another topic, but it's also typically dependent on the software.
Most of the time the implemented techniques do significantly speed up the execution. And it's mostly software design that initiates the slowdowns, not the CPU.
Even on a single core, there are several copies in different cache layers and synchronizing them is done by sending asynchronous messages. Sure, in that one particular edge case when the threads are sharing a core you're right, but this is not a typical scenario for multi-threaded applications. Most of the time for high multi-threaded performance you want exactly opposite - one thread per core and pinning threads to cores. And if you don't do anything, you can never be sure if your threads run on the same core or not and you should assume the worst.
> And it's mostly software design that initiates the slowdowns, not the CPU.
This is quite vague statement and I'm not sure what you really meant here. Software written using a simplified abstraction model (e.g. flat memory with stuff shared between threads, ordered sequential execution) much different than the way how CPU really works (hierarchical memory, out-of-order execution, implicit parallelism etc.) is very likely to cause "magic" slowdowns. See e.g. false-sharing.
Also algorithms designed around the concept of shared mutability do not scale. Sure, you may hide some of the problems with reordering, out-of-order, etc. To some degree it will help, but not when you go to scale of several thousands cores in a geographically distributed system.
It's also an effect of a badly written software, not something that is constantly present in the CPU execution. You based the claim to which I've replied on "sharing is illusory", "if two threads" and "the CPU deals with it" like it's necessary to happen all the time in the CPU as soon as the threads exist and they access the same data.
> when you go to scale of several thousands cores in a geographically distributed system.
There you are not describing "a CPU" (as in, the thing that's in the CPU slot of the motherboard) which is all I discussed, and I'm not interested in changing the topic.
No. Two threads on the same CPU core really access the same data(1) without the delay and no locking happens unless the programmer wrote some locking code.
Synchronizing the different cores or processors is another topic, but it's also typically dependent on the software.
----
1) But there is also reordering https://en.wikipedia.org/wiki/Memory_barrier and out-of-order execution https://en.wikipedia.org/wiki/Out-of-order_execution
Most of the time the implemented techniques do significantly speed up the execution. And it's mostly software design that initiates the slowdowns, not the CPU.