Brilliant approach, really. Never occurred to me to try something like this!
Are you affected? Very likely. What can you do about it? Nerf your CPU performance by disabling "turbo boost" or equivalent. Should you do it? Probably not unless you're particularly vulnerable (journalist, human rights activist, etc.)
One thing I found interesting that may get changed later, so I'm documenting it here, is in their FAQ they say:
> Why did Intel ask for a long embargo, considering they are not deploying patches?
>
> Ask Intel.
So Intel did ask for a long embargo, then apparently did nothing about it. My guess is they investigated "can we actually mitigate this thing with a microcode update?" and arrived at the conclusion after actually trying - or possibly after external influences were exerted (you be the judge) - that no, there's not much you can really do about this one.
Later in the document another FAQ says:
> [...] Both Cloudflare and Microsoft deployed the mitigation suggested by De Feo et al. (who, while our paper was under the long Intel embargo, independently re-discovered how to exploit anomalous 0s in SIKE for power side channels). [...]
Which is again telling us that there indeed WAS a long embargo placed on this research by Intel.
Only mentioning this here just in case the PR spin doctors threaten the researchers into removing mention of Intel on this one. Which honestly I hope doesn't happen because my interpretation is that Intel asked for that long embargo so they could investigate really fixing the problem (state agencies have more methods at their disposal and wouldn't need much time to exert influence over Intel if they decided to). Which speaks well of them IMO. But then again, not everybody's going to come to that same conclusion which is why I'm slightly concerned those facts may get memory-holed.
> … again telling us that there indeed WAS a long embargo placed on this research by Intel.
These are worded as if this wasn’t clear? No guesswork, the article states it plainly:
”We disclosed our findings, together with proof-of-concept code, to Intel, Cloudflare and Microsoft in Q3 2021 and to AMD in Q1 2022. Intel originally requested our findings be held under embargo until May 10, 2022. Later, Intel requested a significant extension of that embargo, and we coordinated with them on publicly disclosing our findings on June 14, 2022.”
> Only mentioning this here just in case the PR spin doctors threaten the researchers into removing mention of Intel on this one. Which honestly I hope doesn't happen because my interpretation is that Intel asked for that long embargo...
Key part of that closing paragraph:
> ... my interpretation is that Intel asked for that long embargo ... and ... not everybody's going to come to that same conclusion...
If mentioning here to mitigate spin doctoring, the important thing to record in case Intel made them remove it would be the explicitly not open to interpretation paragraph I cited above.
>What can you do about it? Nerf your CPU performance by disabling "turbo boost" or equivalent.
Eh? Doesn't this require an attacker to actually be able to talk to the targeted system? You mention classes of actor ("journalist, human rights activist, etc") that aren't online public service providers or at least perhaps shouldn't be. Private devices connecting out to the greater net are effectively universally behind firewalls and often (CG)NAT as well and not unilaterally addressable the other direction. For private services, access should be exclusively via VPN (direct WG or mesh like nebula or something) with PSK alongside public keys and perhaps something like port knocking in front as well.
Same as with other hot class side channel attacks, the attacker does in fact need to be able to get the targeted system to run something somehow. So the most basic thing to do about it is simply not allow attackers to do that. The players for whom this is fundamentally impossible are general public online service providers, but that isn't the class most people or businesses fall into. If attackers are getting sensitive servers to respond to arbitrary code then they have already gotten access credentials of some kind.
> the attacker does in fact need to be able to get the targeted system to run something somehow
Unfortunately that includes Javascript, and now that affects virtually everybody. Speculation: if you can find a Javascript call that uses protected keys, you might be able to extract secrets from that route.
OK, but can you walk me through the threat model here? This isn't a rhetorical question, it's easy to see how servers in general and shared hosting, colocated VMs etc in particularly might theoretically face a threat here, I'm just trying to get a better understanding of how GP would be correct for end user devices. The individual in question on the smartphone or computer specifically chooses to initiate a connection to a desired web server, and you're imagining said server was hacked or otherwise compromised/untrustworthy and begins running Hertzbleed via JS, and that JS has the precision in practice for it to work. So vs other RCEs and such what is the path here? What short secrets (AFAICT these attacks aren't about pulling gigabytes of RAM but a few bits of key stuff) is it going after that, after having obtained them, represent a serious compromise by themselves or allow further escalation? Nothing done on the user device is going to affect the network hardware they're going through, and this attack is about reading not writing, so what is the attacker getting that will then let them break out of the web broswer/OS sandboxes and start pulling more the other direction?
I can see that if other persistent access credentials were sitting unencrypted in memory that could be an issue if the attacker can use those to hop into other services the user has access to, but that's manageable via care with device usage, proper MFA etc right? Or I can see how true external IP address might be a dangerous short secret for someone trying to hide (vs merely secure) via VPN. But I think in those cases externalizing the VPN to the network level is a good idea anyway since being 100% sure of zero leaks in a modern OS/application stack is challenging, and then the user device doesn't have to know anything about the real IP at all. JS also already allows a huge amount of fingerprinting, so if someone is allowing JS to run and is deadly worried about privacy they must be thinking about mitigations there already.
Again, not at all denying that incredibly tricky stuff can be done using little leaks like this to escalate in surprising ways. But for a dedicated web browsing device with only outgoing and session access to WAN, likely though a VPN but not necessarily on-device, what new threat scenario is this such that completely disabling dynamic frequency is the only response? Although I suppose for a dedicated web browsing device using tor browser or the like disabling that might not actually be a big deal anyway.
The exploit comes from a hacked server, a bad ad, social engineering, etc.
As for the attack, imagine a browser that encrypts local storage with a system key. If I understand correctly, by storing different patterns of bits, Hertzbleed might be able to extract the system key from the timings to save data.
This might sound very theoretical, but modern OS'es (and password managers) have lots of keys like that. There's a good chance one or more of them are reachable from Javascript. And that's just what popped in my mind in two minutes, I'm sure red teams will have better ideas.
The scary part is that this is another attack in the same ugly class as Meltdown and Spectre, where the antidote is nearly as damaging as the poison.
> imagine a browser that encrypts local storage with a system key. If I understand correctly, by storing different patterns of bits, Hertzbleed might be able to extract the system key from the timings to save data.
Don't you need precise clocks for this in JS? The ones that were disabled in browsers after Meltdown/Spectre.
>by storing different patterns of bits, Hertzbleed might be able to extract the system key from the timings to save data.
Isn't that practically impossible given the amount of random software running on a regular computer nowadays? This is not based on hdd write speeds but processor timings which are affected by every single cat video you are watching.
Many password managers are using end-to-end encryption and have browser extensions written in js. It would be bad if hertzbleed can be used to extract keys used by those password manager extensions.
On most shared infrastructure would be even harder to exploit. In an ideal world you are sharing the infrastructure to maximize CPU and other resource utilization. When running workloads on VMWare for example, the best practice is to disable deep C states and not allow the CPU to dynamically scale down. This prevents all kinds of clock drift issues in guest VMs that expect a CPU cycle to be relatively constant.
This is not about power saving, it is dynamically, on the granularity of individual instructions, balancing the thermal dissipation available across cores. When some cores are relatively idle (e.g. waiting on cache fill) it uses the available amps to run other cores at "turbo boost" frequencies.
For some single threaded code we force other cores to idle so that one thread can get maximum cache and maximum frequency. A similar approach could be used to minimise the power analysis signal.
untrustworthy server that runs malicious JS: potentially half the links one clicks on HN or Reddit
short secrets which aren't mitigated by MFA: session cookies, TLS client certificates, secret keys of e2ee IM apps (eg. Element), Zerobin URLs, ... Maybe even TLS session keys?
Just like Spectre/Meltdown, it assumes you have an idea where to extract the secrets from, and more importantly what they secure. A string of random bytes is worth nothing to someone who doesn't know what they're the key to.
...and someone who is running JS is probably also running tons of other JS, adding even more noise to what already exists.
When Spectre came it turned out that it was very straightforward to implement the relevant attacks in JS. A script can use workers with shared memory access to monitor execution and get a timer with less than 100ns resolution. As the result the shared memory were disabled. Later under the presumption that relevant issues were mitigated, the shared memory was re-enabled again.
So I wonder if the shared memory will be disabled again as it may allow to monitor frequency changes.
My understanding was that the timer precision was limited and that was never re-enabled.
From MDN.
"It's important to keep in mind that to mitigate potential security threats such as Spectre, browsers typically round the returned value by some amount in order to be less predictable. This inherently introduces a degree of inaccuracy by limiting the resolution or precision of the timer. For example, Firefox rounds the returned time to 1 millisecond increments."
Same as any security: making an attack more expensive to mount means people are less likely to try it. If high-resolution timers allow you to mount an attack in the three minutes the target takes to read a listicle page, then rounded timers require the target to keep the page open in the foreground for 3000 minutes, or 50 hours. That's much more difficult to do.
Isn't rounding pretty much the same thing as throwing away some precision in this case? So if I drop some (enough) bits and add .5 then no amount of averaging is going to recover the lost precision. Or maybe I misunderstand?
Example: You need to know how whether an operation takes 30 or 31 milliseconds, but your timer is rounded to the nearest 100ms, so you just get 0 or 100.
If you repeat the operation 100 times and time how long that takes, you should either get 3000 or 3100 milliseconds.
..and a dramatically increased likelihood of something unrelated going on in the system ruining your results. It's easy to dismiss mitigations like this as "not a solution", but at least those that add uncertainty to sidechannel attacks seem to complement each other quite nicely. Uncertainties don't add, they multiply.
If you round up and down with the same probability, it will get canceled on the long run.
I suppose this does mitigate the risk a little, but risks breaking other things. For example, you don't have the simple logic sequence:
a < b => round(a) < round(b)
never. nothing ever recovered in a root kit. nothing seen for sale. even the original POCs were horrible biased and didn't work. simple moving the targeted buffer around in memory would have make it virtually impossible to exfiltrate it.
I don't have a problem running JS but it is getting to the point where, if you can't prove you are JS worth running perhaps the browser should refuse to.
Why have we gone from 'oh only run programs you trust not anything from the web' to 'oh just run every bit of bloatware out there any time you move around the web?
99% of JS should not exist, 0.9 % of it does anything useful and the other .1% is straight up malicious...
I do this with uBlock Origin. It is easy and I'd say less impactful than blocking third party domain (de-CNAMEd) CDNs by default (that I also do). Some sites just don't show anything without javascript, though, so it works best if you are often willing to just ignore those sites (not common but not all that uncommon either).
It gives certain customers time to mitigate - and certain others time to exploit - the issue before the embargo is lifted. It maintains the customer/vendor relationship. It also allows Intel, even if current products are still impacted, a head start on R&D for future products before disclosure. Remember Intel also has their own Linux, has their own compiler suite, and directly supports development of Linux and Windows (and probably macOS for Apple) for their industry partners. So they could have been working on figuring out software mitigations during that time which they can now share.
To demonstrate the effectiveness of the attack they deliberately chose SIKE as a protocol that was thought to be very resilient against side channel shenanigans.
If anything, real-world contexts are likely to be lower hanging fruit.
> What can you do about it? Nerf your CPU performance by disabling "turbo boost" or equivalent. Should you do it? Probably not unless you're particularly vulnerable (journalist, human rights activist, etc.)
The most likely to be targeted (and probably easiest to target) systems are probably cloud hosts. This might be an argument for disabling frequency scaling and fixing clock speed on cloud VM hosts or bare metal servers.
Less of a performance hit there too since those tend to run at a sustained max anyway, and turbo boost can be problematic in those environments due to heat. It can reduce overall throughput in a sustained load scenario.
There's so much variation (read, noise) intrinsic to response times for network requests to be satisfied on most cloud hosts anyway that I'm very skeptical about any practical attacks being made in the short term.
"Our attack is practical; an unoptimized version recovers the full key from a CIRCL server
in 36 hours and from a PQCrypto-SIDH server in 89 hours ... The target server and the attacker are both
connected to the same network, and we measure an average round-trip time of 688 µs between the two machines."
Note that:
• The server in this case does absolutely nothing except use the cryptographic library. Would it work on a real server that actually does something useful with the requests? We don't know, the paper doesn't try that.
• We aren't told if it works if other people are using the server simultaneously.
• They show the attack against obscure post-quantum algorithms nobody actually uses (as far as I know). Why not RSA or ECDSA or something more standard? Presumably they don't have a technique that works on those, as otherwise it'd have been a big upgrade to their paper.
• What about if you aren't running your attack physically right next to your target? Is <1msec of latency what people think of when they hear "remote attack"?
I'm not hugely surprised Intel has limited themselves to issuing guidance. This paper continues a trend that's emerged since the first Meltdown/Spectre breaks in 2018 in which attacks become ever more convoluted, theoretical and unlikely to work outside of a lab yet they're all presented as equally important by the academics who develop them. I used to follow this area of research quite closely but eventually got sick of it. Way too many papers had some bizarre caveat buried deep in the paper, e.g. eventually I noticed that a lot of attacks on Intel SGX that claimed to leak cryptographic keys turned out to be using an extremely specific version of GnuTLS. I got curious why that might be and discovered that it was absolutely ancient, dating from many years before the papers were written. They were using it because that version had no hardening against side channel attacks of any kind whatsoever. Was that a realistic assumption to make for these attack papers? Probably not, but to notice this sort of trick you had to read well beyond the headlines.
I also remember some years ago, Google researchers got worried people weren't taking Spectre seriously enough, so they released a demo that claimed it would show a Spectre attack in action inside the browser. I was keen to see this because so many posited attacks seemed to rely on extremely specific situatons that didn't seem particularly plausible in the real world. I visited it in Chrome on macOS, i.e. one of the most predictable hardware and software environments the developers could have, and it didn't work. Checked reddit, it was filling up with people saying it didn't work for them either.
In the ~5 years since these attacks came out and started being patched in software and hardware, have there been any real world attackers found using them? Maybe but I don't remember hearing about any. State sponsored attackers seem to be sticking with more conventional techniques, which probably says a lot.
The obvious target for these is the cloud, especially second-tier cloud vendors more likely to be using "stock" KVM/XEN and therefore easier to target. The obvious target within these clouds would be cryptocurrency nodes.
I feel like if this had been exploited in the wild you would have already heard stories of people using it to zark someone's bitcoin off a Digital Ocean or Vultr node.
I don't think there's a way to apply this to cryptocurrency nodes, because they won't sign messages given to them by third parties over and over with their private keys (usually at least).
Novel statistical techniques is a different concern to practical attacks. (And I appreciate the relativity in what is meant by 'practical' -- nation state resources are in a distinct category of capability)
But I would like to see some statistical expectations on 'how long you'd have to wait on an average open network for each key bit to reach 95% confidence'.
> Hertzbleed is a real, and practical, threat to the security of cryptographic software. We have demonstrated how a clever attacker can use a novel chosen-ciphertext attack against SIKE to perform full key extraction via remote timing, despite SIKE being implemented as “constant time”.
Please. If you actually read the paper you'll come to learn that "practical" here means "we've conclusively shown under strict laboratory conditions that this works".
> since those tend to run at a sustained max anyway
Really? I've never been on the cloud-provider side of cloud computing, but every application I've developed that ran on the cloud was rarely if ever running at a sustained maximum of the resources allocated to it. We always wanted a buffer to be able to absorb load spikes and users performing unusually expensive actions.
Dynamic scaling involves bringing the CPU frequency down but not off - you can get almost as much power savings for some loads by using the old HLT instruction, so your CPU/core is either at full speed or basically off.
I am on the cloud provider side; we would sometimes limit the upper and lower range of frequency but completely disabling scaling would be very unusual.
Same. Our compute hosts are generally not using 100% of their cores at all time.
There’s computes that are not full, computes that run rather diverse tenants, and even the fully utilized computes responsible for CPU optimized VMs have enough variance in their workload for frequency scaling to occur.
I think this means you were paying for the over-provisioning i.e. paying for a full CPU or baremetal server?
"The Cloud" is all about vCPU - "2 vCPUs" feels somewhat standard for a base-tier VPS... and 2 vCPUs means "2 virtual CPUs" or rather "roughly equivalent to 2 CPU cores" I think. I understand that jargon to mean they are always cramming 11 x 2vCPU clients onto 20 physical cores.
Thanks for the link, that's great to know about AWS.
I don't think all other VPS providers are that good about things - googling around for some other definitions of vCPU (in VPS context) I see a lot of examples of 16 thread server CPUs handling "128 vCPUs".
No, 2 vCPUs is 2 logical threads, which is equivalent to a single physical core on x86. So yeah, they are cramming 11 x 2 vCPUs onto 20 physical cores. In fact, it's more like 20 x 2 vCPUs.
That only means you were making space for others to phase in and use the remaining resources that you spared. You're not the one deciding which process sits on which resource, after all.
The only reason why resources might be left unused are usage spikes that all customers share.
Absolutely, my observation is in a way that "most workloads don't seem (to me) to be like that".
The number of servers serving interactive queries (frontends, rest api servers, databases, etc) seems (to me) to greatly outnumber the number of batch jobs, and I've always seen those intentionally "over" provisioning CPU because otherwise you get latency issues if load increases at all.
I don't actually know that cloud providers don't either have some clever way around this (e.g. spending spare CPU cycles on some other form of work), or that it isn't the typical usage pattern, but I strongly suspect it.
> What can you do about it? Nerf your CPU performance by disabling "turbo boost" or equivalent.
A server running a multithreaded load is probably disabling turbo boost anyway because of the thermal load on the package already. Instead, you should disable speedstep and set your systems to maximum performance. However, this will increase the heat and your power bill considerably.
I was thinking about busy servers running mixed workloads. I would think that, with the CPU running a bunch of workloads on different cores, context switching, etc, it wouldn't be a practical attack. Maybe that's incorrect.
Mostly idle servers are a different story, obviously.
Sometimes response critical VMs are pinned to the cores at the hypervisor level, and intel's chips support independent (frequency) scaling of CPU cores for some time.
In that scenario, mixed loads won't help. You'll have at least one pinned core, and it can scale relative to the VMs load (considering you also pin hypervisor cores, etc). So, it's possible to hit that pinned core and execute the same timing attack.
I know it's a niche scenario, but it's not an impossible or implausible one. Another possibility is the servers which fill critical roles, but they're idle or in a constant low-load state to have headspace for high loads. Again, attacking these servers are plausible. Considering these servers are not open to internet most of the time, we're bordering on corporate espionage, but it's not the subject here.
unless you do something special, a lot of interrupts are handled by CPU 0. there's techniques like Receive-Side Scaling to balance this load across the cores but that's specific to NICs.
Is that because CPU 0 tends to be scheduled by default? Or is that because the CPU usually uses core CPU 0, and Linux schedules to different cores?
Redhat Linux docs[1]:
The /proc/interrupts file lists the number of interrupts per CPU per I/O device. It displays the IRQ number, the number of that interrupt handled by each CPU core, the interrupt type, and a comma-delimited list of drivers that are registered to receive that interrupt.
The default value for smp_affinity is f, meaning that the IRQ can be serviced on any of the CPUs in the system. To view use cat /proc/irq/32/smp_affinity for interrupt 32 as example. Setting this value to 1, like echo 1 >/proc/irq/32/smp_affinity, means that only CPU 0 can service interrupt 32.
However, I can pin a VM to far away cores to CPU0 (e.g. socket 3, cores 20-23), hence isolating it from all the interrupt handling, and attack that VM instead, at least in theory, no?
You can tune the normal on demand governor with a high hysteresis to keep the frequency up for a long time. On demand governor is already trigger happy enough to jump to maximum frequency with a slight increase in load, so one needs to add more delay in calming down step.
A random hysteresis is bad from a system responsiveness aspect, because frequency scaling is not free (in terms of time) when observed from the CPU perspective.
I've always turned off Turbo Boost on my Intel laptops anyway, because of the heat/battery hit. If I was doing something that I really wanted the extra speed and I was plugged in, I would turn it back on for that task, but I never really felt I was missing anything by having it off.
Considering how easy it is to turn on/off (on macOS at least I used Turbo Boost Switcher which added a button in the menu bar to toggle it) I don't think you would have a noticeable performance hit by keeping it off except when you need it.
Turbo Boost is not something you can reliably kick in any load spike due to its operation being constrained by the thread and thermal load of the CPU. Also, it's affected by the CPU instructions you're already running. AVX family is esp. power-heavy and thermal-heavy.
You can configure your CPU to always boost if you're really paranoid. It shouldn't run much hotter since the load doesn't increase, just the frequency.
I wonder if this is such a widespread issue - frequency attacks mean the cpu must actually change speed frequently in order for any attack to occur.
For a laptop how I thought it worked was cpu frequently was constant depending on performance level - so cpu should shift from low to high depending on idle state...
For a server trying to aggressively save power, frequently changing speed per operation could leak this information.
Turbo boost I thought was a setting overlong periods of time to change power levels, if you change too frequently you don't actually save much power.
Turbo will move the processor above the rated TDP when there is thermal headroom to do so. Turning it off means you'll max out at the rated TDP.
Now, TDP used to mean the max power of the chip, but as Intel's process failures left them holding the bag with no significant performance updates to speak of, they started overclocking their chips more and more so they could claim that the new gen was faster than the last.
Try turning off Turbo Boost on a 2020 i9 Macbook Pro - you actually get a usable machine with reasonable battery life with it off, instead of the hot toaster with 2hr battery life that Intel gave you. But it'll max at something like 2.2GHz when you paid for just over 4.
Correct. This makes benchmarks, at least on thermally limited machines like laptops, very unreliable. High-quality review sites like notebookcheck spend a lot of time dealing with this by doing prolonged benchmarks and measuring thermals.
And there's an honest question to ask: how do you use your computer? If you're just browsing the web 95% of the time and occasionally opening Word/Excel, then short bursts of high power when you need it is perfect. But if you run longer tasks like many programmers or artists do, these machines simply fall down in sustained use.
This is one reason why the M1/M2 architecture has been such a revelation for professionals who primarily work on laptops. It can run full-bore for hours, because the lower-end chips (which are faster than any Intel released at the time) barely hit 10W at max load.
You can keep it up if everything can handle the current and heat.
A typical Intel chip on default behavior will go to maximum boost, limited by watts, for about half a minute. Then it will drop to a lower number of watts. Note that base clock gets ignored here; in this mode the base clock is just a minimum promise.
Many desktop motherboards easily or even automatically remove the time limit.
Disabling turbo boost/frequency boosting would actually _decrease_ power consumption, as well as performance. The idea with boosting is to allow certain cores to exceed the maximum frequency, so long as certain parameters such as package temp, core temp, and power usage are within certain thresholds. This allows workloads that don't push the entire CPU to its limits to run faster, as the few cores that are in use can run at higher frequencies and increased performance per core, at the cost of higher power usage per core and lower efficiency.
Is that all you need to do? Because many overclockers permanently disable Turbo boost anyways so that they can run a higher clock ratio all the time (cant have turbo occidentally crashing your system once you have really overclocked it a lot). This does not of course disable the low power states for idle or low low load. I probably have Turbo disabled right now!
> > Why did Intel ask for a long embargo, considering they are not deploying patches?
> > Ask Intel.
Indeed, I really found this unnecessarily snarky on their part. I don’t think Intel was acting in bad faith.
In my experience, security researchers are very /particular/. They like telling everyone that no matter what you do, you are vulnerable for umpteen reasons, whether practical or not.
This paper relies on Turbo P-states, where they measure the oscillation when that is active; it is not measuring general SpeedStep (OS software controlled) as some seem to have taken away from it. Turbo state is the HWP (hardware P-state controlled) layer above SpeedStep; turning off Turbo in the BIOS still fully allows OS controlled SpeedStep P-states to function, it just disables the hardware level bursting P-states above that max listed CPU level for short periods of time. As others have noted, Turbo state can really kill a laptop battery and/or drive up the thermals pretty quick, a lot of folks disable it anyways if they've tinkered around before.
The abstract writes it as "When frequency boost is disabled, the frequency stays fixed at the base frequency during workload execution, preventing leakage via Hertzbleed. This is not a recommended mitigation strategy as it will very significantly impact performance." This is a confusing grammatical way to state it, as SpeedStep will still work at the OS layer, you'll scale min to max "as usual" and just lose temporary hardware boost max+ capability when under stress (full load at P0 state) - not really "fixed" as it were in layperson's terms. That would be more akin to saying SpeedStep had to be disabled, IMHO.
does the rick. Note that this is not the same as power-saving mode in Gnome settings.
I have found that for heavy C++ compilation that lasts for many minutes the slowdown was about 20% on my ThinkPad X1 laptop. The big plus is that it made the laptop almost silent.
I think you're running into changing the governor mode here, which is a related but different part of the same ballpark. Modern Intel even has a "bias hint" allowed in addition to just a governor, where the user can help tell the power saving features what tradeoffs they prefer; power-saving mode is an additional limitation in conjunction with SpeedStep (or Turbo) P-state use; if the laptop is almost silent (no fans) you're surely clocking it down to avoid heat/thermal buildup (no fans) - this is usually used to conserve/extend battery to the max possible, at the expense of CPU clock speed.
It is Gnome Power setting dialog changes the governor. The above command just disables Turbo boost while allowing CPU to spend 100% of its time at the base frequency.
I use it on my laptop and run it to disable turbo boost most of the time -interestingly for performance reasons. Turbo boost leads to very erratic behaviour on laptops when you have long running CPU-intensive tasks (e.g. the cores run hot and it has to throttle down hard to cool them down again).
This has almost no similarities to stuxnet. A more analogous hypothetical attack to stuxnet would be if they repeatedly cycled spinning rust drive heads in a certain way to cause the motors to fail and corrupt data, all the while faking the SMART data of the drive to not report drive head parking cycles.
Your understanding of either this attack and/or stuxnet is flawed.
Also if this counts as "Cause some crazy busllshit at a very small level, targetting specific people, such that I get the outcome I want!" then so does every vulnerability.
I’m quite fatigued by the recent (?) increase in comparisons of current vulnerabilities, attacks, and adversarial capabilities to Stuxnet and can’t help but tune out when it’s invoked. Yes, the ‘96 Bulls were the best team of all time. That has no bearing on how good the Bulls are now and sure as hell shouldn’t blind you to how good other teams have gotten since…
Being unnecessarily cryptic and sounding like a crackpot while calling everyone else out for being uninformed is generally unlikely to get you support on Hacker News.
Also advised: "Be kind. Don't be snarky. Have curious conversation; don't cross-examine. Please don't fulminate. Please don't sneer, including at the rest of the community."
Sam: (1) you can't do this; (2) you know you can't do this; (3) do you want us to ban you? because I don't want to ban you but your recent posts are way over the line; (4) I've put the rate limit back on your account; (5) please stop.
I suspect what we are seeing in the last few years is the slow death of purely symmetric multiprocessing. At the end of this I wonder if we'll see processors with one or two cores dedicated to cryptographic primitives, where the ALU has a fixed IPC, the core has a very limited number of clock rates, and the caches are sized to prevent eviction when running common cryptographic algorithms.
Unfortunately, side-channel attacks like this (or like Meltdown, Spectre, TLBleed, Foreshadow, etc...) only have negligible real-world impact. They are very interesting from a theoretical point of view, but are usually totally impractical for a plethora of reasons. Therefore, chip designers aren't really pressured into thinking about new chip designs. The sad reality is that something like Log4Shell, which is super boring from a theoretical point of view, is much more practical for attackers to exploit.
I think that this is shortsighted. It is a new area and there is a lot of work improving the effectiveness of these things. If somebody told me that ROP was totally impractical for a plethora of reasons when it was first proposed, I would have believed them. Now, we've got completely automated tools to generate ROP chains without hardly any access to a binary whatsoever.
These sorts of attacks will get more sophisticated.
Side channels through miss-speculation also have a fun property of being virtually undetectable since the problematic code never actually executes. This is attractive for very powerful actors who might want to spend the extra effort for the covert attack even if there are simpler exploits to actually launch.
Their impact on you or me may be negligible. If you were targeted by a determined adversary however - their impact can be total breakdown of your privacy.
If I were a journalist / human rights / opposition activist, I would assume this is being actively used by at least some adversary out there, and would gladly pay the price on perf.
I would say fortunately, but I agree. These security flaws need to be analyzed but I don't think it can compete in threat level with the usual phishing mail.
Analysis of such vectors is important but the threat is limited. I still favor running encryption in software and find hardware support often quite dubious because you can never be sure here while any runtime attack can just as well be mitigated on a higher level. Doesn't mean it is more secure out of the box but security is about trust as well.
One or two cores for crypto would likely be susceptible to the same attacks, unless you don't let any user (or kernel) programs run crypto on those cores, making them useless.
Any resource that needs scheduled will likely be attackable - either by timing on context switches, or flooding the resource with users and measuring things, and so on. Likely any scheduling method for those resources can leak information.
I don't see how a fixed-frequency crypto core would be susceptible to the same attack, assuming proper constant-time cryto code.
This attack exploits the fact that cycles are not constant time, so although crypto primitives are constant in terms of cycle, due to DVFS they're not really constant in terms of time.
If the crypto core doesn't have DVFS and runs constant-cycle crypto, it doesn't matter that the core is contended and that you can measure the contention. You'll be measuring how many people are using the resource, but that won't tell you anything about the secret data, just about how much data there is.
>fixed-frequency crypto core would be susceptible to the same attack,
I also added there are other attacks. Once you are allowing multiple processes to utilize these limited crypto cores, you're gonna leak information. And fixed frequency makes many attacks easier - the attacker no longer has to work through variances in performance due to all the randomness in chips from power and caches and other timing things.
>assuming proper constant-time cryto code
Yeah, that's exactly what the SIKE authors had assumed too. Turns out that it broke.
The point is once you allow things to be scheduled, it's nearly impossible to prevent information from leaking. My task asks for some crypto to be done - if fast, there was less in front. If slow, there was more in front. "Randomize!" the geek says - this nearly never works because random assumes some distribution, and again I can now keep poking at the scheduling to find the differences in behavior by statistical sampling.
There's lots of attacks currently on existing systems exploiting this.
Leaking any information about other processes or supposedly hidden state of the system means you are leaking - and attacks always get better, not worse. The point is once you have shared, scheduled resources, others are going to get knowledge that they should not have.
The rough idea is, say some other process is repeatedly running some known code with an unknown key, and you want to get that key. By fiddling with how you schedule your requests, you can interrupt or interject his work and your work, and the timing issues due to scheduling have been shown to leak things. Say one process is dealing with web requests, signing things fairly often. An attacker on the same machine can craft web requests, learn how the shared system is responding, and glean information about the web server via timing. This type of poking has been used to leak AES keys by exploiting things thought safe until they were on shared resources.
You're making a general hand wavy argument here, but this is a rather specific issue. "Constant time" in cryptography means that there is no information flow from secrets to timings. Keys are secret. Plaintexts are secret. Ciphertexts, and most importantly their size, is public.
You're basically saying that leaking public information is dangerous. This is the same as saying it should be private. In some specific cases you'd be right (I'm thinking of variable length audio encoding, where you could recover part of the conversations or voice prints from network analysis alone), and in these cases you mist hide sizes as well (basically use constant length audio encodings).
But in the general case, message sizes are much less important that you make it sound.
> Constant time" in cryptography means that there is no information flow from secrets to timings
If "constant time" cryptography were achievable don't you think we'd have it and there'd be no more timing attacks breaking encryption schemes?
"Constant time" cryptography is a mathematical abstraction, a goal, like "unbreakable cipher" and "unbreakable hash" and "frictionless surface." They don't occur in practice. This article breaks itself breaks a "constant time" cryptography with a timing attack.
The problem is, as this paper demonstrates (along with many others) coding up a constant time crypto and especially making it portable over time and architectures, is nearly impossible. Caches, chip nuances, power draw mixed with power scaling, and other chip architecture complexity, contribute to attacks. Compiler changes, architecture changes (some even unpublished), architecture variety, user settings, even flaws in any part of the chain, all contribute to making holes in crypto in the real world.
This paper [1], for example, is one of many that shows the "constant time" goal is likely not possible, and is certainly not possible in portable code.
Here's [2] a paper tying to make simple AES "timing-attack resistant" - and you note they did not claim they could make it "constant time" because they realize that is not possible. "Timing-attack resistant" is at least professionally defensible.
Here's [3] a paper referencing [2], trying to make systems more resistant to cross process leaks using Intel SGX to hide things that leaking.
And here [4] is the attack on Intel SGX that shows there are still exploitable leaks.
This type of chain is not unique.
If you want to read literally thousands of papers on such things use google scholar or surf the cryptology eprint archive. Both make searching on such topics pretty easy.
We could go on and on. The literature of crypto is littered with such threads - "constant time" crypto is the goal, but so is "unbreakable encryption" - both are mathematical fantasies that do not play out in practice.
If you're going that route, everything is influenced by anything, and with a sufficiently advanced sensor array you could detect a butterfly flapping its wing across the globe.
If instead we get serious for a minute, we can notice that cryptography is not magic, and neither is the way data flows from secrets to timings. Quite obviously, whether a program's timings depends on its inputs or not is a function of the hardware it runs on more than anything else.
As long as energy consumption does not meaningfully influenced timings, we're actually in very good shape. Most CPUs have constant time arithmetic (multiplication may be more problematic), and the only way data flows from secrets to timings are branches and the cache. All we have to do is avoid secret dependent branches and secret dependent indices.
When energy does influence timings (frequency scaling, listening at an audio feed…), we're basically screwed, because no CPU instruction is constant energy. No way we can fix this without help from the hardware.
> coding up a constant time crypto and especially making it portable over time and architectures, is nearly impossible.
Sure. I'll settle for constant time now with my hardware. And I'll ask hardware vendors to pretty please sell me hardware that makes it possible.
---
In the mean time, I'll see what this new finding actually leads. I don't anticipate major disruption to be honest. The attack demonstrated here required 36 hours, in the lab. This is a far cry from AES cache timing attacks which took 65 milliseconds. I'll wait and see what actually breaks in realistic threat models.
This stuff is complicated. You can still get some timing data by trying to schedule additional cryptographic work on the same core where some sensitive operation is going on, and looking at the delays you get.
Every one of the recent leaking boundaries were assumed to be non-leaking. You cannot just inject "non-leaking" into a statement and assume that solves anything.
Sure, but you can mitigate against all known attacks. You can also mitigate against the class of attacks by, for example, not allowing multitasking and forcing single task to completion (within a dedicated core for a subset of operations).
Yeah, it's good enough for decrypting message headers then handing the rest of the decryption to the much more powerful main CPU. But having it handle the whole encryption? You're going to wait a while.
This transition has already begun and is to large extent already usable on many platforms. Macs have secure enclave, Android has Trusty and Strongbox, Intel TPM, ARM Trustzone, etc. Some of these are just implemented as VMs on same core though, so could in theory be vulnerable to same type of attack.
The questions are incredibly weak from the interviewers. They first state that it's not practical because the attack could take many hours, even days. But they don't describe why a day-long attack is not practical.
They then bring the researchers and ask them the same question. The researchers say that the attack is very practical because it only takes.. a few hours or days to execute the attack. Here's the specific part: https://youtu.be/BiRPr839dSU?t=1476
Instead of chatting more about this discrepancy they just ignore it and ask the researches how they feel about their new popularity.
From what I can tell from the advisory from Intel, it's simply that people should understand the attack and mitigate it in software. It's very vague. The specifics (i.e. a list of example popular programs that are vulnerable) seem entirely missing.
What you're seeing here is a collision between academic cryptography culture and real world engineering culture. In particular, the word "practical" has very different meanings in those two worlds, hence the discrepancy.
In engineering, the word "practical" has an expansive definition that takes into account end goals, likely costs, rewards and risks of getting there, whether better approaches exist and so on. In academic cryptography the word practical is used far more narrowly and means something like: this algorithm doesn't only exist on a whiteboard, we wrote a toy implementation of it as well.
There are people in this thread telling each other how to disable power scaling and stuff. They're probably people who take the claim of "real and practical" literally without realizing what this does(n't) mean when coming from academics. If you read the paper you'll notice a lot of aspects about the attack that aren't actually practical at all, so to believe this is a threat worth spending time on requires a lot of assumptions about unknown developments that may not hold.
To name just a few aspects of "practicality" that engineers might care about but the paper authors do not:
1. The attack requires DoSing the target server for extended periods, like days at a time, without being detected. Do you have CPU load or bandwidth monitoring in place? Then you're going to detect the attack within minutes of starting before it got anywhere at all and can simply block the attacking IPs.
2. The attack is only demonstrated against specific crypto libraries and algorithms that you're almost certainly not using. You're asked to assume it can be easily applied against normal algorithms, but their technique relies heavily on the exact mathematics and implementation schemes they're attacking, so it's not entirely obvious how easily it can be adapted. Presumably they chose this obscure target for a reason.
3. The attack was demonstrated on a perfectly unloaded system in which the server does nothing except cryptography and has no other users. Given how sensitive it is to tiny timing fluctuations, it seems like more or less any other activity would raise the noise level so much that days of DoS attacks might turn into months or years. You're asked to assume this isn't a problem for the attackers, but that seems like a very unsafe assumption.
4. The attack was demonstrated on a machine that's in the same datacenter as the machine being attacked (~600 microseconds of latency to the server). Are your machines in a private colo facility where the owners know who is renting their servers? Well then, the attackers are going to be pretty quickly detected and investigated by the authorities aren't they, because there are no valid use cases for DoSing a server right next to your own for days at a time with carefully crafted crypto packets.
5. What about the cloud? Pretty easy to get machines there, but you also can't control whereabouts you get placed. I read another paper where researchers tried to do remote timing attacks on machines in AWS. It requires massive amounts of descheduling and rescheduling VMs in the hope that eventually you get lucky and the scheduler places you near enough to the victim. That pattern is extremely distinctive, has no real legitimate use cases and AWS could very easily detect and it shut it down if this sort of attack ever became an actual problem. But of course, such obvious mitigations don't get mentioned in these papers.
6. Is this really the easiest way to snoop on traffic? Why not just search for a classical vuln in the client or server software itself? It's not like there's a shortage of those. Just weeks ago it turned out Jira was vulnerable because it was shipping a library last updated in 2005. If this attack is the best way to achieve a specific goal it means you're going up against an unusually well hardened target such that all other means of entry like phishing, hacking, government intervention, physical attack etc are less practical than this. Very few organizations will meet that level of security.
As you can see, once you expand the definition of "practical" to include consideration of everything a real attacker would care about end-to-end, like not being detected, and succeeding against real servers doing actual work that are monitored by humans, the whole thing starts to look very questionable indeed.
Frankly I find it a bit irresponsible that they've named it Hertzbleed. The original Heartbleed attack was quite practical and let you dump the memory contents of real world servers at will. People demoed it on random Cloudflare edge nodes and the like. It required an immediate response by many, many people. Now we have a website that looks nearly identical to the Heartbleed website - it has a similar name, a a logo, a similar FAQ, talk of "patches" by CPU vendors, etc. But when we read the paper there's no similarity between the attacks really. It's just another case of academics exaggerating their work for the sake of getting a paper and it needs to stop.
Wow, amazing response. This was exactly what I was looking for. It's odd I have to get someone from HN to help me understand instead of, say, Intel/AMD. Their recommendations didn't seem to mention any of these important details. Maybe I missed something. Thank you!
My experience has been that large companies won't directly argue with academic research, even when they easily could. Most people will automatically side with academics in any dispute, because they'll intuit that of course the company would say there's no real problem, they're conflicted, whereas the researchers aren't so the latter must be correct. Many people aren't too savvy about the publish-or-perish problem and don't care about the details. Corporate PR people also hate picking public fights, so tell staff to just roll with it and engage in damage control. After all, you're arguing with people who can literally spend all day writing up clever sounding papers about why their claimed problem is real, whereas you have customers to satisfy.
Yeah, I'd argue this is "practical" for state level surveillance. But they are GOING to get you if they want you, the various leaks over the years has shown that.
Heck, isn't spying on keyboards and display signals through a wall still "practical"?
Interestingly, AWS takes no such actions against massive scans of infrastructure. One can acquire millions of cloud servers in search of co-residency without action being taken.
Sure, probably there are no people mounting such attacks today.
My point was more like - the moment it becomes known that people are doing that sort of thing, they would implement mitigations. Sucks if you're literally the first victim who detects what happened, but that's not many people, especially because this sort of "flood the server with data and measure timing" attacks are so noisy and visible.
Ok, I See how this works in theory. But until I see an exploit that uses this method in real life to extract keys (or maybe any memory content) from a server running real life workloads, I am extremely skeptical. How much samples are needed to get anything useful? And wouldn't the time required to acquire these samples be longer than the time required to detect the attack (or even all keys to be shifted)?
> How much samples are needed to get anything useful?
There is proof-of-concept code for reproducing. I don't think sample count is a big concern.
That said, I believe the real caveat lies in the "workload must run for long enough to trigger frequency scaling" part. Usual crypto primitives are just too fast on our processors, which is likely why they picked SIKE to demo the attack.
SIKE is a very relevant example because we are slowly creeping toward a world where Quantum computing will be ubiquitous and existing asymmetric cryptography will face serious challenges.
We are nowhere near "ubiquitous" quantum computing. We aren't near rare quantum computing.
Quantum computing as a practical platform has yet to be proven feasible. When you ask people who know what they're talking about and aren't pitching for grant money, quantum computing is somewhere between decades away [1] and never happening [2].
You need on the order of millions of qubits for quantum error correction algorithms to work.
We have, with superconducting circuits operating at 20 milli-kelvin, managed to corral 53 qubits into a circuit. In the error-correcting model, we must perform simultaneous gate operations on at least thousands of qubits. We have managed to perform simultaneous gate operations on two.
The levels of engineering effort required, and the orders of magnitude separating what has been realized by those efforts and what is required by theory, lends itself towards narratives of impossibility. Unlike the transistor revolution, there is no clear path forward upon which we might improve these initial results.
To quote my second source:
> I believe that, appearances to the contrary, the quantum-computing fervor is nearing its end. That’s because a few decades is the maximum lifetime of any big bubble in technology or science. After a certain period, too many unfulfilled promises have been made, and anyone who has been following the topic starts to get annoyed by further announcements of impending breakthroughs. What’s more, by that time all the tenured faculty positions in the field are already occupied. The proponents have grown older and less zealous, while the younger generation seeks something completely new and more likely to succeed.
> All these problems, as well as a few others I’ve not mentioned here, raise serious doubts about the future of quantum computing. There is a tremendous gap between the rudimentary but very hard experiments that have been carried out with a few qubits and the extremely developed quantum-computing theory, which relies on manipulating thousands to millions of qubits to calculate anything useful. That gap is not likely to be closed anytime soon.
> To my mind, quantum-computing researchers should still heed an admonition that IBM physicist Rolf Landauer made decades ago when the field heated up for the first time. He urged proponents of quantum computing to include in their publications a disclaimer along these lines: “This scheme, like all other schemes for quantum computation, relies on speculative technology, does not in its current form take into account all possible sources of noise, unreliability and manufacturing error, and probably will not work.”
Ok, as a scientist, when I say "never" is too harsh, what I mean by QCs becoming a thing probably isn't what most people on HN think of as QCs but rather being objects for simulating quantum systems. For that, they already have use (that is, those QC systems you keep hearing about on the news that already exist and are being used) and probably will get better to the point (god willing) we can simulate many electron systems. That is _my_ dream. I feel like QCs in HN minds is a lot more towards the "Computer" part of QC, like an actual Turing complete computer that will be able to do Shor's algorithm and break modern encryption, and on that I sort of agree with your assessment that it is between decades to never.
Sorry for that, I have to context switch when I talk to people outside physics. I always forget that. Also definitely the context of the convo was about QCs breaking encryption so my bad.
I am curious as to your perspective as a physicist, do you think it is feasible to have a QC computer from an energy perspective?
There is the cost to consider, yes, there is also an energy cost to a stable QC system. Asymmetric/symmetric are not unbeatable, they have an energy cost. Shors algorithm is theoretically great, but rarely if ever have I seen an associated energy cost...even outside of the answer "will we build one" the question is "can you efficiently build one" or not, i.e. what does a QC capable of executing shor's algorithm look like, a small planet or star perhaps?
So, as nickelpro states, I feel like a QC that is actually general purpose (which is probably a better way to state it) is so difficult at this point to even imagine, it's hard to say it will become a thing absent some breakthrough that is hereto unknown. You probably could state it as an energy cost thing by somehow deriving how much energy it would take to keep millions of quibits from decohering by extrapolating from how much energy it takes to keep a few from decohering, but I'm not even sure you can extrapolate that far out since as you increase the number of quibits the required energy probably isn't linearly related to the number of quibits but it is some power law or worse. Remember, the number of quibits we can run is in the dozens today, the numbers you need for Shor's algo or just general purpose computing is likely in the millions.
For quantum people, the QCs are already pretty cool because they can do simulations of quantum systems like molecules and atoms that are just infeasible on classical computing (high performance computing, ie. supercomputer) systems, things that would take probably years (yes years) of wall time on a HPC system. The thing is the number of required quibits for modeling these types of molecules is likely in the dozens to 100+ quibits, which looks possible now since there are systems out there that, while noisy, do have dozens of operating quibits.
If you're curious what these simulations are for, it's doing things like calculating energy levels for certain molecules, which materials science people care about and will help them make the next generation subtrate for a computer chips, etc etc. So it's not entirely esoteric stuff, it will be things which will eventually make it into actual products and technology people use, but it definitely is NOT general purpose computing, even less so Shor's algorithm or breaking encryption.
My bet is that you could write all of your passwords on your front door and still not be victimized in any meaningful way. But, in many/most cases, it's cheaper to thwart the attack than to analyze if it can be used to exploit your systems.
I haven't seen ANY side-channel timing attacks performed in the real world, but that doesn't stop the Security Theater crowd from costing us hundreds of millions of dollars and megatons of unnecessary carbon emissions by slowing everyone's CPU performance on the grounds that everyone's threat model is the same.
There are several (dozens) of papers showing the practicality of various timing attacks written by highly respected academics. Just because you haven't stumbled across an attack in the wild one does not somehow invalidate that there are practical attacks.
Do you expect those who do carry out a successful attack to email you and let you know of their success? Or perhaps you think they'll exploit someone, and follow it up with an academic write-up of how they carried out that exploitation, to be widely published?
While security theatre does exist, it's laughable to write off an entire class of vulnerabilities as theatre.
None of the attacks are feasible in a trusted environment. If your code isn't running in an environment where other processes from untrusted sources are also running, these timing side-channels and their mitigations are irrelevant.
If an untrusted source gets shell access to your trusted platform/server/container and can run payloads, you're already screwed six ways from Sunday and the rest of the discussion is moot. It's security theater specifically because individuals and organizations following these blind mitigation recommendations don't assess the attack surface that's being exposed.
A school teacher wearing a condom is strictly speaking safer than the alternative, and yet someone should still be fired.
Not all timing attacks require any sort of privileged access. As one example, OpenSSH had a timing attack where under certain configurations a query for a non-existent user returned faster than an existing user, allowing attackers to enumerate user accounts.
I'm not saying this specific attack is something to get worked up over. But, as I have already said, writing off an entire class of vulnerabilities because you think it's all theatre is naive. Weighing each exploit against your attack surface, risks, and risk tolerance is not.
>It's security theater specifically because individuals and organizations following these blind mitigation recommendations don't assess the attack surface that's being exposed.
Blaming researchers for security theatre when it is the organizations which are not doing their due diligence is, at least to me, a weird way to look at things.
I don't blame the researchers, this is specifically against the nonsense discussions that plague this thread and others like it talking about the performance impact on personal computers. These side-channel bugs are minor annoyances, and mostly a problem for cloud providers.
I wouldn't want Intel or AMD or anyone else to abandon speculative execution, clock boosting, or any other of the "vulnerable" technologies just because their unsafe in specific application spaces, which seems to be what half of HN starts advocating for whenever this stuff comes up.
An application bug like OpenSSH is a completely separate spiritually from the hardware bugs that inspire these mitigation discussions.
Another example, I work in scientific high performance computing. The worst that can happen with my work (although people in more defense oriented research might care) is someone might see my data before I publish it...woopy doo, and I guess if they do, they'll have to spend the few hours needed to process TBs of data I make so they can what, scoop me? Oh and they have to access to the same supercomputer too... the risk I face of anything bad happening to me is minuscule. On the other hand, removing modern processor features like speculative execution and frequency scaling would mean my increase in execution time would mean going from 3 weeks or so to 4 weeks or more per simulation? No, I am NOT okay with that at fucking all, it's hard enough dealing with the multiweek delay I have before I can iterate, making that even longer for very little risk is not worth it.
The person I replied to asserted that all timing attacks are theatre, which I disagree with (and, evidently, poorly communicated my stance). Perhaps they did not mean the entire class of vulnerabilities which rely on some sort of exploitable timing difference, but only those that require privileged (or physical) access. In that case, I still believe it is foolish to completely dismiss them simply for being a 'timing attack' (and therefor theatre), but I also believe it is foolish to blindly follow mitigation recommendations without analysis.
nickelpro did not blame researchers, but I will point out researchers are under a number of incentives to push this out and hype up the potential threat level of their work because it boosts their works' credibility and thus citations and ultimately funding. Researchers are better than most bad actors but they are not and cannot be completely pure actors not harboring even a tinge of potentially bad incentives.
> If your code isn't running in an environment where other processes from untrusted sources are also running, these timing side-channels and their mitigations are irrelevant.
And then you put 'mitigations=off' in your kernel command line and go on your way. I do it for all my BOINC compute nodes, because they literally have nothing sensitive on them.
But remember, L1TF/Foreshadow could reach across virtual machine boundaries. It's not just inter-process speculation that's a problem.
I think the one-dimensional severity classification is part of the problem. If you're running a cloud provider, it's a much bigger deal. Call it "high severity" issue for those use cases. No objection to that, better safe than sorry.
Probably 90% of PCs are single-user Windows desktops, though. It's a "nonexistent severity" issue for those use cases... yet we all get to pay.
If you're running a cloud provider, it's a much bigger deal.
On the other hand, if you're a cloud provider that multiplexes tons of virtual cores on your physical hardware, I suspect anyone trying to do the sort of careful timing analysis required for these types of attacks would find themselves drowning in noise, as their processes get migrated arbitrarily between cores of hardware shared with tons of others.
>. If you're running a cloud provider, it's a much bigger deal. Call it "high severity" issue for those use cases. No objection to that, better safe than sorry.
It's odd, because this agrees with what I wrote, and the parent to your comment says they "fully concur", yet they are arguing that I'm incorrect. I did a poor job in communicating.
As an attempt to better clarify what I wrote: I agree with you that for the vast majority of people this specific attack is a non-issue. But, there are plenty of different timing attacks, and some of those may affect some people. It would follow then that some timing attacks should not be abruptly dismissed simply because it's classified as a timing attack.
However, my initial comment was replying to someone who wrote off the entire class of vulnerabilities, asserting that no timing attack of any variety has been used successfully. I find this a naive approach to vulnerability management. Instead of dismissing all attacks that are classified as timing attacks, vulnerabilities should be assessed for what they can do, the ease of doing it, and the potential impact of a successful attack.
Fully concur, although now that I've read some of the white paper some of this doesn't even appear to be a real issue? Like the claimed "remote" attacks against, "Cloudflare’s Interoperable Reusable Cryptographic Library (CIRCL) [28], written in Go, and Microsoft’s PQCrypto-SIDH [65], written in C ... [are] meant to run in constant time"
But they just straight up don't run in constant time, so they're vulnerable to a timing attack across the network. That's clearly just a library bug? Like surely the dumbest part of a "constant time" algorithm is double checking that you ran for a constant wall clock amount of time?
> But they just straight up don't run in constant time, so they're vulnerable to a timing attack across the network. That's clearly just a library bug? Like surely the dumbest part of a "constant time" algorithm is double checking that you ran for a constant wall clock amount of time?
It's... hard. A lot of the "constant cache behavior" and "constant time behavior" algorithms were written back in the day when the CPU speeds didn't change randomly on you, or at worst toggled between "idle" and "running hard." Think... oh, even the Core 2 days, really. They didn't switch that fast.
And then the hardware behavior changed out from under the algorithms, and nobody noticed. Now the throttling is far more rapid. So they may still be "constant instruction count," but that no longer implies constant time.
It's... complicated. :( And what's worse, even the people in charge of managing the complexity don't understand all the details anymore. When stuff like this surprises Intel, we've got problems.
Sure, but you can just check the high precision wall-clock timer at the end of your computation and makes sure you took at least X nanoseconds, and pad that out so that X is always greater than the amount of wall-clock nanoseconds the actual computation takes. Then, following a computation, you sleep until X.
While this won't fool timing attacks that are operating on the same machine as your process, the computation time becomes completely opaque to the network which is what the "remote" attacks are built on.
Not trying to be disrespectful and it is true curiosity, what is your role to have to deal with this type of attacks (as much as can be disclosed) and could you please quantify “much”?
Something about this doesn't bother me as much as other side channels.
To me, this reads like trying to predict the presence, make, model & operational schedule of someone's washing machine just by observing how fast their power meter spins over time. Unless you have an intimate awareness of all of the other power consuming appliances, as well as habits of the homeowner, you would have a hell of a time reaching any meaningful conclusions.
You can say the same thing about all of these attacks. That they are tedious ways of collecting data. The problem is that computers can be made to repeat operations, over and over again. Leaking keys fractional bit by bit or what it is. That's why the attack doesn't work against someone's laundry machine - unless it's connected to the internet, that is.
> To me, this reads like trying to predict the presence, make, model & operational schedule of someone's washing machine just by observing how fast their power meter spins over time.
That sounds almost trivially easy provided you can afford to buy each and every washing machine on the market so you can measure its power consumption profile for each of its programs.
> predict the presence, make, model & operational schedule of someone's washing machine just by observing how fast their power meter spins over time
I think this is a really excellent analogy that explains the situation well. However, I think doing exactly that would be really straightforward, and your analogy explains why. Imagine an ML model constantly adjusting the probabilities for the set of possible washing machines... after a large number of washing machine runs, it will be narrowed down to a really small subset of the possibilities. Given that this is a cryptographic key, they can then trivially brute force the remaining possibilities.
It's more like discerning the washing machine based on the power meter, but you know exactly when and how many washing machines turn various bits on and off.
Could be doable, with some expensive equipment.
For the ghost side channel attacks we did see in situ proofs of concept. It's an open question how many people have the skill to do either those side channel exploits or the power meter washing machine guess above and are also engaged in crime.
I would claim to know less than nothing about what's happening here, but to press a bit on the analogy -- aren't there workloads where clearly you're going to be more sure about what's happening? E.g. consider a bastion host proxying SSH connections into an environment. If you can observe the power meter on that laundry machine, you're much more likely to know what's using the power, no? (Especially so if the bastion isn't used flatly throughout the day).
Often, these side channel attacks work using an oracle or by forcing the system into a vulnerable state. Forcing a cpu to scale frequencies would do that here.
I think it's worth noting that the main attack described in the paper, against SIKE, depends on exploiting some behavior peculiar to that particular algorithm (what the paper calls "anomalous 0s"):
> The attacker simultaneously sends n requests with a challenge ciphertext meant to trigger an anomalous 0 and measures the time t it takes to receive responses for all no requests. When an anomalous 0 is triggered, power decreases, frequency increases, SIKE decapsulation executes faster, and t should be smaller. Based on the observed t and the previously recovered secret key bits, the attacker can infer the value of the target bit, then repeat the attack for the next bit.
While any leakage of information can in be exploited in principle, it might be that this technique is impractical against a target which doesn't exhibit some sort of behavior that facilitates it.
I think SIKE was just chosen because of its relevance, not because it has any particular issues that make it more susceptible. I'd be curious to hear from an expert on this.
SIKE is definitely not the most widely used cryptographic algorithm.
And as the paper points out:
> In our attack, we show that, when provided with a specially-crafted input, SIKE’s decapsulation algorithm produces anomalous 0 values that depend on single bits of the key.
It was clearly selected for this property.
The attack allows to determines the number of 0s and 1s in words processed by an algorithm, so they chose an algorithm that has specific data outcomes which will produce a measurable power effect.
I didn't think it was widely used, I'm sure it's very very rarely used. I was saying it was relevant because it represents the "future" of cryptography.
I would think that any kind of key exchange algorithm that relies on a constant time algorithm is vulnerable to this. I could be wrong.
Not a cryptography specialist, but I doubt that all crypto algorithms have the same property of causing 0s (or 1s) to massively appear for some inputs in a way that could lead to a key leakage with this attack.
I believe that SIKE is an extrem case which allows to perform this attack with more ease.
However I suspect that by refining the attack then it could be extended to other algorithms less sensitive to power side-channel.
Why do we never get proactive defense against this sort of thing? As with speculative execution, caching, out-of-order execution, dispatching instructions to multiple ALUs depending on availability, etc, it was clear from the get-go that in principle the timing can depend on the payload so in principle it can be a problem for crypto.
The need for constant time should have first class support on the language/compiler level, the OS level, the ISA level, and the hardware level. E.g. the processor could guarantee that the instructions of a certain section of code are executed at a constant rate, the OS could guarantee that the thread remains pinned to one core and the frequency fixed, and the compiler could guarantee that only branchless assembly gets emitted.
This is engineering, there's a lot of things that could happen but don't, we don't all run ECC RAM either. The problem is that speculative execution is really good and if Intel didn't have it they would've been selling worse CPUs. And to be clear, it was about 20 years from the point where people were seriously publishing theories about speculative execution attacks to the point where it was a practical attack.
Think about how much benefit we gained during that time. And even then, anyone running in a trusted environment would rather have the optimization consequences be damned. Do you think HFTs patched their boxes to criple their perfomance? No.
Sure, now we know it's a problem we'll offer solutions for people who really need it. But it'll be a long while before the average person needs to think about this and in the meantime billions of people benefitted from better CPUs.
The other way of looking at it is that a huge portion of the market is running non-ECC ram and it hasn't resulted in any measurable reduction of security or stability of operating systems worldwide. So maybe it really isn't necessary for your average user, and manufacturing ECC ram for users who ultimately don't need it would be just a waste(both financial and environmental).
Google researched the topic over 2.5 years last decade and did find a notable amount [1].
"Bitsquatting" has also been seen in the wild in the past decade [2].
That's why it should be a concept known to all levels of the architecture so any mitigations can be applied topically and don't need to affect anything else.
Until consumers demand this as a requirement, it won't happen. Almost everyone would rather have a compiler/language/OS/ISA/CPU that's finishes faster some of the time, rather than one that finishes at the same time all the time. It would just appear (especially in benchmarks) to be slower for no apparent benefit.
Maybe we can introduce a new set of instructions that are guaranteed to be constant time, but good luck convincing the compiler/language/OS to use these slower instructions even if just for the code that is important for security.
And for this particular attack, constant time isn't even enough! You would need either constant power, or limit the frequency when running secure code (which again reduces performance).
Constant time comparisons take practically no time at all. I hardly see how it would noticeably reduce performance if software could command a CPU to lock to a low frequency for a certain period of time or when the sensitive code finishes, whichever happens first. The OS could track how often this happens and give a simple UI so that we can blame those applications that abuse it.
Simplest solution here is to implement the algorithm in hardware, with a new instruction that has all the security attributes. (Including resistance to power differential and timing differential attacks.)
Seriously, this has been talked about for ages now. If every platform had a good enough FPGA, it could be used for cryptography and/or to accelerate some specific computations. Abd without having a set of algorithms baked into the silicon, it would not make the device eventually obsolete as the world moves to better algorithms.
> This means that, on modern processors, the same program can run at a different CPU frequency (and therefore take a different wall time) when computing, for example, 2022 + 23823 compared to 2022 + 24436.
I'm not a hardware expert, and I was a bit surprised at this.
Is that because the transistors heat up more with certain input values, which then results in a lower frequency when the CPU gets hot enough? Something like AND(1,1) using more energy than AND(1,0) on the transistor level?
As far as I can tell [1], addition typically takes a constant number of cycles on x86 CPUs at least, so any difference should happen at a very low level.
I've got some background in FPGA development, and you can actually design for this effect. We literally have tools where you can plug in simulated activity and the tool will tell you how much dynamic power the chip will use and therefore what power & cooling you'll need. It works exactly as you say - different inputs are going to cause different numbers of transistors to need to switch every cycle and switching draws more power. So if you have a transistor that goes 1->0->1 every other cycle, that'll draw more power than a transistor that's just sitting at 0 the entire time.
Then, since modern CPUs have frequency scaling (fpgas generally don't) you can observe that a high switching rate would increase the power consumption which increases the heat and therefore causes the CPU to scale down the frequency.
Is it not possible to add noise by running other processes in parallel that will also cause frequency boosts to occur and colour the results? Bauscilaly the mitigation is to disable boost, but instead boosting more often or boosting in a controlled way (with another process triggering it) should so help mitigate.... That said if it was that trivial, surely intel or someone would suggest it.
Noise helps to increase the difficulty, but with a large enough sample size you can statistically exclude it, greatly simplified but essentially by doing X-avg(all X).
Imagine you do the above bit for bit. So first you sample 1M times to find the baseline, then flip one bit and sample another 1M times to see if any deviations, and so on.
There is also the chance of your PRNG being predictable, so attacker can predict what noise will be generated if they have seen enough of it.
True, but is it reasonable to assume the same key would be used for long enough on a go en operation to allow averaging to help? The mitigation only needs to work practically, not in a worst case theoretical situation.
Sometimes I wonder if the main use of quantum computers will just be to verifiably have no side channels, because any such side channel would act as a measurement (which produces phase errors that you can check for). It wouldn't be efficient but the computation would be provably isolated from the rest of the universe. Well... other than the input at the start and the output at the end.
Would it help to slighty reduce the granularity of the frequency adjustment? Just enough to make the analysis infeasible? It doesn't have to be all or nothing. We had a similar issue with browsers restricting access to high-precision timers in JavaScript.
You can pull off attacks like this from JavaScript by repeatedly recording the time and training a machine learning model on traces of instruction throughput over time, which my group did in a recent paper: https://jackcook.github.io/bigger-fish/
Could you elaborate on this attack? It’s an interesting read, but I’m curious about practicality.
How would you ensure that the user loads your malicious script, and has a running web worker for it?
I see that you trained it on 100 websites. Would you need to retrain for every new version deployed or different paths with varying content?
If your intention is to detect sensitive website accesses, wouldn’t you need those websites to be public to train the model first? I’m not convinced that detecting porn access is particularly malicious, but I acknowledge that it is illegal in some places.
You'd just need to put the script on any webpage the user might access and leave open, such as Google, or Facebook, or whatever. The attack isn't specific to JavaScript, so really you could put this in a desktop app too, think Slack, Spotify, etc. Any app or website that you know the target user is likely to open. CDNs are also a great target.
We evaluated on 100 websites as a proof of concept, but we also included experiments in an "open world" setup where the classifier has to predict whether the activity is from one of 100 sensitive websites, or whether it's none of them, and found that it's still very accurate in that more realistic setup. You would need to retrain to identify more websites outside of your set of 100.
The websites would need to be public, which is basically the same limitation as hertzbleed, since they need to know what they're looking for in order to identify an activity. Some use cases with this limitation aren't too hard to imagine: maybe you're in a country that bans access to major Western news sites but you're evading censorship with a VPN.
I’m a little confused about your attack vector - how feasible would you reckon it is to place such a malicious script on the largest public websites in existence, versus just getting the victim to install a Trojan? The latter could just literally monitor the user.
I’m not saying your paper is technically wrong, just practically infeasible.
Right now, you’ve chosen very specific websites. Have you explored if there is a correlation between specific scripts (react, jquery, etc) and whether websites with similar setups cannot be differentiated? I was also curious about content/non-homepage paths. Your conclusion seems to be that interrupts/etc are the primary indicators, so I suspect there’s a connection.
Edit:
In my experience, large websites and most web apps don’t use CDNJS/etc, but bundle their code - this would make injecting your script much harder without a supply chain attack.
On second thought, given CORS I think this attack is actually impossible. How would your embedded script communicate your findings with your server? You would need to control the originating domain itself…
I don't think any of these side channels are really easy to pull off without the technical capabilities of a nation state or something similar. I personally think embedding a malicious script in a CDN (e.g. https://blog.ryotak.me/post/cdnjs-remote-code-execution-en/) that serves a script for a large website, or something similar (https://blog.igorescobar.com/2016/08/21/ive-the-chance-to-tr...), is more realistic than getting the victim to install your program -- I would imagine sensitive individuals are very concerned about installing arbitrary software.
We did get a comment about this in our rebuttal but didn't end up including it in our final paper -- we found that we distinguished sites with the same frameworks (such as react, angular, and jquery) at the same accuracy at sites that used different frameworks.
We didn't do much research into content/non-homepage paths but it's a good area for future research. I would suspect it'll still do pretty well.
And yes, we concluded that the source came from interrupts (in Table 3 of our paper you can see we ran an experiment with frequency scaling turned off), which does make me question the practicality of hertzbleed. I wouldn't doubt it can be exploited somehow though.
In my experience, large websites and most web apps don’t use CDNJS/etc, but bundle their code - this would make injecting your script much harder without a supply chain attack.
On second thought, given CORS I think this attack is actually impossible. How would your embedded script communicate your findings with your server? You would need to control the originating domain itself…
The attackers do not read the CPU frequency, they estimate it based on the latency of the replies to their queries.
The attack works only for certain combinations of CPUs and cryptographic algorithms that contain a mixture of instructions that cause a CPU to lower its clock frequency, with instructions that allow the CPU to raise its clock frequency.
For such combinations, algorithms that are supposed to be executed in constant time are actually executed in a variable time, creating a side-channel.
As a response to Hertzbleed, Intel has published a guide for those who write cryptographic libraries, about how to mitigate this vulnerability:
The main problem that creates this vulnerabilty is that the CPU vendors publish very little information about their turbo algorithms, so, unless you make your own measurements, it is very difficult for a software writer to predict at which clock frequency will be executed a certain segment of a program, and what should be done to avoid changes in the clock frequency.
The frequency change is observable by the whole algorithm taking a different time to run - the algorithm is constant-time, but because the clock speed is changing based on the data, it's not constant-wall-clock-time and you can perform a timing attack.
And also when I set the scaling governor to "performance" (under Linux)? Is the frequency in that case still adjusted based on the data or always "maximum"?
With the performance governor, the clock frequency is continuously adjusted between the "base frequency" and the "maximum turbo frequency", e.g. between 3.7 GHz and 4.8 GHz, for my computer.
With the powersave governor, the clock frequency is continuously adjusted between a frequency much lower than the "base frequency", and also the "maximum turbo frequency", e.g. between 2.2 GHz and 4.8 GHz, for my computer.
> We disclosed our findings, together with proof-of-concept code, to Intel, Cloudflare and Microsoft in Q3 2021 and to AMD in Q1 2022.
Why did they choose to disclose their findings to just two software companies (Cloudflare and Microsoft)? Why not other software companies like Amazon or Google? Or developers behind open source cryptography libraries?
The attack in question was only tested on SIKE, so it seems logical to start targeted disclosure on the community using and developing it, while using the general disclosures to target the broader cryptographic community.
Both Cloudflare and Microsoft are one of the few companies that have put significant investments into developing SIKE for post-quantum cryptography. Microsoft has a SIKE research team, and Cloudflare has been exploring SIKE for post-quantum TLS for years.
Both companies also maintain the key open-source implementations of SIKE [1][2], and Microsoft is spearheading the effort to standardize SIKE through NIST. Most open source cryptographic libraries don't implement SIKE.
Seems like the simplest way to mitigate is to randomly throw some junk at the problem. Some random cypto code, some random no-purpose cryptographic calculations, should prevent any listener from gaining any useful information. It shouldn't take much, a single-digit percentage increase during cypto functions would be enough imho.
well, yes. if you’re an NSA-level actor your AES implementation hasn’t been `AES_encode(key, input)`, but `AES_encode(key, input, random)`. you then XOR the randomness into the input, do all your (modified) AES operations, and then XOR the randomness out [1]. the modified AES operations take about double the area/power because your “input” is effectively twice as long as it used to be, but there’s now next to zero correlation between input/key and power use.
like most things, i expect the reason they’re not adopted for consumer devices is because they use notably more power/area/are slower.
[1] enter "Provably Secure Masking of AES" into scihub and you'll find a useful paper by Blömer, Merchan and Krummel from 2004.
There are a dozen names for it. In the intelligence world, if you know that the enemy is listening on an unencrypted communications pipe, but you cannot afford to stop using that pipe, you throw random junk down the pipe until they cannot tell real from fake.
In this case, the name used here is masking, referring to what is called data masking, and the reference was to adding noise. There are other operations(as you point out) that could also be used (substitution, shuffling, etc.).
This is probably a naive question, but could this be mitigated by fencing a part of code by some “frequency fence” of some sorts? This is of course a long-term mitigation as it may require compiler support, may affect performance and other threads and whatnot, but I wonder what a proper solution would look like.
I'm curious whether this type of thing would work as well. It sounds like you're suggesting to be able to wrap sections of code in compiler-specific declarations (e.g., UNSAFE blocks in C#) that force the underlying hardware to operate at a constant frequency.
I have PRECISELY no idea whether that is coherent or makes sense. It's just interesting at a glance.
A server accepting TLS connections would have almost always some thread doing “frequency fenced” code so the CPU would always be frequency locked. It's much more practical to just disable Turbo-boost.
A more simpler mitigation is just add noise, perform random computations so the whole algorithm is constant time + random time. Greatly increases the difficulty of gathering timing data.
Agreed. Especially considering that the main threats are still good old social engineering and gullible users downloading and running malware.
Attack vectors like Hertzbleed require considerable resources, detailed knowledge about the target (in order to get the required preconditions right), and as others pointed out are easily detectable.
Interesting that the mitigation is to turn off Turbo/Precision Boost.
Four or five years ago there was an an article submitted here (I wish I could find it) about a developer who keeps a machine with Turbo Boost disabled specifically because it seemed to interfere with their performance testing. By keeping it disabled they were able to eliminate a number of factors that prevented them from getting consistent results. It sounded like they preferred this approach for working on optimizing their code.
I am not pointing this out to disparage this performance boosting feature, only calling it out as a point of interest
Yes, this is a common technique in optimization. With frequency scaling enabled, a profiled function may have more than one hot region, implying 'hot' and 'code' code paths, but are really just manifestations of CPU speed being non-constant.
These "optimizations" might not matter in the end though if the production environment runs with Turbo Boost enabled. Unless they are verified on another machine of course.
Neither AES-NI nor ChaPoly can be influenced by this vulnerability, because they do not use the secret key with different kinds of instructions, that might consume different powers. The secret key is used only in XOR operations. Other secret state of the ciphers is also used only in simple operations, e.g. XOR, additions and rotations, where there is very little variation of the power consumption depending on the operand values.
The cryptographic algorithms that have chances to be influenced are those based on public keys, which compute arithmetic operations with large numbers that can cause changes in the clock frequency.
Interesting, and seems like a natural followup to this side channel: http://www.cs.tau.ac.il/~tromer/papers/acoustic-20131218.pdf (RSA Key Extraction via Low-Bandwidth Acoustic Cryptanalysis), in which researchers deduced that the high-pitched sounds made by CPUs could leak the operations that GPG was performing to decrypt some encrypted content, and thus leak the private key. All you need is a microphone and the ability to trick the victim into decrypting some known content.
We need an industry-wide effort for coordination between cryptography library owners & device/chip vendors to ensure the use of constant CPU frequencies during cryptographic operations.
It's odd that the authors haven't chosen to initiate this themselves, as it seems like the proper solution to this vulnerability.
There are no "cryptographic operations" on the hardware level. It's just normal math, therefore cryptographic code would have to give hints to the processor to enable such countermeasures. Such a facility does not seem to exist yet, and this is why this vulnerability is considered to be unfixable. In comparison, there were workarounds for Spectre because compilers can emit code where the processor cannot apply the dangerous optimizations.
There are special CPU instructions to help speed up cryptographic algorithms, but applying countermeasures to these is not always crucial. They only matter if an attacker could otherwise create a side-channel. This applies in TLS, but not, e.g., when verifying checksums of downloaded software.
> therefore cryptographic code would have to give hints to the processor to enable such countermeasures. Such a facility does not seem to exist yet, and this is why this vulnerability is considered to be unfixable
That is what I am describing. I am proposing that we need to implement these facilities in firmware/microcode.
I think it's still intended as a pun as Hertz (the physical unit and the name of Heinrich Hertz) and Herz are pronounced alike.
Wikipedia and Wiktionary also have some suggestion that the name Hertz is etymologically related to the word Herz, although they disagree about exactly how.
Can't wait until 2050 when all of our computers are bogged down with energy hungry security chips and processors that barely get any real work done because the security arms race demands ever increasing resources...
honestly, i dont think this is some universal remote exploit, despite it being remotely exploitable. under certain circumstances seems a keyword here..
this is incredibly clever and devious, but mostly i think practical locally. since different cpus have different power usage, and systems roll different configurations, id expect the most reliable use case would be for instance to roll a custom os on a confiscated device to learn what power throttling patterns it has related to this kind of attack and then perform that on the systems original installation to decrypt it. (something along those lines). I think, maybe better in a counter example, that it is unlikely that someones online service or personal system would be ever exploited by this. why? because the system runs a lot of threads in general when its being used, making it much harder to predict what measurements mean what. if a system is not idle during the attack, its hard to deduce if timing diffetences are related to the attack or for example just other tasks/threads being executed during the attack.
Unfortunately, trying to defeat side-channel attacks by adding random noise usually only increases the number of samples required to extract information, rather than preventing the attack entirely. (You can blame the central limit theorem for this.)
This is not entirely true. Or rather: it is true when the countermeasure is to add random delays to pad out overall timing, since one can simply collect more samples to obtain an average. And that may be what the OP is suggesting: just scale the frequency to random levels that are not quite the pre-programmed ones, which is very similar to adding random delays. (In practice this might actually work well enough to defeat delicate attacks.) However what I hoped the OP was suggesting is to add random instructions as a way to prevent the processor from switching power modes: sort of like tapping your phone screen occasionally to keep it from dimming.
There are also other (unrelated) techniques that use randomness to eliminate side channels. One of the most basic anti-timing-attack countermeasures is to use RSA blinding in which a base C is first randomized by computing C^r mod N before that (random) result is combined with the secret key. The randomness can then be removed from the final result. This defeats attacks that depend on choosing or knowing the value C.
How about making operations constant time in application code by picking an upper bound which is acceptable for the application, but that is certainly longer than the actual CPU computation and then waiting until the upper bound to return the result?
Eg. my app is performing digital signatures and I'm sure that they take <1ms CPU time, but performing digital signatures in 10ms is acceptable for my application, so when I perform a signature I measure the CPU time elapsed, say 0.5ms and then wait for 9.5ms.
What you are describing is called “quantization”. It has some disadvantages, one of which is that people tend to disable it the second they hit a use case that requires better performance. It is also sometimes possible to distinguish “doing nothing and waiting for a timer” from “actively computing the result” if you have other access (eg you can measure response time by making another connection.)
Does that necessarily mean that all is lost? It sounds possible to arbitrarily extend the time necessary. Don't we already do this when we choose encryption keys with lengths that take an arbitrary amount of time to crack by brute force?
Any noise strong enough to have a good chance of hiding the signal would completely defeat the benefit of having dynamic frequency scaling in the first place, I think.
Except that the processor can be doing other work during artificially induced noise delays, something not possible with delays introduced by lower mean frequency.
The attack isn't nicely asking the computer "hey how fast are you running right now?" and then deriving the private key from that data. If that was the case the fix would be as simple as you laid out here.
This attack works by measuring the absolute (wall) time that elapses during many crypto operations and deriving the speed / private keys based on statistical methods applied to that timing data.
Side-channel attacks are by definition, attacks against unintentional information leakage by a machine. The laws of thermodynamics virtually ensure that side channel attacks will be a persistent issue as long as computers are made of matter and consume electricity, multi-tenant computing exacerbates the issue.
Assume you’re a tenant on a cloud service provider and you don’t care about power consumption… can you mitigate this by running a process with a busy loop that forces the CPU into max frequency at all times, with `nice` set to run it at lower priority than your actual workload?
Wow, I didn't know that frequency scaling on CPUs was a function of the workload being processed, I thought it was a function of CPU temperature, which would be much less easy to glean meaningful data from (presumably it has a large deal of hysteresis, and you'll have to somehow run a computation millions of times and then run another computation millions of times and compare them). I'm not convinced that I'm wrong.
The paper is pretty good and does a great job explaining this:
Basically, P-state / frequency governor side effects cause "constant-time" implementations of some algorithms like SIKE not to be constant time anymore - because in reality, these implementations were never "constant-time" but rather "constant-cycles" and with clock speed changing, so does the observed wall-clock time.
Once this observation is made and the timing oracle understood, it's just a normal remote timing attack - spam the service with constructed data, measure the response time, and eventually your oracle tells you when you got bits right or not.
This reminds me of: In the 90s I rememebr hearing a story about someone hacking a supposedly 'impossible' remote machine for a competition - they did it by analysing the response times and using that info to derive the key - at the time, a novel approach. Can anyone remember the story I must be dimly remembering?
Isn’t the real
long term mitigation here to do all crypto operations on a separate chip? Rewire platform libraries to use the TPM/SecureEnclave backends exclusively. Then if you need “soft crypto” you are kinda on your own in “you better know what you’re doing” territory?
Can someone explain this to a non-crypto expert? I understand the concept that information can leak via timing measurements. However I don’t understand how this can extract the exact bits of a signing key from this?
So if the encryption function would look at an actual timer, and insert bogus calculations at random places during encryption to pad the execution time, would that remove the information this attack needs?
From a theory point of view, adding "bogus calculations at random places" would probably just increase the number of measurements required - it would introduce additional jitter above and beyond the large amount already accounted for in the attack documented in the paper, but the central limit/tendency over a large enough set of repeated measurements would still have multiple peaks.
Adding a minimum wall clock floor (i.e. simply waiting to release the decrypted data to a client until a given wall clock time has passed from initiation) would close the door on this particular remote exploitation, although it would leave the door open to local/hardware attacks (power, frequency-analysis, parallel process checking P-states as the oracle instead of overall timing).
> What can you do about it? Nerf your CPU performance by disabling "turbo boost" or equivalent. Should you do it? Probably not unless you're particularly vulnerable (journalist, human rights activist, etc.)
Would not another option be to do something that temporarily maxes out the CPU and forces it into boost mode, immediately prior to executing the crypto operation? But not for such a long duration that the CPU reaches any thermal limits and decreases its speed again.
Obviously energy inefficient and not good for laptops or portable devices.
„This particular attack demo succeeded with toy models and toy signal processing, so I'd expect state-of-the-art models and state-of-the-art signal processing to extract secrets from many more programs, _except_ when users protect themselves by setting constant CPU frequencies.“
> This means that, on modern processors, the same program can run at a different CPU frequency (and therefore take a different wall time) when computing, for example, 2022 + 23823 compared to 2022 + 24436.
I'm a layman when it comes to things this low level however, I always assumed that different addition inputs would take different amounts of wall time, but looking it up it turns out that in theory I was wrong, but I guess I'm actually correct. ¯\_(ツ)_/¯
Adding individual bits can be parallelized, as long as there is no carry. If there's a carry, then we have to wait for it to be computed and propagated. Compare adding 0b01111011 + 0b00000001, versus 0b01111111 + 0b00000001. If we first compute the sum of each pair of bits, then recompute if there's a carry bit, the first will complete after 3 cycles, whereas the latter will complete after 8.
It doesn't seem that this affects wall time for a single addition though, at least on typical x86 CPUs. If you look at Agner's performance tables [1], ADD instructions always take the same number of cycles.
I'm not a hardware expert, but I'm guessing that what's happening here is that transistors get hotter with certain input values more than others. Eventually this results in higher overall CPU temperature and a lowering of CPU frequency to compensate.
Isn't this sort of a stand-in for power draw side-channel analysis? I guess it is cool that you can do it purely from software rather than needing physical access.
My understanding from reading the page is that modern processors process certain data with higher frequency, and somehow that allows an attacker to know guess private keys.
However, I don't understand the connection between those two things. How would an attacker trigger a lot of almost-identical CPU runs without hitting some rate limit somewhere? And how is this different than just guessing the password?
So I take it when they say "constant time" for things like SIKE, they aren't sleeping for X milliseconds, but are just using some operation that is thought to be effectively constant time, hence this vulnerability? What is the countermeasure for this? Are crypto systems that always wait a full second using system timers, for example, immune to this sort of thing, or is it still detectable even in those circumstances?
>Are crypto systems that always wait a full second using system timers, for example, immune to this sort of thing
No. Such a crypto system would still leak information via the amount of power it consumes, which might change the frequency of the cpu, which could be measured by an attacker through the other processes of the computer.
It probably didn't matter too much since Microsoft and Cloudflare were notified at the same time as Intel. Both of them run AMD hardware in their datacenters. It does seem weird though.
Is it not possible to add noise by running other processes in parallel that will also cause frequency boosts to occur and colour the results? Bauscilaly the mitigation is to disable boost, but instead boosting more often or boosting in a controlled way (with another process triggering it) should so help mitigate.... That said if it was thst trivial, surely intel or someone would suggest it.
I haven't looked at the article but this sounds like a local exploit, right? Those were important in the timesharing era, but with personal computers we temporarily had an era when we didn't have to let hostile code run on our computers. When will we learn that we shouldn't have given that up? Local exploits will never go away, at least on high performance machines.
What’s constant time? Crypto libraries need to do operations to encrypt and decrypt your data. The simple, naive implementation of these operations will work - giving correct input and output. However, a person can time the operation being performed and learn about the key being used. If you’ve deployed on a server and the other person can submit any text they want, whenever they want, they would be able to extract the key from your naive implementation. That’s bad, the worst outcome possible.
That’s why good libraries will make sure that these operations take the same amount of time, regardless of input. So we thought we were safe.
And now these authors tell us, no. That’s not the case. The guidelines used by crypto library developers don’t protect against the attack being described here.
If your cryptographic library is not constant time then it is already vulnerable. This new attack is able to target the even previously unaffected constant time libraries - that's why they call it out specifically in their FAQ, but this is saying that _all_ cryptographic libraries are vulnerable to timing/side channel attacks (when running on processors which don't have these performance features disabled).
I do wonder, if only the Turbo P-States are what cause the vulnerability. Is relying on Deep C-states for instance an alternative to get power savings?
On my server during idle, when cores enter C6, the power savings are at their maximum and no frequency scaling can match that. Why not just rely on that? (Ignoring the loss of turbo boost ofc)
So would this rely on a known compiled binary they could reliably project/simulate/anticipate?
This seems vaguely like when they would use page fault boundaries to extract passwords. An OS/hardware event that occurs within some if-then to leak parts of the "key".
Do we need some sort of "random delays" or binary execution randomization?
A lot of people here commenting about shared hosting in clouds, but I don't see any actual text that shared environments are more vulnerable.
It sounds like a black box timing attack that could target my laptop, my phone, my server, anything that does cpu frequency scaling and is performing a computation that is susceptible to this attack.
Shared hosting is where an attack like this is most useful. Because you don't need remote code execution on a (virtual) machine. You just need to happen to be colocated with it.
For RCE on a laptop, server, phone, etc. You just need privilege escalation to get equivalent access, which tends to be easy.
I’m wondering the same thing but I’m curious if the heterogeneous nature of modern ARM processors is essentially equivalent- if you can get the same crypto primitive to run first on a P-core and then an E-core, can you measure the difference for a similar effect?
Some cryptographic implementations are blinded such that as the number of attempts increase the amount of 'secret' data recovered (e.g. via power/emi sidechannels-- which this acts like) also increases. If the rate of uncertainty increases faster than the rate of leaked data, then the attack should fail.
That first paragraph is perfect. It's an exact description of the concept and it's impossible to know whether this is a shower thought or whether 1,000 Intel engineers are going to spend the next 3 years added RNGs to their clock generation circuitry.
Given that cloud providers oversubscribe their rack power supplies for $ reasons, I'm waiting for the cloud-level equivalent of this DVFS attack, where you throttle a competitor cloud instances by bursting on collocated instances of yours :)
As I've said before, these announcements could benefit from better "action items" or "TLDR" for the average person with other problems to think about. What libraries are affected, what do I need to upgrade exactly, on Ubuntu, etc etc. And I'm guessing this is intended to reach those people (among others) given the effort they put into the graphics, etc.
In this case. "Am I affected?" "Likely, yes. It's on all the CPUs". Okay, but how does this work exactly? Is the Hertzbleed going to take over my computer and steal my private keys from my hard drive? Do I need to be running a server? Do I need to be in the middle of a cryptographic operation with the key in question? Etc.
"What is the impact" Ah, this sounds like the useful part. "...modern x86 cpus... side channel attacks...power consumption...constant-time execution". Nope, this isn't it either.
I think this is simply a matter of being so deeply embedded in something, one forgets how much is assumed. If they showed it to an outsider first they'd get the necessary feedback.
I'm sorry, sometimes (often) it's my reading comprehension. The answer I was looking for was in the first f-ing paragraph. (And I checked archive.org, it was there yesterday)
"In the worst case, these attacks can allow an attacker to extract cryptographic keys from remote servers that were previously believed to be secure."
I'll probably have more to complain about the messaging when a fix comes out, but for now mea culpa.
There really isn't anything for the "average person" to do here, who wouldn't understand any of your questions, either (the library? The one with all the dirty books?)
Here’s a simple mitigation — don’t have your encryption depend on 2022 + 23823 being compared to 2022 + 24436.
The idea that a cpu frequency change (based on cpu load) could be detected, and if detected — that it could lead to any useful information by an attacker is laughably preposterous.
The only theoretical vulnerability is if someone in a shared data center was able to gain control over a system on dedicated hardware that had nothing else running on it — exploit some code on it that triggers and expected frequency — open the cage and case, detects the frequency (by turning off all other hardware in the vicinity — meaning you already know which machine it is) and then by exploiting the machine you already control (and have already isolated) you can then physically identify the machine you have exploited.
Can't intel and amd just change how long a core stays at a turbo frequency to mitigate this? I.e.: if it scales up by 1hz, it can't scale down by that much until N number of cycles.
Pardon my ignorance but why wouldn't sufficient noise make it practically impossible to timing attacks. Very short secret material being processed multiple times?
If the frequency scale is known to user applications, I presume jittering response times proportional of the scale factor just before write() would be effective.
I think the intention here would be to provoke random jitter. So rather than trying to fight it with constant-time algorithms that turn out not to be under certain conditions, we make all the timing unreliably measurable.
I think the terminology is little awkward. It's not algorithmic constant time, and it's not wall-clock constant time, but, I suppose, clock rate-relative input-independent time. So the options are 1) don't change the frequency, which has systemic negative effects, or 2) start with input-independent timing and purposefully skew it.
It wouldn't, not by itself. The attack would take more measurements to create a profile, however. Extending the time required to mount an attack is probably not sufficient to thwart an attack. It could be for some workloads, but not for all.
If you're already executing native code on the machine, you probably have the ability to read and write all the other memory of every other user mode process, so you don't need this to attack cryptographic keys stored there. This attack is more against secure enclaves.
We did exactly this in a recent paper we're presenting at ISCA next week (see https://jackcook.github.io/bigger-fish/) -- it's very possible for an attacker to do this. However, we didn't find that the signal the attacker found was due to frequency variations (and we did run an experiment to test this), but rather due to system interrupts.
The 'S' in 'RSA' is Adi Shamir, who has spent a lot of his career analyzing side-channel attacks. It is especially a problem with special-purpose cryptographic hardware because it tends to be within a small multiple of 'just enough' hardware to do the task. It's a lot easier to spot a 2% increase in processing time (or for that matter, current draw) when the hardware only runs one task, and the task is dominated by CPU time rather than other factors.
But analysis tools only get better over time, so the scenarios where they are useful multiply.
That is my reading of the text on the linked page.
>We have demonstrated how a clever attacker can use a novel chosen-ciphertext attack against SIKE to perform full key extraction via remote timing, despite SIKE being implemented as “constant time”.
If you created an algorithm that evaluated all possible 32 bit inputs in parallel and then picked the correct value at the end based on the input, you'd still have some funky corner case where the branch predictor in your x64 processor spilled the beans. Are we going to have to design our crypto algorithms entirely on SIMD instructions to combat this sort of thing?
>Are we going to have to design our crypto algorithms entirely on SIMD instructions to combat this sort of thing?
There is likely still potential for side channel attacks. From a 'first principles' approach a computer is always going to leak information about it's current state (power, noise, emi, etc) and the methods / tools / techniques for analyzing that leaking information are only getting better.
The multi-tenant nature of modern infrastructure is the bigger issue in play here.
It sounds like there's some mitigations available for the crypto libraries, but perhaps defense-in-depth is going to require the libraries to do "junk work" to obfuscate what's happening against future attacks like this.
(I wonder if one is possible if the same key were to be used on different processors, if that would leak certain information, for example.)
Well, I haven’t finished reading the paper but this is the kind of thing that would basically get is down to security iff sharing is disabled and performance is compromised. So, pick one.
I didn't care about Spectre, Meltdown, or any of the other obscure timing side-channels that came after them either, because they relied on so much detailed information about the environment being attacked that you'd almost certainly be able to get the information you wanted by some much easier way.
Attacking something that doesn't seem to be in much use either doesn't make me any more worried either. Go after e.g. TLS, SSH, AES, RSA, etc. if you want to get our attention, but I suspect that trying this in practice, you're going to be overwhelmed by all the other sources of noise --- especially over a network connection -- that you won't be very successful at all. They mention 36h and 89h to get the key (few dozen bytes), and I assume that was in a basically ideal environment with nothing else to measure.
Those of us familiar with hardware would know that things like this are pretty natural; but unlike these people, we don't go feeding the paranoia machine and driving us even more towards the growing dystopia.
I was rather disappointed as well. The original *bleed attack had private keys coming right out of the response stream, but this hasn't demonstrated anything close to that. Sure, the theory is sound but the practice seems to be more of an educational setup.
>we don't go feeding the paranoia machine and driving us even more towards the growing dystopia.
It's an academic research paper with a website and a logo, it's not like they're broadcasting on the 6pm news. Would you rather the research not be done at all? Or just not posted on the websites you visit?
It also requires having access to continuously "challenge", for many hours, a server which seemingly has no other processing to do but running this one crypto algorithm in an otherwise noise-free environment.
I think you overestimate what kind of information most employees -- even those who build the software that runs on their servers -- have about their execution environments.
Sure, but at one point in time I had the names, addresses, social security numbers, DoBs and in-patient statuses of around 20k people. I didn't want it or like that it was there, but it was due to carelessness.
Or the time I found a client database was actually a flat file with usernames, emails and passwords in plain text.
Hertzbleed is out of a sci-fi movie. The stuff a lot of developers come across is not exploited sheerly out of professionalism.
Same here. I just can't get worked up about these anymore. It was a while before Spectre and Meltdown were fully mitigated in most OSes, and I imagine there are a lot of appliance-like devices out there that aren't fixed and will never get fixed. And yet where's the news of all the active exploits floating around, being used to ruin people's day? Sure, no evidence doesn't mean evidence of nothing, but I think we have a lot more to worry about than stuff like this. Especially given that Hertzbleed's target for their research was SIKE, which... I'd barely heard of it until now.
It has a certain beauty to it! Confusingly enough Herz and Hertz are homophones but the former is the spelling used for heart. However it's possible the name Hertz is derived from an arcahic spelling for heart.
And yet folks will keep using cloud services and multi tenant offerings until we have regulations forbidding multi tenant computing for sensitive data.
It's so cool that x86 is completely fucked security-wise because of all the perf hacks that have been introduced - and yet, computers never seem to get any faster.
So this can be used on so called ‘airgapped’ devices, but what if you house the machine in a giant Faraday cage to prevent this? Maybe a little paranoid, but if your threat model requires it, then surely Faraday cages would make sense no?
Sure in very specific threat models you want to run in a Faraday cage. People already do so if they build for example alternative LTE network or they use device that leak in the RF. Also you need to isolate the power supply. But it has nothing to do with the article
Thanks for pointing that out. I was just thinking if you want to exfiltrate secrets then you need some sort of network to pass them on remotely. An air gap stops the secrets being leaked. Are you saying you can exfil by merely having access to the power?
Are you affected? Very likely. What can you do about it? Nerf your CPU performance by disabling "turbo boost" or equivalent. Should you do it? Probably not unless you're particularly vulnerable (journalist, human rights activist, etc.)
One thing I found interesting that may get changed later, so I'm documenting it here, is in their FAQ they say:
> Why did Intel ask for a long embargo, considering they are not deploying patches? > > Ask Intel.
So Intel did ask for a long embargo, then apparently did nothing about it. My guess is they investigated "can we actually mitigate this thing with a microcode update?" and arrived at the conclusion after actually trying - or possibly after external influences were exerted (you be the judge) - that no, there's not much you can really do about this one.
Later in the document another FAQ says:
> [...] Both Cloudflare and Microsoft deployed the mitigation suggested by De Feo et al. (who, while our paper was under the long Intel embargo, independently re-discovered how to exploit anomalous 0s in SIKE for power side channels). [...]
Which is again telling us that there indeed WAS a long embargo placed on this research by Intel.
Only mentioning this here just in case the PR spin doctors threaten the researchers into removing mention of Intel on this one. Which honestly I hope doesn't happen because my interpretation is that Intel asked for that long embargo so they could investigate really fixing the problem (state agencies have more methods at their disposal and wouldn't need much time to exert influence over Intel if they decided to). Which speaks well of them IMO. But then again, not everybody's going to come to that same conclusion which is why I'm slightly concerned those facts may get memory-holed.