I say this as a lover of FHE and the wonderful cryptography around it:
While it’s true that FHE schemes continue to get faster, they don’t really have hope of being comparable to plaintext speeds as long as they rely on bootstrapping. For deep, fundamental reasons, bootstrapping isn’t likely to ever be less than ~1000x overhead.
When folks realized they couldn’t speed up bootstrapping much more, they started talking about hardware acceleration, but it’s a tough sell at time when every last drop of compute is going into LLMs. What $/token cost increase would folks pay for computation under FHE? Unless it’s >1000x, it’s really pretty grim.
For anything like private LLM inference, confidential computing approaches are really the only feasible option. I don’t like trusting hardware, but it’s the best we’ve got!
There is an even more fundamental reason why FHE cannot realistically be used for arbitrary computation: it is that some computations have much larger asymptomatic complexity on encrypted data compared to plaintext.
A critical example is database search: searching through a database on n elements is normally done in O(log n), but it becomes O(n) when the search key is encrypted. This means that fully homomorphic Google search is fundamentally impractical, although the same cannot be said of fully homomorphic DNN inference.
There has been a theoretical breakthrough that makes search a O(log n) problem, actually, (https://eprint.iacr.org/2022/1703) but it is pretty impractical (and not getting much faster).
Good point. Note however that PIR is a rather restricted form of search (e.g., with no privacy for the server), but even so, DEPIR has polylog(n) queries (not log n), and requires superlinear preprocessing and a polynomial blowup in the size of the database. I think recent concrete estimates are around a petabyte of storage for a database of 2^20 words. So as you say, pretty impractical.
Even without bootstrapping FHE will never be as fast as plaintext computation: the ciphertext is about three orders of magnitude much larger than the plaintext data it encrypts, which means you have to have more memory bandwidth and more compute. You can’t bridge this gap.
Technically, there are rate-1 homomorphic encryption schemes, where ‘rate’ refers to the size ratio between the plaintext and the ciphertext. They’re not super practical, so your general point stands.
That actually sounds pretty reasonable and feels almost standard at this point?
To pick one out of a dozen possible examples: I regularly read 500 word news articles from 8mb web pages with autoplaying videos, analytics beacons, and JS sludge.
That’s about 3 orders of magnitude for data and 4-5 orders of magnitude for compute.
Sure, but downloading a lot of data is not the same as compute on this data. With web you simply download the data, and pass the pointers to this data around. With FHE, you have to compute on extremely large cipher texts, using every byte of them. FHE is roughly 1000x more data to process and it takes about 1000x more time.
Don't you think there is a market for people who want services that have provable privacy even if it costs 1,000 times more? It's not as big a segment as Dropbox but I imagine it's there.
FHE solves privacy-from-compute-provider and doesn't affect any other privacy risks of the services. The trivial way to get privacy from the compute provider is to run that compute yourself - we delegate compute to cloud services for various reasonable efficiency and convenience reasons, but a 1000-fold less efficient cloud service usually isn't competitive with just getting a local device that can do that.
If Python devs/users had to actually use all pure Python libraries, no C bindings or Rust bindings, no RPC to binaries written in faster languages, it would get dropped for a ton of use cases, absolutely including its most prominent ones (machine learning, bioinformatics, numeric analysis, etc.).
Yeah, because people use python when it doesn't matter and c++ when it does (including implicitly by calling modules that are backed by c implementations).
That is not an option with FHE. You have to go all in.
Yes but with FHE it also depends on the use-case and how valuable the output is and who is processing it and decrypting the final output.
There are plenty of viable schemes like proxy re-encryption, where you operate on a symmetric key and not on a large blob of encrypted data.
Or financial applications where you are operating on a small set of integers, the speed is not an issue and the output is valuable enough to make it worth it.
It only becomes a problem when operating FHE on a large encrypted dataset to extract encrypted information.
The data extracted will need to offset the costs. As long as companies don't care about privacy, this use-case is non-existent so its not a problem that its slow.
For military operations on the other hand, it might be worth the wait to run a long running process
You're not joking. If you're like most people and have only a few TiB of data in total, self hosting on a NAS or spare PC is very viable. There are even products for non-technical people to set this up (e.g. software bundled with a NAS). The main barrier is having an ISP with a sufficient level of service.
However if you actually follow the 3-2-1 rule with your backups, then you need to include a piece of real estate in your calculation as well, which ain’t cheap.
I have true 3-2-1 backups on a server running proxmox with 32 cores, 96gb of ram, and 5TB of ssd disks (2TB usable for VMs). Cost me $1500 for the new server hardware 2 years ago. Runs in my basement and uses ~30w of power on average (roughly $2.50/mo). The only cloud part is the encrypted backups at backblaze which cost about $15/mo.
Its a huge savings over a cloud instance of comparable performance. The closest match on AWS is ~$1050/mo and I still have to back it up.
The only outage in 2 years was last week when there was a hardware failure of the primary ssd. I was back up and running within a few hours and had to leverage the full 3-2-1 backup depth, so I am confident it works.
If i was really desperate i could have deployed on a cloud machine temporarily while i got the hardware back online.
Some quick checking on Newegg and I came up with this. https://newegg.io/64113a4
About $1200.
I didn’t look into the power draw for this setup.
Added bonus there is space for a GPU if you want to do some AI stuff.
Each 2TB of SSD is like $85, double it if you want local redundancy in your software RAID.
The rest is basically a nice custom PC minus a cool case and a high end GPU. A 9950X is $500, a 2x48GB kit is maybe $200. A few hundo more for a mobo, PSU and basic case.
If you self-host your NAS, then your server has access to the data in clear to do fancy stuff, and you can make encrypted backups to any cloud you like, right?
Some people I know make a deal with a friend or relative to do cross backups to each others' homes. I use AWS Glacier as my archival backup, costs like 3 bucks a month for my data; you could make a copy onto two clouds if you like. There are tools to encrypt the backups transparently, like the rclone crypt backend.
It can be quite high, but it doesn't have to be. For instance, I have a 7TB storage server from Hosthatch that's $190 for 2 years. That's $7.92 per month, or £5.88 at today's exchange rates. That's under 20p per day.
Just on electricity costs alone, this is good value. My electricity costs are 22.86p/kWh which is pretty cheap for the UK. That means that if having that drive plugged in and available 24/7 uses more than 37W, it's more expensive to self host at home than rent the space via a server. Also, I've not needed to buy the drive or a NAS, nor do I have to worry about replacing hardware if it fails.
They tend to do promotions, typically only valid for 24h and only advertised on certain forums like LET, a couple of times per year - typically at least around their company anniversary date or Black Friday.
There are others too, e.g. Servarica who keep their Black Friday offers running all year round.
> There are others too, e.g. Servarica who keep their Black Friday offers running all year round.
I don’t understand the logic here so I’m going to assume I’m being obtuse. Doesn’t that just mean that’s their standard price? Why or how would you ever pay more?
Yeah, I kind of agree in the latter case. Black Friday deals often have lower priority support etc.
I guess with Servarica, they have their standard deals, but for Black Friday deals are generally thin margins, but still enough to cover costs. Typically every year, they have special deals that are a bit different to their previous offerings. As a result, some people prefer the previous deals, some prefer the new ones, so they keep them all going. It's a bit unusual. They've also got a few interesting deals, like start with N TB and it grows a bit every day. If you keep these more than about 3-4 years, these are probably better value for money, but I think you're paying too much in the first few years. It's interesting if your primary use case is incremental backups.
Hosthatch's deals are a bit different as they're usually preorders and at almost cost with basically minimal support, whereas they keep their normal stuff in stock and have higher support levels.
I should also add that I've not personally used Servarica, even though they look interesting - just because they only have a Canadian datacenter. I have 4 Hosthatch servers spread all over the globe so that I have more redundancy in my backups. I only buy them when they have deals, assuming I don't miss them as they're only for 24h.
For very large amounts of data, the cloud provider can hit economies of scale using tape drives ($$$$ to buy a tape drive yourself) or enterprise-class hard drives (very loud + high price of entry if you want redundancy + higher failure rate than other storage). That's why storing data in the slower storage classes in S3 and other object stores is so cheap compared to buying and replacing drives.
The statements made in the linked description of this cannot be true, such as Google not being able to read what you sent them and not being able to read what they responded with.
Having privacy is a reasonable goal, but VPNs and SSL/TLS provide enough for most, and at some point your also just making yourself a target for someone with the power to undo your privacy and watch you more closely- why else would you go through the trouble unless you were to be hiding something? It’s the same story with Tor, VPN services, etc.- those can be compromised at will. Not to say you shouldn’t use them if you need to have some level of security functionally, but no one with adequate experience believes in absolute security.
> The statements made in the linked description of this cannot be true, such as Google not being able to read what you sent them and not being able to read what they responded with.
If Google’s services can respond to queries, they must be able to read them.
If A uses a cereal box cipher and B has a cereal box cipher, B can can make sense of encoded messages A sends them, A can ask about the weather, and B can reply with an encoded response that A can decode and read. B is able to read A’s decoded query, and B knew what the weather was, and responded to A with that information.
there is, it's called governments. however this technology is so slow that using it in mission critical systems (think communication / coordinates during warfare) that it is not feasible IMO.
the parent post is right, confidential compute is really what we've got.
Honestly, no? Unless you get everyone using said services, then a market that is only viable to people trying to hide bad behavior becomes the place you look for people doing bad things?
This is a large part of why you have to convince people to hide things even if "they have nothing to hide."
I get that there is a big LLM hype, but is there really no other application for FHE? Like for example trading algorithms (not the high speed once) that you can host on random servers knowing your stuff will be safe or something similar?
I speak as someone who used to build trading algorithms (not the high speed ones) for a living for several years, so knows that world pretty well. I highly doubt anyone who does that will host their stuff on random servers even if you had something like FHE. Why? Because it's not just the code that is confidential.
1) if you are a registered broker dealer you will just incur a massive amount of additional regulatory burden if you want to host this stuff in any sort of "random server"
2) Whoever you are, you need the pipe from your server to the exchange to be trustworthy, so no-one can MITM your connection and front-run your (client's) orders.
3) This is an industry where when people host servers in something like an exchange data center it's reasonably common to put them in a locked cage to ensure physical security. No-one is going to host on a server that could be physically compromised. Remember that big money is at stake and data center staff typically aren't well paid (compared to someone working for an IB or hedge fund), so social engineering would be very effective if someone wanted to compromise your servers.
4)Even if you are able to overcome #1 and are very confident about #2 and #3, even for slow market participants you need to have predictable latency in your execution or you will be eaten for breakfast by the fast players[1]. You won't want to be on a random server controlled by anyone else in case they suddenly do something that affects your latency.
[1] For example, we used to have quite slow execution ability compared with HFTs and people who were co-located at exchanges, so we used to introduce delays when we routed orders to multiple exchanges so the orders would arrive at their destinations at precisely the same time. Even though our execution latency was high, this meant no-one who was colocated at the exchange could see the order at one exchange and arb us at another exchange.
But shouldn't proper FHE address most of these concerns? I mean, most of those extra measures are exactly because if you can physically access the server, it's game over. With FHE, if the code is trusted, even tampering with the hardware should not compromise the software.
How does FHE help with someone executing a process on the server that affects the latency of your trading algo? eg by sucking up the CPU resources you need to do FHE.
How does FHE help with the fact that regulators generally want single-tenant shared-nothing for registered broker/dealers? Have you tried to explain a technical mitigation like FHE to a financial regulator? I have, there are 2 standard responses:
1) (in the US) "We strongly prefer single-tenant shared nothing. I won't officially say whether or not we deem your technical mitigation of using FHE to be sufficient. If we think it's insufficient we may take regulatory action against you in the future. Us not taking action doesn't mean we think it's sufficient."
2) (in places like Switzerland) "We strongly prefer single-tenant shared nothing. I'm not sure I fully understand the technical mitigation of FHE you are putting in place, but I'm going to increase your regulatory capital reserves. Send us some more white papers describing the solution and we may not increase your capital reserves further".
Singapore is the only exception where you have a regulator who is tech-savvy and will give you a clear answer as to whether something or not is OK.
I give a concrete example in the GP post but the reason is that the high-speed people can take advantage of you in certain circumstances if you don’t have extremely accurate timing of things like order placement.
As another example, imagine you are placing an options order on one exchange and a cash hedge on another exchange (eg for a delta hedge). If someone sees one half of your order and has faster execution than you, they can trade ahead of you on the other leg of your trade, which increases your execution cost. This is even more important if you’re doing something like an index options trade on one side and the cash basket (all the stocks in the index) on the hedge side.
The fix for this is to use hi-res exchange timestamps (which the exchange gives you on executed trades) to tune a delay on one leg of your execution so both halves hit at precisely the same time. This ensures that HFTs can’t derive an information advantage from seeing one half of your trade before you place the other half of the order.
I encountered the situation where one company had the data, and considered this to be really valuable and did not want to show/share it.
Another company had a model, which was very considered very valuable and did not want to show it. So they were stuck in a catch22.
Eventually they solved the perceived risk via contracts, but it could have been solved technically if FHE were viable.
I'd also like to comment on how everything used to be a PCIE expansion card.
Your GPU was, and we also used to have dedicated math coprocessor accelerators. Now most of the expansion card tech is all done by general purpose hardware, which while cheaper will never be as good as a custom dedicated silicon chip that's only focused on 1 task.
Its why I advocate for a separate ML/AI card instead of using GPU's. Sure their is hardware architecture overlap but your sacrificing so much because your AI cards are founded on GPU hardware.
I'd argue the only AI accelerators are something like what goes into modern SXM (sockets). This ditches the power issues and opens up more bandwidth. However only servers have the sxm sockets....and those are not cheap.
> most of the expansion card tech is all done by general purpose hardware, which while cheaper will never be as good as a custom dedicated silicon chip that's only focused on 1 task
I think one reason they can be as good as or better than dedicated silicon is that they can be adjusted on the fly. If a hardware bug is found in your network chip, too bad. If one is found in your software emulation of a network chip, you can update it easily. What if a new network protocol comes along?
Don't forget the design, verification, mask production, and other one-time costs of making a new type of chip are immense ($millions at least).
> Its why I advocate for a separate ML/AI card instead of using GPU's. Sure their is hardware architecture overlap but your sacrificing so much because your AI cards are founded on GPU hardware.
I think you may have the wrong impression of what modern GPUs are like. They may be descended from graphics cards (as in graphics ), but today they are designed fully with the AI market in mind. And they are design to strike an optional balance between fixed functionality for super-efficient calculations that we believe AI will always need, and programmability to allow innovation in algorithms. Anything more fixed would be unviable immediately because AI would have moved on by the time it could hit the market (and anything less fixed would be too slow).
I think the only thing that could make FHE truly world-changing is if someone figures out how to implement something like multi-party garbled circuits under FHE where anyone can verify the output of functions over many hidden inputs since that opens up a realm of provably secure HSMs, voting schemes, etc.
While it’s true that FHE schemes continue to get faster, they don’t really have hope of being comparable to plaintext speeds as long as they rely on bootstrapping. For deep, fundamental reasons, bootstrapping isn’t likely to ever be less than ~1000x overhead.
When folks realized they couldn’t speed up bootstrapping much more, they started talking about hardware acceleration, but it’s a tough sell at time when every last drop of compute is going into LLMs. What $/token cost increase would folks pay for computation under FHE? Unless it’s >1000x, it’s really pretty grim.
For anything like private LLM inference, confidential computing approaches are really the only feasible option. I don’t like trusting hardware, but it’s the best we’ve got!