Uber Engineering Tricks of the Trade: Tuning JVM Memory for Large-Scale Services

asguy · on May 18, 2021

“Our Apache Hadoop-based data platform ingests hundreds of petabytes of analytical data with minimum latency and stores it in a data lake built on top of the Hadoop Distributed File System (HDFS).” “Did you just tell me to go fuck myself, Bob?” “I believe I did.”

pc86 · on May 18, 2021

I knew I recognized this from somewhere.

http://howfuckedismydatabase.com/nosql/

https://news.ycombinator.com/item?id=1636113

bpodgursky · on May 18, 2021

This isn't especially strange or complicated. I guess you're just trying to be witty? But it comes across as pretty ignorant.

Yeah if Uber was born today and didn't want to use any on-prem resources they'd use S3, but HDFS is the best alternative to object storage out there, and there's a huge ecosystem of tools around it. If you're running your own datacenter, there's not any serious alternative.

_old_dude_ · on May 18, 2021

Tuning GCs like it was 2014 again.

It's a little bit sad that they did not hire an expert.

vajrabum · on May 18, 2021

I don't understand what you're trying to say here. Is the methodology they're using outdated? Or perhaps their approach seems naive? Or are you saying something else?

_old_dude_ · on May 18, 2021

The whole section on how GCs work is far from accurate. It only describes how one GC, CMS, works or worked because it had been removed. The permanent generation does not exist anymore since Java 8, etc

And i'm far from being an expert in GCs, but the basics are just wrong.

viscanti · on May 18, 2021

Maybe they have to use some kind of legacy JVM library that doesn't work on Java 8? It looks like they're a GoLang shop, so I'm not sure why we'd assume their Java stuff is representative of anything other than what they have to do because some useful JVM library for mapping or something needs to be supported.

_old_dude_ · on May 18, 2021

The problem is that Java 7 is 10 years old, it means that you are missing 10 years of optimization. For GCs that's a lot.

GCs before ~2010 have been optimized for throughput more than for latency. Since then, you can choose.

Since 10 years, G1 let you set your maximum pause time and you get a huge warning if it miss that target, ZGC or Shenandoah are from the beginning latency first, throughput second.

lamontcg · on May 19, 2021

I last tuned Java 6 in about ~2009 and I was a bit shocked at how familiar most of this article sounded...

Like your handle, too.

acamillo · on May 19, 2021

Any advise on good / recent sources of information about the various GC algorithm in the recent JVM ?

vsto · on May 18, 2021

It is Hadoop that requires (mostly) Java 8 and Java 11 (runtime only). https://cwiki.apache.org/confluence/display/HADOOP/Hadoop+Ja...

tomnipotent · on May 19, 2021

That's for Hadoop 3 - Uber is on CDH 5 last I read, which has min support for Java 7 and it sounds like that's what they're using if the CMS GC was being used (unless it just happened to work better for them in Java 8+).

erik_seaberg · on May 18, 2021

Yikes. Fortunately Java is a supported language at Uber, though some teams are more serious about it than others. Scala is also accepted because of Spark.

viscanti · on May 18, 2021

I guess I misunderstood things then. It's difficult to reconcile why they'd be using a legacy version of Java that required these specific GC tunings if they weren't forced to use it because of libraries that couldn't be replicated or replaced in GoLang. Sounds like it's a much bigger mess there than what I had assumed (I guess I was giving them too much of the benefit of the doubt).

throwdbaaway · on May 19, 2021

Well, at least they take the cake when it comes to running CMS with the largest heap size?

It is also quite ironic that they paid for Zing C4, and then came up with "P90 average RPC queue time" as the metric, ignoring Gil Tene's wisdom that you should never average percentiles.

Another problematic statement is this:

> which in turn decreases request latency and increases throughput

Gil Tene wants you to measure p99.99 or pmax, so that you will pay for Zing C4 to get lower tail latency, while sacrificing some throughput. If you end up getting both lower tail latency and more throughput, then something is already very wrong with the existing setup (i.e. the full GC).

StreamBright · on May 18, 2021

Using Hadoop like it was 2014 again too.

I am surprised how much legacy inefficient crap is lingering around in companies like Uber.

0xbadcafebee · on May 18, 2021

Large companies operate almost entirely on legacy inefficient crap. It's often the case that the thing that makes them the most money is running on giant old barely-supported piles of shit.

People constantly underestimate the fact that >80% of the cost of software is maintenance. Upgrades and patches and redesigns/refactors and following modern conventions costs 4x more than the cost it took to build it. But nobody puts that in their budget. And then later people find this huge pile of legacy shit, and think, wow, this company really screwed up. But in fact this is totally normal.

By the way, this is the case because of how we develop software and hardware, not because they are some inscrutable element that can't possibly be built once and maintained for a lifetime (like, say, a building). We just don't build it to last.

pavitheran · on May 18, 2021

What should be used instead of Hadoop DFS in 2021?

throwaway7783 · on May 18, 2021

Oh, these cloud kids...

Edit: instant downvotes. Okay, S3/GCS/Azure is the typical answer (egress costs be damned)

StreamBright · on May 18, 2021

I guess the same was said about those non-mainframe kids 30 years ago. Tech gets cheaper and better. You can rent CPU time instead of buying a server. It is just that simple.

notyourday · on May 19, 2021

And overpay in orders of magnitude on a 5 year depreciation schedule. And the amount of fun you are going to have getting charged for the S3 API calls and pulling that data out of S3 for processing would make anyone reasonable CFO head spin 360 degrees every minute.

ppf · on May 18, 2021

You mean, like a mainframe? Tech goes round in circles, more like.

StreamBright · on May 18, 2021

Depends on multiple factors: - S3 or compatible would be trivial choice for storage

- if on-prem is a must there are multiple options, generally something with erasure codes (it is a game changer for storage)

So far I have been using enterprise storage (that has some potential problems when mounted as nfs volumes), works for petabytes, already decouples storage from compute.

More recently I was experimenting with MinIO. No conclusion so far.

The problems are with Hadoop:

- unfortunate design choices (namenode??)

- extremely unfortunate implementation (I probably spent more time in the Hadoop codebase than any other, found many bugs, some I could fix, most I couldn’t)

I think I have migrated away from Hadoop 10 PB worth of data infra in the last 5 years, mostly to AWS, some to Azure. Average cost saving is between 10-30% yoy.

Some comments point out the network cost. The reality is most companies collect a giant amount of data (ingress) and publish dashboards (egress). It makes cloud pretty viable.

S3 is beating the shit out of HDFS in reliability and cost, even though most Hadoop shops spread the fud that it is slow. Same way these companies used to spread the fud that snappy is best for data compression.

As of 2021 even the latest adopters (banks and insurance companies) use cloud. Maybe extremely few dogmatic companies remain in the onprem crowd. Even those will eventually give up.

bogomipz · on May 18, 2021

It's quite odd that your calling out Uber for using HDFS because it's so "2014", legacy and inefficient and yet your solution involves NFS and an enterprise storage vendor? Do you not see any irony in that? I think that many would argue your solution is far more legacy. It's also far more inefficient from a cost perspective. I have yet to see a storage vendor who could match the price of commodity hardware.

StreamBright · on May 19, 2021

I agree, it is quite ironic that I can beat HDFS with a SAN + NFS mounts.

sershe · on May 19, 2021

Did S3 really become that much faster over the years? I was optimizing a cache that worked on top of HDFS and cloud FSes at some point few years ago, and I remember making slides to present the improvements. For HDFS I had this complex slide that tried to show cache is actually achieving some fractional improvement at some point. For the unnamed cloud FS, I just had to make a 2-bar chart that looked like the cloud FS is giving you a middle finger sideways... A really tall bar for the time it took to read data off the cloud FS. A really small bar with cache.

junon · on May 19, 2021

Uber has its own datacenters. They've self-hosted for quite a while now, with few exceptions.

AWS is incredibly expensive in comparison. Uber is /not/ a small company technology-wise, for better or for worse.

commandlinefan · on May 19, 2021

> extremely unfortunate implementation (I probably spent more time in the Hadoop codebase

Well in fairness, have you ever seen the S3 codebase? I mean honestly it could be a fork of HDFS for all we know.

StreamBright · on May 19, 2021

I used to work for Amazon. The code quality at places like Google and Amazon tend to be good.

S3 has a really good architecture and a great implementation.

HDFS has a meh architecture with a bad implementation.

There were obvious signs. I remember when Twitter decided to investigate why HDFS was slow and they figured out some details about how Hadoop guys decided to implement their own dictionary for configuration that had a much worse time complexity than the default dictionary in Java. There might be a video about this somewhere.

And there are more things like that. I used to have 5-10 years old HDFS Jira tickets open. I just gave up.

Here is a video:

https://www.youtube.com/watch?v=jupArYWxoq0

Hadoop is full of these things.

One more thing:

https://lamport.azurewebsites.net/tla/formal-methods-amazon....

I would love to see similar approach to Hadoop.

geoduck14 · on May 19, 2021

You mention that you have migrated to both AWS and Azure.

I've seen distributed file systems on S3 - can it also be done with Azure Blob storage?

StreamBright · on May 19, 2021

Hadoop natively supports S. Azure Blob storage is compatible with S3.

https://doc.dataiku.com/dss/latest/hadoop/hadoop-fs-connecti...

https://devblogs.microsoft.com/cse/2016/05/22/access-azure-b...

elmalto · on May 18, 2021

Did you move the compute layer to AWS as well? Did you see similar savings there as well for non-burst payloads?

marcinzm · on May 19, 2021

>works for petabytes

Per the article Uber has hundreds of petabytes of data.

menzoic · on May 18, 2021

"legacy inefficient crap"

Do you consider monetary cost as part of efficiency? The scale that these companies operate at make it worthwhile.

viscanti · on May 18, 2021

> I am surprised how much legacy inefficient crap is lingering around in companies like Uber.

Maybe there are some JVM specific libraries they need for things like mapping that don't exist on GoLang? From reading their tech blog, it sounds like they're mostly a GoLang shop and they were apparently Python before that. So it seems like they're probably forced into using Java because some of the libraries they use aren't worth rewriting in-house to avoid having to use the JVM.

StreamBright · on May 18, 2021

Don’t get me wrong I have nothing against Java/JVM. I pretty much appreciate that tech. The criticism was specifically against Hadoop. I spent nearly 10 years on it. Luckily it is stack that most companies migrating away from.

gher-shyu3i · on May 18, 2021

What do they migrate to if they need on-prem solutions?

closeparen · on May 18, 2021

Well, throwing away and recreating all the data pipelines that have ben written since 2014 would be pretty inefficient.

SilurianWenlock · on May 18, 2021

I thought the concurrent GC algorithms (zing/azul, g1?) avoided these stop the world events by using concurrency?

rchiang · on May 19, 2021

This presentation from 2019 gives an idea about some of the GC avg/max runtimes using the NameNode: https://www.slideshare.net/xkrogen/hadoop-meetup-jan-2019-dy...

dharmab · on May 18, 2021

As discussed in the link, these algorithms reduce but do not entirely eliminate STW.

_old_dude_ · on May 18, 2021

Recent GC algorithms slow down the application, either by not scheduling coroutines that allocate too much memory or by forcing them to do the marking or the evacuation (helping the GC) when running (or both).

With that, STW pause time does not depend on the heap size or on the root set size (the stack size). In practice, it means pause < 1ms, at that point the OS becomes the bottleneck, not the GC.

So latency is good but throughput can be reduced by 30%.

suyash · on May 18, 2021

@Uber - have you considered GraalVM ?

notdang · on May 18, 2021

Will it solve the problem? As I understand the same GC tuning will need to take place.

chrisseaton · on May 18, 2021

Graal has better escape analysis and so produces less garbage, which is the very best kind of garbage collection!

That might be what they meant.

throwaway7783 · on May 18, 2021

GP might've meant AOT

cellularmitosis · on May 18, 2021

grall removes the need for JIT but does it also remove the need for GC?

suyash · on May 19, 2021

almost, it has something called native images

crummybowley · on May 18, 2021

Just use go.

domano · on May 19, 2021

I love go and it is my main language, but Uber uses go and clearly has some need for Hadoop.

Also the go GC is behind the JVM GCs if you want to really tune it IMHO