Scaling Erlang Developer Experience at WhatsApp [pdf]

amgreg · on Sept 11, 2020

What I like most about some experienced engineers who have been at companies from the beginning is that they realize the needs of a small-team, cash-constrained, rapidly growing company are different from that of a dozens-of-teams, complex, but also rapidly scaling enterprise. The devil is in understanding when and how to transition your tooling, architecture, and — dare I say it — attitudes, from the one to the other.

I am curious to hear more from these engineers; how decisions were made about transitioning/scaling; and what decisions were made.

thdrdt · on Sept 11, 2020

There are things that could (should?) be done from the start. Most has to do with automation. For example automated build tools. Maybe you don't want automated unit tests in the beginning but those parts can be added later without changing the company's culture and stucture too much.

Or automated invoices. It is much easier to change this than changing a co-worker who manually created all the invoices.

outworlder · on Sept 11, 2020

> Most has to do with automation.

This.

Don't ever put anything into production that's not automated. Ever. "Oh it's a simple HTTP server that's going to be used by 5 people". Cool. Automate it. "Oh but it's going to take time!". I thought you said it was simple...

When things are not automated, we treat them differently. If a server is not automated, now it becomes time-consuming to setup another, which means it's now a pet, not cattle. But if it is automated and is misbehaving, we can easily replace it (and save the current for troubleshooting later if necessary). Or we can spin up 100 of them.

Same goes for development. If builds are not automated, they become more difficult to do, more error prone, which means people will avoid making contributions unless they are forced to do so (by some ticket). When contributing to a code base that has no guardrails becomes a lose/neutral proposition, people won't want to.

And now it's been 5 years and it's difficult to automate things because there's a 100 different workflows that people use and you are going to break them.

jagtesh · on Sept 12, 2020

Yes, yes, yes. Just CI/CD alone has been a huge productivity boost for one of the teams I lead. The thing is, as you scale (50 people to 200+), people adapt to depend on processes rather than individuals.

If the process is automated and well documented, you'll be have even more power to tweak the inner workings without retraining anyone.

Like someone else said, even if it doesn't save you time, adding automation simplifies your processes and creates a powerful abstraction, which will come in handy when you do start hitting scaling challenges.

AlchemistCamp · on Sept 12, 2020

WhatsApp never scaled to even 50 people. Facebook acquired them before that point.

tangjurine · on Sept 11, 2020

What exactly do you mean by automate? Like the http server for example?

jagtesh · on Sept 12, 2020

Other examples: linting code, configuration, documentation, etc.

Or updating a registry, release version somewhere, even sending an email/message to someone when something happens (a new client signs up, emails with a certain keywords).

So many processes beyond development can be automated. SalesOps is very simply doing that for Sales teams (managing sales pipelines, distributing leads, generating reports). Marketing Automation has become its own field (targetted drip campaigns, now add another layer: account based targeting based on rules).

rudasn · on Sept 11, 2020

Not OP, but I think s/he means automating the processes of provisioning servers (settings, common software), and deploying changes to your infra/code/apps.

chromatin · on Sept 11, 2020

Image building (eg packer, docker) and/or deployment (eg, terraform, ansible)

spinningslate · on Sept 11, 2020

> Maybe you don't want automated unit tests

Interested why you'd say this, can you elaborate? Do you mean tests in general or unit tests specifically?

I can't imagine automating a CD pipeline without having some automated testing as an integral step. If you don't have automated tests, how do you remain confident that you haven't introduced bugs/regressions?

Even in a small codebase that's easy to do - especially when you're evolving quickly.

Like I say: not tacit criticism, genuinely interested.

ra0x3 · on Sept 12, 2020

I think the key point in the OP’s statement is “early on”

I think unit tests (or just more generally a test suite) is invaluable. But when you’re starting out small (1-3 person dev team) and the API is changing so rapidly (we learned something this week we didn’t know last week so that new service now only does 1 thing instead of 5 things), tests can _really_ slow down development. And as mentioned by the OP as well, adding unit tests (or a test suite in general) shouldn’t be too hard once you get a stable API

I might be preaching to the choir so to speak, but that’s just my $0.02

deepGem · on Sept 12, 2020

As someone who has mostly written early stage software with 1 to 3 developers, the single most reason to skip tests is the rapidly changing APIs. Also it is ok to ship somewhat buggy code. Use case validation and speed trumps accuracy.

Also invariably a lot of early stage code is in the front end and FE testing is very time consuming.

What has been immensely helpful is an automated build and deploy system, which quite honestly is very expensive to build in the iOS world. I just want a one click deploy to a clean machine every time I decide to deploy. This is kinda trivial in the backend world using containers but not so in the iOS world.

jayd16 · on Sept 11, 2020

You can have automated builds and manual testing and deploy. It goes a long long way, especially in a small team that's still prototyping rapidly.

spinningslate · on Sept 12, 2020

Appreciate the responses. Most of the things I work on involve financial modelling, so tests are as much part of specification as they are validation. I can see them causing some drag for evolving less algorithmic code, particularly UI. Always interesting to get different perspectives - thanks.

jacquesm · on Sept 11, 2020

This comment is gold. Yes, that's exactly it. Trying to use the tooling and processes that are a bad match for your situation is the way to waste a fortune or to end up being overtaken on the right by a competitor that made smarter choices.

ConradKilroy · on Sept 11, 2020

@amgreg, fantastic observation! "scalable attitudes"

kornish · on Sept 11, 2020

One of my favorite talks is by Rick Reed about scaling Erlang at WhatsApp. What an absolute savage. He flies through an articulate and in-depth curriculum on system performance and bottleneck mitigation.

The talk is called "That's 'Billion' with a 'B'" and makes for a great lunchtime watch: https://www.youtube.com/watch?v=c12cYAUTXXs

strmpnk · on Sept 11, 2020

I remember that year. I was giving a talk at EF during the same time slot but the schedule originally had me in the large room and they had a much smaller one.

When the news of the acquisition hit, everyone wanted to see the WhatsApp talk. The organizers knew this so we swapped rooms. So, I started my talk by asking if anyone in the room was here for the WhatsApp talk and told them they could quietly leave and I wouldn't mind and a bunch of people got up.

Heheh. I don't blame them. I didn't really like my talk and Rick Reed is very good at what he does and the talk is no exception.

staticassertion · on Sept 11, 2020

Yeah, not quite related to the linked slides, which are about scaling in terms of productivity, but still a great talk.

I really found that "isolation" is just a key optimization and one of the most useful properties of a system, which they call out in that talk. I wrote about it a bit more in depth here: https://www.graplsecurity.com/post/architecting-for-performa...

I've written our data processing layer and our event orchestration layer as basically an actor oriented system, with push/pull systems in a couple of key places where it makes sense. It's incredible what you can do with strong isolated infrastructure in terms of performance, security, reliability, and quality.

secondcoming · on Sept 11, 2020

I know nothing about Erlang, but I do know high-volume systems. Needing 550 beefy servers to handle all that load is - in my mind - not that impressive.

an_opabinia · on Sept 11, 2020

Erlang is the Actors-model version of Greenspun's tenth rule of programming:

"Any sufficiently complicated server backend contains an ad hoc, informally-specified, bug-ridden, slow implementation of half of Erlang."

Is it really the end-all-be-all model of server architecture? What are microservices, if not Actors Clunkified? "Serverless" functions, if not the simplest actors in disguise? Docker, actors except over IP addresses and clunky APIs?

The real takeaway: Valuable stuff is stateful. You know you're doing valuable work for a real human end-user if you are dealing with tricky state problems. Extracting money out of that is a bunch of developer-productivity-limited business concerns. It seems actors are a good model to solve that.

davidw · on Sept 11, 2020

That quote regarding Erlang is attributed to Robert Virding.

acjohnson55 · on Sept 12, 2020

I agree that serverless / cloud native feels a lot like what you would build in a distributed monolithic codebase using Erlang or Akka. I guess you get something more polyglot and modular for your trouble, but I think a lot of companies have skipped right over much less complex setups for building resilient, scalable distributed systems.

stevencorona · on Sept 11, 2020

I absolutely love the BEAM and Elixir, but one of my common complaints and source of errors is the lack of compile-time type sanity checking, so I'm excited to see movement in this direction. Dialyzer is sorta okay, but the speed, cryptic errors, and syntax have kept me from fully embracing it.

findjashua · on Sept 11, 2020

Gleam may be what you're looking for:

https://gleam.run/

faitswulff · on Sept 11, 2020

I'd be curious how WhatsApp's internal statically typed Erlang solution looks compared to Gleam. If I had to guess based on current trends, I'd guess that it's a gradually typed, opt-in toolchain a la Sorbet or Typescript, in order to maintain backwards compatibility and gradually update their legacy codebase as opposed to a whole different language.

callamdelaney · on Sept 11, 2020

This isn't ready for prime time yet iirc.

sodapopcan · on Sept 11, 2020

It's not but is in constant development. The author is very active on elixirforum.com

AshamedCaptain · on Sept 11, 2020

I find it funny they say Whatsapp "chose" Erlang. They "chose" Erlang because ejabberd was written on it, and they chose ejabberd because of a Jabber mailing list suggestion. That's about it...

chii · on Sept 12, 2020

are you saying that the choice was purely due to luck? Would whatsapp have been as popular and successful had it not been in erlang but some other language/stack?

krn · on Sept 12, 2020

> are you saying that the choice was purely due to luck?

What he is saying is, that they chose the best open source tool available for their use case at that time, and not a programming language.

> Would whatsapp have been as popular and successful had it not been in erlang but some other language/stack?

I would argue, that they might not have been as efficient in terms of computational resources and the number of technical employees needed, but they would still have succeeded even if the back-end was implemented in Java.

Because what all WhatsApp users actually interacted with were the clients – and they were not written in Erlang.

tiffanyh · on Sept 11, 2020

> "We are working on a prototype, open-sourcing in November"

I'm curious to see how Elixir integration with such static type system would work.

seanclayton · on Sept 11, 2020

I absolutely love Elixir and use it all the time, I really just wish it had a type system (that's not the slow behemoth Dialyzer). I understand the difficulty of typing processes and I don't envy the task, but as a user of the language it would make it that much better to work with! Just imagine working with a language like Elixir that has the type system of F#!

acjohnson55 · on Sept 12, 2020

Sounds like Scala / Akka to me

rch · on Sept 11, 2020

I think it's more natural to gradually implement components as Erlang NIFs in Rust via https://github.com/rusterlium/rustler.

derefr · on Sept 11, 2020

Last I heard (years ago), I though that WhatsApp's backend was being rewritten in some other language, because Facebook didn't think maintaining an Erlang codebase was tenable. Did that change?

ramchip · on Sept 11, 2020

Perhaps you're thinking of Facebook chat?

https://www.quora.com/Why-was-Erlang-chosen-for-use-in-Faceb...

https://www.quora.com/When-did-Facebook-switch-away-from-usi...

gautamcgoel · on Sept 11, 2020

Can someone explain what Erlang offers that Go does not? Both have great support for concurrency, but Go has type checking and a more familiar syntax.

masklinn · on Sept 11, 2020

* Erlang builds reliability into the language and runtime through concepts of liking, supervision, …

* Erlang is a much simpler language

* Despite lacking a static type system, Erlang's functional bend, immutability, SSA and pattern matching makes for a rather safe language

* Erlang has always been fully preempted, it didn't have to wait until 2020 for that to happen

* Erlang also builds distribution in, though it's "trusted distribution" so probably of fairly limited use at large

* Erlang uses shared-nothing concurrency, greatly limiting side-effects across processes, which again helps with reliability: there is no action at a distance in sequential erlang, no third-party can come in and mess with your state

* when dealing with binary data, binary patterns are absolutely amazing (and rather unique)

bigbizisverywyz · on Sept 14, 2020

* Hot code reloading - you can modify an existing running system with a new version of code with no downtime.

I don't use Erlang, but that runtime feature always impressed me. The only other are system I know can do that is updating a stored proc on a running DB, and that can be pretty useful.

coder543 · on Sept 11, 2020

> * Erlang is a much simpler language

Go is widely considered to be a very simple language, often too simple, so I find this claim interesting.

> * Despite lacking a static type system, Erlang's functional bend, immutability, SSA and pattern matching makes for a rather safe language

This is fine stuff, but I would rather have a type system, so this is just making up for a deficiency with some other stuff, so it's really just a sidegrade, not so much an upgrade or a downgrade.

> * Erlang has always been fully preempted, it didn't have to wait until 2020 for that to happen

With extremely rare exceptions, Go has always been fully preempted from a developer experience point of view. The fact that the underlying implementation was using function calls as yield points simply didn't matter. The fact that it is fully preempted now means it's kind of weird to still hold it against Go that they are doing the same thing as BEAM now.

It is done, this is no longer a bullet point.

> * Erlang also builds distribution in, though it's "trusted distribution" so probably of fairly limited use at large

Go has always "built distribution in" by offering to build single static binaries. If anything, BEAM languages are harder to deploy.

> * Erlang uses shared-nothing concurrency, greatly limiting side-effects across processes, which again helps with reliability: there is no action at a distance in sequential erlang, no third-party can come in and mess with your state

I agree that immutability is an advantage. In general, third parties can't come in and unexpectedly mess with your state in Go either, as long as you're either maintaining your state local to a function or keeping your global variables private to each package.

Keep in mind that Go's package imports must form a tree with no loops. Third party packages literally cannot see your code or any of the exported global variables, since you are importing them, and not the other way around.

Go does not offer any way for packages to reflect over the whole program's state in memory, so there's no way for third party packages to go hunting around in memory and playing with your values, which would be stupid, unless they use "unsafe" to start poking at random memory addresses, and I'm certain you can do something equally dumb in Erlang with a sufficient amount of effort. It's just as crazy to think of doing something useful that way in Go as it is in Erlang.

> * Erlang builds reliability into the language and runtime through concepts of liking, supervision, …

This is literally the only compelling talking point I've seen for Erlang/BEAM vs Go.

However, I've written large Go applications with distinct services running within the same binary, and it is possible to make extremely reliable systems. Go's context package allows you to build trees of processes that you want to be able to cancel, including cancelling specific sub-trees without affecting adjacent trees or parent trees.

The single biggest footgun with building reliable systems in Go is that it prefers to crash early. So, every Goroutine that you launch, you must make sure that the first thing it does is catch any panics to avoid taking down the entire program, including unrelated Goroutines. When correctly written, a human will be immediately alerted to the bug that caused this panic (which shouldn't ever happen anyways), and the program will continue operating normally in the mean time until a human gets a chance to push out an update to fix that bug.

In practice, enforcing just one rule (Goroutines must catch panics) in code reviews is not that hard, and it isn't common in my experience for most code to arbitrarily launch Goroutines -- Goroutines should be managed.

In return, Go offers much better performance than BEAM, as well as most of the practical benefits of BEAM that I've seen discussed online.

I think Rust is a better language than either Go or any BEAM language, but the absence of a preemptive concurrency model like Go and Erlang offer just makes it hard for me to enjoy the prospect of using Rust for general purpose network services.

Once Go adds generics in the next year-ish, that should solve my biggest real annoyances with the language. I really look forward to being able to package up a number of the concurrent patterns I write into reusable packages, which will largely eliminate once and for all the need for me manually spin up a new Goroutine in the code that I'm writing and potentially forget to catch any panics.

Jtsummers · on Sept 11, 2020

> > * Erlang also builds distribution in, though it's "trusted distribution" so probably of fairly limited use at large

> Go has always "built distribution in" by offering to build single static binaries. If anything, BEAM languages are harder to deploy.

I suspect GP meant distribution in the distributed computing sense, not the deployment sense. What, baked into Go not via a library, handles multiple Go programs communicating within the same physical node or across multiple physical nodes? Erlang gives you the same communication method whether distributed or within a single node, Go does not (that is, Go channels are internal to a specific program, they cannot be referenced from different programs or computers).

If they did mean it in the deployment sense, I'd still argue it's a point in favor of Erlang. Two nodes communicating can execute code on the other.

coder543 · on Sept 11, 2020

Yes, I definitely misunderstood the point being made there.

> What, baked into Go not via a library, handles multiple Go programs communicating within the same physical node or across multiple physical nodes?

Go does have an RPC package built into the standard library, for what it's worth, but I know it's not the same thing as the runtime transparently stitching distributed machines together under the hood.

masklinn · on Sept 11, 2020

> This is fine stuff, but I would rather have a type system, so this is just making up for a deficiency with some other stuff, so it's really just a sidegrade, not so much an upgrade or a downgrade.

Sure, I'm just saying the lack of a type system in erlang is not quite as bad as, say, in javascript.

> Go has always "built distribution in" by offering to build single static binaries. If anything, BEAM languages are harder to deploy.

Distribution in the sense of distributed computing, not in the sense of giving somebody else binaries to run. That is not something erlang is good at. But you can rather easily join nodes together and not just have them communicate but have them leverage one another e.g. spawn processes on an other node in the cluster.

> I agree that immutability is an advantage.

Immutability and isolation are different things, though Erlang has both.

> In general, third parties can't come in and unexpectedly mess with your state in Go either, as long as you're either maintaining your state local to a function or keeping your global variables private to each package.

Go is a shared-memory concurrency language, with no restrictions on the sharing.

coder543 · on Sept 11, 2020

> Distribution in the sense of distributed computing, not in the sense of giving somebody else binaries to run. That is not something erlang is good at. But you can rather easily join nodes together and not just have them communicate but have them leverage one another e.g. spawn processes on an other node in the cluster.

Ah, I misunderstood, sorry. Personally, I see limited benefits to making things fully distributed at the runtime level. I'm sure it's nice sometimes.

> Go is a shared-memory concurrency language, with no restrictions on the sharing.

Sharing is the critical word there. You don't have to give access to that shared memory to anything, but the point I was making is that nothing can take shared access to your stuff without you handing it to them. It's not the wild west of interpreted languages where any piece of code can access all of your code and variables and edit them just for fun without you realizing.

Go also supports channels. How you design your system is up to you, and doesn't BEAM also support shared memory concurrency? A little thing called ETS? I would guess that there are other escape hatches as well, but I'm not familiar enough with Erlang or the BEAM to say for sure.

If you want to make a poorly designed system, I'm sure that Erlang won't stop you, but it will obviously try to guide you into certain accepted patterns by making other things harder.

toast0 · on Sept 11, 2020

> Personally, I see limited benefits to making things fully distributed at the runtime level. I'm sure it's nice sometimes.

I think it's nice because when you write code as processes that receive requests and send responses, they can handle requests from the same node or different nodes with no change in code, you only change where you start that process (and make sure you register the process somewhere either as a named registered process or pg2/pg or whatever). Certainly there can be some scaling difficulties (see Rick Reed's talk linked elsewhere), but it generally works. And, you can move the processes while the system is running to address changing needs. Building on top of asynchronous messaging makes it easy to do things like normally wait for a reply after sending for a synchronous feel, but occassionally doing multiple sends and then waiting for parallelism, or just sending without waiting for a reply when that's appropriate.

> BEAM also support shared memory concurrency? A little thing called ETS? I would guess that there are other escape hatches as well, but I'm not familiar enough with Erlang or the BEAM to say for sure.

Ets is built with shared memory, yes, but interaction with your code is as if you were messaging a table process; the data you get back from an ets read won't change while you have it, but it might not be what you wrote, because another process may have written something else; same as if another process messaged a storage process in between your write and read. ETS is less of an escape hatch and more of a performance optimisation -- acknowledging shared memory concurency is helpful for data storage and retrieval, but it's tricky to get right and should be written once, so BEAM doesn't give the tools to do it and it needs to be done in C.

You can also do fun things with erlang:process_info or tracing to snoop on some data from other processes (but I don't think modify?). And, BEAM allows native code (NIFs) which is the ultimate escape hatch --- you can do lots of fun things there, of course.

coder543 · on Sept 11, 2020

> Ets is built with shared memory, yes, but interaction with your code is as if you were messaging a table process

To me, this is a distinction without difference.

Shared, mutable memory access without some form of synchronization is a bug and is always invalid.

You either hold a lock the entire time you're using a shared value, or you use a lock to take a copy of the value and then go about your business. With the singular exception of atomic values, the value should never change while you have it without you taking an explicit action to update it, and atomics are rarely used outside of either metric tracking or very specialized code. Besides atomics, everything you described about ETS is the same as any shared memory system.

toast0 · on Sept 11, 2020

> Shared, mutable memory access without some form of synchronization is a bug and is always invalid.

That doesn't prevent it from being really easy to write, and sometimes very hard to notice in a lot of languages. In BEAM languages, you can, of course, set up similar bugs through the use of other processes (or ETS), but you have to work harder to do it wrong than to do it right.

Jtsummers · on Sept 11, 2020

Erlang is a system and language for distributed (not just concurrent) programming. Go is a language for concurrent (not distributed) programming. Unless things have changed substantially since I last investigated Go, the language does not sit within a system designed for distributed computing. There are libraries you can use, but the language itself does not attempt to solve the same problems as Erlang.

If you use Erlang within one node, it's roughly equivalent to Go. But Erlang is designed so that the same concurrency mechanisms you use on one node also work across a cluster of nodes. There are more differences than this, but this is probably the most critical one.

ericlewis · on Sept 11, 2020

So you’re saying erlang can automatically distribute async tasks across nodes? Probably too much to comment, but how or where can I learn more about this way of thinking of it?

Jtsummers · on Sept 11, 2020

Not automatically, but if you connect multiple nodes into a cluster, when you spawn a process you can select which node it will be spawned on. Additionally, once the nodes and processes are running you can send messages across nodes to different processes using the same method you use for processes on a single node. [0] Shows an example of communication across nodes, as well as some preliminary material that would be helpful if you don't know Erlang.

I mostly learned Erlang from Joe Armstrong's book [1], though the first edition. I'm confident the second edition is as good as the first. Learn You Some Erlang [2] is a great free (also available for purchase, but free online) book.

[0] https://erlang.org/doc/getting_started/conc_prog.html

[1] https://pragprog.com/titles/jaerlang2/programming-erlang-2nd...

[2] https://learnyousomeerlang.com/

tonyarkles · on Sept 12, 2020

I’m not at home and don’t recall the library name off the top of my head (maybe libcluster?), but I recently inherited an Elixir project where the nodes were run in a k8s cluster and they automatically joined up. I was a little skeptical at first, having always manually managed that before, but... it’s buttery smooth!

darkmarmot · on Sept 12, 2020

We are doing this right now at scale for healthcare integration. My team lead gave a keynote on it at CodeBEAM this week!

epidemian · on Sept 11, 2020

There's this talk, The Sould of Erlang and Elixir, that opened my eyes to what makes this platform great:

https://youtu.be/JvBT4XBdoUE

It's not so much about what Erlang, as a language, "offers that Go does not". It's about the BEAM VM that runs it that makes all the difference.

I very much recommend watching the talk if you're interested in the topic :)

coder543 · on Sept 11, 2020

I watched that talk just now, and it was basically a retelling of everything I really like about Go. Except, Go is statically typed, and a lot faster than BEAM in benchmarks that I've seen. Go is also a very "normal" language, so it's usually easy to onboard developers into it, no matter what their background is. JetBrains GoLand makes the IDE experience essentially impeccable. I can refactor code with a few clicks, and GoLand precisely updates all references thanks to the static type system.

Go can run massive numbers of Goroutines on a single system, and it preemptively timeslices individual Goroutines, so I'm never worried about any single task preventing other tasks from making progress, which was like 75% of that talk.

Go has a fantastic production runtime debuggability story, which was the other 25% of that talk.

Go's pprof support feels magical: https://golang.org/pkg/net/http/pprof/

You can connect to running systems and see flamegraphs of where CPU time is being spent, you can take traces of the goroutines to understand why the system is idle when you think it shouldn't be, you can see how much memory different parts of the system are using, and so much more. It's not mentioned as prominently as I think it should be, but `go tool pprof` has this `-http=:` option that will open a web browser to let you just see everything graphically and click around, and it's an amazing experience.

There were really only three things in there that Go doesn't offer:

1. Forcefully terminating processes

Go doesn't allow anyone to forcefully terminate a Goroutine except the Goroutine itself. Personally, if I identified a buggy Goroutine, I would just focus on writing a patch and deploying it. Manually terminating a process is cool, but then the user types "-1" again and again, and you're just sitting there playing whack-a-mole instead of fixing the problem and deploying the fix.

2. Patching the running system without restarting it

The advantages of this are difficult to enumerate. It sounds nice and clean conceptually, but I would worry about existing processes / goroutines being in unexpected states because of erroneous code that they had previously run. Having a way to gracefully shut down a system and restart it without abruptly terminating responsive client connections is much more predictable to me.

3. Seamlessly distributed execution

He briefly talks about distributed Erlang towards the end of the talk. but mentions that it has serious problems and that a number of people don't recommend using it in production. Personally, even if it were nearly perfect, I think the benefits of it are limited. It's not difficult to have nodes communicate with each other using any number of other things, including proper RPC libraries all the way down to just raw TCP connections. The hard thing is service discovery, and I don't think distributed erlang meaningfully solves the problem different than other solutions. If everything is running homogeneous code, why bother your nodes with knowing about the existence of other nodes? Just use a load balancer, and then do everything internally on each node. If your systems are heterogeneous, you need a way to send the right messages to the right nodes, and that's where a named load balancer for each type of node can do just fine.

tokenrove · on Sept 11, 2020

I've worked a bunch in both, and Go really doesn't compare; you can't judge based on watching a talk. I find it much, much easier to write safe, robust code in Erlang, and operationally, you have so much more power to inspect the state of the BEAM. Another key BEAM thing is the process heap, which amounts to region allocation when used carefully. The reduction mechanism used for scheduling processes, though crude, is also still much lighter-weight than Go's approach.

On the other hand, Erlang is a weird language and a weird environment, that takes a lot of time to really understand, so I understand why it's never going to "win", and golang tools and the runtime will continue to improve. YMMV.

coder543 · on Sept 11, 2020

> The reduction mechanism used for scheduling processes, though crude, is also still much lighter-weight than Go's approach.

I'm not sure how it could be lighter weight than what Go has implemented natively at the machine level, instead of in what amounts to a bytecode interpreter. If you have any links, I would be very interested to read more.

> I've worked a bunch in both, and Go really doesn't compare

I respect your experience, so it is probably a failure of imagination on my part that I simply can't imagine how the BEAM tools could be that much better. I haven't had a problem with a rogue task (or really anything else) in production with Go where I was unable to figure out what was going on extremely quickly. The tooling has been amazing for me, unlike basically every other language I've worked seriously with. The YouTube talk certainly didn't do the BEAM tools justice if they are that much better.

With Go, I actually get really nice GUIs to look around the running Go process and analyze the situation, instead of just an interactive CLI session where I have to craft my own commands to find the top tasks and such. The Go pprof tools also work as an interactive shell (but for analysis, not for remote arbitrary code execution), but I would rather just have the flamegraph in front of me 99% of the time. I fully admit that I've never had a chance to use Erlang/Elixir/BEAM in any meaningful way, but I have tried to understand what they offer, and I haven't seen the compelling magic that some people talk about.

Now, if someone is running a Go service without the HTTP pprof server running on some port that they can access, then yes... it wouldn't even come close to comparing to what BEAM offers when you have the option to connect to a running BEAM instance.

tokenrove · on Sept 11, 2020

And, note, I'm not saying Go is bad, here, just that there is a lot that is underrated and misunderstood about Erlang.

On the Erlang side, check out the BEAM Book chapter on scheduling: https://blog.stenmans.org/theBeamBook/#CH-Scheduling and the core scheduling loop in the BEAM: https://github.com/erlang/otp/blob/master/erts/emulator/beam...

On the Go side, check out https://github.com/golang/go/tree/master/src/runtime/proc.go and the asm* implementations.

It's been a little while since I looked at it, but I recall that much less state had to be saved in an Erlang process switch in the usual case; I seem to recall it can be done in a handful of instructions in many cases. Go of course has to save a bunch of registers much as you'd have to do in any native context switch.

Edited to add: it can be useful to look at that part of the BEAM disassembled in objdump or gdb, to appreciate it, since it's hard to tell how much work is happening with all those macros.

sodapopcan · on Sept 11, 2020

I only started with Elixir in recent months and it's the first language that has ever made me comfortable about writing concurrent code. I didn't spend a lot of time with Go, but the idea of just calling a function that is now running in parallel but was always disconcerting to me. Of course, I could have spent more time with it and gotten more comfortable learning the ins-and-outs, but Erlang/Elixir's addressable processes running with their own stack/heap/gc and passing messages between each other is something that clicked very quickly with me. It's such an incredibly simple idea. For being a "weird" language, I think there is a lot of power the simplicity of its design, especially around learning. You just have to get over the weird syntax (which is a hot topic).

For transparency, I've never written any production code in either Elixir or Go.

rightbyte · on Sept 11, 2020

The funny thing is that Erlang is the Ericsson language. Go is the Google language.

Ericsson never had the Google "street cred" or K& not R. Ericsson is boring phone interchanges and boring radio. Google was hipster latte with software freedom before it turned out to be worse than MS, IBM and Oracle combined.

One might wonder what could have been without all the smoke and mirrors.

emit_time · on Sept 11, 2020

Doesn’t erlang have a REPL, allows you to do live deployments. Individual processes crashing doesn’t bring everything down, you can tell it how to handle those. Dynamic typing. Functional programming. Concise syntax, with “let it fail” mentality.

juhatl · on Sept 11, 2020

One major thing would probably be OTP, which would be very difficult to implement as-is on top of Go. A recent comment thread [1] over at Lobste.rs explored this in more depth.

Saša Jurić made an absolutely wonderful hands-on presentation [2] of what (oftentimes rather unique) value propositions Erlang/Elixir/BEAM bring to the table. It's a very tightly-packed presentation, but I strongly recommend having a look if you're curious.

1. https://lobste.rs/s/ntati1/even_go_concurrency_is_still_not_...

2. https://www.youtube.com/watch?v=JvBT4XBdoUE

horlux · on Sept 11, 2020

processes are lightweight and stateless

fbn79 · on Sept 11, 2020

Working in a small Company always let me astonished learning things about unicorn apps. But really an application like Whatsapp need 1000+ developers!? I can think about teams about android, ios and web app. X2 if you think about teams for business oriented services. But 1000+ developers for what?

jamestimmins · on Sept 11, 2020

Just some of the needs at that scale:

  Fraud
  Security
  Integrations
  Infrastructure
  Data
  Accessibility
  Growth
  I18n
  API

In addition, there won't be individual teams for android, ios, etc. Those domains will be broken across many task-specific teams.

ramraj07 · on Sept 11, 2020

Even whatsapp the app is deceptively simple. It works seamlessly in all the dialects around the world, on a swathe of devices quite possibly matched by no other app; and it manages to handle every actually useful feature anyone wants in a messaging app with amazing aplomb. In India for the majority of the people whatsapp is not just their communication platform, it's also their photo album, dropbox, task manager, social networking platform and contacts manager. All within the simplest UIs I've ever seen.

gt565k · on Sept 11, 2020

Whatsapp had a total of 55 employees when it was acquired by facebook for $16B.

_pd19 · on Sept 11, 2020

> But 1000+ developers for what?

Managerial prestige

pjmlp · on Sept 12, 2020

> shift to modern languages with integrated tooling, e.g. Erlang competition:

>

> C++, Java => Go, Rust, Kotlin

My first C++ IDE was Turbo C++ 1.0 for MS-DOS in 1993, and Borland, Zortech, Symatec, Microsoft, IBM, and plenty of others had well known C++ IDEs.

Many of the ideas of modern C++ IDEs is basically rediscovering the work done by Lucid and IBM with their Smalltalk like capabilities for C++ tooling in the late 80's/early 90's.

Java is well known for being one of the languages with best IDE support out of box. It didn't start like that in 1996, but the last 25 years have been good to it.

Go and Rust are yet to have similar offerings and Kotlin, well it is a JVM language and profits from the Java eco-system and Google's push to replace it on Android due to non-technical reasons.

Maybe better examples should have been provided.

saagarjha · on Sept 12, 2020

Java is interesting because writing code in it is generally absolutely miserable without an IDE.

AzzieElbab · on Sept 11, 2020

I am curios why they need 1K people instead of 10? What changed on the server side of WhatsApp?

ramraj07 · on Sept 11, 2020

Scaling, reliability, support, dialects and localisation, accessibility, edge cases, device support, security, moderation (not just moderators but moderation tools) infrastructure, research and exploration, monetization, commercialization. Basically everything no one in a startup thinks about for years if not more. And to that add the simple fact that as the team grows the per engineer productivity just has to drop due to communication overhead among other things.

eitland · on Sept 12, 2020

> device support

For the record I think the device support used to be better before when they supported pre-android Nokias as well.

octorian · on Sept 12, 2020

Yes, in the early days it ran on 6-7 different platforms. Back then it was about 1-3 developers per platform, however. Small focused teams like that can have significantly higher per-developer productivity, but are far more limited in the breath and nuance of what they can work on.

As a project grows in scale, there are a lot of development tasks that take a significant amount of time and effort for relatively little user-visible impact. However, in aggregate, there is a benefit to investing in those sorts of things.

eitland · on Sept 12, 2020

For me WhatsApp has gone from my favorite app to something I won't touch except almost as last resort.

I had no issues with old WhatsApp whatsoever and I'm not aware of anyone else having any either and I'm part of multiple groups that used to use it extensively.

The only thing that has improved meaningfully since then is end-to-end encryption, but as much as I love that it is only nice-to-have for me and I would feel a lot safer with just ordinary encryption as long as Facebook wasn't snooping in my contacts and metadata.

abrolhos · on Sept 11, 2020

I understand that the 1K+ engineer number comes from Facebook as a whole, when mentioning the use of Hack as a typed PHP.

More like to give a perspective on the transition on how to come from a small company into a huge one.

user5994461 · on Sept 11, 2020

WhatApps got audio and video calls after the Facebook acquisition.

Usage increased by a few orders of magnitude too.

stjohnswarts · on Sept 12, 2020

Yeah I think people get usage confused with number of users.

AzzieElbab · on Sept 12, 2020

That explains it. I haven’t used WhatsApp until few months ago

user5994461 · on Sept 11, 2020

No global service can run with 10 people. Just to have a good coverage 24/7 you're looking at a 50 people company.

oblio · on Sept 11, 2020

Yeah, just ops would be 3 timezones x 8 hours x 2 (ideally 3) people per timezone. So that's 6-9 people just for a barebones ops team.

yibg · on Sept 12, 2020

Do you know of any company that runs a product with this many users globally on 100 people let alone 10?

AzzieElbab · on Sept 13, 2020

Old WhatsApp?

ramon · on Sept 19, 2020

I think the concepts are very interesting of Erlang but I still don't understand it fully. And also about the multi-node os only an internal network scenario is not like a cloud mesh thing. You would need docker.

btbuildem · on Sept 11, 2020

> some {T} | undefined

Can we please avoid TS pitfalls?

phowat · on Sept 11, 2020

Would you mind expanding a little bit on why this is bad ? I'm fairly new to TS and find myself doing it a lot.

angio · on Sept 11, 2020

The advantage of the Option type is that you don't need to check for nullability, you simply pattern match on Some(_) or None. A type of `some{T} | undefined` means that the expression can have values `Some(_)` or `None` or `undefined`, which defeats the point of the Option type!

masklinn · on Sept 11, 2020

Except Erlang doesn't have a none. Nor an undefined for that matter. So `some {T} | undefined` means exactly that, there is no underlying null pointer which could sneak up on you.

`undefined` here is not built-in type or value of the language, it's a standard atom like any other.

So are null, nil, value or false (although false does have a special status in that it's the customary output of boolean operations alongside true).

angio · on Sept 11, 2020

I'm aware of that, I was explaining why `some{T} | undefined` is problematic in Typescript. If you have a signature with both option type and undefined it's a signal of a code smell.

mbo · on Sept 20, 2020

But `some{T}` isn't an option type, it's a member of the option type!

mbo · on Sept 11, 2020

> A type of `some{T} | undefined` means that the expression can have values `Some(_)` or `None` or `undefined`

Can you please explain this further? Why is `None` an inhabitant of this union?

btbuildem · on Sept 11, 2020

It can lead to cascading checks for undefined; best case scenario is a lot of boilerplate type code.

masklinn · on Sept 11, 2020

> It can lead to cascading checks for undefined

Do you know anything about Erlang? Because unlike typescript, `undefined` in erlang is not a special value, it's just an atom like any other.

> best case scenario is a lot of boilerplate type code.

Do you know anything about Erlang(2)? Because unlike typescript erlang has pattern matching, both faillible and not.

I'm not talking "it can kinda unpack objects", erlang has actual pattern matching (though not unification despite the prolog-inspired syntax), and a simple-looking assignment is a pattern match (and a potential assertion).

airstrike · on Sept 11, 2020

Couple things: personal attacks are not cool, and "please respond to the strongest plausible interpretation of what someone says, not a weaker one that's easier to criticize. Assume good faith."

https://news.ycombinator.com/newsguidelines.html

masklinn · on Sept 11, 2020

> Couple things: personal attacks are not cool, and "please respond to the strongest plausible interpretation of what someone says, not a weaker one that's easier to criticize. Assume good faith."

Couple things: asking somebody if they know anything about a subject is not a personal attack, and that GP doesn't know anything about Erlang and completely miunderstands the snippet is the strongest plausible interpretation of their comments which assumes good faith.

The alternative interpretations which do not are that they're actively lying, or that they can't be trusted with any device more complicated than a rock or a small stick.

dang · on Sept 11, 2020

It sounds like you didn't intend it as a personal attack, but inserting rhetorical questions like "Do you know anything about $foo" in an argument about $foo definitely comes across that way on the internet, and isn't allowed here. The fact that you repeated it twice makes it much worse—this is an attempt to expose the other commenter as "someone who doesn't know anything", which is needlessly personal.

If you know more than someone else, that's great, but please just share some of what you know so the rest of us can learn. Putting others down (even implicitly) doesn't serve learning, it's just distracting and builds up toxins in the ecosystem. Consider how much better your comment would read without those bits. The best way to react to bad information is respectfully with better information.