Internals of Go's new fuzzing system

hit8run · on Feb 18, 2022

Nice feature. But that string reverse example is exactly one thing that annoys me in Go: the lack of higher level batterie functions for strings, maps and slices.

mseepgood · on Feb 18, 2022

They are in the works:

https://pkg.go.dev/golang.org/x/exp/slices

https://pkg.go.dev/golang.org/x/exp/maps

However, reversing a string is not a commonly useful operation.

ezo · on Feb 18, 2022

> However, reversing a string is not a commonly useful operation.

But not for an interview :)

tialaramex · on Feb 18, 2022

If you ask me to reverse a string at interview, you're going to get my opinion about at least:

* Why a string isn't (shouldn't be treated as) really just a sequence of characters (even if yes, internally it's probably some structure like a vector of bytes, or 16-bit unsigned integers, or whatever) and so "reversing" it is probably nonsense.

* Dangers that fall out of that, starting with: Oops my reverse function actually produces invalid trash because that's not how text works.

* Bad interview code exercises. Do you actually reverse strings here? No? Then why are you wasting my time?

UncleMeat · on Feb 18, 2022

The first two are good discussions. I used to have an interview problem that was framed around money computations and I always appreciated it when interviewees considered the challenges with doing floating point math for money.

I also don't really expect "reverse a string" to be on any interview except the "have you ever coded before in your life" phone screen.

The last one, frankly, makes you come across as a jerk. If somebody spends their interview time telling me I am an idiot or mean or foolish for choosing a particular interview question, that's not going to go well. People who show up to a design review with a shallow understanding of the problem and assume that the other people are just stupid for not doing it a certain way are terrible to work with. Assuming that the other person has a reason for doing something is a better starting point.

camgunz · on Feb 18, 2022

I think you can do this stuff in a respectful way. I've been in and around a lot of hiring situations where no one knew what they were doing, and were also aware they didn't know what they were doing. If your candidates start showing up saying "this is... not exactly what people do" that's actually really helpful.

For example, I'm fully against whiteboarding now. I'm pretty good at it, but I think it's irrelevant and ableist (lots of people have anxiety issues and so on). When companies ask me to take a live test, I decline respectfully, talk about all this, and offer alternatives (pairing, take home assignments, review of past work, references). If this doesn't fly, well it wasn't meant to be, and it's better we both found out early on.

mikepurvis · on Feb 22, 2022

I think I'm pretty against whiteboarding code, but I do feel like there's value in having someone sketch an architecture diagram or something, since that is pretty relevant to most software engineering jobs.

Like sure, you're usually going to start the design review with a prepared document that everyone's looked at in advance, but if it becomes contentious, you need people to be able to quickly pitch their alternatives and hash out the various tradeoffs in a synchronous way (eg, not running off and making a whole new slide deck for each stage of iteration).

camgunz · on Feb 22, 2022

Oh, yeah I guess I've been using "whiteboarding" to mean "live code test"--which these days are done on like CodeSignal or whatever. I guess for teams that do architecture stuff w/ an actual whiteboard then like, yeah that skill seems relevant.

And I 100% think that both parties getting a feel for how work is done is very important, so I think like letting interviewees have a look at some code reviews, maybe sit in on a retro, do a pair exercise or whatever, this stuff makes sense. And I like your point about "if things become contentious": a lot of teams can fall apart in these moments, and they're kind of a test of the maturity of the participants and the strength of the culture.

xxgreg · on Feb 18, 2022

If given a question about reversing a string, it lets you ask the interviewer questions like:

* Do you want to do it in-place or as a copy?

* Is it an ASCII or Unicode string? If using Unicode, I assume you want to reverse on grapheme cluster boundaries?

Asking these questions let you as a candidate demonstrate your knowledge.

If the candidate doesn't ask these questions, the interviewer can ask follow up questions like.

* What is the time/space complexity of the algorithm? (Easy answer!)

* How does this work with UTF strings?

I don't particularly like the question, but have been in interviews where this exact question was used. From the discussion with the candidate it did very quickly weed out inexperienced developers. I was amazed at how many people applied for roles and had no knowledge of these concepts.

> Do you actually reverse strings here?

We've also tried doing more in-depth code exercises, which are more applicable to our business domain. That didn't work much better, and required upfront work from the candidate, which isn't always fair on them.

Anyways - happily not involved in the recruiting process any more.

Intermernet · on Feb 20, 2022

"Welcome to Reversr. We provide string reversal as a service!"

tialaramex · on Feb 20, 2022

LOL. I mean, if it existed Reversr is potentially a really interesting business, like they presumably have some very sophisticated language analysis stuff and they've got a bunch of different reverses they can do, for example maybe there are cases where (020) 6543 2109 should be reversed as 9012 3456 (020) notice the parentheses swapped there to still make sense. Or they can reverse "black cat named Noir" to "white cat named Blanc" or a whole bunch of interesting work.

But, I'm guessing Reversr don't want me to write some awful slice swapping algorithm that disrespects their hard won knowledge about human writing systems.

ASalazarMX · on Feb 18, 2022

This would be a great answer in my book, it shows advanced knowledge and the willingness to voice and objectively justify your opinion. I would openly say so, but insist on the coding just to see you pop a vein.

morelisp · on Feb 18, 2022

A good programmer knows to call out the bullshit task, but also do it anyway when needs must.

joppy · on Feb 18, 2022

So reverse a list/slice of anything else then?

tialaramex · on Feb 18, 2022

Sure, reversing slices of say integers makes lots of sense, people definitely use that. I don't remember enough Go to know how tricky that is - presumably it is not a built-in feature?

Rust's slices have reverse() but the implementation is a little hairier than you might expect: https://doc.rust-lang.org/src/core/slice/mod.rs.html#625 explains why, it wants to persuade LLVM that the things we're swapping are definitely different things, so it cuts the slice in half (if there's an odd middle element no matter, it needn't move anywhere by definition) and swaps between halves, so that LLVM can see OK, this necessarily is two different things, no aliasing is possible.

I can't think an interviewer is expecting you to show that unless you're interviewing for a job working on optimisations in the compiler or something.

morelisp · on Feb 18, 2022

I'm really curious why LLVM can't figure out that a[i:j] and a[j:k] are disjoint without help. Is something about the use of Range making it opaque?

tialaramex · on Feb 18, 2022

Mmm, I don't recognise your notation, are a[i:j] and a[j:k] really disjoint?

Isn't a[j] in both of these slices?

morelisp · on Feb 18, 2022

In all programming languages I know of using this syntax the upper bound is exclusive.

cbolton · on Feb 18, 2022

It's inclusive in R, Julia, Matlab, Fortran and presumably most languages use 1-based indices by default.

morelisp · on Feb 18, 2022

Fair I guess - of those I only used Fortran and not in an environment which supported the colon syntax. (Once you've chosen the wrong way to address array elements of course you'll end up with the wrong way to denote bounds, to take the contrapositive of the classic Djikstra note.)

But in this thread we're talking about Go - and the syntax is also used in Python and Ruby and with the same semantics `..` in Rust - and above all else the notation was clear from my question when I explicitly said those sets were disjoint - and that question is the much more interesting thing, I think?

thetallstick · on Feb 18, 2022

If i==j==k then they’re not disjoint.

morelisp · on Feb 18, 2022

Then they're both zero length and a zero-length set is disjoint from all other sets.

Anyone want to explain why Rust needs to explicitly de-alias this yet? What a useless digression...

tialaramex · on Feb 19, 2022

Why LLVM needs this explaining? Probably somewhere an optimiser could inspect something and it doesn't.

Rust cuts the slice in half and swaps between the halves, so probably LLVM doesn't convince itself that if you did those swaps directly they aren't ever aliased, but once there are two slices which can't overlap it can see it's fine.

ss108 · on Feb 18, 2022

Can you elaborate so that I may regurgitate your response in an interview? Thanks :)

mseepgood · on Feb 18, 2022

You wouldn't be allowed to use it in an interview anyway, so its absence a feature.

pantsforbirds · on Feb 18, 2022

I'm rather confused by this. Do you all not ever have a need to reverse strings? I find it somewhat common in my work. Maybe its because I work with text data for NLP models?

fulafel · on Feb 18, 2022

In case you miss it, the link to found bug list: https://github.com/golang/go/wiki/Fuzzing-trophy-case

gilgad13 · on Feb 18, 2022

> The coordinator communicates with each worker using an improvised JSON-based RPC protocol over a pair of pipes. The protocol is pretty basic because we didn't need anything sophisticated like gRPC, and we didn't want to introduce anything new into the standard library.

Interesting that this does not use `encoding/gob` by these criteria. I think `encoding/gob` is a nice example of what is possible with reflection, and I've certainly learned techniques from reading its implementation, but I haven't seen very many uses in the wild and this certainly would seem like a vote of no-confidence.

siscia · on Feb 18, 2022

Can we get the minimisation features also for the generative/property based testing?

I found that property base testing is so much better than unit testing and only marginally harder to write, after some practice at least.

However the property base testing in the go standard library is so unadvanced that it is basically impractical.

Hopefully with generics the status will improve.

ParetoOptimal · on Feb 18, 2022

Not a fan of Go, but I'll give credit where it's due:

This is a good feature for the language to afford.

mettamage · on Feb 18, 2022

Why are you not a fan?

Asking as someone who likes a few things and dislikes a few things about it. So basically asking out of curiosity.

ParetoOptimal · on Feb 18, 2022

Enums and iota weirdness made my love for Go start to crack.

I learned about abstract data types and discriminated unions, then wondered why Go didn't have those.

The prevalence and overuse of interface{} undermined type safety at jobs I had writing Go and errors from this apparently popular style of Go programming filled the logs.

Then I saw someone say "Go's error handling on values is good, but it's better in a language with Either".

I looked up examples, and indeed it seemed nicer and retained the value proposition of errors as values.

I wish Go had either.

Then bugs started happening in code that was copy pasted... but why was it copy pasted?

Lack of generics were the final straw and I started playing with Haskell in the evenings.

It gave me the type safety I wanted and more. It's not perfect, but it's based on a philosophy I believe is more correct than the worse is better values of Go that seek micro-level simplicity at the cost of macro-level complexity.

Groxx · on Feb 18, 2022

>micro-level simplicity at the cost of macro-level complexity

That pretty succinctly covers a lot of my not-fond-about-Go reasons too. There's a lot to like about Go for small projects - I've replaced a lot of my CLI-writing languages/tools with Go code and it's a dramatic improvement and I'm quite happy with it for that.

But it falls apart badly when you're trying to do highly-stable, highly-correct, or highly-complex things. Not being able to know what errors can be returned is a HUGE source of missing error handling. Fatal errors containing only "not found" after passing through dozens of calls wastes a ridiculous amount of troubleshooting time. Not-so-minor mismatches between reality and Go's SDK are covered up in awkward, error-prone ways with no real alternatives (because the core library can and does do things you cannot). https://fasterthanli.me/articles/i-want-off-mr-golangs-wild-... for a much more detailed decline into madness because of all this.

If you are sufficiently careful, almost none of this is a problem, true. Experts are consistently able to write safe C code, right? ........right?

---

Honestly I expect generics to be a pretty significant improvement. E.g. intrusive concurrency control is unbelievably error-prone in bulk, and generics will let us finally have safe and easy higher level constructs for most code. I also know of a few projects I've touched that will likely be able to remove 10-20% of the codebase just because we won't have to go to great lengths to maintain (or check after abandoning) type safety. It's a simple and reasonable implementation of generics, especially since there's no inheritance to worry about.

I'm looking forward to it... but I probably still won't like Go.

mseepgood · on Feb 18, 2022

Probably something about generics, which is no longer an issue if you use the Go 1.18 Release Candidate today or the final release in four weeks.

vsnf · on Feb 18, 2022

It's about a lot more than just generics, at least for me. I'm not interested in derailing this thread to into one bashing Go, but there's a bunch of things that are distasteful about the language. Go appeals to certain kinds of programmers, and is actively repulsive to different kinds. This division breeds animosity when the kind of programmers who are repulsed are forced to interact with it.

Also, the community has a big attitude problem.

zibzab · on Feb 18, 2022

I think that is a good thing.

Most people trying to "fix" go are basically trying to make it into that other language they are more comfortable with.

If I wanted to use ruby, I would use ruby. I use go because even its presumed limitations are an advantage (for me).

kodablah · on Feb 18, 2022

I, probably like many others, want struct-based params.

I am thinking of a lib that can fuzz a func w/ struct param(s). It takes the struct params, extracts fields into a deterministic list, builds a dynamic function via reflect.MakeFunc where each field is an arg and the body of the puts args on structs and invokes the original function. That dynamically created function is then passed to f.Fuzz.

Does a lib like this exist?

morelisp · on Feb 18, 2022

testing/quick contains an interface and basic implementation for randomly generating arbitrary structs. https://pkg.go.dev/testing/quick#Value

For traditional fuzzing you usually want a "blacker" barrier between the input and processing. E.g. in an algorithm with exponential growth it's not helpful to know that a raw 1000000 element input slice runs out of memory; it is helpful to know if a mere 100 bytes over the wire can somehow trigger that. If you do need arbitrary input, it's easy enough to treat the fuzz input as JSON or whatever.

But most code that you're interested in testing starting from a truly arbitrary state takes only a small number of scalars or a byte stream, not complex structures. If you have complex structures it's probably because you have complex invariants arbitrary data won't satisfy. (And you're probably as much or more interested in verifying your code maintains those invariants during whatever preparatory steps you have which outputs those structures, not just their use as inputs.)

kodablah · on Feb 18, 2022

> For traditional fuzzing you usually want a "blacker" barrier between the input and processing.

They clearly didn't want it that opaque and bytes only since they accept primitives and accept arbitrary param count. Why not just accept byte slice only and make people binary encode even primitives? The reason is it takes a long time to build a corpus when unnecessarily making the fuzzer figure out UTF-8, JSON, little-endian ints, etc. Using structured typing is a reasonable approach for an in-language fuzzer. They currently have a user-defined length of user-defined types, so it's only natural to translate from more complex types and save the generator time.

morelisp · on Feb 18, 2022

> Why not just accept byte slice only and make people binary encode even primitives?

It's a good question since this is what go-fuzz did and it worked pretty well; I don't see the reason in the design documentation. But I don't think it's "clearly" because

> it takes a long time to build a corpus when unnecessarily making the fuzzer figure out UTF-8, JSON, little-endian ints, etc.

Not because it doesn't in some cases (JSON it can, UTF-8 and le ints not really), but because having the fuzzer be aware of that doesn't seem to be the best way to solve the problem; this has been discussed in lots of fuzzing literature from AFL onwards. Structurally-aware mutators might as well also be []byte mutators so you can also more quickly generate JSONL, JSON-in-HTTP-request, JSON-in-CSV-column, and not only bare JSON.

For the specific issue of corpus bootstrapping and measuring coverage, it would seem to be more greatly helped by the approach mentioned in "Instrument specific packages only" in the draft proposal.

And this is putting aside my other point: For any complex structure, you have non-well-formed values with other semantic invariants which randomly-generated structures won't satisfy. (And on a more minor implementation note, you may also have some private fields.) To solve both you'll need to go all the way to custom mutators, which is interesting but a much larger issue - specifically, issue 48815.

What kind of use case are you considering where you want to start with an completely-unconstrained-yet-complex struct and not some external representation of that struct?

gorgonzolachz · on Feb 18, 2022

I'm cautiously excited/optimistic for this.

That said, I wonder if this feature will end up seeing all that much usage? Once your inputs are sanitized/bounded in a Golang application, it's pretty hard to get the language to do things it wasn't meant to do - and if the fuzzing system is built like the unit testing system, the number of useful fuzzing cases you can run won't be that large.

I've always thought that fuzzing primarily benefits E2E/integration testing, and that with modern languages' type systems and lack of pointer arithmetic the usage of fuzz testing is useful for niche cases (embedded programming and cryptography come to mind). The examples in the article (integer overflows, truncated input, invalid unicode) may be issues, but they won't break the integrity of the program or cause catastrophic failure (assuming there are no panic() calls in the code) due to Go's type system.

nemo1618 · on Feb 18, 2022

The type system can't catch nil dereferences or out-of-bounds accesses, which (in my experience) are the most common causes of runtime panics in Go. I assure you, there are many ways to make a Go program crash. :) If your program is handling untrusted input -- and almost every useful program does -- then you really should be fuzzing your input handlers.

stingraycharles · on Feb 18, 2022

I have been a big proponent of generative testing / fuzzing, where rather than hard-coding inputs you simply always generate them. You can see it in Haskell’s QuickCheck or Clojure’s test.check, but there are probably many others.

It is absolutely useful outside of just a few parsers; a type system is not able to catch all bugs. Just a recent example was a scheduling system I was working on, where the input fuzzing was able to put the system in states that I did not anticipate.

Practically speaking, with fuzzing, you’re changing tests from manually crafting certain inputs and validating the results, to defining “laws” about how your system should behave.

Here’s a good resource on the types of tests you can develop using this technique: https://fsharpforfunandprofit.com/posts/property-based-testi...

nsajko · on Feb 18, 2022

> rather than hard-coding inputs you simply always generate them

Why not both?

In any case, a rudimentary implementation is actually in Go's standard library already for a long time: https://pkg.go.dev/testing/quick@master

Though, note:

> The testing/quick package is frozen and is not accepting new features.

stingraycharles · on Feb 18, 2022

> Why not both?

Absolutely fine and often good enough, these tests are typically very simple and easier to reason about, and make more sense in a whole range of situations (eg regression tests).

However, I would take a single input fuzzing test over one that uses hard coded inputs.

jrockway · on Feb 18, 2022

> Why not both?

Go's fuzz tester takes this approach. When a failing input is found, it's added to the source code directory with the intent of you checking it in.

morelisp · on Feb 18, 2022

testing/quick is not coverage-driven and so not good at truly "interesting" input cases. It's OK for simple invariants (e.g. anything arithmetic with an inverse) but I would not trust it to tell me anything interesting about a parser. At this stage I might even rely on the fuzzer to test simple invariants because the tooling is nicer.

chmike · on Feb 18, 2022

There is a limit to the errors a compiler can detect. Here is an example where fuzzing is required.

I'm the author of [qjson](https://github.com/qjson). It converts a human readable json text (qjson) into json. The intended use is for configuration files or data input.

For instance, numeric values may be simple math expressions. I used go-fuzz and it detected right away that I forgot to deal with division by zero in these math expressions.

It is impossible for the compiler to detect such errors in the program. It fully depends on the data, and when it is complex, the risk of errors are high.

tialaramex · on Feb 18, 2022

> I forgot to deal with division by zero in these math expressions. It is impossible for the compiler to detect such errors in the program.

I'm sure there's a general class of problem where it really is impossible, but you haven't found it.

All you needed to prevent this is dependent types, which Go doesn't have. With dependent types the compiler sees a = b / c and it immediately can conclude that c's type is non-zero, since if it was zero that's a divide-by-zero error. Having refined the type to non-zero, an attempt earlier to load it with a value despite not being sure if that value is zero will fail. The buggy qjson won't compile.

potamic · on Feb 18, 2022

I have the same question as well. Dare say it seems to be anti-go to package such niche capabilities in the stdlib, but maybe their direction is changing?

TheDong · on Feb 18, 2022

> seems to be anti-go to package such niche capabilities in the stdlib

Go seems to tend heavily on the side of packaging things into the stdlib.

It has stuff like "net/rpc/jsonrpc" (a json-rpc 1.0 impl, even though you should use grpc or json-rpc 2.0, both of which are not in the stdlib). It has crap like net/smtp and archive/tar. It has image/jpeg for some unknown reason.

If you look at rust, basically half of go's stdlib exists as external crates (even if they're officially maintained, akin to the golang.org/x/ packages in go). Everything from json, to http, to image support in rust is outside the stdlib, and honestly that has ultimately let rust evolve better support for each of those things.

C, C++, (to a lesser extent) java, D, etc.. basically every language in a similar space to Go has a much smaller and more compact stdlib.

The go stdlib clearly doesn't err on the side of being small.

isabellat · on Feb 18, 2022

Deno is another project that is trying to add more to a stdlib. In some ways this is really fantastic because in the JavaScript ecosystem there is so much fatigue in deciding what libraries to use. On the flip side, if you couple too much into the stdlib then you’re limiting the huge benefit of an open market.

vsnf · on Feb 18, 2022

> It has image/jpeg for some unknown reason.

I don't know why they added this and the other image packages specifically, but I used this the other day and it was nice to not have to bring in a separate package.

TheDong · on Feb 18, 2022

I mean, they realized their mistake for other image formats, hence: golang.org/x/image: https://pkg.go.dev/golang.org/x/image

Now it's really confusing to have to figure out to import stdlib image for jpeg, but x/image for bmp.

For bonus points, check out "image/draw" vs "golang.org/x/image/draw".

jrockway · on Feb 18, 2022

Fuzz testing is probably the #1 software security innovation of the last 10 years. It's "niche" only because it's currently hard to do. For people that have set it up, it's essential. With Go 1.18, it's easy for anyone to set it up, and a lot of people are going to find a lot of dumb bugs. Pretty much everyone that has written fuzz tests for their software has found at least one crash. I found one in a program with 100% test coverage within minutes of writing the first fuzz test. Sometimes you miss dumb things, and the human-written "I'm going to trick my program into malfunctioning" tests can simply forget things. (In my case, I had a few branches with similar logic. The unit tests did run all the code in the branches, but only tested the boundary case for two of the three. The fuzz testing found the third case immediately.)

Crashes in Go vary in severity (memory isn't usually corrupted, code isn't usually modified, the other goroutines don't stop serving requests because you probably have "recover" somewhere up the chain), but at the very least, by identifying the input that can cause crashes, you gain the ability to turn invalid input into an easy-to-understand error message, saving users and operators time and frustration. Users can retry the request with valid input. Operators don't have to freak out about log spam. And, of course, sometimes the crashes are a big deal; maybe you missed the "recover", or maybe you're calling out to unsafe code that DOES corrupt memory.

To me, fuzz testing is the headline feature in the 1.18 release. And it's the release that introduces generics.

masklinn · on Feb 18, 2022

Seems completely in line to me:

- go is a broad stdlib langage

- go has a bespoke toolchain with little to no hooks, meaning anything which needs to integrate directly into the build process or to instrument the runtime benefits extensively from being brought in-tree e.g. profilers, sanitisers, and (guided) fuzzers

- finally go’s “niche” are mainly network daemons and CLI utilities, so lots of interacting with network streams and file processing, which are the main use cases for fuzzing

derekperkins · on Feb 18, 2022

Great description

unboxingelf · on Feb 18, 2022

Excellent write up