An overview of the science on function length

freetime2 · on Aug 31, 2020

I find that a lot of junior developers tend to go overboard with creating small functions, and the motivation is largely aesthetic. They have a mental image of what “clean” code is supposed to look like, and it doesn’t involve lots of indentation, curly braces, mathematical notation, etc. So they often try to bury these messy details inside a function.

They don’t have a lot of experience reading and debugging other peoples’ code, so they might not consider how much of a pain it can be to have to go jumping through a whole bunch of deeply nested one or two-liner functions just to see what the code actually does.

CuriouslyC · on Aug 31, 2020

Small functions are a form of semantic annotation when used properly.

If I'm working with code that modifies an image, I'd much rather see "rotate" than a matrix multiply. Heck, if I'm dividing a number by some special value, say to normalize it, I'd prefer to see "normalize" than a mystery division. I do agree that I'd rather see x + 1 than "addOne" though, since that doesn't tell me anything the code itself didn't.

freetime2 · on Aug 31, 2020

If a block of code is short (a few lines or less), and doesn't get reused in multiple places, I often prefer to add a clarifying comment rather than extracting the code out into a separate function. That way they reader doesn't need to go jumping around to see what the code does. And you can fit more information on a comment line than you can in aSuperDuperDescriptiveFunctionName.

klyrs · on Aug 31, 2020

> "... if you need more than 3 levels of indentation, you're screwed anyway, and should fix your program." - Linus Torvalds

I've had that quote thrown at me by a petty reviewer. Okay advice in C, utter shitshow in python classes. Like pep8 and Dijkstra's rant on goto, it's generic advice, not a holy mandate.

besulzbach · on Sept 1, 2020

I agree. I am most annoyed by the use of maxims with something so subjective and context-sensitive as code. I've read and written code with 4 (and maybe even more) levels of indentation that made complete sense and code with 2 levels that should probably be refactored into only having one.

I've seen people propose things as dubious as a character limit per function body. Most programming languages are so flexible that no matter what guidelines you set for your project, there will still be very bad code that conforms to all of them.

Forcing function splitting by rules is a good example of this. Very large functions are probably doing too much, but where and how you split them matters incomparably more than if you split them at all.

sgk284 · on Aug 31, 2020

There are so many compounding factors here that the conclusion seems unreliable. It also reminds me a lot of the streetlight effect[1], in that defects may simply be easier to detect in smaller functions so that's where we find them.

Many of the conclusions center around density and lines of code. This is weird because if smaller functions lead to less overall code, then if you have 2 bugs in a 1,000 sloc codebase it'll measure worse than 6 bugs in a 10,000 sloc codebase that does the exact same thing.

It might be more valuable to compute the defect rate per cyclomatic or kolmogorov complexity. Or some other measure that's independent of line length and then figure out how function size impacts those complexity measures.

[1] https://en.wikipedia.org/wiki/Streetlight_effect

leto_ii · on Aug 31, 2020

I think you make a valuable point. If somebody writes concise, clear code with little duplication, but still has a bug, their stats might look worse simply because their code is shorter.

It's also not clear that bug rates should be the only/main thing to look at. Complexity measures can also help to indicate if the code is easy to read/extend. In my experience low cyclomatic complexity does correlate with more readable code.

tsimionescu · on Aug 31, 2020

> If somebody writes concise, clear code with little duplication, but still has a bug, their stats might look worse simply because their code is shorter.

Fortunately, the science here me to indicate the opposite: the less code you have, the fewer defects per LoC you'll have. This is also indicated in the article, which mentions that both function size and function length were found to be correlated to higher defect density, meaning that both may in fact simply be estimators of overall code length - that overall code length correlates with defect density per LoC.

> It's also not clear that bug rates should be the only/main thing to look at. Complexity measures can also help to indicate if the code is easy to read/extend. In my experience low cyclomatic complexity does correlate with more readable code.

The article does actually cite some (necessarily small-scale) studies on this as well, which have also tended to find that larger functions are easier to read for debugging purposes, and that it was easier to add new functionality to the system with larger functions; but that it was easier to modify the existing functionality of the system with smaller functions. This is all in the section entitled 'Practical Effects'.

leto_ii · on Aug 31, 2020

> Fortunately, the science here me to indicate the opposite: the less code you have, the fewer defects per LoC you'll have

I judged things conditional on the fact that a bug exists. If you apply the technique of counting bugs/loc blindly you may end up favouring longer, more tangled code. Think about it this way: if your goal is low bugs/loc then when a bug shows up you will be incentivized to fix it in a way that increases loc, instead maybe of simplifying thing. This will lead to fewer bugs and lower bug density.

eesmith · on Aug 31, 2020

What do you consider to be the conclusion(s)?

I ask because the conclusion is only firm about very short functions, otherwise saying "[f]or longer functions, the picture is less clear" - which you agree with, yes?

Part of the conclusion concerns how short functions appear to be harder to debug, which seems rather the opposite of your hypothesis that defects are easier to find in them. (One of the contributing arguments is that the indirection of a function name adds its own mental overhead.)

Nor do I see a conclusion that smaller functions lead to less overall code. The article seems more concerned about questions like "should [we] refactor our long methods into short ones to avoid defects", with that study concluding no.

And the article addresses code complexity several times, most directly as:

"""The literature doesn’t provide a straightforward way to measure which feature (length or number of methods) is more significant when predicting defects. It’s quite possible that, for example, measures such as “average method complexity” and “number of methods in a class” simply act as 2nd order estimators for number of lines, in which case we’d simply be comparing which measure correlates better with the underlying metric."""

I don't see how your answer fits in, so did I misread the article that badly?

tsimionescu · on Aug 31, 2020

I think that this result is rather intuitive, even in the common paradigm of 'a function should do one thing and have a good name that describes that thing'.

Essentially, the idea that code is easier to understand with small, well-named, cohesive functions relies on the functions being bug-free themselves, and on the relevant behavioral details being perfectly caught in the name and types.

However, if small functions have bugs, it is pretty intuitive that it takes more time to explore a call graph than a linear code listing to find that bug.

Furthermore, when chaining many small functions to achieve a more complex functionality, bugs can more easily slip in the chain itself. For example, a function may modify a list you pass in and also return it, when your chain assumed it would return a copy. A sort function may be unstable when the calling code assumed it was stable. A function may move to another thread when the calling code assumes that locking is unnecessary.

All of these would be easier to catch if the code were inline, instead of abstracted behind a function signature. Of course, this has to balance somehow with not rewriting sorting procedures in every function of your code base.

I think that the answer probably lies with some idea of separating library code and application code. Library code should be composed of many small functions, well documented and well unit-tested. Application programmers should all be familiar with the library and its semantics. In contrast, application code should probably favor larger functions and avoid ad-hoc abstractions. If there is a small piece of functionality that seems re-usable, it should usually not be moved to a separate function, but to a separate library.

leto_ii · on Aug 31, 2020

I'm not sure the first part of the article is based on a sound methodology. For example, we don't have the number of classes for each bucket in the plots [1] and [2]; perhaps there are only a few classes of one liners - probably not a statistically representative sample.

It's also not clear what type of code is counted. I'm not convinced boilerplate (setters, getters, simple constructors) should be taken into account. It will artificially decrease the average method length in a class. Imagine a class with 10 fields (hence 10 setters, 10 getters, maybe a few constructors) and one method of 100 lines. The average method length will be 120 (100 + 20*1 from the boilerplate) / 21 ~= 6 lines of code. If there are bugs in this class, they basically have to be in the large method; still, in statistics this would count as a class with pretty short methods overall.

I'm also surprised by what counts as a long method - in my experience code starts getting messy with lengths > 20 - 30. These methods don't even seem to be represented in the data.

> application code should probably favor larger functions and avoid ad-hoc abstractions

Why favor large functions? Maybe allow them, but why encourage? Also, we haven't established what large is (20 lines - perfectly fine; 200 lines - I don't think this can ever be justified).

> If there is a small piece of functionality that seems re-usable, it should usually not be moved to a separate function, but to a separate library.

This does seem excessive. Don't you think there might be code that can be locally reused? Does every little thing have to go to a library?

[1] https://i2.wp.com/softwarebyscience.com/wp-content/uploads/2...

[2] https://i1.wp.com/softwarebyscience.com/wp-content/uploads/2...

tsimionescu · on Aug 31, 2020

Your points about getters and setters are valid, and may be skewing the results for Java especially. However, I believe they should be expected to be skewing them in the direction of improving the numbers for very small methods, not harming them.

> I'm also surprised by what counts as a long method - in my experience code starts getting messy with lengths > 20 - 30. These methods don't even seem to be represented in the data.

This is a good point, and I believe it is the main reason the article talks about Very short functions being sub-optimal. I think almost everyone wod agree that a function in the range 1-5 lines of code is very short, while deciding if a 20-Loc method is short or long is going to be more disputed.

> Why favor large functions? Maybe allow them, but why encourage? Also, we haven't established what large is (20 lines - perfectly fine; 200 lines - I don't think this can ever be justified).

My idea is to favor expressing serial logic serially. That is, if you have to do A then B then C then D, prefering to write it that way, rather than A, then foo() [which does B and C] then bar() [which does D]. This implicitly favors large functions over short ones, assuming that the code is required at all, of course. Short code is still preferable to long code, but at the module level, not function level.

> This does seem excessive. Don't you think there might be code that can be locally reused? Does every little thing have to go to a library?

There might be, of course, this is not about strict commandments. But I think that looking at many code bases, there are numerous functions called only once, which essentially act only as comments. A lot of the time, it may be worth it to refractor the code more deeply, and instead of simply extracting a piece of a larger function into a non-reused smaller function, to try to extract some common pattern into a library, and to keep the business logic in-line in the original function.

Just as an example, if you have something like

    for(int i=0; i <n1; i++) {
        for(int j=0; j <n2; j++) {
          arr3[count++] = arr1[i] + arr2[j] ;
        }
    }

As part of a large function, instead of extracting this into an 'addArr1AndArr2' function, to extract the general pattern and leave so thing like this in the original context:

    arr3 = zip(arr1, arr2).map(a1,a2=>a1+a2);

Of course, this will not always be possible, and sometimes a function can just become too long even in serial logic. But a lot of the time, the effort of reducing line count by extracting or applying library functions will be more worth it than simply reducing function line count by more simplistic refactoring.

ajuc · on Aug 31, 2020

Splitting 100-lines-long function into let's optimistically say ten 5-lines-long functions is only a win in readibility if you strongly restrict the number of ways in which these 10 new functions can be combined (by scopes, typing, good naming conventions, etc.).

If you just extract the functions and change nothing else - you have 50 lines of code to read instead of 100 but you have 10! possible permutations of these function calls instead of 1 to think about.

christophilus · on Aug 31, 2020

And you have 10 new, probably misleading names to think about. Naming is one of the hard problems, and a preference for small functions means you run into this problem more frequently.

asgard1024 · on Aug 31, 2020

I am glad that somebody is trying to challenge the conventional wisdom.

This might also be relevant: http://number-none.com/blow/john_carmack_on_inlined_code.htm...

ajmurmann · on Aug 31, 2020

I saw this discussed elsewhere where it was pointed out that Java has a tendency to have objects with lots of small getters and setters. This might effectively be a proxy metric for objects that do too much and expose too much of their state.

cjfd · on Aug 31, 2020

When there is a correlation between A and B it could be that A causes B or that B causes A or that A and B have a common cause. It could also be that the correlation is just a coincidence.

I tend to write longer functions when the thing being done is simple and shorter ones when the thing being done is complex.....

outsomnia · on Aug 31, 2020

It's true one huge function repeating things and excessive fragmentation into subfunctions are both less than ideal to maintain and there's a desirable middle ground. But

>> Very short functions are a code smell

that's definitely not unconditionally correct. There are many cases where dereference helpers of the form x_to_y(x), wrapping type conversions that might go through three or four levels of struct members are going to eliminate mistakes in bulk code. They may be done as preprocessor defines or as inline functions but either way they are usually a green flag, not a red flag.

eesmith · on Aug 31, 2020

You quoted the title, but the title isn't the complete conclusion, which is:

> As such, software developers should be wary of breaking their code into too small pieces, and actively avoid introducing very short (1-3 lines) functions when given the choice. At the very least unnecessary single-line functions (ie. excluding getters, setters etc.) should be all but banned.

I believe a x_to_y(x) fits into the "etc." of acceptable single-line functions.

neilwilson · on Aug 31, 2020

Does anybody do coupling and cohesion any more?

That was always the dividing line I was taught. A cohesive function you could name to increase the abstraction level and stop your brain overloading.

brundolf · on Aug 31, 2020

Something I didn't appreciate until several years into my career is that even decoupling has a cost. By adding a layer of abstraction, you're making a bet that the mental overhead of tracking that new idea is smaller than the mental overhead of tracking its implementation directly. You're introducing a new concept that the programmer didn't have to think about before, in hopes that it allows them to ignore a larger set of information some of the time. For tiny functions, especially if their concept isn't already familiar to the programmer, this may often not be a good tradeoff.

roughly · on Aug 31, 2020

I'm unconvinced.

Two notes - first, the experimental evidence seems to involve taking people unfamiliar with the codebase and asking them to debug or investigate code with either long or short functions. I can certainly conceive of it being easier to learn and debug a single long function than multiple smaller ones if one is unfamiliar with the codebase. I'd expect that effect to go away once one is asked to debug the same code again.

Second, the article asserts more bugs are found in shorter functions. However, this is only taking lines of code into account, not functional complexity. As code lives, bugs are found and corrected; often those bugs are to do with unforeseen runtime circumstances. Joel Spolsky covered this memorably a couple decades ago [1]. Again, taken as a static snapshot, I am not surprised that short code correlates with bugs, but I don't think that tells us any meaningful information about how to code. A codebase is not a static entity.

As pointed out by several people already, the point of short functions is abstraction and comprehensibility - we want one and only one place to talk about reading a file, one and only one place to talk about handling a particular message type. Similarly, we want to know when we look at a function that it does one and only one thing, and we want to know when we look at our code that what it says it's doing is what it's actually doing.

I think this is one of those studies similar to the "judges sentence harsher before lunch"[2] study - the type of effect discussed fails under any reasonable comparison to commonly-experienced reality. Consider your own personal experience working in codebases and how this article aligns with that - phenomenological reporting can indeed be suspect, but when it contrasts starkly with experimental results it often indicates poor study design.

[1] https://www.joelonsoftware.com/2000/04/06/things-you-should-...

[2] http://nautil.us/blog/impossibly-hungry-judges

(I hope I can use this level of motivated reasoning to critique the next study I agree with, too.)

legulere · on Aug 31, 2020

> the point of short functions is abstraction and comprehensibility

If you have clean code style short functions you get nothing of that. There you split functions into more functions just for the sake of it. You end up with function names that describe the code worse than how the code describes itself, similar to comments beginner programmers make. Why is a comment "Increments i by one" for i++ not ok, while a function "incrementIByOne" is supposed to be good style?

Even if you don't make functions clean code-style ridiculous short, abstractions still come with a cost. You have to do weigh costs and benefits each time.

Someone · on Aug 31, 2020

“Why is a comment "Increments i by one" for i++ not ok, while a function "incrementIByOne" is supposed to be good style?”

It isn’t, but nextOrderNumber could be, especially if it is used in multiple places. That function could easily evolve into _doing_ “increment i by one, atomically”, “generate a random UUID”, “get a new unique ID from the database”, etc.

eesmith · on Aug 31, 2020

Several of the papers referenced, at least, use empirical evidence not based on the experimental procedure you described.

Eg https://www.microsoft.com/en-us/research/wp-content/uploads/... and https://www.inf.usi.ch/faculty/lanza/Downloads/DAmb2010c.pdf .

I think the author is not saying "more bugs are found in shorter functions" but that more defects are found in software modules with shorter functions. Hypothesized reasons for this include ""interface errors"" and ""it’s plausible that creating functions will always carry some [cognitive] overhead in itself"".

Even then, the only firm conclusion is for very small functions, at most a few lines long. Are those the same sorts of short functions you mean?

You write: "A codebase is not a static entity." And that's true. Which is why the cited paper https://www.inf.usi.ch/faculty/lanza/Downloads/DAmb2010c.pdf (for example) tracks repository changes for several open-source projects for several years, and correlates them with reported defects. That was used in the essay, eg, ""this is likely because we consider all bugs over the ~4 year period"".

My own history is that working with code bases which follow the 'functions should be only a few lines long' approach drive me batty because I can't remember all those function names which are only used once, and because many of those code bases treat instance methods as pseudo-global variables, and modify them will-nilly.

So yes, the conclusions given here agree with my personal experience, so suggest that it is a good summary of research which was reasonably well-defined.

Which is a rather different conclusion that yours, yes?

jake_morrison · on Aug 31, 2020

This post describes a process of decomposing code into very simple functions: http://www.gar1t.com/blog/solving-embarrassingly-obvious-pro...

This results in functions which are, I feel, a bit too small for comfort. On the other hand, Erlang's pattern matching on function parameters has an effect similar to Eiffel design by contract. It has the potential to reduce the number of tests that need to be written (many of which are often very short functions).

Ideomatic Erlang/Elixir code tends toward smaller functions and pattern matching on multiple function heads instead of if/then/else logic.

aaron-santos · on Aug 31, 2020

Is it difficult to believe that functions can be "too small"? I've found myself in both code bases with enormous 5kloc functions as well as code bases where I had to juggle 10 1loc functions in my head in order to figure out what the hell was going on. Both are their own forms of torture.

A 5loc/function sweetspot sounds about right.

christophilus · on Aug 31, 2020

Extremes are generally suboptimal, but the prevailing wisdom seems to be to have many small functions, so I’m all for a study that pushes us back to the middle.

awinter-py · on Aug 31, 2020

Hmmm I wonder if test coverage is a factor here

One long function gets run every time you change any piece of it, whereas a more complex call graph can be edited without testing every permutation

(yes there can be branches in the function too)

gumby · on Aug 31, 2020

I lop Ike this author:

> Time to scare the wikipedia editors among you and do some original research. All the code and data can be found at...

tcldr · on Sept 1, 2020

By this measure, Haskell is an abomination and Combinators are a 'code smell'. Not sure I agree.

zelphirkalt · on Aug 31, 2020

The heading is sort of misleading.

> To put it plainly, if we have a long function and split it into smaller ones, we’re not removing a source of defects, but we would simply be switching from one to another.

Of course ... But when you use short functions, you try to abstract things and make these functions usable in general cases. Who thinks, that simply splitting up a long function with zero changes fixes any bugs? Hopefully very few people. The only advantage in that case would be, that one can think in the context of that smaller function about what it is doing and that might help to find a bug _and then fix it_ by changing the logic. By then we have done already more than simply splitting up the long function though.

Long functions usually do more than one thing to achieve a more complex thing, which requires doing multiple steps. That makes them less reusable. They become too specific in what they achieve. If ones naming skills (those are very important) are not up to the task of naming long functions precisely, then perhaps one will forget, that this long function does something very specific inside to reach its goal. It will silently have lost composability. Other developers for sure will not know, unless they read all the code. I have seen atrocious 300+ loc functions, which serve exactly one purpose and interact with so many parts of the system, that I can read them 10 times and still do not get what they do in their entirety.

By not using small functions, one gives away readability as well. Every function is a chance to give a name to some short program. Short functions can be looked at separately, if written well. Of course, if you modify a lot of global state in your short function then no one can help you any longer.

> All of the studies measuring defect density found increased defect density for smaller functions. One possible explanation was suggested by [2], who proposed that the increase in errors was due to what they called “interface errors” – that is, “those that were associated with structures existing outside the module’s local environment but which the module used.” This would include errors such as calling the wrong function.

Well, that happens usually, when the naming is off or difficult to remember or follow. That is why naming things is an incredibly important skill. The name of a function should, if possible, give a good picture (by convention if there is any) of what the function is doing. Do not tell me, that the 100 loc function can be described in a single verb. It is most likely doing much more than one thing.

So in general I am not convinced by how the title and intro of the article put it. If we look at the article, the title is also actually different than here. It says "Very short [...]" not "short".

They would have to limit the scope of their study a lot, to make a good point. For example:

(1) procedural code / mainstream "every noun is a class" code / functional code

(2) the kind of expertise of coding people allowed to take part (Was the code written by capable people?)

(3) the programming languages they look at

It seems that this research might be biased by what code they looked at. Some programming communities are not as well known for keeping code clean as others. They are often associated with "only learned one programming language ever, never widened their horizon to a different methodology or paradigm". Those are mainly found in languages, which are very widely used and up at the top of the programming language usage lists like TIOBE. The reason is, that being at the top of the list is used by many as a justification to not needing to learn a different tool. Many people stop learning. Here I quote the article:

> However, nowadays the vast majority of functions are under 50 lines. A quick analysis of Eclipse, a popular open source IDE, reveals it averages about 8.6 lines per method in its source code.

Relating to an IDE, which is mainly used for Java.

> This shift in function sizes is perhaps partially due to changes in programming languages. In the 80s a Fortran “module” was commonly considered a function and some variables (see eg. https://www.tutorialspoint.com/fortran/fortran_modules.htm) and function was the basic building block of software, whereas nowadays most Java or C++ programmers would define “module” as a class consisting of multiple functions.

They do that, because for a long time their languages were so limited, that they did not have any other means of expressing a module. This also discouraged a style which does not use objects, but only functions. Add to that, that Java (and I think neither C++) does not have TCO and you have yet another reason, why Java or C++ code is not a good argument, when talking about short functions. Java got modules now, as far as I know. Not sure developers make proper use of them.

I would like to see the same research being done on a language like Haskell or Scheme, one language at a time.

tziki · on Aug 31, 2020

>So in general I am not convinced by how the title and intro of the article put it. If we look at the article, the title is also actually different than here. It says "Very short [...]" not "short".

The "very" word was automatically stripped by hackernews. I'm guessing some system to tune down clickbait posts?

>By not using small functions, one gives away readability as well. Every function is a chance to give a name to some short program. Short functions can be looked at separately, if written well. Of course, if you modify a lot of global state in your short function then no one can help you any longer.

I disagree about readability - I think the empirical experiments show enough evidence about short functions not being more readable than long functions, that we can't take that as granted.

For reusability, you might be right, and maybe that's why short functions did seem to perform better when modifying existing functionality.

>Well, that happens usually, when the naming is off or difficult to remember or follow. That is why naming things is an incredibly important skill. The name of a function should, if possible, give a good picture (by convention if there is any) of what the function is doing. Do not tell me, that the 100 loc function can be described in a single verb. It is most likely doing much more than one thing.

Sure, but naming things well is also a difficult skill, which means not everyone can do it well. If we look at programming as a worldwide phenomenon, the vast majority of programming is done in English. However, the majority of programmers aren't native speakers, meaning they face an even steeper climb to become good at naming things. There will always be cases where programmers misname functions, or don't update the name when changing functionality, or typo, and I think the data here shows that things like that do happen.

>I would like to see the same research being done on a language like Haskell or Scheme, one language at a time.

I do agree that all the research focuses on mainstream languages, and the results might be different for functional languages. However, the point of this post is to look at science and there's simply no science there. There aren't any bug datasets focusing on functional languages either.

The point of this post is to gather what we know about function length and what we don't. I feel you've pointed at many of the areas we don't know much about - and maybe we'll get there one day. However, for now, we need to look at what we know and what kind of generalizations we can make from what we currently know.

jghn · on Aug 31, 2020

> I think the empirical experiments show enough evidence about short functions not being more readable than long functions, that we can't take that as granted.

Pure anecdata on my part, my observation over the years is that developers tend to cluster on this cluster with high correlation to if they're a top down or bottom up thinker when it comes to coding in that top down devs tend towards smaller functions and bottom up devs tend towards larger functions.

If there is in any validity to my anecdata, my hypothesis has been that the top down/smaller function crowd is more likely to trust underlying implementation of sub-functions for one reason or another, while the bottom up/larger function crowd wants to be able to visually verify how things work.