But can't the compiler work that bit out? Source code is for humans. Put all the...

atilaneves · on June 5, 2020

> Source code is for humans

The reason why making the scope as local as possible is important is for humans, not compilers.

> Put all the vars at the top to tell the humans "here's all the scratch space I'll be needing in this block"

Why would humans care about how much scratch space is needed? That's for the compiler to know.

jagged-chisel · on June 5, 2020

We're not talking about scope. The scope is already decided. "Vars at the top of the block" doesn't mean "take those extra vars out of the while{} scope and elevate them to the next enclosing scope."

You've taken my "scratch space" too literally. Very few people need to count bytes for local vars. I'm talking about future maintainers reading and understanding code. Grouping the current block's variables at the top says nothing about how the compiler might organize the resulting code and storage. But it does inform future readers of the code.

nybble41 · on June 8, 2020

The scope of a variable is from where it is declared to the end of the block. Moving a variable to the top of the enclosing block means that it can be referenced from more places in the code, which increases its scope.

Warnings about uninitialized variables help, but don't catch everything. For example, you don't usually get a warning for passing the address of an uninitialized variable to an external function (since it might be an output parameter), but that would be undefined behavior if the function expects the variable to be initialized. Initializing variables at the point where they are declared ensures that they can't be referenced at all in an uninitialized state.

Rust has a slightly more nuanced (and IMHO superior) system: non-mutable ("const") variables can be assigned exactly once, possibly but not necessarily at the point where they are declared, and all variables must be initialized before use, including passing references to other functions. This permits more flexibility in how the code is arranged while simultaneously offering stronger guarantees against undefined or otherwise erroneous behavior.

cesarb · on June 5, 2020

> Why would humans care about how much scratch space is needed?

In some contexts, it's important. For instance, each thread within the Linux kernel has a very limited fixed-size stack space (used to be 4K bytes, IIRC it's been increased to 8K and then 16K), which resides in physical memory (cannot be swapped out or lazily allocated). Avoiding large stack frames is necessary.

atilaneves · on June 5, 2020

1. That's a rare use case 2. There are ways of measuring that without hampering readability. I mean, are programmers supposed to add all the variables' sizes up in their head??

MaxBarraclough · on June 5, 2020

That wouldn't work anyway. You could rewrite code to use fewer variables without changing the resulting assembly.

    i = get_thing();
    j = do_stuff(i);

vs

    j = do_stuff(get_thing());

Someone · on June 5, 2020

Also, unless your thread has only a single function that has local variables or arguments and that function doesn’t contain sub blocks that declare variables (say inside a while block), All variable declarations at top of block doesn’t help much in gauging stack space usage of a thread.

Actually, not even that helps, you would also have to know how much stack a function call takes (might be non-trivial in the presence of stack alignment rules), and which functions get inlined. Edit: if you declare all your locals at the start of a function, chances are the compiler will check whether it can make some of them share memory, so you’d have to take that into account, too.

If you’re concerned about stack overflows in your threading code, it is tooling is what you need, not manually counting stack usage.

MaxBarraclough · on June 7, 2020

> if you declare all your locals at the start of a function, chances are the compiler will check whether it can make some of them share memory, so you’d have to take that into account, too.

And that's ignoring registers. Not every local ever needs to reside in memory.

MaxBarraclough · on June 7, 2020

> can't the compiler work that bit out

Technically no, not in all cases, due to the halting problem. In practical terms, read-before-write issues do happen in real C code, so it makes sense to take steps to avoid it. (Languages like Java force the programmer to write code where the compiler can guarantee the absence of read-before-write errors, sometimes just synthesising an assignment of zero, but it's still possible the programmer will assign a dummy value and accidentally end up using it.)

> Source code is for humans.

Yes, that's precisely my point. It's about making the code readable and easy for a programmer to reason about. It's unlikely there will be any performance impact either way; decent compilers should be good at lifetime-analysis and register-allocation.

It's more readable to declare a short-lived local on its first use. This makes its precise type more apparent, as you don't need to scroll up to its declaration. This is particularly important in C, where using the wrong type can have especially nasty consequences.

The new style also makes it immediately clear over what scope the variable is relevant, as the local does not exist in scope until it is declared and assigned. That is to say, it only exists when it should. I expand on this in my other comment in this thread.

Related to this, the new style helps prevent undefined behaviour by making it less likely you'll accidentally introduce a read-before-write. Again, those errors do happen in real production code. It's the kind of error static analysers pick up in long-trusted codebases.

The old style makes your code less dense, artificially increasing the number of lines in a function.

The new style also enables you to use const, which of course requires assignment at the point of declaration. If you use const with your locals, you do not have to scan the code to determine if the local is modified later on, you know at a glance that it will not be. This lets you reason about values, rather than the current state of a local. If you can access the local, you know it holds the right value. [0]

If it turns out the lifetime of a local needs to be broadened, you can move the declaration up to a broader scope, but in my experience this is surprisingly rare.

It's not exactly relevant, but in C++, with RAII, you don't really have a choice, and you pretty much must use the new style rather than the old-school C style. But that doesn't tell us much here. In a similar vein, Java and C# programmers could use the old-school declare-at-the-top style, but none of them ever do.

It's just a style that used to be necessary in old versions of the C standard, which people got accustomed to. For what it's worth, the Linux kernel seems to use both styles. [1] [2]

> here's all the scratch space I'll be needing in this block

For the reasons I've given above, I don't think this is a good way to approach locals. It makes sense to leverage scope and constness to improve readability, not to just introduce a free-form set of uninitialised locals with overly broad lifetimes. That approach opens the door to avoidable bugs, and needlessly burdens the reader with having to scan the code to determine basic properties of the locals (which they may then get wrong).

[0] https://www.infoq.com/presentations/Value-Values/

[1] https://github.com/torvalds/linux/blob/master/init/do_mounts...

[2] https://github.com/torvalds/linux/blob/master/kernel/sched/c...