The "hackaday" blog focuses on cool things that are typically (but not necessarily) impractical. He isn't suggesting that this "lambda" be used. Instead, this is a stealth blog-post about "__anon", as far as I can tell.
Which is really what hackaday is about: finding weird features in hardware/compilers/etc. etc. and using them in some manner. There's a whole lot of obscure features of GCC that are being touched upon in this blogpost (nested functions, whatever is going on with $__anon$, etc. etc.). I can't say that I can figure out exactly what is going on yet, but its kind of exciting to see all of these features get used at once.
The "lambda$__anon$" identifer is just the name of the local function, it could just as well have been "elephant" or anything else. The first line defines the nested function:
{
double elephant (double x){ return x/3; }
And the second line references that same identifer:
&elephant;
}
Normally an expression that didn't include an assignment call or a function call is legal but doesn't do anything. But as the article mentions, GCC uses it as the return value of the block.
The commenters seem to have identified the undefined behaviour here: the resulting value is a pointer to a function that's only valid within the block but is being used outside it.
What do the dollar signs do? If they're really just part of the identifier it doesn't seem necessary to make sure the compiler supports it, rather than just use a more normal name like elephant.
It does make a difference for some unknown reason. :)
> However, using the Linux system for Windows 10, the same code would seg fault. In fact, if you didn’t set the gcc -O2 option, the other examples would seg fault, too.
Strangely enough, it works with -O2. There's clearly some kind of undefined-behavior going on (that depends on the optimizer!), since the code doesn't work with -O0.
I'm not conviced that GCC defines the behavior of this, because the trick relies on defining a local function in a block scope, and then allowing it to escape from that block scope:
{
rettype foo(args ...) { ... }
foo;
}
GCC local functions are "downward funarg only", as far as I know. This would definitely be wrong:
{
int local = 42;
rettype foo(args ...) { ... reference local ... }
foo;
}
then, when foo is called, local no longer exists, which is bad news. The lambda macro doesn't do this (the block doesn't extend the enviornment; nothing is captured from there), and so maybe works by fluke.
Another thing to is that pointers to GCC local functions work via trampolines: pieces of executable machine code installed into the stack. When you use GCC functions, the linker has to mark the executable with a bit which says "allow stacks to be executable". The default in most distros is non-executable stacks, which guards against stack overflow exploits.
(Speaking of trampolines, I'm not sure about the effective scope of those. If we lift a pointer to a local function inside a block, requiring a trampoline, and then that block terminates, is that trampoline scoped to the block or the function? If it's scoped to the function, won't it be overwritten if we execute that logic multiple times? If the trampoline is scoped to the block, then the invocation of foo is using an out-of-scope trampoline.
By the way I compiled and ran the program (Ubuntu 18.04, x86_64 with various optimization options and whatnot, such as -fstack-protector. It runs cleanly under Valgrind.
So, this doesn't work because the scope of the statement-expression is the scope of the local function, so to use the function outside that scope (as TFA shows) is UB.
C w/ GCC's local functions extensions is just not enough for lambda expressions. You have to declare the local function earlier than (and in scope of) the use site.
may well assign 6.0 to x rather than 5.0 because the first lambda gets overwritten on the stack with the second. That's if it works at all -- after all, we have UB here, and this could just summon cthulhu or anything else.
There actually appears to be a `gcc` bug here, `gcc` doesn't warn if you return the address of a local function even though it's clearly bogus usage due to it being implemented via a trampoline on the stack.
Interesting note, some quick testing shows that if the local function doesn't require any variables from the outside scope, it will actually be stored in the `.text` segment, which would allow this to work in a defined way. That said, I view this is just an implementation detail that you can't rely on, as the docs don't mention this and only talk about trampolines. It's also super easy to mess up, obviously.
I still wonder why C still does’t have lambas implemented by standard. I understand its a quite slow moving language but it would make programming in it mich nicer (see C++11)
Are there anh underlying ‘issues’ with lambdas, I wonder?
There was an analysis of this and the C++11 lambda specification done shortly after at http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1483.htm, but it was inconclusive and there doesn't seem to have been any followup since then.
> Are there anh underlying ‘issues’ with lambdas, I wonder?
A lambda object is intrinsically an object with an unnameable type and an overloaded call operator. C doesn't have any mechanisms for parameterizing function bodies over types, or even any mechanism for defining a variable without declaring its type (although one is suggested for C2x). Without such mechanisms, it's impossible to actually use a lambda.
Not really. It's true for any language where types have statically-shaped memory layouts, and where names are statically bound. Lambdas need to bind their names to locations; if this is to be done statically, then the bound names need to packed into some reference type. The type of the bound name environment cannot be named, because it is dependent on what specifically the environment is naming. Languages that fall into this category would include C, C++, Java, C#, and Rust.
By contrast, languages such as Python or JavaScript that rely on dynamic name binding have implicit environment objects attached to their objects that allow different lambdas to share the same type, since the function body gets to acquire a map that it can ask for the bound names.
> the bound names need to packed into some reference type
Exactly, and that's how we refer to them. We already do the same for arrays–it would be overly pedantic to refuse to call an address/size pair an array because it doesn't actually contain the elements it refers to, just as statically-typed languages with lambdas are actually just passing pointers around. C could do this too–we'd just need sugar that would convert lightweight lambda syntax into code somewhere that would have the same type as a function pointer.
Perhaps because Go does this properly, where by properly I mean that you can have both stack and heap allocations of scope local variables, depending upon their needed availability for functions within the scope and dissociation , from the scope, of their lifetimes? In particular, the Go compiler performs escape analysis to determine where variables must be allocation, in honor of upward funargs.
Go also has no way to catch memory allocation failure. Many things become easier when you can pretend memory is infinite, especially from a language design standpoint.
As C++ has been contemplating moving away from exceptions, they've effectively been forced to choose between concise language abstractions or strict memory management. They seem to be moving toward the former, which is to say C++ may soon begin to behave like Go, Perl, and other high-level languages--OOM will simply crash your application.
> As C++ has been contemplating moving away from exceptions, they've effectively been forced to choose between concise language abstractions or strict memory management. They seem to be moving toward the former
You don't need exception handling to handle memory allocation failure: just make your allocator return null (that's what new (nothrow) X does, after all). The question of how to handle allocation failure is easily the single most divisive question presented in the proposal, and in the face of stark division, status quo usually wins the day.
One: Compiler development is driven by the C++ standards committee. And they all hate C and wish it would die already. More to the point things you would do to make C a better more powerful language are orthogonal to the direction C++ is being pushed.
Two: Being tied to C++ also means being tied to the same ABI as C++. And improvements to the C language probably would need some extensions to the ABI.
Three: I can't wrap my head around this but a lot of people are extremely hostile to attempts to extend and improve C.
> One: Compiler development is driven by the C++ standards committee. And they all hate C and wish it would die already.
The latter statement is not true. But it is true that most of the evolution of C/C++ is driven by the C++ committee, with the C committee mostly adapting features from C++ and very little innovation in C being adapted for C++. (As one C++ committee member confided to me, the C committee does have a bit of a tendency to completely screw things up when the C++ committee liaisons leave the room). But there is still coordination and cooperation between the committees--for example, the recent proposals to replace the current EH model in C++ includes a coordinating proposal to modify the C ABI to provide access to a Result-esque exception model.
> Two: Being tied to C++ also means being tied to the same ABI as C++. And improvements to the C language probably would need some extensions to the ABI.
The C ABI desperately needs extensions anyways, especially because it is the de facto platform ABI and languages usually only support FFI features using the C ABI. The biggest missing features here are SIMD vector support and multiple return value support.
I apologize for the slight against the C++ standard people.
I do like your comment about the ABI needing to be extended to improve FFI features. I feel that way too. Also think that a clean (non clunky) method for FFI is exactly what C has needed for a long time.
Regarding three, I believe it's because C ultimately aims to be the most low-level, high-level abstraction of machine code.
Stray too far from that, and you're already in C++ territory.
Except for multi-core/-threading support, I can't really think of anything that has changed in the past 25 years to add to this, in my opinion, essentially near-perfect language.
Every single SYSV x64 ABI platform supports at least two uint64_t return values, which are the registers rdi and rax (Actually it might be rbx, I haven't done assembly for a while). So C is behind of the curve.
C is considered to be "High level assembly", up until it isn't. To be honest, YASM and other assemblers do "high level assembly" much better.
C is a simple, flexible low-level language. There is a feeling about C that I do not get from other languages, a specific way of thinking and a trend towards simplicity of feature (Ignoring GNU) that other languages (including rust) do not encourage or notice. It would be nice to see it importing some of the type semantics of ML and OCaml.
> Every single SYSV x64 ABI platform supports at least two uint64_t return values, which are the registers rdi and rax (Actually it might be rbx, I haven't done assembly for a while).
Which is really what hackaday is about: finding weird features in hardware/compilers/etc. etc. and using them in some manner. There's a whole lot of obscure features of GCC that are being touched upon in this blogpost (nested functions, whatever is going on with $__anon$, etc. etc.). I can't say that I can figure out exactly what is going on yet, but its kind of exciting to see all of these features get used at once.
https://github.com/wd5gnr/clambda/blob/master/clambda2.c
EDIT: Unfortunately, it just segfaults for me at the moment.
This is Ubuntu on Windows, but I doubt that would make a difference.