Lambdas for C – sort of

dragontamer · on Sept 11, 2019

The "hackaday" blog focuses on cool things that are typically (but not necessarily) impractical. He isn't suggesting that this "lambda" be used. Instead, this is a stealth blog-post about "__anon", as far as I can tell.

Which is really what hackaday is about: finding weird features in hardware/compilers/etc. etc. and using them in some manner. There's a whole lot of obscure features of GCC that are being touched upon in this blogpost (nested functions, whatever is going on with $__anon$, etc. etc.). I can't say that I can figure out exactly what is going on yet, but its kind of exciting to see all of these features get used at once.

https://github.com/wd5gnr/clambda/blob/master/clambda2.c

EDIT: Unfortunately, it just segfaults for me at the moment.

    $ gcc --std=gnu99 clambda2.c
    $ ./a.out
    Segmentation fault (core dumped)
    $ gcc --version
    gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0

This is Ubuntu on Windows, but I doubt that would make a difference.

quietbritishjim · on Sept 11, 2019

> whatever is going on with $__anon

The "lambda$__anon$" identifer is just the name of the local function, it could just as well have been "elephant" or anything else. The first line defines the nested function:

    {
        double elephant (double x){ return x/3; }

And the second line references that same identifer:

        &elephant;
    }

Normally an expression that didn't include an assignment call or a function call is legal but doesn't do anything. But as the article mentions, GCC uses it as the return value of the block.

The commenters seem to have identified the undefined behaviour here: the resulting value is a pointer to a function that's only valid within the block but is being used outside it.

ramshorns · on Sept 11, 2019

What do the dollar signs do? If they're really just part of the identifier it doesn't seem necessary to make sure the compiler supports it, rather than just use a more normal name like elephant.

gpderetta · on Sept 12, 2019

It is just a way to uglify symbols to make collisions with surrounding code less likely. C macros are not hygienic.

archgoon · on Sept 11, 2019

It does make a difference for some unknown reason. :)

> However, using the Linux system for Windows 10, the same code would seg fault. In fact, if you didn’t set the gcc -O2 option, the other examples would seg fault, too.

dragontamer · on Sept 11, 2019

Strangely enough, it works with -O2. There's clearly some kind of undefined-behavior going on (that depends on the optimizer!), since the code doesn't work with -O0.

johnisgood · on Sept 11, 2019

It works for me. I am using Linux.

    $ gcc clambda2.c && ./a.out
    Running sum=6.282000
    Running sum=18.322001
    Running sum=102.321999
    Running sum=103.722000
    25.930500
    Running sum=1.047000
    Running sum=3.053667
    Running sum=17.053667
    Running sum=17.287001
    4.321750
    $ gcc --version
    gcc (GCC) 9.1.0
    [...]

kazinator · on Sept 11, 2019

I'm not conviced that GCC defines the behavior of this, because the trick relies on defining a local function in a block scope, and then allowing it to escape from that block scope:

   {
     rettype foo(args ...) { ... }
     foo;
   }

GCC local functions are "downward funarg only", as far as I know. This would definitely be wrong:

   {
     int local = 42;
     rettype foo(args ...) { ... reference local ... }
     foo;
   }

then, when foo is called, local no longer exists, which is bad news. The lambda macro doesn't do this (the block doesn't extend the enviornment; nothing is captured from there), and so maybe works by fluke.

Another thing to is that pointers to GCC local functions work via trampolines: pieces of executable machine code installed into the stack. When you use GCC functions, the linker has to mark the executable with a bit which says "allow stacks to be executable". The default in most distros is non-executable stacks, which guards against stack overflow exploits.

(Speaking of trampolines, I'm not sure about the effective scope of those. If we lift a pointer to a local function inside a block, requiring a trampoline, and then that block terminates, is that trampoline scoped to the block or the function? If it's scoped to the function, won't it be overwritten if we execute that logic multiple times? If the trampoline is scoped to the block, then the invocation of foo is using an out-of-scope trampoline.

ndesaulniers · on Sept 11, 2019

There are quite a few GNU C extensions with unspecified behavior for edge cases. Source: have implemented and debugged/fixed some in Clang.

kazinator · on Sept 11, 2019

By the way I compiled and ran the program (Ubuntu 18.04, x86_64 with various optimization options and whatnot, such as -fstack-protector. It runs cleanly under Valgrind.

iforgotpassword · on Sept 12, 2019

Valgrind is pretty bad at detecting stack corruption, or at least was a couple years ago. Did you try -fsanitize=address too?

cryptonector · on Sept 11, 2019

So, this doesn't work because the scope of the statement-expression is the scope of the local function, so to use the function outside that scope (as TFA shows) is UB.

C w/ GCC's local functions extensions is just not enough for lambda expressions. You have to declare the local function earlier than (and in scope of) the use site.

For example, an expression like this:

  float x = add_fns(1,
                    lambda(float,(float x),{ return 2*x; }),
                    lambda(float,(float x),{ return 3*x; }));

may well assign 6.0 to x rather than 5.0 because the first lambda gets overwritten on the stack with the second. That's if it works at all -- after all, we have UB here, and this could just summon cthulhu or anything else.

DSMan195276 · on Sept 11, 2019

There actually appears to be a `gcc` bug here, `gcc` doesn't warn if you return the address of a local function even though it's clearly bogus usage due to it being implemented via a trampoline on the stack.

Interesting note, some quick testing shows that if the local function doesn't require any variables from the outside scope, it will actually be stored in the `.text` segment, which would allow this to work in a defined way. That said, I view this is just an implementation detail that you can't rely on, as the docs don't mention this and only talk about trampolines. It's also super easy to mess up, obviously.

cryptonector · on Sept 11, 2019

Good points all around.

pjmlp · on Sept 11, 2019

Apparently the author forgot to look into clang blocks language extension.

https://clang.llvm.org/docs/BlockLanguageSpec.html

basementcat · on Sept 11, 2019

There are a variety of ways to use lambdas in C, each uniquely horrifying.

https://codegolf.stackexchange.com/questions/2203/tips-for-g...

eyegor · on Sept 12, 2019

Hmm, I wonder if you couldn't wrap all those horrible approaches in a unfified header macro with

  #ifdef __GNUC__, 
  #ifdef __clang__,
  etc.

saagarjha · on Sept 11, 2019

The "GCC specific" one mentioned is the same as the technique mentioned here.

mpfundstein · on Sept 11, 2019

I still wonder why C still does’t have lambas implemented by standard. I understand its a quite slow moving language but it would make programming in it mich nicer (see C++11)

Are there anh underlying ‘issues’ with lambdas, I wonder?

0x09 · on Sept 12, 2019

Apple did submit a proposal based on the then-new blocks extension in 2010: http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1451.pdf

There was an analysis of this and the C++11 lambda specification done shortly after at http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1483.htm, but it was inconclusive and there doesn't seem to have been any followup since then.

jcranmer · on Sept 11, 2019

> Are there anh underlying ‘issues’ with lambdas, I wonder?

A lambda object is intrinsically an object with an unnameable type and an overloaded call operator. C doesn't have any mechanisms for parameterizing function bodies over types, or even any mechanism for defining a variable without declaring its type (although one is suggested for C2x). Without such mechanisms, it's impossible to actually use a lambda.

saagarjha · on Sept 11, 2019

> A lambda object is intrinsically an object with an unnameable type and an overloaded call operator.

This sounds like a really restrictive, C++-centric way of defining lambdas…

jcranmer · on Sept 11, 2019

Not really. It's true for any language where types have statically-shaped memory layouts, and where names are statically bound. Lambdas need to bind their names to locations; if this is to be done statically, then the bound names need to packed into some reference type. The type of the bound name environment cannot be named, because it is dependent on what specifically the environment is naming. Languages that fall into this category would include C, C++, Java, C#, and Rust.

By contrast, languages such as Python or JavaScript that rely on dynamic name binding have implicit environment objects attached to their objects that allow different lambdas to share the same type, since the function body gets to acquire a map that it can ask for the bound names.

saagarjha · on Sept 11, 2019

> the bound names need to packed into some reference type

Exactly, and that's how we refer to them. We already do the same for arrays–it would be overly pedantic to refuse to call an address/size pair an array because it doesn't actually contain the elements it refers to, just as statically-typed languages with lambdas are actually just passing pointers around. C could do this too–we'd just need sugar that would convert lightweight lambda syntax into code somewhere that would have the same type as a function pointer.

jjtheblunt · on Sept 11, 2019

Perhaps because Go does this properly, where by properly I mean that you can have both stack and heap allocations of scope local variables, depending upon their needed availability for functions within the scope and dissociation , from the scope, of their lifetimes? In particular, the Go compiler performs escape analysis to determine where variables must be allocation, in honor of upward funargs.

wahern · on Sept 12, 2019

Go also has no way to catch memory allocation failure. Many things become easier when you can pretend memory is infinite, especially from a language design standpoint.

As C++ has been contemplating moving away from exceptions, they've effectively been forced to choose between concise language abstractions or strict memory management. They seem to be moving toward the former, which is to say C++ may soon begin to behave like Go, Perl, and other high-level languages--OOM will simply crash your application.

jcranmer · on Sept 12, 2019

> As C++ has been contemplating moving away from exceptions, they've effectively been forced to choose between concise language abstractions or strict memory management. They seem to be moving toward the former

You don't need exception handling to handle memory allocation failure: just make your allocator return null (that's what new (nothrow) X does, after all). The question of how to handle allocation failure is easily the single most divisive question presented in the proposal, and in the face of stark division, status quo usually wins the day.

Gibbon1 · on Sept 11, 2019

Having kicked this around I think two problems.

One: Compiler development is driven by the C++ standards committee. And they all hate C and wish it would die already. More to the point things you would do to make C a better more powerful language are orthogonal to the direction C++ is being pushed.

Two: Being tied to C++ also means being tied to the same ABI as C++. And improvements to the C language probably would need some extensions to the ABI.

Three: I can't wrap my head around this but a lot of people are extremely hostile to attempts to extend and improve C.

jcranmer · on Sept 11, 2019

> One: Compiler development is driven by the C++ standards committee. And they all hate C and wish it would die already.

The latter statement is not true. But it is true that most of the evolution of C/C++ is driven by the C++ committee, with the C committee mostly adapting features from C++ and very little innovation in C being adapted for C++. (As one C++ committee member confided to me, the C committee does have a bit of a tendency to completely screw things up when the C++ committee liaisons leave the room). But there is still coordination and cooperation between the committees--for example, the recent proposals to replace the current EH model in C++ includes a coordinating proposal to modify the C ABI to provide access to a Result-esque exception model.

> Two: Being tied to C++ also means being tied to the same ABI as C++. And improvements to the C language probably would need some extensions to the ABI.

The C ABI desperately needs extensions anyways, especially because it is the de facto platform ABI and languages usually only support FFI features using the C ABI. The biggest missing features here are SIMD vector support and multiple return value support.

Gibbon1 · on Sept 11, 2019

I apologize for the slight against the C++ standard people.

I do like your comment about the ABI needing to be extended to improve FFI features. I feel that way too. Also think that a clean (non clunky) method for FFI is exactly what C has needed for a long time.

klingonopera · on Sept 11, 2019

Regarding three, I believe it's because C ultimately aims to be the most low-level, high-level abstraction of machine code.

Stray too far from that, and you're already in C++ territory.

Except for multi-core/-threading support, I can't really think of anything that has changed in the past 25 years to add to this, in my opinion, essentially near-perfect language.

fao_ · on Sept 11, 2019

Every single SYSV x64 ABI platform supports at least two uint64_t return values, which are the registers rdi and rax (Actually it might be rbx, I haven't done assembly for a while). So C is behind of the curve.

C is considered to be "High level assembly", up until it isn't. To be honest, YASM and other assemblers do "high level assembly" much better.

C is a simple, flexible low-level language. There is a feeling about C that I do not get from other languages, a specific way of thinking and a trend towards simplicity of feature (Ignoring GNU) that other languages (including rust) do not encourage or notice. It would be nice to see it importing some of the type semantics of ML and OCaml.

saagarjha · on Sept 11, 2019

> Every single SYSV x64 ABI platform supports at least two uint64_t return values, which are the registers rdi and rax (Actually it might be rbx, I haven't done assembly for a while).

For integer returns, System-V uses rax/rdx.

fao_ · on Sept 12, 2019

Ah! Yes, of course :)

lgeorget · on Sept 11, 2019

> However, it seems like if it compiles it ought to work and — mostly — it does.

I'm taking this out of context of course but that looks like a very dangerous assumption to make...