Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
How fast can you allocate a large block of memory in C++? (lemire.me)
77 points by ibobev on Jan 16, 2020 | hide | past | favorite | 89 comments


Any C++ programmer could write code that “allocates” 500MB in a few instructions if the pages already exist.

If the pages don’t already exist (as indicated by the given timings), this is a test of the OS and has little to do with the language.

It’s a poorly posed question. C++ runs on many environments.


If you need to allocate lots of memory, you should be using mmap(). If you need it faulted in, you use MAP_POPULATE. Try to use MAP_HUGETLB to economize on TLB cache entries; you may need to set "vm.nr_hugepages=16384" (or something) in /etc/sysctl.conf (or someplace) to reserve them.

If you are allocating lots of memory and don't know about hugepages, you badly need to learn.

If your program has threads, unmapping memory is generally to be avoided.


> If your program has threads, unmapping memory is generally to be avoided.

Could you expand on this? I recently encountered a situation where allocating and freeing many large allocations using mmap() seemed to eventually cause problems with thread creation, but I had assumed that it was probably because the virtual address space had become too fragmented, which of course would not solely be a result of unmapping. Or maybe that fragmentation is what you're referring to, and I'm just reading too much into that sentence.


It just creates long stalls as the TLBs of all cores that threads in the process run on are flushed and must be reloaded from backing RAM.


mmap is an operating system interface, or at the very least a part of POSIX. it isn't part of the C++ standard, and so you cannot rely on it being present on systems that don't claim extensive POSIX compatibility. As for MAP_POPULATE and MAP_HUGETLB, try using that on a POSIX-ish OS that isn't Linux and see how far you get.


What, are there any left?

Every OS nowadays has mmap, by some spelling, and they all support hugepages, likewise. A program that doesn't use its host OS doesn't, generally, do much.


Perhaps you've not come across the idea of cross-platform development?

I have a large fairly successful cross-platform program that does a hell of a lot. All of our interactions with the OS are wrapped, either by libstdc++ or by the portability library that we use.

We would never allow a direct call to mmap() from the main application code ... we don't even allow direct calls to most of the POSIX API since we have to run on Windows too (without WSL or whatever MS' current POSIX layer is).

"By some spelling" is precisely why almost all programmers are better off using "new X[N]" than mmap, unless you actually think that code like this is good:

#ifdef __APPLE__ #define LOCAL_HUGETLB 0 #else #define LOCAL_HUGETLB MMAP_HUGETLB #endif ...

mmap (..., foo|bar|LOCAL_HUGETLB)

This cannot be a serious suggestion regarding the recommended way to allocate memory.


Seems pretty easy and reasonable to write a wrapper function for mmap for all 3 OSes in common use...


Exactly. There's nothing new about portability shims. Really nothing, been using them since the '80s.


> Every OS nowadays has mmap, by some spelling

I mean, I'm fairly sure that's the problem. (Plus a lot of systems support C++ but not virtual memory…)


Just to take one (small) example, the Arduino IDE is C++, and you will not mmap() a lot there.


You also won't new[] a lot there. How much can it even physically address?


I've actually used an STL port to Arduino and was able to create a standard vector of length three but not four. that crashes the system.

I believe it uses a custom allocator.


Actually Arduino code does tend to use dynamic allocation for things like strings - it's a lot easier than the alternative, they tend to have quite a lot of memory these days (e.g. 1 MB) and hobby projects don't need to be super-reliable usually.


One megabyte or ten isn't enough to bother with mapping, and on Arduino you don't typically have memory mapping hardware anyway. "Big" means gigabytes these days, sometimes hundreds of them.


but often you might write code that you dont know it will end up there, like lib


Many c stdlib will fallback to mmap above a certain threshold :)


The OP should have added “Linux” somewhere in the title.

Here on Windows, the OS doesn’t overcommit and the first new[] call actually allocates these pages (possibly in a page file).

Also, if the requested size is large enough (>512kb on Win32, >1MB on Win64), the OS guarantees zero initialization despite C++ doesn’t: https://docs.microsoft.com/en-us/windows/win32/api/memoryapi...


Even on Windows the pagefile space is taken but it doesn't have to write to disk right? They can zero on first read.


Correct me if I'm wrong, but won't the speed depends on the compiler, the operating system, the hardware...

There is no particular reason you can't allocate 500GB in 1 cycle. Just get rid of all memory management.

It seems silly to answer a question about C++ in operations per second, besides.


> If you actually want to measure the memory allocation in C++, then you need to ask the system to give you s bytes of allocated and initialized memory. You can achieve the desired result in C++ by adding parentheses after the call to the new operator

Is this an actual thing? You can force reification of overcommitted memory just by calling operator() on it?


It's not `operator()`, it's value initialization. If you're particularly masochistic you can read more about it at https://en.cppreference.com/w/cpp/language/value_initializat....


I don’t think that guarantees the memory gets committed. The compiler/standard library combination is allowed to know how the OS behaves, so if it, say, initializes new pages with 0x00, it can use that information to skip the initialization of that memory.


Ah, that makes more sense. Thanks.


The example given:

    char *buf = new char[s]();
Put into Compiler Explorer:

    https://gcc.godbolt.org/z/QAs9gz
However, doing this is a very strong code smell. At the very least, using `new` and assigning to a raw pointer is a sign that the C++ developer is managing memory manually and is likely to hit a lot of problems including memory leaks or segmentation faults. Also many would forget that this is calling `operator new[]()`, not `operator new()` and might confuse with placement-new `operator new(...)` or `operator new[](...)`. And the developer might also forget that `new` could throw an exception... [0]

The developer should instead be using, at a minimum, a `std::unique_ptr<char[]>` [1]. Or, IMO, a `std::vector<char>` which reminds the developer not only of the pointer but also of the count of bytes which have been allocated (.capacity()) and also of the valid range which has been initialized (.size()).

IMO, if the developer wanted a pointer to a byte array then it's a lot easier to use `malloc()` than trying to remember all the different ways you can get screwed by `operator new`:

    std::unique_ptr<char, std::function<void(char*)>> m{
        (char*)std::malloc(count), std::free
    };

[0]: https://en.cppreference.com/w/cpp/language/new

[1]: https://en.cppreference.com/w/cpp/memory/unique_ptr


Eh I'd say using malloc in place of new is more code smell than anything here. If you want a byte array make a byte array

    std::make_unique<std:::array<char, SIZE>>()
If you don't know how big you want the byte array make a vector.

    std::vector<char>(size) 
It's baffling to me how many C++ developers don't use the standard collections. The only one that really sucks is unordered_map.


Small vectors are also bad. There should be some kind of small vector optimization, just like what is common for std::string.


vector<bool> also sucks


That's fair. But in my experience that's rather rare and easy to work around.


> using `new` and assigning to a raw pointer is a sign that the C++ developer is managing memory manually and is likely to hit a lot of problems including memory leaks

There is no need to badmouth all c++ developers. Plenty of us can keep our memory straight just fine. just because you don't like pointers, does not mean the rest of us can't use them perfectly safely.


The rest of us isn't that big number according to Microsoft and Google security reports.

I surely have my share of C++ pointer mistakes done since I got hold of C++ARM and Turbo C++, after years of Assembly and Turbo Pascal programming.

Not only C++, but any other language with manual memory management, because keeping track of where everything is going manually, just doesn't scale.


I think the evidence is pretty clear that programmers can't, in general, use bare pointers properly. The very best write use after free, double free, stack smashers, and all manner of other memory related bugs. We can see the evidence of this fact in the CVEs, the syzkaller bugs, etc. If you haven't been bitten by a serious instance of one of these, then you just haven't written very much C++.

IMO the debate about whether programmers can safely handle bare pointers is over. They can't. The only question is whether smart pointers help enough to make the extra line noise worth it.


And the guys that implement smart pointers are what, exactly? Uberprogrammers? Is there a certificate to get into that club? What about compiler authors, that deal with, gasp, optimizations of these bare pointers? Kernel developers? Embedded system engineers?

It is ok to stay above a certain level of abstractions consciously, but stating that one cannot go below safely is just steadying mediocrity.


> And the guys that implement smart pointers are what, exactly? Uberprogrammers? Is there a certificate to get into that club? What about compiler authors, that deal with, gasp, optimizations of these bare pointers? Kernel developers? Embedded system engineers?

One cannot do any of those things safely by hand. Compilers, kernels, and embedded systems do indeed do these things; they also have bugs.


You are contradicting yourself. If one cannot do these things safely by hand, then noone can implement the automatic abstractions either.


You cannot safely control an internal combustion engine's valve timings by hand, but it is clearly possible to implement this automatically (and fairly simply). Some things are much easier for machines to do than humans, even though the machines are designed and build by humans.


> Is there a certificate to get into that club?

There is and the large majority fails at it, as proven by the CVE database, in spite the rigorous process to contribute code to the Linux kernel for example.


I call bullshit. And arrogant bullshit at that. The wast majority of bare metal components is highly functional and relatively bug free.


You can call whatever you like, CVE database, yearly security reports, and OS release notes from all major OSes provide the necessary facts.

Sadly liability still isn't a thing on software development.


That's not the point. Smart pointer implementations aren't some dark magic, they're just the opposite: mostly fairly simple to implement (shared_ptr is a little trickier), and obviously right by construction: the compiler does the hard work of making sure the constructors and destructors get called appropriately (and this is a little harder but still far far easier than making sure that malloc() and free() match in all actual uses).


> just because you don't like pointers, does not mean the rest of us can't use them perfectly safely.

I actually use pointers quite safely. I just no longer see a need to ever return an allocation from `new` to a raw pointer unless I'm implementing my own pointer class.

Herb Sutter's correct [0]. `std::unique_ptr` or `std::shared_ptr` or some other pointer container should __always__ hold new-allocated objects. Just like you should __always__ wear your seatbelt. It's no danger to you or anyone else, it's slightly inconvenient, and it saves a metric ton of headaches about "what if?". Because in reality a pointer container explicitly marks the scope of the allocation and if you're not wanting to use C++'s scoping rules then why are you using C++?

[0] https://www.youtube.com/watch?v=JfmTagWcqoE


The statement you quoted is simply a fact, not badmouthing. If there is any badmouthing by implication, it wouldn't be "all" C++ developers, just those who don't realize they're doing something spooky when their code looks like Foo *foo = new Foo.


This is one of the best troll posts that is getting taken seriously ever... wow.


The std::function<void(char*)> is the tell :-)


Why do you think that marks a troll? Have you never had to use a C library which wants you to use the library's structure allocator / deallocator? In that case, this translates very well to malloc/dealloc.

@unlinked_dll has it correct: the C++ way to allocate an array of chars is `std::vector<char>` (or `std::array<char, size>`).


It's a bit bloaty and verbose as well, you could just use void(&)(void *).


Doesn't that break when std::free does not have the default C++ calling convention? I think `decltype(std::free)` would be easier to read anyway, but writing a deleter class that calls std::free in its operator() has the advantage that it will not use any memory in the unique_ptr thanks to the empty base class optimization.


I don't think that's possible, and in that case the std::function wouldn't work either.


I was thinking about calling conventions because std::free can use extern "C", but the function pointer declaration in C++ code points to an extern "C++" function, and the standard says these are two distinctive types. Are they guaranteed to be compatible - since they're only linkage specifiers, not calling conventions? I'm not sure.

std::function would still work in this case because pointers to C functions have a normal operator().


Oh, right, it would work. Anyway, it looks like compilers have a known bug and they’re supposed to treat the types as incompatible.


Sometimes your just need a buffer for a syscall. Newing a buffer is canonical and unique_ptr isn’t free enough to use in the most performance sensitive contexts.


Can you give an example of unique_ptr being too heavy?


I worked with a team which used raw pointers instead of unique_ptr as they use more memory (twice as much, I think). The core data structure was a tree, and the tree was several hundred GB in size. So they were doing whatever they could to shave off memory.


Unique pointers imply ownership of the underlying memory. If you have multiple references to the same data, or in your case a tree, it might make sense to use raw pointers (or references) to access the tree data. It would not make sense to grant ownership of the same memory to two unique_ptr, or make copies of data unnecessarily.


In that case, you'd have something like `std::vector` or `std::unique_ptr` which owns the actual allocation (and governs the scoping and therefore deallocation), while using `vector.data()` or `unique_ptr.get()` to pass around an unowned pointer. All raw pointers can then be considered to be unowned by convention.


Without a custom deleter unique_ptr should have zero overhead.


unique_ptr does not have any memory overhead. (unless you use a custom deleter, but even then you could use one which does not use memory)


I gave up on C++ in part because the language changes too much.

I paid for lessons on it. The std:: stuff didn't exist, templates were a recent invention, and people were starting to agree that putting the overload keyword everywhere was bad style. Mostly, people were just excited about // comments and cout.

I keep hearing about how C++ is supposed to be done, and it seems to change every year. I don't know what people do with the older code as it becomes unfashionable with "very strong code smell". Updating it is risky busywork, kind of like porting to Python 3, with potential for serious bugs. Leaving it in place will make developers differently unhappy.

It isn't easy getting everybody on a team to agree on what subset of C++ is OK to use. People will sneak in their must-have feature. You'd have better luck getting agreement between emacs and vi.


C++ makes no pretense of ever planning to be "done". "Done" is another word for dead. Even COBOL is still evolving.

Failing to keep up with evolving languages is called stagnation. Everyone is free to stagnate, but I do not advise it.

You use the subset of the language supported by the compiler you have. Code using newer features is better because the new features were added for sound engineering reasons, not just to be different.


It's the opposite of sound engineering. Every version of C++ contains a new feature that does 11% of the proper solution because the feature in the last version only did 9% of the proper solution. Despite all that churn the language is still as far behind ML as ever.


I see that you have not been keeping up.


What has the parent comment not keeping up with? Was it a reference to the ‘11%’ or to C++ being ‘behind ML’? Just curious to see what your objection was.


11%


If you think C++ changes too much, you're going to love today's Java/Typescript flavor-of-the-month web UI toolkit.


Yes it is hard to keep up, but so it is like any other language with mainstream usage.

Languages either evolve to fulfill market needs or they die.

Even C17 has plenty of differences down to K&R, not counting all the compiler specific extensions.


C has changed much less over the years. Arguably C89 was the most important change, C99 brought a bunch of small quality of life improvements (which you might or might not need, arguably the most important are stdint and snprintf), C11 got us a threading model, C17 is a bugfix.. After C89, no new version of the standard really changed the way you write idiomatic C without code smells.


Fact is it did change, it even introduced breaking changes, like gets() and Annex K removal on C11.

And you are missing the compiler extensions that I also mentioned.

It did not change even more, because nowadays C is left for UNIX clones and embedded development, having been superseded by other languages on most corporations, even its major compilers have been rewritten in C++.


I don’t know if parent is ‘missing the compiler extensions [you] mentioned’ or is just not addressing them. But I will take a stab: compiler extensions may be neat/effective/revolutionary, but they are unequivocally not a part of a programming language. That is why they are compiler specific extensions. I could code a compiler extension for LLVM that is a conservative, generational garbage collector and have the extension flag be —-no-more-manual-memory, but no one is then going to say that because that extension exist C++ is a garbage collected language.

tl;dr -> compiler extensions are by definition not part of a language, no matter how useful, and therefore do not count towards the argument.


C++ supports garbage collection since ISO C++11.

http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2008/n267...


C++ changes too much???

It's the weirdest thing I've read today.


Perhaps I can give you something weirder to read.

The other big complaint I have about C++ is that it is too high-level. It isn't very good for low-level control. :-)

Writing start-up code and linker scripts to support C++ code running without the libraries is not easy. All sorts of things, such as tables of constructors, need a solution. Typically that would involve a lot of assembly code. I'd need to disable lots of C++ functionality unless I wanted to write complicated things like a stack unwinder for exceptions.

It's just easier to use plain old C. The result is smaller too, which is important.


And someone said I was trolling hahaha


It's weird feeling surrounded by web programmers who think Python and Javascript are the lowest you'd ever want to go.

C++ is annoyingly high-level if you are trying to write code to go in a flash chip. The code starts running at a specific address at soon as the CPU starts. There is no OS unless your code implements one. There is no C++ library unless you port one.


That's a fair point and it's a project I'd like to tackle someday in spare or paid time.

I don't think C++ is annoyingly high-level in that case: just annoyingly lacking in the ecosystem to support it.


Very minor nitpick, but IMHO, this article creates a little unnecessary confusion by insisting upon a particular meaning of the word "allocate" without being very clear it's doing that.

That is, it uses "allocate" to mean "make ready for immediate use with no further (lazy) processing". Another perfectly reasonable definition would be "guarantee to be possible to use".

The distinction would arise on, for example, a system which doesn't over-commit memory but also doesn't fault pages in upon allocation. On such a system, allocation might give you a rock solid guarantee that you can write to (and read from) that memory but it wouldn't give you any guarantee about how fast or slow that would happen on initial access.

Personally, I prefer the second definition, but that's not really the point. The point is to be clear and avoid confusion.


This is a completely ridiculous article. The whole concept is flawed.

New/delete like this are for smallish general purpose allocation, for example of objects, where we want to keep the code at a fairly high level in C++ and not think about low-level concepts like "bytes" or allocation mechanics. Or performance.

Conversely, an app written in C++ that needs huge swaths memory for manipulating raw bytes and needs high-performance would not likely use new char[] or calloc/malloc directly at all. It would directly interface with the OS via mmap or indirectly via some domain relevant library, for example, OpenCV.

We might even still exclusively use the C++ language to write the low-level portions of the code, but if you need to interface with the OS in specific controlled ways or write performance oriented code, you are not going to do it using only high-level C++ operations. You are going to call exactly what you need to interface with the system directly. If you want mmap(), you'd just call mmap() from C++. If you want sbrk(), you'd call sbrk(). If you don't like the system new/delete and malloc/free, you could use something like dlmalloc and and even remap new/delete to it, or to some custom slab allocator.

Secondly, as pointed out by others, the author isn't benchmarking C++. He's benchmarking glibc malloc and the Linux mmap. We'd expect a C program using malloc/calloc to have exactly the same timings.

Thirdly, fallacious reasoning about initialization. If your app needs to allocate (for some ??? reason) 32 GB of memory, you would NOT automatically zero it first, unless you actually needed it to be zero, or you wanted your app to waste a bunch of time. Unnecessary zeroing huge arrays is not required for good security. We're mostly benchmarking memset here, not even malloc/mmap. So it's completely apples/oranges to compare mmap benchmark with a zerofill benchmark. I almost expect a follow-up article that points out accessing x[i] is much faster when x is a raw array of integers than when x is a std::map of strings.

Now read the footnotes. The author is misunderstanding what "idomatic C++" means. RAII is the best answer I can come up with for idiomatic - C++ doesn't have a built-in guard concept so we creatively mis-use constructors and destructors to get a similar result.

Footnote 2 proves the author knows bupkis about how C++ or any part of the system actually works. He's semi-admitting as much.

I'd hope for much better from a CS professor. Stick to benchmarking this:

  for(i=0; i<1E9; ++i);


> the system may then decide

This makes me think if the whole question is really about C++ or the OS implementation for memory allocations. This could have little to do with C++.

On a bare-metal system, malloc/sbrk (and thus new) can be something as simple as:

  uint32_t retptr = heap;
  heap += size;
  return retptr;


Anonymous mmap() zeros memory so why is new wasting its cycles doing so too? I doubt any modern c++ standard library is using brk/sbrk I this day and age.


The thing is, tested C++ standard library implementation cannot assume a specific platform or implementation of libc. It's too portable.

Yes, you could add some platform and type specific speedups, it ends up a big mess.

In C++20 you get uninitialized allocation functions which do not have to initialize.


By definition the libraries present a portable interface to underlying system resources and are full of platform specific code (I was a Cygnus founder so spent plenty of time deep in these issues at a time when there was more diversity). All posix is systems I know (e.g. Linux, BSD, Solaris, etc) return pages of zeros from anonymous mmap. I consider it a bug if they post process in this case.


most std libs rely on malloc lib underneath. so i am not sure what's being tested here. Especially that the standard allocator is being used.


Anyone want to take a stab at how malloc actually works? What’s under the hood? What data structures and algorithms does it use?


In a comment the blog author says he's using Ubuntu and glibc malloc. And glibc malloc has a threshold (128 kB or thereabouts) where it switches to using mmap(). So in practice this benchmark is a test of how the Linux kernel implements the mmap() syscall, and how the compiler implements the zeroing loop.


> And glibc malloc has a threshold (128 kB or thereabouts) where it switches to using mmap()

Very similar on Windows. In modern CRT, malloc is a thin wrapper over HeapAlloc WinAPI. That one has a threshold (512kb or 1MB depending on 32- or 64-bit process) where it switches to VirtualAlloc.


K&R actually contains an example malloc()/free() implementation using a "freelist" - i.e. a list of spare memory blocks held as a linked list in the "unallocated" (or de-allocated) blocks. Memory is retrieved from the OS using the ugly sbrk() interface.



What is the point in initializing memory here?


It ensures that they actually exist, because otherwise the OS will just overcommit them.


On a barebones system, 0 cpu cycles.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: