In the realm of death by a thousand cuts, please stop using std::endl
and remove it from code.
Per cppreference.com, std::endl
"inserts a newline character into the output sequence... and flushes it". Okay, so why exactly shouldn't it be used???
There have been numerous articles written about endl
that echo the sentiment against using it. Here's some:
- Don't use std::endl: https://accu.org/journals/overload/27/149/sharpe_2619/
- "std::endl" vs "\n": https://stackoverflow.com/questions/213907/stdendl-vs-n
- C++ Weekly With Jason Turner: Stop Using std::endl: https://www.youtube.com/watch?v=GMqQOEZYVJQ
- C++ Core Guidelines SL.io.50: Avoid endl.
They highlight the following:
- Flushing the stream slows performance and usually isn't intended nor necessary.
- Using the newline character,
\n
, is just as cross platform compatible for stream output as usingendl
, but it doesn't force flush the stream.
Agreeably, for environments where speed isn't a concern, these points may not seem that big of a deal. There are additional less-mentioned concerns that I'd like to point out however.
Before flushing the stream, C++ standard, implementation, and compiler dependent, endl
also:
- Executes code to possibly widen the newline (using
std::basic_ios<CharT,Traits>::widen
). This code:- Gets the current locale associated with the stream (
std::ios_base::getloc
). - Uses this locale's "facet" (via a call to
std::use_facet
) for the character type of the stream to do the possible widening. For details, see the source code foruse_facet
for LLVM, or Microsoft standard C++ libraries. This is turn:- Checks that
std::has_facet<Facet>(loc)
is true for the locale and the facet identified for the stream. - Throws
std::bad_cast
if this check is false. - Returns the available identified facet otherwise.
- Checks that
- Gets the current locale associated with the stream (
- Executes code to output the possibly widened newline to the output stream (through a call to
std::basic_ostream<CharT,Traits>::put
).
So using endl
comes with more baggage than just force flushing the stream; none of which most programmers using endl
that I've spoken with seem aware of.
Not to mention:
endl
arguably doesn't express intent as well as alternatives.endl
's use appears to be everywhere. If we don't put more effort into getting rid it, it's more likely to be perpetuated especially by newer programers to even more code where its use is a big deal.
All this can result in things like unintended behavior, surprising performance loss, and unnecessarily enlarged executables. At the time of writing this, Wikipedia redirected the disambiguation of "death by a thousand cuts" for psychology to Creeping normality. That seems spot on for endl
unless we do more to curtail its use.
Instead of using std::endl
, just use \n
:
void someFunction(std::ostream &os)
{
os << "Hello world!\n";
}
Or, if you really do need to force flush the output stream, be explicit about it and use the std::flush
I/O manipulator:
void someFunction(std::ostream &os)
{
os << "Hello world!\n" << std::flush;
}
Take the following code for example:
#include <iostream>
#include <ostream>
void doEndl()
{
std::cout << "Hello world!" << std::endl;
}
void doNewline()
{
std::cout << "Hello world!\n";
}
Compare the resulting assembly of the two functions for yourself.
For the record, here's what Compiler Explorer shows gcc 12.1 generates for the x86-64 target just for the doEndl
function:
.LC0:
.string "Hello world!"
doEndl():
push rbx
mov edx, 12
mov esi, OFFSET FLAT:.LC0
mov edi, OFFSET FLAT:std::cout
call std::basic_ostream<char, std::char_traits<char> >& std::__ostream_insert<char, std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*, long)
mov rax, QWORD PTR std::cout[rip]
mov rax, QWORD PTR [rax-24]
mov rbx, QWORD PTR std::cout[rax+240]
test rbx, rbx
je .L10
cmp BYTE PTR [rbx+56], 0
je .L5
movsx esi, BYTE PTR [rbx+67]
.L6:
mov edi, OFFSET FLAT:std::cout
call std::basic_ostream<char, std::char_traits<char> >::put(char)
pop rbx
mov rdi, rax
jmp std::basic_ostream<char, std::char_traits<char> >::flush()
.L5:
mov rdi, rbx
call std::ctype<char>::_M_widen_init() const
mov rax, QWORD PTR [rbx]
mov esi, 10
mov rax, QWORD PTR [rax+48]
cmp rax, OFFSET FLAT:_ZNKSt5ctypeIcE8do_widenEc
je .L6
mov rdi, rbx
call rax
movsx esi, al
jmp .L6
.L10:
call std::__throw_bad_cast()
_GLOBAL__sub_I_doEndl():
sub rsp, 8
mov edi, OFFSET FLAT:_ZStL8__ioinit
call std::ios_base::Init::Init() [complete object constructor]
mov edx, OFFSET FLAT:__dso_handle
mov esi, OFFSET FLAT:_ZStL8__ioinit
mov edi, OFFSET FLAT:_ZNSt8ios_base4InitD1Ev
add rsp, 8
jmp __cxa_atexit
Meanwhile, here's what Compiler Explorer shows gcc 12.1 generates for the x86-64 target just for the doNewline
function:
.LC0:
.string "Hello world!\n"
doNewline():
mov edx, 13
mov esi, OFFSET FLAT:.LC0
mov edi, OFFSET FLAT:_ZSt4cout
jmp std::basic_ostream<char, std::char_traits<char> >& std::__ostream_insert<char, std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*, long)
_GLOBAL__sub_I_doNewline():
sub rsp, 8
mov edi, OFFSET FLAT:_ZStL8__ioinit
call std::ios_base::Init::Init() [complete object constructor]
mov edx, OFFSET FLAT:__dso_handle
mov esi, OFFSET FLAT:_ZStL8__ioinit
mov edi, OFFSET FLAT:_ZNSt8ios_base4InitD1Ev
add rsp, 8
jmp __cxa_atexit
That's just sixteen lines of assembly for doNewline
compared to forty-five for doEndl
!
@grantrostig I like the questions you're asking! Biggest reason I like them: C++'s as-if rule.
In short, my interpretation of this rule is that basically a standards conforming compiler is at liberty to do what it wants to with only a few limits. This really excites me about C++ and gives rise to possibilities like zero overhead abstractions such as boost's units library. And applied to your questions, the answer is: it depends.
It depends on things like:
I know all this, yet I admittedly guessed compilers like gcc and clang would avoid emitting assembly that exhibited an N copies of M lines of code pattern in favor calling a function of M, N times. I was wrong! I suspect they're using their inlining logic for this that's like: inline until believed resulting binary would be slower. Note that I have
-O3
enabled. Changing to usingO0
, I see less inlining and more function calling. Which was more like I was expecting. Note also that this is just looking at the lines of assembly code and using that as a gauge of how large a resulting binary output file would be. That's more reasonable for conventional CPUs than less conventional ones which may add a whole extra layer of complexity to the picture.For more specifics, try with the compiler of your preference, with the compiler options of your choosing, and check the size of the resulting binary executable output file.