In the realm of death by a thousand cuts, please stop using std::endl
and remove it from code.
Per cppreference.com, std::endl
"inserts a newline character into the output sequence... and flushes it". Okay, so why exactly shouldn't it be used???
There have been numerous articles written about endl
that echo the sentiment against using it. Here's some:
- Don't use std::endl: https://accu.org/journals/overload/27/149/sharpe_2619/
- "std::endl" vs "\n": https://stackoverflow.com/questions/213907/stdendl-vs-n
- C++ Weekly With Jason Turner: Stop Using std::endl: https://www.youtube.com/watch?v=GMqQOEZYVJQ
- C++ Core Guidelines SL.io.50: Avoid endl.
They highlight the following:
- Flushing the stream slows performance and usually isn't intended nor necessary.
- Using the newline character,
\n
, is just as cross platform compatible for stream output as usingendl
, but it doesn't force flush the stream.
Agreeably, for environments where speed isn't a concern, these points may not seem that big of a deal. There are additional less-mentioned concerns that I'd like to point out however.
Before flushing the stream, C++ standard, implementation, and compiler dependent, endl
also:
- Executes code to possibly widen the newline (using
std::basic_ios<CharT,Traits>::widen
). This code:- Gets the current locale associated with the stream (
std::ios_base::getloc
). - Uses this locale's "facet" (via a call to
std::use_facet
) for the character type of the stream to do the possible widening. For details, see the source code foruse_facet
for LLVM, or Microsoft standard C++ libraries. This is turn:- Checks that
std::has_facet<Facet>(loc)
is true for the locale and the facet identified for the stream. - Throws
std::bad_cast
if this check is false. - Returns the available identified facet otherwise.
- Checks that
- Gets the current locale associated with the stream (
- Executes code to output the possibly widened newline to the output stream (through a call to
std::basic_ostream<CharT,Traits>::put
).
So using endl
comes with more baggage than just force flushing the stream; none of which most programmers using endl
that I've spoken with seem aware of.
Not to mention:
endl
arguably doesn't express intent as well as alternatives.endl
's use appears to be everywhere. If we don't put more effort into getting rid it, it's more likely to be perpetuated especially by newer programers to even more code where its use is a big deal.
All this can result in things like unintended behavior, surprising performance loss, and unnecessarily enlarged executables. At the time of writing this, Wikipedia redirected the disambiguation of "death by a thousand cuts" for psychology to Creeping normality. That seems spot on for endl
unless we do more to curtail its use.
Instead of using std::endl
, just use \n
:
void someFunction(std::ostream &os)
{
os << "Hello world!\n";
}
Or, if you really do need to force flush the output stream, be explicit about it and use the std::flush
I/O manipulator:
void someFunction(std::ostream &os)
{
os << "Hello world!\n" << std::flush;
}
Take the following code for example:
#include <iostream>
#include <ostream>
void doEndl()
{
std::cout << "Hello world!" << std::endl;
}
void doNewline()
{
std::cout << "Hello world!\n";
}
Compare the resulting assembly of the two functions for yourself.
For the record, here's what Compiler Explorer shows gcc 12.1 generates for the x86-64 target just for the doEndl
function:
.LC0:
.string "Hello world!"
doEndl():
push rbx
mov edx, 12
mov esi, OFFSET FLAT:.LC0
mov edi, OFFSET FLAT:std::cout
call std::basic_ostream<char, std::char_traits<char> >& std::__ostream_insert<char, std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*, long)
mov rax, QWORD PTR std::cout[rip]
mov rax, QWORD PTR [rax-24]
mov rbx, QWORD PTR std::cout[rax+240]
test rbx, rbx
je .L10
cmp BYTE PTR [rbx+56], 0
je .L5
movsx esi, BYTE PTR [rbx+67]
.L6:
mov edi, OFFSET FLAT:std::cout
call std::basic_ostream<char, std::char_traits<char> >::put(char)
pop rbx
mov rdi, rax
jmp std::basic_ostream<char, std::char_traits<char> >::flush()
.L5:
mov rdi, rbx
call std::ctype<char>::_M_widen_init() const
mov rax, QWORD PTR [rbx]
mov esi, 10
mov rax, QWORD PTR [rax+48]
cmp rax, OFFSET FLAT:_ZNKSt5ctypeIcE8do_widenEc
je .L6
mov rdi, rbx
call rax
movsx esi, al
jmp .L6
.L10:
call std::__throw_bad_cast()
_GLOBAL__sub_I_doEndl():
sub rsp, 8
mov edi, OFFSET FLAT:_ZStL8__ioinit
call std::ios_base::Init::Init() [complete object constructor]
mov edx, OFFSET FLAT:__dso_handle
mov esi, OFFSET FLAT:_ZStL8__ioinit
mov edi, OFFSET FLAT:_ZNSt8ios_base4InitD1Ev
add rsp, 8
jmp __cxa_atexit
Meanwhile, here's what Compiler Explorer shows gcc 12.1 generates for the x86-64 target just for the doNewline
function:
.LC0:
.string "Hello world!\n"
doNewline():
mov edx, 13
mov esi, OFFSET FLAT:.LC0
mov edi, OFFSET FLAT:_ZSt4cout
jmp std::basic_ostream<char, std::char_traits<char> >& std::__ostream_insert<char, std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*, long)
_GLOBAL__sub_I_doNewline():
sub rsp, 8
mov edi, OFFSET FLAT:_ZStL8__ioinit
call std::ios_base::Init::Init() [complete object constructor]
mov edx, OFFSET FLAT:__dso_handle
mov esi, OFFSET FLAT:_ZStL8__ioinit
mov edi, OFFSET FLAT:_ZNSt8ios_base4InitD1Ev
add rsp, 8
jmp __cxa_atexit
That's just sixteen lines of assembly for doNewline
compared to forty-five for doEndl
!
Would every occurrence of std::endl in a compilation unit (perhaps 50) and or linked object create that many assembly instructions? In other words 50*45? Or just 45 one time? Would an optimizer remove most or all of it if using standard ASCII on linux or windows?