Please explain in detail what will happen if the following program is executed:
#include <iostream>
int main() {
std::cout << "Hello, world!" << std::endl;
}
It will print out "Hello, world!".
Assuming a unix system, the program will write the string "Hello, world!\n"
to
the standard output stream, which is connected to file descriptor 1. Afterwards,
the stream is flushed.
The given text is not a program, but rather UTF-8 encoded C++ source code. After being turned into a program by a C++ compiler, it's impossible to tell what will happen after it is executed. One possibility would be that it receives a SIGABRT signal immediately after it started, in which case the effect would probably be the creation of a core dump in the current directory.
Lets analyze the program according to the C++ Draft Standard in version N4296.
Since <iostream>
is one of the 53 C++ standard library header listed
in §17.6.1.2/2 [headers], its contents will be made available to the translation
unit. (§17.6.2.2/1 [using.headers])
Including this header causes the effects of defining an instance
of std::ios_base::Init
with static storage duration. (§27.4.1/2 [iostream.objects.overview])
During or before construction of this instance, the object std::cout
of type
std::ostream
is constructed and associated with the object stdout
declared in
the <cstdio>
header. (§27.4.2/1 [narrow.stream.objects])
Next, we see the definition of a global function called main
returning an int
and taking no arguments. The program thus fulfills the requirements of
§3.6.1 [basic.start.main] and this function will be the designated start of
the program.
The body of the main function consists of an expression statement as in §6.2/1 [stmt.expr]. The expression inside that statement refers to three distinct entities by name:
-
The object
std::cout
of typestd::ostream
(27.4.2 [narrow.stream.objects]), -
The string literal
"Hello, world!"
(a static null-terminated byte string according to §17.5.2.1.4.1/3 Footnote 170), of typeconst char[14]
, and -
The function template
std::endl
with the signaturestd::basic_ostream<C,T>&(std::basic_ostream<C,T>&)
These are joined by two instances of the binary left-shift operator, which
groups left-to-right. Therefore, to determine what will happen, we first have to
look at the sub-expression std::cout << "Hello, world!"
, which is the left
shift operator with operands of type std::ostream
and const char[14]
.
Since at least one operand has class or enumeration type, overload resolution is used to determine which operator-function or built-in operator is invoked. (§13.3.1.2/2 [over.match.oper])
The set of candidate functions is constructed according to the rules detailed
in §13.3.1.2/3. They consist of the result of the qualified lookup of
std::ostream::operator<<
(§13.3.1.2/3.1), together with the result of the
unqualified lookup of operator<<
in the context of the expression. (§13.3.1.2/3.2)
Since the operands can't be converted to a pair of promoted integral types,
the requirement of clause §13.3.1.2/3.3.3 is not be fulfilled and there are no
built-in operator candidates.
The best match is the template function specialization
std::ostream& std::operator<< (std::ostream& out, char const*)
from §27.7.3.6.4 [ostream.inserters.character], which behaves like a formatted inserter of out. (§27.7.3.6.1 [ostream.formatted.reqmts])
Therefore, calling this function will begin by constructing an object of
class std::sentry
. (§27.7.3.4 [ostream.sentry]) If this object returns true
when converted to bool, the function will proceed to create a character
sequence seq
of 14 characters, each widened using out.widen(), to
insert seq
into out
, and to call width(0)
. (§27.7.3.6.4/3) Finally,
the sentry object is destroyed before leaving the function, and it
returns its first argument out
.
The same procedure is repeated for the next left shift operator, which has
a left operand of type std::ostream&
and a right operand that refers to
a template function of two arguments with the signature
template<class C, class T> std::basic_ostream<C,T>&(std::basic_ostream<C,T>&)
.
Here, the selected overload is
std::ostream::operator<<(std::ostream&(*f)(std::ostream& os))
from §27.7.3.6.3 [ostream.inserters]. This function returns f(*this)
,
and calling std::endl
has the effect of calling os.put(os.widen('\n'))
followed by os.flush()
. (§27.7.3.8/1 [ostream.manip])
Finally, control reaches the end of main without encountering a return
statement, which has has the effect of destroying any objects with automatic
storage duration and calling std::exit()
with the argument 0
. (§3.6.1/5)
It will introduce side-effects, so let's re-write it in a purely functional way.
I can tell you what the program does, but not what it should do, because it's lacking unit tests and documentation.
From the lack of any platform specific initialization code, we can infer that the program is intended to be run in a hosted as opposed to a free-standing environment.
Let's for simplicity assume we're on a standard GNU/Linux system on x86_64.
This means our process began life when a previously running process called
the exec()
syscall.
This means it had to store the syscall number 59 in register $rax
, the virtual
memory addresses of the file name, the argument array, and the environment array
in the registers $rdi
, $rsi
and $rdx
, and execute the SYSCALL
instruction.
This crosses the border from user space to the kernel by setting the instruction pointer
to the address stored in the IA32_LSTAR register, which was set up by the kernel to contain
the address of entry_SYSCALL_64
, the syscall entry function. (<linux>/arch/x86/kernel/cpu/common.c:syscall_init()
)
The kernel is now responsible for walking the file system to the given path,
and opening the file that was the argument to exec()
for reading. (<linux>/fs/exec.c:open_exec()
)
If the file exists, has the right permissions etc., the binary format of the
executable needs to be determined. To do this, the first BINPRM_BUF_SIZE
bytes
are loaded into memory (<linux>/fs/exec.c:prepare_binprm()
), and the list of
registered binfmt-handlers is walked to see if one of them recognizes the
format.
Probably the compiler will have transformed the program into an ELF file,
which can be recognized by the magic bytes "\x7fELF"
at the start of the file.
In this case, the loading will be performed by <linux>/fs/binfmt_elf.c:load_elf_binary()
,
where the elf header and the program header table are loaded into memory.
The first thing that is done is to look for a PT_INTERP
section, which contains the name
of the program interpreter, another ELF executable identified by a fixed path
on the file system, in our example "/lib64/ld-linux-x86-64.so.2"
. If there is
an interpreter, again the kernel needs to locate the correct file, check
permissions, etc.
After all checks are done and passed, the page table of the old process is
cleared, and a new mapping set up. All PT_LOAD
sections of the binary are mapped
into their respective places, and a memory region for the stack is allocated at
a random address. Then, the load sections of the interpreter, which is
position-independent, are mapped into private, write-protected pages at some
free part of the address space.
When the memory is set up, control is transferred back to user space, in particular to the entry point of the interpreter.
The interpreter reads the DT_NEEDED
tags of the binary to determine the shared
library dependencies, which will in our case consist of libstdc++.so.6
, libc.so.6
,
libm.so.6
, and libgcc_s.so.1
. The interpreter tries to locate each of these
libraries and map them into memory at a randomly chosen address. A list of
library load addresses is maintained in the static global struct _r_debug
. (/usr/include/link.h
)
However, unless the environment variable LD_BIND_NOW
is set to 1,
the function symbols will not be resolved right now but lazily on the first
call to the respective function.
After doing its thing, the dynamic loader passes control to the entry
point of the actual binary, which is the symbol _start
defined by glibc. (<glibc>/sysdeps/x86_64/start.S
)
This starting point will setup an initial stack frame,
compute the correct values for argc, argv and env from the information in
the auxiliary vector, and call the C runtime initialization
function __libc_start_main
. (<glibc>/csu/libc-start.c
)
This will run static initialization functions, in particular constructors of all static objects, and install atexit-handlers for static destruction functions (again, in particular destructors of static objects).
Inside main()
, the two functions
_ZSt4endlIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_
_ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc
defined in the shared library libstdc++.so.6
are called for the first time, so when
the program jumps to their PLT-slots, a symbol lookup will be triggered. (<glibc>/elf/dl-lookup.c
)
What these functions do is more or less up to the standard library implementors,
but ultimately the syscall write(1, p, 14)
will be issued, where the arguments
are the file descriptor 1, which is mapped to stdout, a pointer p containing
the address of the string "Hello, world!\n", and the number of bytes that should
be written.
Finally, the program returns the process signals to the operating system
that it is finished and all of its resources should be freed and cleaned up
by executing the system call exit_group()
, with the only argument being the
value returned by main()
which is 0.
A program must run on a CPU, and a CPU is made of metal. Information is transmitted through metal by letting electrons flow along local gradients, increasing the entropy of the system. All of these electrons, together with the atoms of the CPU, form a huge quantum system which will evolve according to its wave function. Therefore, we can't know what the program does until we measure it's outcome.
It will print out "Hello, world!".
Since <iostream> is one of the 53 C++ standard library header
You forgot to mark
<iostream>
as code, which means it's confusingly not displayed: