sugar700/cout-difference.md

Created February 23, 2018 11:10

Star (0) You must be signed in to star a gist
Fork (0) You must be signed in to fork a gist

Learn more about clone URLs
Clone this repository at <script src="https://gist.github.com/sugar700/f04083f3a4a4f580fff90497e3c5b84f.js"></script>
Save sugar700/f04083f3a4a4f580fff90497e3c5b84f to your computer and use it in GitHub Desktop.

Raw

I'm surprised that everyone in this question claims that std::cout is way better than printf, even if the question just asked for differences. Now, there is a difference - std::cout is C++, and printf is C (however, you can use it in C++, just like almost anything else from C). Now, I'll be honest here; both printf and std::cout have their advantages.

Disclaimer: I'm more experienced with C than C++, so if there is a problem with my answer, feel free to edit or comment.

Real differences

Extensibility

std::cout is extensible. I know that people will say that printf is extensible too, but such extension is not mentioned in the C standard (so you would have to use non-standard features - but not even common non-standard feature exists), and such extensions are one letter (so it's easy to conflict with an already-existing format).

Unlike printf, std::cout depends completely on operator overloading, so there is no issue with custom formats - all you do is define a subroutine taking std::ostream as the first argument and your type as second. As such, there are no namespace problems - as long you have a class (which isn't limited to one character), you can have working std::ostream overloading for it.

However, I doubt that many people would want to extend ostream (to be honest, I rarely saw such extensions, even if they are easy to make). However, it's here if you need it.

Syntax

As it could be easily noticed, both printf and std::cout use different syntax. printf uses standard function syntax using pattern string and variable-length argument lists. Actually, printf is a reason why C has them - printf formats are too complex to be usable without them. However, std::cout uses a different API - the operator << API that returns itself.

Generally, that means the C version will be shorter, but in most cases it won't matter. The difference is noticeable when you print many arguments. If you have to write something like Error 2: File not found., assuming error number, and its description is placeholder, the code would look like this. Both examples work identically (well, sort of, std::endl actually flushes the buffer).

printf("Error %d: %s.\n", id, errors[id]);
std::cout << "Error " << id << ": " << errors[id] << "." << std::endl;

While this doesn't appear too crazy (it's just two times longer), things get more crazy when you actually format arguments, instead of just printing them. For example, printing of something like 0x0424 is just crazy. This is caused by std::cout mixing state and actual values. I never saw a language where something like std::setfill would be a type (other than C++, of course). printf clearly separates arguments and actual type. I really would prefer to maintain the printf version of it (even if it looks kind of cryptic) compared to iostream version of it (as it contains too much noise).

printf("0x%04x\n", 0x424);
std::cout << "0x" << std::hex << std::setfill('0') << std::setw(4) << 0x424 << std::endl;

Translation

This is where the real advantage of printf lies. The printf format string is well... a string. That makes it really easy to translate, compared to operator << abuse of iostream. Assuming that the gettext() function translates, and you want to show Error 2: File not found., the code to get translation of the previously shown format string would look like this:

printf(gettext("Error %d: %s.\n"), id, errors[id]);

Now, let's assume that we translate to Fictionish, where the error number is after the description. The translated string would look like %2$s oru %1$d.\n. Now, how to do it in C++? Well, I have no idea. I guess you can make fake iostream which constructs printf that you can pass to gettext, or something, for purposes of translation. Of course, $ is not C standard, but it's so common that it's safe to use in my opinion.

Not having to remember/look-up specific integer type syntax

C has lots of integer types, and so does C++. std::cout handles all types for you, while printf requires specific syntax depending on an integer type (there are non-integer types, but the only non-integer type you will use in practice with printf is const char * (C string, can be obtained using to_c method of std::string)). For instance, to print size_t, you need to use %zd, while int64_t will require using %"PRIu64"d. The tables are available at http://en.cppreference.com/w/cpp/io/c/fprintf and http://en.cppreference.com/w/cpp/types/integer.

You can't print the NUL byte, `\0`

Because printf uses C strings as opposed to C++ strings, it cannot print NUL byte without specific tricks. In certain cases it's possible to use %c with '\0' as an argument, although that's clearly a hack.

Differences nobody cares about

Performance

Update: It turns out that iostream is so slow that it's usually slower than your hard drive (if you redirect your program to file). Disabling synchronization with stdio may help, if you need to output lots of data. If the performance is a real concern (as opposed to writing several lines to STDOUT), just use printf.

Everyone thinks that they care about performance, but nobody bothers to measure it. My answer is that I/O is bottleneck anyway, no matter if you use printf or iostream. I think that printf could be faster from a quick look into assembly (compiled with clang using the -O3 compiler option). Assuming my error example, printf example does way fewer calls than the cout example. This is int main with printf:

main:                                   @ @main
@ BB#0:
        push    {lr}
        ldr     r0, .LCPI0_0
        ldr     r2, .LCPI0_1
        mov     r1, #2
        bl      printf
        mov     r0, #0
        pop     {lr}
        mov     pc, lr
        .align  2
@ BB#1:

You can easily notice that two strings, and 2 (number) are pushed as printf arguments. That's about it; there is nothing else. For comparison, this is iostream compiled to assembly. No, there is no inlining; every single operator << call means another call with another set of arguments.

main:                                   @ @main
@ BB#0:
        push    {r4, r5, lr}
        ldr     r4, .LCPI0_0
        ldr     r1, .LCPI0_1
        mov     r2, #6
        mov     r3, #0
        mov     r0, r4
        bl      _ZSt16__ostream_insertIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_PKS3_l
        mov     r0, r4
        mov     r1, #2
        bl      _ZNSolsEi
        ldr     r1, .LCPI0_2
        mov     r2, #2
        mov     r3, #0
        mov     r4, r0
        bl      _ZSt16__ostream_insertIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_PKS3_l
        ldr     r1, .LCPI0_3
        mov     r0, r4
        mov     r2, #14
        mov     r3, #0
        bl      _ZSt16__ostream_insertIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_PKS3_l
        ldr     r1, .LCPI0_4
        mov     r0, r4
        mov     r2, #1
        mov     r3, #0
        bl      _ZSt16__ostream_insertIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_PKS3_l
        ldr     r0, [r4]
        sub     r0, r0, #24
        ldr     r0, [r0]
        add     r0, r0, r4
        ldr     r5, [r0, #240]
        cmp     r5, #0
        beq     .LBB0_5
@ BB#1:                                 @ %_ZSt13__check_facetISt5ctypeIcEERKT_PS3_.exit
        ldrb    r0, [r5, #28]
        cmp     r0, #0
        beq     .LBB0_3
@ BB#2:
        ldrb    r0, [r5, #39]
        b       .LBB0_4
.LBB0_3:
        mov     r0, r5
        bl      _ZNKSt5ctypeIcE13_M_widen_initEv
        ldr     r0, [r5]
        mov     r1, #10
        ldr     r2, [r0, #24]
        mov     r0, r5
        mov     lr, pc
        mov     pc, r2
.LBB0_4:                                @ %_ZNKSt5ctypeIcE5widenEc.exit
        lsl     r0, r0, #24
        asr     r1, r0, #24
        mov     r0, r4
        bl      _ZNSo3putEc
        bl      _ZNSo5flushEv
        mov     r0, #0
        pop     {r4, r5, lr}
        mov     pc, lr
.LBB0_5:
        bl      _ZSt16__throw_bad_castv
        .align  2
@ BB#6:

However, to be honest, this means nothing, as I/O is the bottleneck anyway. I just wanted to show that iostream is not faster because it's "type safe". Most C implementations implement printf formats using computed goto, so the printf is as fast as it can be, even without compiler being aware of printf (not that they aren't - some compilers can optimize printf in certain cases - constant string ending with \n is usually optimized to puts).

Inheritance

I don't know why you would want to inherit ostream, but I don't care. It's possible with FILE too.

class MyFile : public FILE {}

Type safety

True, variable length argument lists have no safety, but that doesn't matter, as popular C compilers can detect problems with printf format string if you enable warnings. In fact, Clang can do that without enabling warnings.

$ cat safety.c

#include <stdio.h>

int main(void) {
    printf("String: %s\n", 42);
    return 0;
}

$ clang safety.c

safety.c:4:28: warning: format specifies type 'char *' but the argument has type 'int' [-Wformat]
    printf("String: %s\n", 42);
                    ~~     ^~
                    %d
1 warning generated.
$ gcc -Wall safety.c
safety.c: In function ‘main’:
safety.c:4:5: warning: format ‘%s’ expects argument of type ‘char *’, but argument 2 has type ‘int’ [-Wformat=]
     printf("String: %s\n", 42);
     ^

Raw

octal.md

First of all, you really don't need parseInt() in most cases. It's algorithm is full of various quirks, the 0 prefix is even forbidden by the specification ("the specification of the function parseInt no longer allows implementations to treat Strings beginning with a 0 character as octal values."), but it will take a while to change browser behaviors (even if I'm sure that nobody does use octals intentionally in parseInt()). And Internet Explorer 6 will never change (the Internet Explorer 9 however removed support for octals in parseInt()). The algorithm used by it usually does more than you want from it. In certain cases, it's bad idea.

First argument is converted to string if it isn't already.
Trim the number, so ' 4' becomes '4'.
Check if string begins with - or + and remove this character. If it was - make output negative.
Convert radix to integer.
If radix is 0 or NaN try to guess radix. It means looking (case-insensitive) for 0x and (non-standard) 0. If prefix wasn't found, 10 is used (and this is what you most likely what).
If radix is 16 strip 0x from the beginning if it exists.
Find the first character which is not in range of radix.
If there is nothing before first character which wasn't in range of radix, return NaN.
Convert number to decimal until the first character which is not in range.

For example, parseInt('012z', 27) gives 0 * Math.pow(27, 2) + 1 * Math.pow(27, 1) + 2 * Math.pow(27, 0).

The algorithm itself is not really quick, but performance varies (optimizations make wonders). I've put test on JSPerf and the results are interesting. + and ~~ are fastest with exception for Chrome where parseFloat() is somehow way faster than other options (2 to 5 times faster than other options, where + is actually 5 times slower). In Firefox, ~~ test is very fast - in certain cases, I've got Infinity cycles.

The other thing is correctness. parseInt(), ~~ and parseFloat() make errors silent. In case of parseInt() and parseFloat() characters are ignored after invalid character - you can call it a feature (in most cases it's anti-feature for me, just like switch statements fallthrough) and if you need it, use one of those. In case of ~~ it means returning 0, so be careful.

In certain cases, parseInt() might hurt you. Badly. For example, if number is so big that it is written in exponential notation. Use Math methods then.

parseInt(2e30); // will return 2

Anyways, at end I want to make a list when of methods to convert strings to numbers (both integers and floats). They have various usages and you may be interested what method to use. In most cases, the simplest one is +number method, use it if you can. Whatever you do (except for first method), all should give correct result.

parseInt('08', 10); // 8
+'08';              // 8
~~'08';             // 8
parseFloat('08');   // 8
Number('08');       // 8
new Number('08');   // 8... I meant Object container for 8
Math.ceil('08');    // 8

`parseInt(number)`

Don't use. Simple as that. Either use parseInt(number, 10) or this workaround which will magically fix parseInt function. Please note that this workaround will not work in JSLint. Please don't complain about it.

(function () {
    "use strict";
    var oldParseInt = parseInt;
    // Don't use function parseInt() {}. It will make local variable.
    parseInt = function (number, radix) {
        return oldParseInt(number, radix || 10);
    };
}());

`parseInt(number, radix)`

parseInt converts argument to numbers using mentioned above algorithm. Avoid using it on large integers as it can do incorrect results in cases like parseInt(2e30). Also, never ever give it as argument to Array.prototype.map or Underscore.js variation of it as you may get weird results (try ['1', '2', '3'].map(parseInt) if you want (for explanation, replace parseInt with console.log)).

Use it when either:

When you need to read data written in different radix.
You need to ignore errors (for example change 123px to 123)

Otherwise use other more safe methods (if you need integer, use Math.floor instead).

`+number`

+ prefix (+number) converts number to float. In case of error it returns NaN which you can compare by either isNaN() or just by number !== number (it should return true only for NaN). It's very fast in Opera.

Use it unless you want specific features of other types.

`~~number`

~~ is a hack which uses ~ two times on the integer. As ~ bitwise operation can be only done for integers, the number is automatically converted. Most browsers have optimizations for this case. As bitwise operations only work below Math.pow(2, 32) never use this method with big numbers. It's blazingly fast on SpiderMonkey engine.

Use it when either:

You're writing code where performance is important for SpiderMonkey (like FireFox plugins) and you don't need error detection.
You need integer and care resulting JavaScript size.

`parseFloat(number)`

parseFloat() works like + with the one exception - it processes number until first invalid character instead of returning NaN. It's very fast (but not as fast as ~~ on Firefox) in V8. Unlike parseInt variation, it should be safe with Array.prototype.map.

Use it when either:

You're writing performance-critical code for Node.js or you're writing Google Chrome plugins (V8).
You need to ignore errors (for example change 42.13px to 42.13)

`Number(number)`

Avoid it. It works just like + prefix and is usually slower. The only usage where it could be useful is callback for Array.prototype.map - you cannot use + as callback.

`new Number(number)`

Use it when you need to confuse everybody with 0 being truthy value and having typeof of 'number'. Seriously, don't.

Math methods, like `Math.ceil(number)`

Use them when you need integer as it's more safe than parseInt() by not ignoring unexpected characters. Please note that technically it involves long conversion - string → float → integer → float (numbers in JavaScript are floats) - but most browser have optimizations for it, so usually it's not that noticeable. It's also safe with Array.prototype.map.

Raw

vexing-parse.md

C function declarators

First of all, there is C. In C, A a() is function declaration. For example, putchar has the following declaration. Normally, such declarations are stored in header files, however nothing stops you from writing them manually, if you know how the declaration of function looks like. The argument names are optional in declarations, so I omitted it in this example.

int putchar(int);

This allows you to write the code like this.

int puts(const char *);
int main() {
    puts("Hello, world!");
}

C also allows you to define functions that take functions as arguments, with nice readable syntax that looks like a function call (well, it's readable, as long you won't return a pointer to function).

#include <stdio.h>

int eighty_four() {
    return 84;
}

int output_result(int callback()) {
    printf("Returned: %d\n", callback());
    return 0;
}

int main() {
    return output_result(eighty_four);
}

As I mentioned, C allows omitting argument names in header files, therefore the output_result would look like this in header file.

int output_result(int());

One argument in constructor

Don't you recognize that one? Well, let me remind you.

A a(B());

Yep, it's exactly the same function declaration. A is int, a is output_result, and B is int.

You can easily notice a conflict of C with new features of C++. To be exact, constructors being class name and parenthesis, and alternate declaration syntax with () instead of =. By design, C++ tries to be compatible with C code, and therefore it has to deal with this case - even if practically nobody cares. Therefore, old C features have priority over new C++ features. The grammar of declarations tries to match the name as function, before reverting to the new syntax with () if it fails.

If one of those features wouldn't exist, or had a different syntax (like {} in C++11), this issue would never have happened for syntax with one argument.

Now you may ask why A a((B())) works. Well, let's declare output_result with useless parentheses.

int output_result((int()));

It won't work. The grammar requires the variable to not be in parentheses.

<stdin>:1:19: error: expected declaration specifiers or ‘...’ before ‘(’ token

However, C++ expects standard expression here. In C++, you can write the following code.

int value = int();

And the following code.

int value = ((((int()))));

C++ expects expression inside inside parentheses to be... well... expression, as opposed to the type C expects. Parentheses don't mean anything here. However, by inserting useless parentheses, the C function declaration is not matched, and the new syntax can be matched properly (which simply expects an expression, such as 2 + 2).

More arguments in constructor

Surely one argument is nice, but what about two? It's not that constructors may have just one argument. One of built-in classes which takes two arguments is std::string

std::string hundred_dots(100, '.');

This is all well and fine (technically, it would have most vexing parse if it would be written as std::string wat(int(), char()), but let's be honest - who would write that? But let's assume this code has a vexing problem. You would assume that you have to put everything in parentheses.

std::string hundred_dots((100, '.'));

Not quite so.

<stdin>:2:36: error: invalid conversion from ‘char’ to ‘const char*’ [-fpermissive]
In file included from /usr/include/c++/4.8/string:53:0,
                 from <stdin>:1:
/usr/include/c++/4.8/bits/basic_string.tcc:212:5: error:   initializing argument 1 of ‘std::basic_string<_CharT, _Traits, _Alloc>::basic_string(const _CharT*, const _Alloc&) [with _CharT = char; _Traits = std::char_traits<char>; _Alloc = std::allocator<char>]’ [-fpermissive]
     basic_string<_CharT, _Traits, _Alloc>::
     ^

I'm not sure why g++ tries to convert char to const char *. Either way, the constructor was called with just one value of type char. There is no overload which has one argument of type char, therefore the compiler is confused. You may ask - why the argument is of type char?

(100, '.')

Yes, , here is a comma operator. The comma operator takes two arguments, and gives the right-side argument. It isn't really useful, but it's something to be known for my explanation.

Instead, to solve the most vexing parse, the following code is needed.

std::string hundred_dots((100), ('.'));

The arguments are in parentheses, not the entire expression. In fact, just one of expressions needs to be in parentheses, as it's enough to break from the C grammar slightly to use the C++ feature. Things brings us to the point of zero arguments.

Zero arguments in constructor

You may have noticed the eighty_four function in my explanation.

int eighty_four();

Yes, this is also affected by the most vexing parse. It's a valid definition, and one you most likely have seen if you created header files (and you should). Adding parentheses doesn't fix it.

int eighty_four(());

Why is that so? Well, () is not an expression. In C++, you have to put an expression between parentheses. You cannot write auto value = () in C++, because () doesn't mean anything (and even if did, like empty tuple (see Python), it would be one argument, not zero). Practically that means you cannot use shorthand syntax without using C++11's {} syntax, as there are no expressions to put in parenthesis, and C grammar for function declarations will always apply.

sugar700/cout-difference.md

Real differences

Extensibility

Syntax

Translation

Not having to remember/look-up specific integer type syntax

You can't print the NUL byte, \0

Differences nobody cares about

Performance

Inheritance

Type safety

parseInt(number)

parseInt(number, radix)

+number

~~number

parseFloat(number)

Number(number)

new Number(number)

Math methods, like Math.ceil(number)

C function declarators

One argument in constructor

More arguments in constructor

Zero arguments in constructor

You can't print the NUL byte, `\0`

`parseInt(number)`

`parseInt(number, radix)`

`+number`

`~~number`

`parseFloat(number)`

`Number(number)`

`new Number(number)`

Math methods, like `Math.ceil(number)`