Mattによる"How to C in 2016"への批評

(英語版のこのバージョンから翻訳) (日本語版はここ)

Keith Thompson, Sat 2016-01-09, updated Fri 2016-01-15

(koba, Sun 2016-02-21)

Matt(ウェブサイトでは私が見る限り苗字はわからない)が"How to C in 2016"という記事を書いた。この記事はRedditとHacker Newsからリンクされている。私は後者でこの記事を見た。

更新: Mattが記事にこの批評へのリンクを追加してくれた。

更新: Mattの苗字を発見してくれた人がいるが、彼は記事に苗字を含めていないので、私はそれについては言及しない。

混乱を避けるために書いておくが、私はKen Thompsonではないし、関係もない。

この記事はhttps://reddit.com/r/programming/からリンクされていて、今のところページ最上部にある。たくさんのコメントがあり、そのうちいくつかは建設的である。

C言語に関するどんな意見についても仕方のないことであるが、何個か同意できないところがある。この批評は建設的な批評になるように意図した。ある点においてはMattが正しくて私が誤っている、ということは十分あり得る。

Mattの記事をすべて引用することはしない。特に、賛同する点については省略する。この記事はMattの記事と並行して読むことを勧める。

The first rule of C is don't write C if you can avoid it.

これには賛同しない。しかし意味のある議論をするにはこれはあまりにも広すぎるトピックである。

C99 is the default C implementation for clang, no extra options needed.

これはclangのバージョンに依存する。clang 3.5はデフォルトでC99を使い、clang 3.6はデフォルトでC11を使う。これがどのくらい厳格なのかは私ははっきり知らない。

もしgccやclangで特定の標準を使いたいのならば、明示的にそれを指定する(-std=cNN -pedantic)。

gcc-5 defaults to -std=gnu11, but you should still specify a non-GNU c99 or c11 for practical usage.

gccに特有の拡張を使いたいのでなければ、これは正しい。 Unless you want to use gcc-specific extensions, which is a perfectly legitimate thing to do.

If you find yourself typing char or int or short or long or unsigned into new code, you're doing it wrong.

残念だがナンセンスである。特にintは現在のプラットフォームでは最も「自然」な整数型になっている。符号付き整数型であって十分速く、少なくとも16ビットあるようなものが欲しい時は、intを使うことには何の問題もない。 (もしくはint_least16_tを使ってもよく、おそらくintと同じ型になるが、個人的な意見では冗長すぎるように思える。)

For modern programs, you should #include <stdint.h> then use standard types.

intという名前に"std"がないからといって標準でないというわけではない。int、longなどの型は言語に組み込みである。typedefで定義されている<stdint.h>の中の型は後から付け加えられたものである。だからといって標準の型よりも標準度が低いわけではないが、高いわけでもない。

float — standard 32-bit floating point

double - standard 64-bit floating point

floatとdoubleは普通はIEEE 32ビット・64ビット浮動小数点数型であり、特に現代のシステムではそうだが、C言語では保証されていない。私はfloatが64ビットであるようなシステムを使ったことがある。

Notice we don't have char anymore. char is actually misnamed and misused in C.

C言語が文字とバイトを一緒にしているのは残念だが、私たちはそれに固執しなければならない<?>。char型は正確に1バイトであることが保証されており、バイトは少なくとも8ビットである。 C's conflation of characters and bytes is unfortunate, but we're stuck with it. The type char is guaranteed to be exactly one byte, where a "byte" is at least 8 bits.

Developers routinely abuse char to mean "byte" even when they are doing unsigned byte manipulations. It's much cleaner to use uint8_t to mean single a unsigned-byte/octet-value and uint8_t * to mean sequence-of-unsigned-byte/octet-values.

もしバイトを使いたいのなら、unsigned char型を使用するべきである。オクテットを使いたい場合はuint8_tである。 CHAR_BIT > 8ならuint8_tは存在せず、コードはコンパイルできない(多分それが意図した挙動である)。もし少なくとも8ビットあるものが欲しいのなら、uint_least8_tを使う。バイトがオクテットであることを仮定したい場合は、以下のようなコードを追加する:

#include <limits.h>
#if CHAR_BIT != 8
    #error "This program assumes 8-bit bytes"
#endif

POSIXはCHAR_BIT == 8を要請していることに注意すること。

the C type of string literals ("hello") is char *.

違う。文字列リテラルの型はchar[]である。特に、"hello"の型はchar[6]である。 配列はポインタではない。このトピックについて詳しくはcomp.lang.c FAQのSection 6を参照。

At no point should you be typing the word unsigned into your code. We can now write code without the ugly C convention of multi-word types that impair readability as well as usage.

C言語の型の多くは2語以上の名前を持つ。何も間違いはない。キーストロークを節約するために短い省略形を使うのはよくない。

Who wants to type unsigned long long int when you can type uint64_t?

まず、unsigned long longと書くだけで良い。intは暗黙のうちに補完される。さらに、この2つは意味が異なる。unsigned long longは少なくとも64ビットで、パディングビットを持つかもしれないし持たないかもしれない。uint64_tは正確に64ビットであり、パディングビットを持たない。そして存在することが保証されていない。

そしてunsigned long longはC言語で定義されている型であり、C言語のプログラマなら誰でもわかる。

uint_least64_tを使ってもよい。unsigned long longとは同じかもしれないし、同じでないかもしれない。

The <stdint.h> types are more explicit, more exact in meaning, convey intentions better, and are more compact for typographic usage and readability.

確かに、intN_tとuintN_t型はより明示的である。しかし必ずしも良いこととは限らない。どうでもよいことを指定するのはよくない。uint64_tを使うのは、本当に過不足なく正確に64ビットの型が欲しい時だけである。

特定の正確な幅を持つ型が必要な時もある。例えば外部で要請されているフォーマットに従う時などである。(ときには特定のエンディアン、アラインメントなどを指定する必要もあるだろう。C言語の<stdint.h>ではそれらは指定できない。) しかし、多くの場合で必要なのは特定の範囲の値である。そのためには[u]int_leastN_t型を使っても良いし、あらかじめ定義された型を使っても良い。

The correct type for pointer math is uintptr_t defined in <stddef.h>.

これは致命的に誤っている。

まず細かいところから: uintptr_tは<stdint.h>で定義されている。<stddef.h>ではない。

上の話はuintptr_tが定義されていることを仮定している。void*型を情報の損失なしに整数型に変換できないような実装はuintptr_tを定義しないだろう。(そのような実装は確かにまれで、もしかしたら存在しないかもしれない。)

Instead of:

        long diff = (long)ptrOld - (long)ptrNew;

確かにこれは悪い。

Use:

        ptrdiff_t diff = (uintptr_t)ptrOld - (uintptr_t)ptrNew;

これは少しも良くなっていない。

2つのポインタの指し示す先の違いを、その型を単位として知りたい場合は、以下を使う:

ptrdiff_t diff = ptrOld - ptrNew;

バイト単位で知りたい場合は以下である:

ptrdiff_t diff = (char*)ptrOld - (char*)ptrNew;

ptrOldとptrNewの指すオブジェクトが同じでない場合<---- just past の訳?> ポインタの減算の振る舞いは未定義である。uintptr_tに変換してから減算をすれば何らかの結果は得られるが、その結果に意味はない。低レベルのシステムコードを書くのでない限り、同じオブジェクトを指さないポインタの比較や演算は行ってはならない。(例外: ポインタの==と!=は異なるオブジェクトを指すポインタに対しても正しく動作する。)

In these situations, you should use intptr_t — the integer type defined to be the word size of your current platform.

そうではない。「ワードサイズ」という概念はちゃんと定義されていない。intptr_t型は符号付き整数型であって、void*型から変換してまたvoid*型に戻した時に情報の損失なく変換できるような大きさの型である。void*型より大きくなり得る。

On 32-bit platforms, intptr_t is int32_t.

おそらく。しかし保証されていない。

On 64-bit platforms, intptr_t is int64_t.

同様に、おそらくそうだが、保証はない。

size_t is defined as "an integer capable of holding the largest array index"

そうではない。

which also means it's capable of holding the largest memory index" which also means it's capable of holding the largest memory offset in your program.

size_tは、実装がサポートする最大のオブジェクトのサイズを保持できる型である。 (必ずしも保証されていないという意見もあるが、実用的にはそう仮定して良い。) 全てのメモリオフセットが単一のオブジェクトの内部にあるならば、最大のオフセットを保持できる。

In either case: size_t is practically defined to be the same as uintptr_t on all modern platforms, so on a 32-bit platform size_t is uint32_t and on a 64-bit platform size_t is uint64_t.

おそらく、しかし保証はない。

より端的に言えば, size_tは全ての単一のオブジェクトのサイズを表現できる。 uintptr_tは任意のポインタ値を表現でき、全てのオブジェクトの全てのバイトを区別できる。ほとんどの現代のシステムはmonolithicなアドレス空間を持ち、オブジェクトの理論的な最大サイズは全メモリ空間のサイズと同じである。しかしC標準は注意深くそれを要請していない。32ビットより大きなオブジェクトを持てないような64ビットシステムも存在し得る。

「現代」のシステムであることを強調することで、元の記事は古いシステム("near"と"far"ポインタによって、セグメントアドレッシングを行う古いx86システムなど)や、将来あり得るC標準に完全に準拠しつつ「現代的な」想定を満たさないシステムを無視している。

You should never cast types during printing. You should use proper type specifiers.

それも一つの方法だが、それだけが良いアプローチというわけではない。 (それに元の記事でも"%p"を使う時にvoid*にキャストすることが必要だと言っている。)

raw pointer value - %p (prints hex value; cast your pointer to (void *) first)

良いアドバイスだ。しかし出力フォーマットは実装定義である。普通は16進だが、それを仮定してはいけない。

        printf("Local number: %" PRIdPTR "\n\n", someIntPtr);

someIntPtrという名前はint*型をimplyしているが、実際はintptr_t型である。 The name someIntPtr implies a type of int*, but in fact it's of type intptr_t.

There's an alternative which means you don't have to remember the alphabet soup of macro names:

some_signed_type n;
some_unsigned_type u;
printf("n = %jd, u = %ju\n", (intmax_t)n, (uintmax_t)u);

intmax_t and uintmax_t are typically 64 bits. The conversions are going to be far cheaper than the physical I/O.

Notice you put the '%' inside your format string, but the type specifier is outside your format string.

It's all part of the format string. The macros are defined as string literals, which are concatenated with adjacent string literals.

Modern compilers support #pragma once

モダンコンパイラが#pragma onceをサポート

だからといって使うべきというわけではない。GNU cppマニュアルさえもそれは推奨していない。"Once-Only Headers"の章は#pragma onceに言及さえしておらず、#ifndefイディオムについて論じている。"Alternatives to Wrapper #ifndef"というSectionでは、#pragma onceについて簡潔に言及しているが、ポータブルではないと書いている。 That doesn't mean you should use it. Even the GNU cpp manual doesn't recommend it. The section on "Once-Only Headers" doesn't even mention #pragma once; it discusses the #ifndef idiom. The following section, "Alternatives to Wrapper #ifndef", briefly mentions #pragma once but points out that it's not portable.

This pragma is widely supported across all compilers across all platforms and is recommended over manually naming header guards.

このプラグマはあらゆるプラットフォーム上のあらゆるコンパイラを広くサポートしていますので、手動のヘッダガード作成よりも推奨されます。

誰によって推奨されているのか? #ifndefを用いたトリックは綺麗ではないが、信頼性が高くポータブルである。

IMPORTANT NOTE: If your struct has padding, the {0} method does not zero out extra padding bytes. For example, struct thing has 4 bytes of padding after counter (on a 64-bit platform) because structs are padded to word-sized increments. If you need to zero out an entire struct including unused padding, use memset(&localThing, 0, sizeof(localThing)) because sizeof(localThing) == 16 bytes even though the addressable contents is only 8 + 4 = 12 bytes.

重要なメモ：パディングを持つ構造体の場合は、{0}メソッドは余分なパディングのバイト数をゼロに初期化しません。例えば、struct thingには、ワードサイズでのインクリメントを考慮するために、counterの後に4バイトのパディングを持っています（64-bitプラットフォームの場合）。未使用のパディングも含め、構造体全体をゼロにする必要がある時は、memset(&localThing, 0, sizeof(localThing))を使いましょう。アクセス可能なコンテンツが8 + 4 = 12 bytesのみだったとしても、sizeof(localThing) == 16 bytesだからです。

これは少しトリッキーになっている。普通はパディングバイトについて考慮する理由はない。もし考慮するなら、確かにmemsetはそれらをゼロ初期化する方法である。しかし構造体をmemsetでゼロ初期化すると、整数メンバはすべてゼロになるが、浮動小数点数メンバが0,0になりポインタがNULLになるとは限らない(ほとんどのシステムではそうなるが)。

C99 allows variable length array initializers

(ここは元記事も修正しているので訳す必要はない) No, C99 doesn't allow initializers for VLAs (variable length arrays). But Matt isn't actually talking about VLA initializers, just about VLAs.

VLAs are controversial. Unlike malloc, VLAs provide no mechanism for detecting allocation failures. So if you need to allocate N bytes of data, then this:

{
    unsigned char *buf = malloc(N);
    if (buf == NULL) { /* allocation failed */ }
    /* ... */
    free(buf);
}

is at least in principle safer than this:

{
    unsigned char buf[N];
    /* ... */
}

Certainly VLAs are dangerous when used incorrectly. The same is true of just about every feature in every language.

But old-fashioned fixed-length arrays have exactly the same problem. As long as you check the size before creating the array, a VLA of some variable size N is no more dangerous than a fixed-length array of the same size. And it's common to define a fixed-length array that's bigger than it needs to be to ensure that you use part of it to store your actual data. With a VLA, you can allocate just what you need. I agree with Matt's advice here.

Except for one thing: C11 made VLAs optional. I doubt that many C11 compilers will actually decide not to implement them, except perhaps for small embedded systems, but it's something to keep in mind if you want your code to be as portable as possible.

If a function accepts arbitrary input data and a length to process, don't restrict the type of the parameter.

So, do NOT do this:

        void processAddBytesOverflow(uint8_t *bytes, uint32_t len)

Do THIS instead:

        void processAddBytesOverflow(void *input, uint32_t len)

関数が任意の入力データと長さの処理を受け付けるなら、パラメータの型は制限しないようにしましょう。

(関数本体は省略) (I've omitted the bodies of the functions.)

これは同意する。任意のメモリのchunkを指すパラメータについては、void*は正しい型である。標準ライブラリのmem*関数を参照。(しかしlenはuint32_tではなくsize_tであるべき。) I agree, void* is the right type to use for a parameter pointing to an arbitrary chunk of memory. See the mem* functions in the standard library. (But len should be size_t, not uint32_t.)

By declaring your input type as void * and re-casting inside your function, you save the users of your function from having to think about abstractions inside your own library.

入力の型をvoid *として定義し、関数内で実際に使いたい型に再代入または再キャストすることで、その関数を使おうとするユーザはあなた自身のライブラリ内の抽象化について考えずに済みます。

小さなこじつけだが、Mattの関数にはキャストはない。あるのはvoid*からuint8_t*への暗黙の変換である。 (注:日本語版では直っている)

Some readers have pointed out alignment problems with this example.

Some readers are mistaken. Accessing a chunk of memory as a sequence of bytes is always safe.

C99 gives us the power of <stdbool.h> which defines true to 1 and false to 0.

Yes, and it also defines bool as an alias for the predefined Boolean type _Bool.

For success/failure return values, functions should return true or false, not an int32_t return type with manually specifying 1 and 0 (or worse, 1 and -1 (or is it 0 success and 1 failure? or is it 0 success and -1 failure?)).

There is a widespread convention, particularly in Unix-like systems, for functions to return 0 for success and some non-zero value (often -1) for failure. In many cases different non-zero results denote different kinds of failure. It's important to follow this convention when adding new functions to such an interface. (0 is used for success because typically there's only one way for a function to succeed, and multiple ways for it to fail.)

A function that tests whether some condition is true or false should return true or false. But success vs. failure is often a different thing.

A bool function should have a name that's a predicate. In English, it would be something that answers a yes/no question. Examples are is_foo() and has_widget(). A function that tries to do something, and that needs to let you know whether it succeeded or not, should probably use a different convention. In some languages raising or throwing an exception is the right approach. In C, you should follow some existing convention -- and zero for success is the most common one.

The only usable C formatter as of 2016 is clang-format. clang-format has the best defaults of any automatic C formatter and is still actively developed.

I haven't used clang-format myself. I'll have to look into it.

I have my own fairly strong opinions about C code layout:

Opening brace goes at the end of the line;
Spaces, not tabs;
4-columns per level;
Always use curly braces (except in rare cases where putting a statement on one line improves readability).

These are just my own personal preferences, which can be overridden by one important rule:

Follow the conventions of the project you're working on.

I don't often use automatic formatting tools myself. Perhaps I should.

Never use malloc

You should always use calloc.

I disagree. Initializing allocated memory to all-bits-zero is somewhat arbitrary, and it's typically not a meaningful value. If you write your code correctly, you won't access any object unless you have first assigned a meaningful value to it. Using calloc means that a bug in your code will give you a value of zero, rather than some arbitrary garbage. This isn't necessarily an improvement.

Zeroing memory often means that buggy code will have consistent behavior; by definition it will not have correct behavior. And consistently incorrect behavior can be more difficult to track down.

Of course there's no such thing as bug-free code. But if you're trying to program defensively, you might consider initializing allocated memory to some value that's known to be invalid rather than one that might be valid.

On the other hand, if all-bits-zero happens to be a reasonable initial value, calloc might be a good approach.

koba-e964/how-to-c-2016-resp-ja.md

Mattによる"How to C in 2016"への批評

Keith Thompson, Sat 2016-01-09, updated Fri 2016-01-15