Skip to content

Instantly share code, notes, and snippets.

@catamorphism
Last active April 4, 2024 22:53
Show Gist options
  • Save catamorphism/fc07f7eda8ae5dd9ec5da7448b57ce1c to your computer and use it in GitHub Desktop.
Save catamorphism/fc07f7eda8ae5dd9ec5da7448b57ce1c to your computer and use it in GitHub Desktop.
Crashes in ICU tools when running under asan
# Steps to reproduce
Using Ubuntu 23.10 and clang 16.0.6.
```
$ git clone https://github.com/unicode-org/icu.git
$ cd icu
$ mkdir build
$ cd build
$ CPPFLAGS=-fsanitize=address LDFLAGS=-fsanitize=address ../icu4c/source/runConfigureICU --enable-debug --disable-release Linux/clang --disable-renaming --enable-tracing
$ make tests
```
(Note: the results are the same with `make -j -l4.5 tests`, as far as I can tell.)
# Crashes
When building the ICU data, the tools (usually `genrb` but sometimes `gendict` or `makeconv`) sometimes segfault. Exactly which commands segfault is non-deterministic, but I've seen at least one command segfault every time I've tried this.
For example, here's the tail of the output on one `make` attempt:
```
echo "$BRKITR_INDEX_TXT_CONTENT" > ./out/tmp/brkitr/res_index.txt
LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH ../bin/icupkg -tl ../../icu4c/source/data/in/coll/ucadata-unihan.icu ./out/build/icudt75l/coll/ucadata.icu
LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH ../bin/makeconv -s ../../icu4c/source/data -d ./out/build/icudt75l -c mappings/euc-tw-2014.ucm
genrb number of files: 906
echo "$ICUDATA_LIST_CONTENT" > ./out/tmp/icudata.lst
LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH ../bin/genrb -s ./out/tmp/brkitr -d ./out/build/icudt75l/brkitr/ -i ./out/build/icudt75l -k res_index.txt
LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH ../bin/genrb -s ./out/tmp/locales -d ./out/build/icudt75l/ -i ./out/build/icudt75l -k res_index.txt
LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH ../bin/genrb -s ../../icu4c/source/data/zone -d ./out/build/icudt75l/zone -i ./out/build/icudt75l -k tzdbNames.txt
[snip]
genrb number of files: 510
make[1]: *** [../data/rules.mk:554: out/build/icudt75l/euc-tw-2014.cnv] Segmentation fault (core dumped)
```
The command that crashed in this case is `LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH ../bin/makeconv -s ../../icu4c/source/data -d ./out/build/icudt75l -c mappings/euc-tw-2014.ucm`.
# Debugging the crash
I found that running `gdb` on the resulting core file didn't help; the only backtrace I could get was:
```
#0 0x0000615616cc55a0 in ?? ()
#1 <signal handler called>
#2 0x0000615616cc55a0 in ?? ()
#3 <signal handler called>
[snip]
#32 0x0000615616cc55a0 in ?? ()
#33 <signal handler called>
#34 0x0000615616cdab65 in ?? ()
#35 0x0000000000000000 in ?? ()
```
Also, I was never able to reproduced the segfaults by running `gdb` on any of the failing commands. The segfaults only seemed to happen when running from the makefiles.
I was able to get more debugging info by running `rr`. Since it isn't predictable which commands actually segfault, I did the following (in my `icu/build` directory):
```
$ cd data
$ make -n
```
This outputs all the `genrb` (etc.) commands that will be called. Next I pasted the first 20 lines or so into a shell script. I replaced all the occurrences of `../bin/genrb` with `rr record ../bin/genrb`. The script is [here](https://gist.github.com/catamorphism/a48ee66a12614686846e8d75424c0c1b). I also had to set `ASAN_OPTIONS` to `detect_leaks=0`, because LSAN doesn't run under debuggers.
When running the script, some of the `rr` runs crash and can be replayed, for example:
```
tjc@tjc-ThinkPad:~/icu/build_clang_asan_no_enable_static/data$ bash ~/genrb.sh
rr: Saving execution to trace directory `/home/tjc/.local/share/rr/genrb-1002'.
/home/tjc/genrb.sh: line 2: 662978 Segmentation fault LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH rr record ../bin/genrb -s ../../icu4c/source/data/locales -d ./out/build/icudt75l/ -i ./out/build/icudt75l --usePoolBundle ./out/build/icudt75l/ -k af.txt
rr: Saving execution to trace directory `/home/tjc/.local/share/rr/genrb-1003'.
rr: Saving execution to trace directory `/home/tjc/.local/share/rr/genrb-1004'.
rr: Saving execution to trace directory `/home/tjc/.local/share/rr/genrb-1005'.
/home/tjc/genrb.sh: line 5: 663026 Segmentation fault LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH rr record ../bin/genrb -s ../../icu4c/source/data/locales -d ./out/build/icudt75l/ -i ./out/build/icudt75l --usePoolBundle ./out/build/icudt75l/ -k agq.txt
rr: Saving execution to trace directory `/home/tjc/.local/share/rr/genrb-1006'.
rr: Saving execution to trace directory `/home/tjc/.local/share/rr/genrb-1007'.
rr: Saving execution to trace directory `/home/tjc/.local/share/rr/genrb-1008'.
/home/tjc/genrb.sh: line 8: 663074 Segmentation fault LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH rr record ../bin/genrb -s ../../icu4c/source/data/locales -d ./out/build/icudt75l/ -i ./out/build/icudt75l --usePoolBundle ./out/build/icudt75l/ -k ak_GH.txt
rr: Saving execution to trace directory `/home/tjc/.local/share/rr/genrb-1009'.
rr: Saving execution to trace directory `/home/tjc/.local/share/rr/genrb-1010'.
rr: Saving execution to trace directory `/home/tjc/.local/share/rr/genrb-1011'.
rr: Saving execution to trace directory `/home/tjc/.local/share/rr/genrb-1012'.
/home/tjc/genrb.sh: line 12: 663136 Segmentation fault LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH rr record ../bin/genrb -s ../../icu4c/source/data/locales -d ./out/build/icudt75l/ -i ./out/build/icudt75l --usePoolBundle ./out/build/icudt75l/ -k ar_001.txt
```
If I pick one of the runs that crashed and do `rr replay genrb-1002`, and get a backtrace:
```
(rr) c
Continuing.
Program received signal SIGSEGV, Segmentation fault.
0x00006387310c0b65 in __sanitizer::internal_mmap(void*, unsigned long, int, int, int, unsigned long long) ()
(rr) bt
#0 0x00006387310c0b65 in __sanitizer::internal_mmap(void*, unsigned long, int, int, int, unsigned long long)
()
#1 0x00006387310c241d in __sanitizer::MmapNamed(void*, unsigned long, int, int, char const*) ()
#2 0x00006387310cc45c in __sanitizer::ReservedAddressRange::Init(unsigned long, char const*, unsigned long) ()
#3 0x0000638731010ea1 in __sanitizer::SizeClassAllocator64<__asan::AP64<__sanitizer::LocalAddressSpaceView> >::Init(int, unsigned long) ()
#4 0x000063873100e1d1 in __asan::Allocator::InitLinkerInitialized(__asan::AllocatorOptions const&) ()
#5 0x00006387310b17e9 in __asan::AsanInitInternal() ()
#6 0x00007f76cfca64ba in _dl_init (main_map=0x7f76cfcd92d0, argc=11, argv=0x7ffd1f3fbc78, env=0x7ffd1f3fbcd8)
at ./elf/dl-init.c:122
#7 0x00007f76cfcbfb70 in _dl_start_user () from /lib64/ld-linux-x86-64.so.2
#8 0x000000000000000b in ?? ()
#9 0x00007ffd1f3fcf05 in ?? ()
#10 0x00007ffd1f3fcf12 in ?? ()
#11 0x00007ffd1f3fcf15 in ?? ()
#12 0x00007ffd1f3fcf35 in ?? ()
#13 0x00007ffd1f3fcf38 in ?? ()
#14 0x00007ffd1f3fcf4e in ?? ()
#15 0x00007ffd1f3fcf51 in ?? ()
#16 0x00007ffd1f3fcf66 in ?? ()
#17 0x00007ffd1f3fcf76 in ?? ()
#18 0x00007ffd1f3fcf8c in ?? ()
#19 0x00007ffd1f3fcf8f in ?? ()
#20 0x0000000000000000 in ?? ()
```
# Is this a bug in asan, or a bug in ICU?
I don't know. I initially thought this was because I was also using the `--enable-static` configure flag, but the same behavior happens if that flag is omitted.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment