Last active
          April 4, 2024 22:53 
        
      - 
      
- 
        Save catamorphism/fc07f7eda8ae5dd9ec5da7448b57ce1c to your computer and use it in GitHub Desktop. 
    Crashes in ICU tools when running under asan
  
        
  
    
      This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
      Learn more about bidirectional Unicode characters
    
  
  
    
  | # Steps to reproduce | |
| Using Ubuntu 23.10 and clang 16.0.6. | |
| ``` | |
| $ git clone https://github.com/unicode-org/icu.git | |
| $ cd icu | |
| $ mkdir build | |
| $ cd build | |
| $ CPPFLAGS=-fsanitize=address LDFLAGS=-fsanitize=address ../icu4c/source/runConfigureICU --enable-debug --disable-release Linux/clang --disable-renaming --enable-tracing | |
| $ make tests | |
| ``` | |
| (Note: the results are the same with `make -j -l4.5 tests`, as far as I can tell.) | |
| # Crashes | |
| When building the ICU data, the tools (usually `genrb` but sometimes `gendict` or `makeconv`) sometimes segfault. Exactly which commands segfault is non-deterministic, but I've seen at least one command segfault every time I've tried this. | |
| For example, here's the tail of the output on one `make` attempt: | |
| ``` | |
| echo "$BRKITR_INDEX_TXT_CONTENT" > ./out/tmp/brkitr/res_index.txt | |
| LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH ../bin/icupkg -tl ../../icu4c/source/data/in/coll/ucadata-unihan.icu ./out/build/icudt75l/coll/ucadata.icu | |
| LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH ../bin/makeconv -s ../../icu4c/source/data -d ./out/build/icudt75l -c mappings/euc-tw-2014.ucm | |
| genrb number of files: 906 | |
| echo "$ICUDATA_LIST_CONTENT" > ./out/tmp/icudata.lst | |
| LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH ../bin/genrb -s ./out/tmp/brkitr -d ./out/build/icudt75l/brkitr/ -i ./out/build/icudt75l -k res_index.txt | |
| LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH ../bin/genrb -s ./out/tmp/locales -d ./out/build/icudt75l/ -i ./out/build/icudt75l -k res_index.txt | |
| LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH ../bin/genrb -s ../../icu4c/source/data/zone -d ./out/build/icudt75l/zone -i ./out/build/icudt75l -k tzdbNames.txt | |
| [snip] | |
| genrb number of files: 510 | |
| make[1]: *** [../data/rules.mk:554: out/build/icudt75l/euc-tw-2014.cnv] Segmentation fault (core dumped) | |
| ``` | |
| The command that crashed in this case is `LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH ../bin/makeconv -s ../../icu4c/source/data -d ./out/build/icudt75l -c mappings/euc-tw-2014.ucm`. | |
| # Debugging the crash | |
| I found that running `gdb` on the resulting core file didn't help; the only backtrace I could get was: | |
| ``` | |
| #0 0x0000615616cc55a0 in ?? () | |
| #1 <signal handler called> | |
| #2 0x0000615616cc55a0 in ?? () | |
| #3 <signal handler called> | |
| [snip] | |
| #32 0x0000615616cc55a0 in ?? () | |
| #33 <signal handler called> | |
| #34 0x0000615616cdab65 in ?? () | |
| #35 0x0000000000000000 in ?? () | |
| ``` | |
| Also, I was never able to reproduced the segfaults by running `gdb` on any of the failing commands. The segfaults only seemed to happen when running from the makefiles. | |
| I was able to get more debugging info by running `rr`. Since it isn't predictable which commands actually segfault, I did the following (in my `icu/build` directory): | |
| ``` | |
| $ cd data | |
| $ make -n | |
| ``` | |
| This outputs all the `genrb` (etc.) commands that will be called. Next I pasted the first 20 lines or so into a shell script. I replaced all the occurrences of `../bin/genrb` with `rr record ../bin/genrb`. The script is [here](https://gist.github.com/catamorphism/a48ee66a12614686846e8d75424c0c1b). I also had to set `ASAN_OPTIONS` to `detect_leaks=0`, because LSAN doesn't run under debuggers. | |
| When running the script, some of the `rr` runs crash and can be replayed, for example: | |
| ``` | |
| tjc@tjc-ThinkPad:~/icu/build_clang_asan_no_enable_static/data$ bash ~/genrb.sh | |
| rr: Saving execution to trace directory `/home/tjc/.local/share/rr/genrb-1002'. | |
| /home/tjc/genrb.sh: line 2: 662978 Segmentation fault LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH rr record ../bin/genrb -s ../../icu4c/source/data/locales -d ./out/build/icudt75l/ -i ./out/build/icudt75l --usePoolBundle ./out/build/icudt75l/ -k af.txt | |
| rr: Saving execution to trace directory `/home/tjc/.local/share/rr/genrb-1003'. | |
| rr: Saving execution to trace directory `/home/tjc/.local/share/rr/genrb-1004'. | |
| rr: Saving execution to trace directory `/home/tjc/.local/share/rr/genrb-1005'. | |
| /home/tjc/genrb.sh: line 5: 663026 Segmentation fault LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH rr record ../bin/genrb -s ../../icu4c/source/data/locales -d ./out/build/icudt75l/ -i ./out/build/icudt75l --usePoolBundle ./out/build/icudt75l/ -k agq.txt | |
| rr: Saving execution to trace directory `/home/tjc/.local/share/rr/genrb-1006'. | |
| rr: Saving execution to trace directory `/home/tjc/.local/share/rr/genrb-1007'. | |
| rr: Saving execution to trace directory `/home/tjc/.local/share/rr/genrb-1008'. | |
| /home/tjc/genrb.sh: line 8: 663074 Segmentation fault LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH rr record ../bin/genrb -s ../../icu4c/source/data/locales -d ./out/build/icudt75l/ -i ./out/build/icudt75l --usePoolBundle ./out/build/icudt75l/ -k ak_GH.txt | |
| rr: Saving execution to trace directory `/home/tjc/.local/share/rr/genrb-1009'. | |
| rr: Saving execution to trace directory `/home/tjc/.local/share/rr/genrb-1010'. | |
| rr: Saving execution to trace directory `/home/tjc/.local/share/rr/genrb-1011'. | |
| rr: Saving execution to trace directory `/home/tjc/.local/share/rr/genrb-1012'. | |
| /home/tjc/genrb.sh: line 12: 663136 Segmentation fault LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH rr record ../bin/genrb -s ../../icu4c/source/data/locales -d ./out/build/icudt75l/ -i ./out/build/icudt75l --usePoolBundle ./out/build/icudt75l/ -k ar_001.txt | |
| ``` | |
| If I pick one of the runs that crashed and do `rr replay genrb-1002`, and get a backtrace: | |
| ``` | |
| (rr) c | |
| Continuing. | |
| Program received signal SIGSEGV, Segmentation fault. | |
| 0x00006387310c0b65 in __sanitizer::internal_mmap(void*, unsigned long, int, int, int, unsigned long long) () | |
| (rr) bt | |
| #0 0x00006387310c0b65 in __sanitizer::internal_mmap(void*, unsigned long, int, int, int, unsigned long long) | |
| () | |
| #1 0x00006387310c241d in __sanitizer::MmapNamed(void*, unsigned long, int, int, char const*) () | |
| #2 0x00006387310cc45c in __sanitizer::ReservedAddressRange::Init(unsigned long, char const*, unsigned long) () | |
| #3 0x0000638731010ea1 in __sanitizer::SizeClassAllocator64<__asan::AP64<__sanitizer::LocalAddressSpaceView> >::Init(int, unsigned long) () | |
| #4 0x000063873100e1d1 in __asan::Allocator::InitLinkerInitialized(__asan::AllocatorOptions const&) () | |
| #5 0x00006387310b17e9 in __asan::AsanInitInternal() () | |
| #6 0x00007f76cfca64ba in _dl_init (main_map=0x7f76cfcd92d0, argc=11, argv=0x7ffd1f3fbc78, env=0x7ffd1f3fbcd8) | |
| at ./elf/dl-init.c:122 | |
| #7 0x00007f76cfcbfb70 in _dl_start_user () from /lib64/ld-linux-x86-64.so.2 | |
| #8 0x000000000000000b in ?? () | |
| #9 0x00007ffd1f3fcf05 in ?? () | |
| #10 0x00007ffd1f3fcf12 in ?? () | |
| #11 0x00007ffd1f3fcf15 in ?? () | |
| #12 0x00007ffd1f3fcf35 in ?? () | |
| #13 0x00007ffd1f3fcf38 in ?? () | |
| #14 0x00007ffd1f3fcf4e in ?? () | |
| #15 0x00007ffd1f3fcf51 in ?? () | |
| #16 0x00007ffd1f3fcf66 in ?? () | |
| #17 0x00007ffd1f3fcf76 in ?? () | |
| #18 0x00007ffd1f3fcf8c in ?? () | |
| #19 0x00007ffd1f3fcf8f in ?? () | |
| #20 0x0000000000000000 in ?? () | |
| ``` | |
| # Is this a bug in asan, or a bug in ICU? | |
| I don't know. I initially thought this was because I was also using the `--enable-static` configure flag, but the same behavior happens if that flag is omitted. | 
  
    Sign up for free
    to join this conversation on GitHub.
    Already have an account?
    Sign in to comment