Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BOLT] Assertion '__n < this->size()' failed during addDebugFilenameToUnit in CPython #121554

Open
zanieb opened this issue Jan 3, 2025 · 10 comments
Labels
BOLT crash Prefer [crash-on-valid] or [crash-on-invalid]

Comments

@zanieb
Copy link

zanieb commented Jan 3, 2025

Similar to #121213 but with a different trace

This assertion crashes BOLT during a CPython build. I'm using the 3.13.1 tag for reproducibility.

My usage looks like

git clone https://github.com/python/cpython
cd cpython
git checkout v3.13.1

export CC=clang
export CXX=clang++

./configure py_cv_module__openssl=n/a py_cv_module__hashlib=n/a py_cv_module__gdbm=n/a py_cv_module__tkinter=n/a \
    --without-ensurepip \
    --enable-optimizations --enable-bolt

make -j8

I'll try to create a reproduction in a Dockerfile, if that'd be helpful.

I'm using LLVM 19.1.6 on an x86_64 Arch Linux host

❯ llvm-bolt --version
LLVM (http://llvm.org/):
  LLVM version 19.1.6
  Optimized build.
BOLT revision 6a0964d75628b15bafd078342120888c0e6d126f

Here's the stacktrace

/usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/14.2.1/../../../../include/c++/14.2.1/bits/stl_vector.h:1149: const_reference std::vector<llvm::DWARFDebugLine::FileNameEntry>::operator[](size_type) const [_Tp = llvm::DWARFDebugLine::FileNameEntry, _Alloc = std::allocator<llvm::DWARFDebugLine::FileNameEntry>]: Assertion '__n < this->size()' failed.
 #0 0x00005a51dd3fd860 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) /usr/src/debug/llvm-bolt/llvm-project-19.1.6.src/llvm/lib/Support/Unix/Signals.inc:723:13
 #1 0x00005a51dd3fb49b llvm::sys::RunSignalHandlers() /usr/src/debug/llvm-bolt/llvm-project-19.1.6.src/llvm/lib/Support/Signals.cpp:106:18
 #2 0x00005a51dd3fe165 SignalHandler(int) /usr/src/debug/llvm-bolt/llvm-project-19.1.6.src/llvm/lib/Support/Unix/Signals.inc:413:1
 #3 0x000074697ac4c1d0 (/usr/bin/../lib/libc.so.6+0x3d1d0)
 #4 0x000074697aca53f4 __pthread_kill_implementation /usr/src/debug/glibc/glibc/nptl/pthread_kill.c:44:76
 #5 0x000074697ac4c120 raise /usr/src/debug/glibc/glibc/signal/../sysdeps/posix/raise.c:27:6
 #6 0x000074697ac334c3 abort /usr/src/debug/glibc/glibc/stdlib/abort.c:81:7
 #7 0x000074697aed3af0 std::chrono::_V2::system_clock::now() /usr/src/debug/gcc/gcc/libstdc++-v3/src/c++11/chrono.cc:52:5
 #8 0x00005a51dda10cf9 std::vector<llvm::DWARFFormValue, std::allocator<llvm::DWARFFormValue>>::operator[](unsigned long) const /usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/14.2.1/../../../../include/c++/14.2.1/bits/stl_vector.h:1149:2
 #9 0x00005a51dda10cf9 llvm::bolt::BinaryContext::addDebugFilenameToUnit(unsigned int, unsigned int, unsigned int) /usr/src/debug/llvm-bolt/llvm-project-19.1.6.src/bolt/lib/Core/BinaryContext.cpp:1587:13
#10 0x00005a51dda220bd (anonymous namespace)::BinaryEmitter::emitLineInfo(llvm::bolt::BinaryFunction const&, llvm::SMLoc, llvm::SMLoc, bool, llvm::MCSymbol*&) /usr/src/debug/llvm-bolt/llvm-project-19.1.6.src/bolt/lib/Core/BinaryEmitter.cpp:678:25
#11 0x00005a51dda220bd (anonymous namespace)::BinaryEmitter::emitFunctionBody(llvm::bolt::BinaryFunction&, llvm::bolt::FunctionFragment&, bool) /usr/src/debug/llvm-bolt/llvm-project-19.1.6.src/bolt/lib/Core/BinaryEmitter.cpp:461:25
#12 0x00005a51dda22e49 (anonymous namespace)::BinaryEmitter::emitFunction(llvm::bolt::BinaryFunction&, llvm::bolt::FunctionFragment&) /usr/src/debug/llvm-bolt/llvm-project-19.1.6.src/bolt/lib/Core/BinaryEmitter.cpp:376:24
#13 0x00005a51dda226e2 (anonymous namespace)::BinaryEmitter::emitFunctions()::$_0::operator()(std::vector<llvm::bolt::BinaryFunction*, std::allocator<llvm::bolt::BinaryFunction*>> const&) const /usr/src/debug/llvm-bolt/llvm-project-19.1.6.src/bolt/lib/Core/BinaryEmitter.cpp:240:18
#14 0x00005a51dda20fe7 (anonymous namespace)::BinaryEmitter::emitFunctions() /usr/src/debug/llvm-bolt/llvm-project-19.1.6.src/bolt/lib/Core/BinaryEmitter.cpp:275:8
#15 0x00005a51dda20fe7 (anonymous namespace)::BinaryEmitter::emitAll(llvm::StringRef) /usr/src/debug/llvm-bolt/llvm-project-19.1.6.src/bolt/lib/Core/BinaryEmitter.cpp:205:3
#16 0x00005a51dda20fe7 llvm::bolt::emitBinaryContext(llvm::MCStreamer&, llvm::bolt::BinaryContext&, llvm::StringRef) /usr/src/debug/llvm-bolt/llvm-project-19.1.6.src/bolt/lib/Core/BinaryEmitter.cpp:1174:31
#17 0x00005a51dd47278a std::__uniq_ptr_impl<llvm::MCStreamer, std::default_delete<llvm::MCStreamer>>::_M_ptr() const /usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/14.2.1/../../../../include/c++/14.2.1/bits/unique_ptr.h:193:51
#18 0x00005a51dd47278a std::unique_ptr<llvm::MCStreamer, std::default_delete<llvm::MCStreamer>>::get() const /usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/14.2.1/../../../../include/c++/14.2.1/bits/unique_ptr.h:464:21
#19 0x00005a51dd47278a std::unique_ptr<llvm::MCStreamer, std::default_delete<llvm::MCStreamer>>::operator->() const /usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/14.2.1/../../../../include/c++/14.2.1/bits/unique_ptr.h:457:9
#20 0x00005a51dd47278a llvm::bolt::RewriteInstance::emitAndLink() /usr/src/debug/llvm-bolt/llvm-project-19.1.6.src/bolt/lib/Rewrite/RewriteInstance.cpp:3461:3
#21 0x00005a51dd46a0ee llvm::bolt::RewriteInstance::run() /usr/src/debug/llvm-bolt/llvm-project-19.1.6.src/bolt/lib/Rewrite/RewriteInstance.cpp:710:3
#22 0x00005a51dcc56a37 llvm::Error::getPtr() const /usr/src/debug/llvm-bolt/llvm-project-19.1.6.src/llvm/include/llvm/Support/Error.h:282:12
#23 0x00005a51dcc56a37 llvm::Error::operator bool() /usr/src/debug/llvm-bolt/llvm-project-19.1.6.src/llvm/include/llvm/Support/Error.h:242:16
#24 0x00005a51dcc56a37 main /usr/src/debug/llvm-bolt/llvm-project-19.1.6.src/bolt/tools/driver/llvm-bolt.cpp:267:17
#25 0x000074697ac34e08 __libc_start_call_main /usr/src/debug/glibc/glibc/csu/../sysdeps/nptl/libc_start_call_main.h:74:3
#26 0x000074697ac34ecc call_init /usr/src/debug/glibc/glibc/csu/../csu/libc-start.c:128:20
#27 0x000074697ac34ecc __libc_start_main /usr/src/debug/glibc/glibc/csu/../csu/libc-start.c:347:5
#28 0x00005a51dcc547f5 (/usr/bin/llvm-bolt+0x497f5)
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0.	Program arguments: /usr/bin/llvm-bolt python.prebolt -o python.bolt -data=python.fdata -update-debug-sections -reorder-blocks=ext-tsp -reorder-functions=hfsort+ -split-functions -icf=1 -inline-all -split-eh -reorder-functions-use-hot-size -peepholes=none -jump-tables=aggressive -inline-ap -indirect-call-promotion=all -dyno-stats -use-gnu-stack -frame-opt=hot
Additional logs
make profile-bolt-stamp
make[1]: Entering directory '/big/workspace/cpython'
# Ensure a pristine, pre-BOLT copy of the binary and no profile data from last run.
for bin in python; do \
  prebolt="${bin}.prebolt"; \
  if [ -e "${prebolt}" ]; then \
    echo "Restoring pre-BOLT binary ${prebolt}"; \
    mv "${bin}.prebolt" "${bin}"; \
  fi; \
  cp "${bin}" "${prebolt}"; \
  rm -f ${bin}.bolt.*.fdata ${bin}.fdata; \
done
# Instrument each binary.
for bin in python; do \
  /usr/bin/llvm-bolt "${bin}" -instrument -instrumentation-file-append-pid -instrumentation-file=/big/workspace/cpython/${bin}.bolt -o ${bin}.bolt_inst -update-debug-sections; \
  mv "${bin}.bolt_inst" "${bin}"; \
done
BOLT-INFO: Target architecture: x86_64
BOLT-INFO: BOLT version: 6a0964d75628b15bafd078342120888c0e6d126f
BOLT-INFO: first alloc address is 0x400000
BOLT-INFO: creating new program header table at address 0xa00000, offset 0x600000
BOLT-INFO: enabling relocation mode
BOLT-INFO: forcing -jump-tables=move for instrumentation
BOLT-INFO: enabling lite mode
BOLT-INFO: 0 out of 7746 functions in the binary (0.0%) have non-empty execution profile
BOLT-INFO: validate-mem-refs updated 2 object references
BOLT-INSTRUMENTER: Number of indirect call site descriptors: 1995
BOLT-INSTRUMENTER: Number of indirect call target descriptors: 7717
BOLT-INSTRUMENTER: Number of function descriptors: 7717
BOLT-INSTRUMENTER: Number of branch counters: 150914
BOLT-INSTRUMENTER: Number of ST leaf node counters: 47237
BOLT-INSTRUMENTER: Number of direct call counters: 0
BOLT-INSTRUMENTER: Total number of counters: 198151
BOLT-INSTRUMENTER: Total size of counters: 1585208 bytes (static alloc memory)
BOLT-INSTRUMENTER: Total size of string table emitted: 166866 bytes in file
BOLT-INSTRUMENTER: Total size of descriptors: 10930728 bytes in file
BOLT-INSTRUMENTER: Profile will be saved to file /big/workspace/cpython/python.bolt
BOLT-INFO: 66826 instructions were shortened
BOLT-INFO: removed 84 empty blocks
BOLT-INFO: UCE removed 844 blocks and 51394 bytes of code
BOLT-INFO: padding code to 0x1600000 to accommodate hot text
BOLT-INFO: output linked against instrumentation runtime library, lib entry point is 0x18e1950
BOLT-INFO: clear procedure is 0x18dd390
BOLT-INFO: patched build-id (flipped last bit)
BOLT-INFO: setting _end to 0x190345c
BOLT-INFO: setting _end to 0x190345c
BOLT-INFO: setting __bolt_runtime_start to 0x18e1900
BOLT-INFO: setting __bolt_runtime_fini to 0x18e1950
BOLT-INFO: setting __hot_start to 0xc00000
BOLT-INFO: setting __hot_end to 0x14b46c6
# Run instrumented binaries to collect data.
./python -m test --pgo --timeout=
Using random seed: 717582732
0:00:00 load avg: 3.11 Run 44 tests sequentially in a single process
0:00:00 load avg: 3.11 [ 1/44] test_array
0:00:01 load avg: 2.94 [ 2/44] test_base64
0:00:07 load avg: 2.95 [ 3/44] test_binascii
0:00:07 load avg: 2.95 [ 4/44] test_binop
0:00:07 load avg: 2.95 [ 5/44] test_bisect
0:00:08 load avg: 2.95 [ 6/44] test_bytes
0:00:14 load avg: 2.87 [ 7/44] test_bz2
0:00:15 load avg: 2.87 [ 8/44] test_cmath
0:00:15 load avg: 2.87 [ 9/44] test_codecs
0:00:17 load avg: 2.88 [10/44] test_collections
0:00:18 load avg: 2.88 [11/44] test_complex
0:00:19 load avg: 2.88 [12/44] test_dataclasses
0:00:19 load avg: 2.88 [13/44] test_datetime
0:00:26 load avg: 2.59 [14/44] test_decimal
0:00:30 load avg: 2.46 [15/44] test_difflib
0:00:32 load avg: 2.46 [16/44] test_embed
0:00:35 load avg: 2.43 [17/44] test_float
0:00:36 load avg: 2.43 [18/44] test_fstring
0:00:39 load avg: 2.43 [19/44] test_functools
0:00:40 load avg: 2.43 [20/44] test_generators
0:00:40 load avg: 2.43 [21/44] test_hashlib
0:00:41 load avg: 2.31 [22/44] test_heapq
0:00:42 load avg: 2.31 [23/44] test_int
0:00:43 load avg: 2.31 [24/44] test_itertools
0:00:51 load avg: 2.11 [25/44] test_json
0:01:02 load avg: 2.09 [26/44] test_long
0:01:05 load avg: 2.09 [27/44] test_lzma
0:01:05 load avg: 2.09 [28/44] test_math
0:01:08 load avg: 2.00 [29/44] test_memoryview
0:01:09 load avg: 2.00 [30/44] test_operator
0:01:09 load avg: 2.00 [31/44] test_ordered_dict
0:01:11 load avg: 1.92 [32/44] test_patma
0:01:12 load avg: 1.92 [33/44] test_pickle
0:01:18 load avg: 1.85 [34/44] test_pprint
0:01:18 load avg: 1.85 [35/44] test_re
0:01:20 load avg: 1.85 [36/44] test_set
0:01:26 load avg: 1.80 [37/44] test_sqlite3
0:01:33 load avg: 1.73 [38/44] test_statistics
0:01:41 load avg: 1.62 [39/44] test_str
0:01:45 load avg: 1.62 [40/44] test_struct
0:01:47 load avg: 1.57 [41/44] test_tabnanny
0:01:50 load avg: 1.57 [42/44] test_time
0:01:52 load avg: 1.45 [43/44] test_xml_etree
0:01:53 load avg: 1.45 [44/44] test_xml_etree_c
les/getbuildinfo.o Parser/token.o  Parser/pegen.o Parser/pegen_errors.o Parser/action_helpers.o Parser/parser.o Parser/string
Total duration: 1 min 55 sec
Total tests: run=9,394 skipped=214
Total test files: run=44/44
Result: SUCCESS
# Merge all the data files together.
for bin in python; do \
  /usr/bin/merge-fdata ${bin}.*.fdata > "${bin}.fdata"; \
  rm -f ${bin}.*.fdata; \
done
Using legacy profile format.
Profile from 52 files merged.
# Run bolt against the merged data to produce an optimized binary.
for bin in python; do \
  /usr/bin/llvm-bolt "${bin}.prebolt" -o "${bin}.bolt" -data="${bin}.fdata" -update-debug-sections -reorder-blocks=ext-tsp -reorder-functions=hfsort+ -split-functions -icf=1 -inline-all -split-eh -reorder-functions-use-hot-size -peepholes=none -jump-tables=aggressive -inline-ap -indirect-call-promotion=all -dyno-stats -use-gnu-stack -frame-opt=hot ; \
  mv "${bin}.bolt" "${bin}"; \
done
BOLT-WARNING: '-reorder-functions=hfsort+' is deprecated, please use '-reorder-functions=cdsort' instead
BOLT-INFO: Target architecture: x86_64
BOLT-INFO: BOLT version: 6a0964d75628b15bafd078342120888c0e6d126f
BOLT-INFO: first alloc address is 0x400000
BOLT-INFO: enabling relocation mode
BOLT-INFO: enabling lite mode
BOLT-INFO: pre-processing profile using branch profile reader
BOLT-INFO: 4651 out of 7746 functions in the binary (60.0%) have non-empty execution profile
BOLT-INFO: 14 functions with profile could not be optimized
BOLT-INFO: profile for 1 objects was ignored
BOLT-INFO: 62009 instructions were shortened
BOLT-INFO: removed 933 empty blocks
BOLT-INFO: ICF folded 228 out of 8119 functions in 4 passes. 0 functions had jump tables.
BOLT-INFO: Removing all identical functions will save 21.04 KB of code space. Folded functions were called 96915620 times based on profile.
BOLT-INFO: ICP Total indirect calls = 2651973684, 357 callsites cover 99% of all indirect calls
BOLT-INFO: ICP total indirect callsites with profile = 408
BOLT-INFO: ICP total jump table callsites = 327
BOLT-INFO: ICP total number of calls = 1801366538
BOLT-INFO: ICP percentage of calls that are indirect = 57.6%
BOLT-INFO: ICP percentage of indirect calls that can be optimized = 66.0%
BOLT-INFO: ICP percentage of indirect callsites that are optimized = 38.2%
BOLT-INFO: ICP number of method load elimination candidates = 0
BOLT-INFO: ICP percentage of method calls candidates that have loads eliminated = 0.0%
BOLT-INFO: ICP percentage of indirect branches that are optimized = 51.3%
BOLT-INFO: ICP percentage of jump table callsites that are optimized = 38.8%
BOLT-INFO: ICP number of jump table callsites that can use hot indices = 0
BOLT-INFO: ICP percentage of jump table callsites that use hot indices = 0.0%
BOLT-INFO: inlined 441716628 calls at 5714 call sites in 3 iteration(s). Change in binary size: 258240 bytes.
BOLT-INFO: ICF folded 4 out of 7891 functions in 3 passes. 0 functions had jump tables.
BOLT-INFO: Removing all identical functions will save 0.59 KB of code space. Folded functions were called 55974 times based on profile.
BOLT-INFO: basic block reordering modified layout of 2702 functions (58.10% of profiled, 34.26% of total)
BOLT-INFO: UCE removed 55 blocks and 0 bytes of code
BOLT-INFO: splitting separates 1563500 hot bytes from 796858 cold bytes (66.24% of split functions is hot).
BOLT-INFO: 44 Functions were reordered by LoopInversionPass
BOLT-INFO: program-wide dynostats after all optimizations before SCTC and FOP:
s/config.o Modules/main.o Modules/gcmodule.o Modules/atexitmodule.o  Modules/faulthandler.o  Modules/posixmodule.o  Modules/s
         19344703631 : executed forward branches
          2659191733 : taken forward branches
          3760694037 : executed backward branches
          2291469612 : taken backward branches
           850469448 : executed unconditional branches
          3272559894 : all function calls
          1322937184 : indirect calls
           167879416 : PLT calls
        162099894623 : executed instructions
         42144050223 : executed load instructions
         21878573462 : executed store instructions
          1603155126 : taken jump table branches
                   0 : taken unknown indirect branches
         23955867116 : total branches
          5801130793 : taken branches
         18154736323 : non-taken conditional branches
          4950661345 : taken conditional branches
         23105397668 : all conditional branches
         20489079093 : executed forward branches (+5.9%)
          2115361961 : taken forward branches (-20.5%)
          4910822367 : executed backward branches (+30.6%)
          2500360925 : taken backward branches (+9.1%)
           796202750 : executed unconditional branches (-6.4%)
          2832643280 : all function calls (-13.4%)
           609200227 : indirect calls (-54.0%)
           167879160 : PLT calls (-0.0%)
        164775133622 : executed instructions (+1.7%)
         41318829231 : executed load instructions (-2.0%)
         21878527330 : executed store instructions (-0.0%)
           781427138 : taken jump table branches (-51.3%)
                   0 : taken unknown indirect branches (=)
         26196104210 : total branches (+9.4%)
          5411925636 : taken branches (-6.7%)
         20784178574 : non-taken conditional branches (+14.5%)
          4615722886 : taken conditional branches (-6.8%)
         25399901460 : all conditional branches (+9.9%)
BOLT-INFO: SCTC: patched 171 tail calls (170 forward) tail calls (1 backward) from a total of 171 while removing 81 double jumps and removing 204 basic blocks totalling 924 bytes of code. CTCs total execution count is 54997775 and the number of times CTCs are taken is 53824796
BOLT-INFO: FOP optimized 16 redundant load(s) and 0 unused store(s)
BOLT-INFO: Frequency of redundant loads is 17334971 and frequency of unused stores is 0
BOLT-INFO: Frequency of loads changed to use a register is 17334971 and frequency of loads changed to use an immediate is 0
BOLT-INFO: FOP deleted 12 load(s) (dyn count: 17334372) and 0 store(s)
BOLT-INFO: FRAME ANALYSIS: 3468 function(s) were not optimized.
BOLT-INFO: FRAME ANALYSIS: 1229 function(s) (59.0% dyn cov) could not have its frame indices restored.
BOLT-INFO: Shrink wrapping moved 127 spills inserting load/stores and 0 spills inserting push/pops
BOLT-INFO: Shrink wrapping reduced 1314488115 store executions (0.8% total instructions executed, 6.0% store instructions)
BOLT-INFO: Shrink wrapping failed at reducing 0 store executions (0.0% total instructions executed, 0.0% store instructions)
BOLT-INFO: Allocation combiner: 137 empty spaces coalesced (dyn count: 1534146234).
/usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/14.2.1/../../../../include/c++/14.2.1/bits/stl_vector.h:1149: const_reference std::vector::operator[](size_type) const [_Tp = llvm::DWARFDebugLine::FileNameEntry, _Alloc = std::allocator]: Assertion '__n < this->size()' failed.
 #0 0x00005a51dd3fd860 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) /usr/src/debug/llvm-bolt/llvm-project-19.1.6.src/llvm/lib/Support/Unix/Signals.inc:723:13
 #1 0x00005a51dd3fb49b llvm::sys::RunSignalHandlers() /usr/src/debug/llvm-bolt/llvm-project-19.1.6.src/llvm/lib/Support/Signals.cpp:106:18
 #2 0x00005a51dd3fe165 SignalHandler(int) /usr/src/debug/llvm-bolt/llvm-project-19.1.6.src/llvm/lib/Support/Unix/Signals.inc:413:1
 #3 0x000074697ac4c1d0 (/usr/bin/../lib/libc.so.6+0x3d1d0)
 #4 0x000074697aca53f4 __pthread_kill_implementation /usr/src/debug/glibc/glibc/nptl/pthread_kill.c:44:76
 #5 0x000074697ac4c120 raise /usr/src/debug/glibc/glibc/signal/../sysdeps/posix/raise.c:27:6
 #6 0x000074697ac334c3 abort /usr/src/debug/glibc/glibc/stdlib/abort.c:81:7
 #7 0x000074697aed3af0 std::chrono::_V2::system_clock::now() /usr/src/debug/gcc/gcc/libstdc++-v3/src/c++11/chrono.cc:52:5
 #8 0x00005a51dda10cf9 std::vector>::operator[](unsigned long) const /usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/14.2.1/../../../../include/c++/14.2.1/bits/stl_vector.h:1149:2
 #9 0x00005a51dda10cf9 llvm::bolt::BinaryContext::addDebugFilenameToUnit(unsigned int, unsigned int, unsigned int) /usr/src/debug/llvm-bolt/llvm-project-19.1.6.src/bolt/lib/Core/BinaryContext.cpp:1587:13
#10 0x00005a51dda220bd (anonymous namespace)::BinaryEmitter::emitLineInfo(llvm::bolt::BinaryFunction const&, llvm::SMLoc, llvm::SMLoc, bool, llvm::MCSymbol*&) /usr/src/debug/llvm-bolt/llvm-project-19.1.6.src/bolt/lib/Core/BinaryEmitter.cpp:678:25
#11 0x00005a51dda220bd (anonymous namespace)::BinaryEmitter::emitFunctionBody(llvm::bolt::BinaryFunction&, llvm::bolt::FunctionFragment&, bool) /usr/src/debug/llvm-bolt/llvm-project-19.1.6.src/bolt/lib/Core/BinaryEmitter.cpp:461:25
#12 0x00005a51dda22e49 (anonymous namespace)::BinaryEmitter::emitFunction(llvm::bolt::BinaryFunction&, llvm::bolt::FunctionFragment&) /usr/src/debug/llvm-bolt/llvm-project-19.1.6.src/bolt/lib/Core/BinaryEmitter.cpp:376:24
#13 0x00005a51dda226e2 (anonymous namespace)::BinaryEmitter::emitFunctions()::$_0::operator()(std::vector> const&) const /usr/src/debug/llvm-bolt/llvm-project-19.1.6.src/bolt/lib/Core/BinaryEmitter.cpp:240:18
#14 0x00005a51dda20fe7 (anonymous namespace)::BinaryEmitter::emitFunctions() /usr/src/debug/llvm-bolt/llvm-project-19.1.6.src/bolt/lib/Core/BinaryEmitter.cpp:275:8
#15 0x00005a51dda20fe7 (anonymous namespace)::BinaryEmitter::emitAll(llvm::StringRef) /usr/src/debug/llvm-bolt/llvm-project-19.1.6.src/bolt/lib/Core/BinaryEmitter.cpp:205:3
#16 0x00005a51dda20fe7 llvm::bolt::emitBinaryContext(llvm::MCStreamer&, llvm::bolt::BinaryContext&, llvm::StringRef) /usr/src/debug/llvm-bolt/llvm-project-19.1.6.src/bolt/lib/Core/BinaryEmitter.cpp:1174:31
#17 0x00005a51dd47278a std::__uniq_ptr_impl>::_M_ptr() const /usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/14.2.1/../../../../include/c++/14.2.1/bits/unique_ptr.h:193:51
#18 0x00005a51dd47278a std::unique_ptr>::get() const /usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/14.2.1/../../../../include/c++/14.2.1/bits/unique_ptr.h:464:21
#19 0x00005a51dd47278a std::unique_ptr>::operator->() const /usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/14.2.1/../../../../include/c++/14.2.1/bits/unique_ptr.h:457:9
#20 0x00005a51dd47278a llvm::bolt::RewriteInstance::emitAndLink() /usr/src/debug/llvm-bolt/llvm-project-19.1.6.src/bolt/lib/Rewrite/RewriteInstance.cpp:3461:3
#21 0x00005a51dd46a0ee llvm::bolt::RewriteInstance::run() /usr/src/debug/llvm-bolt/llvm-project-19.1.6.src/bolt/lib/Rewrite/RewriteInstance.cpp:710:3
#22 0x00005a51dcc56a37 llvm::Error::getPtr() const /usr/src/debug/llvm-bolt/llvm-project-19.1.6.src/llvm/include/llvm/Support/Error.h:282:12
#23 0x00005a51dcc56a37 llvm::Error::operator bool() /usr/src/debug/llvm-bolt/llvm-project-19.1.6.src/llvm/include/llvm/Support/Error.h:242:16
#24 0x00005a51dcc56a37 main /usr/src/debug/llvm-bolt/llvm-project-19.1.6.src/bolt/tools/driver/llvm-bolt.cpp:267:17
#25 0x000074697ac34e08 __libc_start_call_main /usr/src/debug/glibc/glibc/csu/../sysdeps/nptl/libc_start_call_main.h:74:3
#26 0x000074697ac34ecc call_init /usr/src/debug/glibc/glibc/csu/../csu/libc-start.c:128:20
#27 0x000074697ac34ecc __libc_start_main /usr/src/debug/glibc/glibc/csu/../csu/libc-start.c:347:5
#28 0x00005a51dcc547f5 (/usr/bin/llvm-bolt+0x497f5)
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0.	Program arguments: /usr/bin/llvm-bolt python.prebolt -o python.bolt -data=python.fdata -update-debug-sections -reorder-blocks=ext-tsp -reorder-functions=hfsort+ -split-functions -icf=1 -inline-all -split-eh -reorder-functions-use-hot-size -peepholes=none -jump-tables=aggressive -inline-ap -indirect-call-promotion=all -dyno-stats -use-gnu-stack -frame-opt=hot
make[1]: *** [Makefile:942: profile-bolt-stamp] Error 134
make[1]: Leaving directory '/big/workspace/cpython'
make: *** [Makefile:965: bolt-opt] Error 2
@github-actions github-actions bot added the BOLT label Jan 3, 2025
@zanieb
Copy link
Author

zanieb commented Jan 3, 2025

Note there's a warning in CPython as reported in python/cpython#128437 — I resolved it with the following patch but the failure was not affected.

diff --git a/configure b/configure
index ae70f02f70e..d3ef63d9575 100755
--- a/configure
+++ b/configure
@@ -9330,7 +9330,7 @@ fi
 printf %s "checking BOLT_INSTRUMENT_FLAGS... " >&6; }
 if test -z "${BOLT_INSTRUMENT_FLAGS}"
 then
-  BOLT_INSTRUMENT_FLAGS=
+  BOLT_INSTRUMENT_FLAGS="-update-debug-sections"
 fi
 { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $BOLT_INSTRUMENT_FLAGS" >&5
 printf "%s\n" "$BOLT_INSTRUMENT_FLAGS" >&6; }
diff --git a/configure.ac b/configure.ac
index a764028e49f..9f4ca14b1dc 100644
--- a/configure.ac
+++ b/configure.ac
@@ -2221,7 +2221,7 @@ AC_ARG_VAR(
 AC_MSG_CHECKING([BOLT_INSTRUMENT_FLAGS])
 if test -z "${BOLT_INSTRUMENT_FLAGS}"
 then
-  BOLT_INSTRUMENT_FLAGS=
+  BOLT_INSTRUMENT_FLAGS="-update-debug-sections"
 fi
 AC_MSG_RESULT([$BOLT_INSTRUMENT_FLAGS])

@liusy58
Copy link
Contributor

liusy58 commented Jan 3, 2025

It's better if you can provide a docker image.

@zanieb
Copy link
Author

zanieb commented Jan 3, 2025

Yep! Here's the Docker reproduction (forgive that it's a little messy — it's slow to iterate on)

FROM archlinux:base-devel-20241229.0.293060

RUN rm -fr /etc/pacman.d/gnupg \
    && pacman-key --init \
    && pacman-key --populate archlinux \
    && pacman -Syyu --noconfirm archlinux-keyring

# Install build dependencies
RUN pacman -Sy --needed --noconfirm \
    git \
    gcc-libs \
    ncurses \
    zlib \
    zstd \
    clang \
    cmake \
    llvm \
    llvm-libs \
    ninja \
    python

# Build llvm-bolt
ADD https://github.com/llvm/llvm-project/releases/download/llvmorg-19.1.6/llvm-project-19.1.6.src.tar.xz .

RUN tar -xf llvm-project-19.1.6.src.tar.xz \
    && cd llvm-project-19.1.6.src \
    && cmake \
        -G Ninja \
        -DCMAKE_BUILD_TYPE=Release \
        -DCMAKE_INSTALL_PREFIX=/usr \
        -DLLVM_INSTALL_UTILS=ON \
        -DCMAKE_C_COMPILER=clang \
        -DCMAKE_CXX_COMPILER=clang++ \
        -DBUILD_SHARED_LIBS=OFF \
        -DLLVM_LINK_LLVM_DYLIB=OFF \
        -DLLVM_EXTERNAL_LIT=/usr/bin/lit \
        -DLLVM_ENABLE_PROJECTS="bolt" \
        -DLLVM_TARGETS_TO_BUILD="X86;AArch64" \
        llvm \
    && ninja bolt

RUN cd llvm-project-19.1.6.src && ninja install-bolt

# Download CPython
RUN git clone https://github.com/python/cpython
WORKDIR /cpython
RUN git checkout v3.13.1

ENV CC=clang
ENV CXX=clang++

# Build CPython
RUN ./configure py_cv_module__openssl=n/a py_cv_module__hashlib=n/a py_cv_module__gdbm=n/a py_cv_module__tkinter=n/a \
    --without-ensurepip \
    --enable-optimizations --enable-bolt

RUN make -j8

Note, the stacktrace is a little different

BOLT-INFO: SCTC: patched 171 tail calls (170 forward) tail calls (1 backward) from a total of 171 while removing 81 double jumps and removing 204 basic blocks totalling 924 bytes of code. CTCs total execution count is 55080780 and the number of times CTCs are taken is 53908487
BOLT-INFO: FOP optimized 16 redundant load(s) and 0 unused store(s)
BOLT-INFO: Frequency of redundant loads is 17298146 and frequency of unused stores is 0
BOLT-INFO: Frequency of loads changed to use a register is 17298146 and frequency of loads changed to use an immediate is 0
BOLT-INFO: FOP deleted 12 load(s) (dyn count: 17297556) and 0 store(s)
BOLT-INFO: FRAME ANALYSIS: 3467 function(s) were not optimized.
BOLT-INFO: FRAME ANALYSIS: 1231 function(s) (59.0% dyn cov) could not have its frame indices restored.
BOLT-INFO: Shrink wrapping moved 128 spills inserting load/stores and 0 spills inserting push/pops
BOLT-INFO: Shrink wrapping reduced 1318084588 store executions (0.8% total instructions executed, 6.0% store instructions)
BOLT-INFO: Shrink wrapping failed at reducing 0 store executions (0.0% total instructions executed, 0.0% store instructions)
BOLT-INFO: Allocation combiner: 139 empty spaces coalesced (dyn count: 1541301822).
 #0 0x000055c7d79247a6 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/usr/sbin/llvm-bolt+0x9177a6)
 #1 0x000055c7d792251c llvm::sys::RunSignalHandlers() (/usr/sbin/llvm-bolt+0x91551c)
 #2 0x000055c7d7924fe4 SignalHandler(int) Signals.cpp:0:0
 #3 0x000070b98f9f41d0 (/usr/bin/../lib/libc.so.6+0x3d1d0)
 #4 0x000055c7d7f2a4bc llvm::bolt::BinaryContext::addDebugFilenameToUnit(unsigned int, unsigned int, unsigned int) (/usr/sbin/llvm-bolt+0xf1d4bc)
 #5 0x000055c7d7f3ac73 (anonymous namespace)::BinaryEmitter::emitFunctionBody(llvm::bolt::BinaryFunction&, llvm::bolt::FunctionFragment&, bool) BinaryEmitter.cpp:0:0
 #6 0x000055c7d7f3b9f8 (anonymous namespace)::BinaryEmitter::emitFunction(llvm::bolt::BinaryFunction&, llvm::bolt::FunctionFragment&) BinaryEmitter.cpp:0:0
 #7 0x000055c7d7f3b317 (anonymous namespace)::BinaryEmitter::emitFunctions()::$_0::operator()(std::vector<llvm::bolt::BinaryFunction*, std::allocator<llvm::bolt::BinaryFunction*>> const&) const BinaryEmitter.cpp:0:0
 #8 0x000055c7d7f39b5a llvm::bolt::emitBinaryContext(llvm::MCStreamer&, llvm::bolt::BinaryContext&, llvm::StringRef) (/usr/sbin/llvm-bolt+0xf2cb5a)
 #9 0x000055c7d7995b76 llvm::bolt::RewriteInstance::emitAndLink() (/usr/sbin/llvm-bolt+0x988b76)
#10 0x000055c7d798d063 llvm::bolt::RewriteInstance::run() (/usr/sbin/llvm-bolt+0x980063)
#11 0x000055c7d71a4bc9 main (/usr/sbin/llvm-bolt+0x197bc9)
#12 0x000070b98f9dce08 (/usr/bin/../lib/libc.so.6+0x25e08)
#13 0x000070b98f9dcecc __libc_start_main (/usr/bin/../lib/libc.so.6+0x25ecc)
#14 0x000055c7d71a2ad5 _start (/usr/sbin/llvm-bolt+0x195ad5)
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0.	Program arguments: /usr/sbin/llvm-bolt python.prebolt -o python.bolt -data=python.fdata -update-debug-sections -reorder-blocks=ext-tsp -reorder-functions=hfsort+ -split-functions -icf=1 -inline-all -split-eh -reorder-functions-use-hot-size -peepholes=none -jump-tables=aggressive -inline-ap -indirect-call-promotion=all -dyno-stats -use-gnu-stack -frame-opt=hot
make[1]: *** [Makefile:942: profile-bolt-stamp] Error 139

@liusy58
Copy link
Contributor

liusy58 commented Jan 3, 2025

By the way, you are on X86?

@zanieb
Copy link
Author

zanieb commented Jan 3, 2025

Here's my system information


❯ uname -a
Linux zbvlka 6.12.4-arch1-1 #1 SMP PREEMPT_DYNAMIC Mon, 09 Dec 2024 14:31:57 +0000 x86_64 GNU/Linux

❯ cat /proc/cpuinfo | head -n 28
processor	: 0
vendor_id	: AuthenticAMD
cpu family	: 23
model		: 113
model name	: AMD Ryzen 7 3800X 8-Core Processor
stepping	: 0
microcode	: 0x8701034
cpu MHz		: 2200.000
cache size	: 512 KB
physical id	: 0
siblings	: 16
core id		: 0
cpu cores	: 8
apicid		: 0
initial apicid	: 0
fpu		: yes
fpu_exception	: yes
cpuid level	: 16
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr rdpru wbnoinvd arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif v_spec_ctrl umip rdpid overflow_recov succor smca sev sev_es
bugs		: sysret_ss_attrs spectre_v1 spectre_v2 spec_store_bypass retbleed smt_rsb srso ibpb_no_ret
bogomips	: 7788.78
TLB size	: 3072 4K pages
clflush size	: 64
cache_alignment	: 64
address sizes	: 43 bits physical, 48 bits virtual
power management: ts ttp tm hwpstate cpb eff_freq_ro [13] [14]

@liusy58
Copy link
Contributor

liusy58 commented Jan 3, 2025

Thank you, I will work on this now.

@EugeneZelenko EugeneZelenko added the crash Prefer [crash-on-valid] or [crash-on-invalid] label Jan 3, 2025
@zanieb
Copy link
Author

zanieb commented Jan 3, 2025

I also reproduced this on the latest CPython 3.14 commit: python/cpython@4c14f03

@liusy58
Copy link
Contributor

liusy58 commented Jan 4, 2025

yeah, I can reproduce, I am now working on it.

@zanieb
Copy link
Author

zanieb commented Jan 4, 2025

👍 sweet. Just letting you know in case it helps narrow things down.

I also produced the segfault on aarch64 in a Debian container on a M3 MacBook

102.6 BOLT-INFO: Target architecture: aarch64
102.6 BOLT-INFO: BOLT version: <unknown>
102.6 BOLT-INFO: first alloc address is 0x400000
102.6 BOLT-INFO: enabling relocation mode
102.7 BOLT-INFO: pre-processing profile using branch profile reader
103.2 BOLT-INFO: number of removed linker-inserted veneers: 0
103.2 BOLT-INFO: 4610 out of 7994 functions in the binary (57.7%) have non-empty execution profile
103.2 BOLT-INFO: 86 functions with profile could not be optimized
103.2 BOLT-INFO: profile for 1 objects was ignored
103.3 BOLT-INFO: removed 1 empty block
103.3 BOLT-INFO: ICF folded 469 out of 8369 functions in 6 passes. 0 functions had jump tables.
103.3 BOLT-INFO: Removing all identical functions will save 35.09 KB of code space. Folded functions were called 113496264 times based on profile.
103.4 BOLT-INFO: ICP Total indirect calls = 829823055, 157 callsites cover 99% of all indirect calls
103.4  #0 0x0000aaaac1454dac (/usr/lib/llvm-19/bin/llvm-bolt+0x1ab4dac)
103.4  #1 0x0000aaaac1453060 (/usr/lib/llvm-19/bin/llvm-bolt+0x1ab3060)
103.4  #2 0x0000aaaac1455634 (/usr/lib/llvm-19/bin/llvm-bolt+0x1ab5634)
103.4  #3 0x0000ffffa67297a0 (linux-vdso.so.1+0x7a0)
103.4  #4 0x0000aaaac19713b0 (/usr/lib/llvm-19/bin/llvm-bolt+0x1fd13b0)
103.4  #5 0x0000aaaac1974d44 (/usr/lib/llvm-19/bin/llvm-bolt+0x1fd4d44)
103.4  #6 0x0000aaaac14febb8 (/usr/lib/llvm-19/bin/llvm-bolt+0x1b5ebb8)
103.4  #7 0x0000aaaac1502cec (/usr/lib/llvm-19/bin/llvm-bolt+0x1b62cec)
103.4  #8 0x0000aaaac14adf74 (/usr/lib/llvm-19/bin/llvm-bolt+0x1b0df74)
103.4  #9 0x0000aaaac00ee1e0 (/usr/lib/llvm-19/bin/llvm-bolt+0x74e1e0)
103.4 #10 0x0000ffffa472229c (/lib/aarch64-linux-gnu/libc.so.6+0x2229c)
103.4 #11 0x0000ffffa472237c __libc_start_main (/lib/aarch64-linux-gnu/libc.so.6+0x2237c)
103.4 #12 0x0000aaaac00ec330 (/usr/lib/llvm-19/bin/llvm-bolt+0x74c330)
103.4 PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
103.4 Stack dump:
103.4 0.	Program arguments: /usr/lib/llvm-19/bin/llvm-bolt python.prebolt -o python.bolt -data=python.fdata -update-debug-sections -reorder-blocks=ext-tsp -reorder-functions=cdsort -split-functions -icf=1 -inline-all -split-eh -reorder-functions-use-hot-size -peepholes=none -jump-tables=aggressive -inline-ap -indirect-call-promotion=all -dyno-stats -use-gnu-stack -frame-opt=hot
103.4 Segmentation fault

Is there an obvious way to bypass / workaround this in the meantime? I was doing some other work with BOLT but am stuck now :)

I can workaround this with the relatively obvious patch

diff --git a/configure b/configure
index aa88c74c611..0fcd10b61cb 100755
--- a/configure
+++ b/configure
@@ -9403,7 +9403,7 @@ printf "%s\n" "$BOLT_INSTRUMENT_FLAGS" >&6; }
 printf %s "checking BOLT_APPLY_FLAGS... " >&6; }
 if test -z "${BOLT_APPLY_FLAGS}"
 then
-  BOLT_APPLY_FLAGS=" -update-debug-sections -reorder-blocks=ext-tsp -reorder-functions=cdsort -split-functions -icf=1 -inline-all -split-eh -reorder-functions-use-hot-size -peepholes=none -jump-tables=aggressive -inline-ap -indirect-call-promotion=all -dyno-stats -use-gnu-stack -frame-opt=hot "
+  BOLT_APPLY_FLAGS=" -reorder-blocks=ext-tsp -reorder-functions=cdsort -split-functions -icf=1 -inline-all -split-eh -reorder-functions-use-hot-size -peepholes=none -jump-tables=aggressive -inline-ap -indirect-call-promotion=all -dyno-stats -use-gnu-stack -frame-opt=hot "
 
 fi
 { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $BOLT_APPLY_FLAGS" >&5
diff --git a/configure.ac b/configure.ac
index 9e131ed1a2d..72f1d5d91fb 100644
--- a/configure.ac
+++ b/configure.ac
@@ -2181,7 +2181,6 @@ then
   AS_VAR_SET(
     [BOLT_APPLY_FLAGS],
     [m4_normalize("
-     -update-debug-sections
      -reorder-blocks=ext-tsp
      -reorder-functions=cdsort
      -split-functions

@liusy58
Copy link
Contributor

liusy58 commented Jan 4, 2025

The error is related to Dwarf. I am not sure what causes this, I guess dwarf type mismatch. I will try to fix it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
BOLT crash Prefer [crash-on-valid] or [crash-on-invalid]
Projects
None yet
Development

No branches or pull requests

3 participants