本文总结经典的区间第k小值数据结构题。
给定一个长为n的数组,元素为范围为[0,σ)的整数。有m个询问:求区间[l,r)中第k小的元素。
一些方法支持扩展问题:有m个操作,或者修改某个位置上的元素,或者询问区间[l,r)中第k小的元素。
本文总结经典的区间第k小值数据结构题。
给定一个长为n的数组,元素为范围为[0,σ)的整数。有m个询问:求区间[l,r)中第k小的元素。
一些方法支持扩展问题:有m个操作,或者修改某个位置上的元素,或者询问区间[l,r)中第k小的元素。
Updated in 2025-05.
A control-flow graph (CFG) is a graph representation of all paths that might be traversed through a program during its execution. Control-flow integrity (CFI) refers to security policy dictating that program execution must follow a control-flow graph. This article describes some features that compilers and hardware can use to enforce CFI, with a focus on llvm-project implementations.
CFI schemes are typically divided into forward-edge (e.g. indirect calls) and backward-edge (mainly function returns). It should be noted that exception handling and symbol interposition are not included in these categories, as far as my understanding goes.
Updated in 2025-02.
In GNU ld, -r produces a relocatable object file. This
is known as relocatable linking or partial linking. This mode suppresses
many passes done for an executable or shared object output (in
-no-pie/-pie/-shared modes). -r,
-no-pie, -pie, and -shared
specify 4 different modes. The 4 options are mutually exclusive.
The relocatable output can be used for analysis and binary manipulation. Then, the output can be used to link the final executable or shared object.
1 | clang -pie a.o b.o |
Let's go through various linker passes and see how relocatable linking changes the operation.
This article describes how to detect C++ One Definition Rule (ODR) violations. There are many good resources on the Internet about how ODR violations can introduce subtle bugs, so I will not repeat that here.
Updated in 2024-08.
glibc 2.3.4 introduced _FORTIFY_SOURCE in 2004
to catch security errors due to misuse of some C library functions. The
initially supported functions were
fprintf, gets, memcpy, memmove, mempcpy, memset, printf, snprintf, sprintf, stpcpy, strcat, strcpy, strncat, strncpy, vfprintf, vprintf, vsnprintf, vsprintf
and focused on buffer overflow detection and dangerous printf
%n uses. The implementation leverages inline functions and
__builtin_object_size (see [PATCH]
Object size checking to prevent (some) buffer overflows). More
functions were added over time and __builtin_constant_p was
used as well. As of 2022-11 glibc defines 79 default version
*_chk functions.
I was asked about a segfault related to lld linked musl libc.so on PowerPC64.
/usr/lib/ld-musl-powerpc64le.so.1 /path/to/thing
worked. The kernel ELF loader loads rtld and rtld loads the
executable./path/to/thing segfaulted. The kernel ELF loader loads
both rtld and the executable.Therefore the bug is likely due to a difference between the two modes.
Updated in 2024-08.
Note: The article will likely get frequent updates in the next few days.
This article describes some approaches to distribute debug information. Commands below will use two simple C files for demonstration.
1 | cat > a.c <<eof |
I recently revamped Competitive programming in Nim. In short, I can create a C amalgamation from a Nim program and submit the C source code to various competitive programming websites.
Then I use a Clang based tool to shorten the C source code. It does two things:
clangFormat library to remove some
whitespaceFor the first step, the tool uses a derived
ASTFrontendAction to traverse the AST twice, one for
collecting function/var/type names and the other for renaming. Building
clang::CompilerInstance from command lines needs some
boilerplate. An alternative is to use
clang::tooling::CommonOptionsParser and
clang::tooling::ClangTool.
1 | /* |
CMakeLists.txt 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61cmake_minimum_required(VERSION 3.14)
project(cminify LANGUAGES C CXX)
add_executable(cminify "")
set(DEFAULT_CMAKE_BUILD_TYPE Release)
set_property(TARGET cminify PROPERTY CXX_STANDARD 17)
set_property(TARGET cminify PROPERTY CXX_STANDARD_REQUIRED ON)
set_property(TARGET cminify PROPERTY CXX_EXTENSIONS OFF)
find_package(Clang REQUIRED)
if(CLANG_LINK_CLANG_DYLIB)
target_link_libraries(cminify PRIVATE clang-cpp)
else()
target_link_libraries(cminify PRIVATE
clangIndex
clangFormat
clangTooling
clangToolingInclusions
clangToolingCore
clangFrontend
clangParse
clangSerialization
clangSema
clangAST
clangLex
clangDriver
clangBasic
)
endif()
if(LLVM_LINK_LLVM_DYLIB)
target_link_libraries(cminify PRIVATE LLVM)
else()
target_link_libraries(cminify PRIVATE LLVMOption LLVMSupport)
endif()
if(NOT LLVM_ENABLE_RTTI)
# releases.llvm.org libraries are compiled with -fno-rtti
# The mismatch between lib{clang,LLVM}* and cminify can make libstdc++ std::make_shared return nullptr
# _Sp_counted_ptr_inplace::_M_get_deleter
if(MSVC)
target_compile_options(cminify PRIVATE /GR-)
else()
target_compile_options(cminify PRIVATE -fno-rtti)
endif()
endif()
target_sources(cminify PRIVATE main.cc)
foreach(include_dir ${LLVM_INCLUDE_DIRS} ${CLANG_INCLUDE_DIRS})
get_filename_component(include_dir_realpath ${include_dir} REALPATH)
# Don't add as SYSTEM if they are in CMAKE_CXX_IMPLICIT_INCLUDE_DIRECTORIES.
# It would reorder the system search paths and cause issues with libstdc++'s
# use of #include_next. See https://github.com/MaskRay/ccls/pull/417
if(NOT "${include_dir_realpath}" IN_LIST CMAKE_CXX_IMPLICIT_INCLUDE_DIRECTORIES)
target_include_directories(cminify SYSTEM PRIVATE ${include_dir})
endif()
endforeach()
install(TARGETS cminify RUNTIME DESTINATION bin)
Define LLVM as the llvm-project repository and
LLVMOUT as the build directory (make sure you have at least
built these targets:
ninja clang clangFormat clangIndex clangTooling).
1
2cmake -GNinja -S. -Bout/release -DCMAKE_BUILD_TYPE=Release -DCMAKE_PREFIX_PATH="$LLVMOUT;$LLVMOUT/tools/clang;$LLVM/llvm;$LLVM/clang"
ninja -C out/release
If LLVM and Clang's CMake, library, and header files are installed in
well-known locations, then -DCMAKE_PREFIX_PATH can be
omitted.
It's certainly not straightforward to find all these APIs. I mainly
use ccls as a reference which was inspired by clangIndex.
For writing this tool, I read a bit code of clang-rename,
clang-format, and C-Reduce clang_delta.
C-Reduce provides clang_delta/RenameFun.cpp
and two other passes (RenameVar, RenameParam) which do similar stuff.
Its code was a bit old now as it was written based on a Clang in circa
2012.
Let's see an example. Unfortunately I don't find clangFormat options
removing whitespace after = and ,. That can
perhaps be done by a post-processing string substitution tool without
introducing too much risk.
1 | % cat test/a.c |
Updated in 2023-07.
This article describes some Clang header modules
features that apply to #include. These features enforce a
more explicit dependency graph, which provide documentation purposes and
makes refactoring convenient. The benefits of clean header inclusions
are well described in Include
What You Use as well, so I won't repeat them here.
When using C++20 modules, these features apply to
#include in a global module fragment (module;)
but have no effect for import declarations.
-fmodules-decluseFor a #include directive, this option emits an error if
the following conditions are satisfied (see
clang/lib/Lex/ModuleMap.cpp
diagnoseHeaderInclusion):
A).B.A does not have a use-declaration of B (no
use B).For the first condition, -fmodule-map-file= is needed to
load the source module map and -fmodule-name=A is needed to
indicate that the source file is logically part of module
A.
For the second condition, the module map defining B must
be loaded by specifying -fimplicit-module-maps (implied by
-fmodules and -fcxx-modules) or a
-fmodule-map-file=.
Updated in 2022-10.
In January I wrote Compressed debug
sections. The venerable zlib shows its age and there are
replacements which are better in every metric except adoption and a
larger memory footprint. The obvious choice was Zstandard, but I was not
so confident about adoptinig it and solving the ecosystem issue. At any
rate, I slowly removed some legacy .zdebug support from
llvm-project so that a new format could be more easily introduced.