Bugs in Julia with AFL and C-Reduce

Nov 16, 2018 #julia

Recently, I’ve been looking at some existing tools to efficiently find bugs in the Julia language implementation: American Fuzzy Lop (AFL) for fuzzing the compiler, and C-Reduce for reducing test cases. In this post, I’ll detail the set-up and use of these tools. Skip to the bottom of this post for a demonstration.

American Fuzzy Lop

Fuzzing Julia used to be impractical due to the significant start-up time of the Julia VM, rendering typical fuzzing approaches inefficient and calling for specialized tools such as TypeFuzz.jl. However, start-up latency has been improving lately, and AFL now has some features to cope with slow binaries.

Quick start

Working with AFL 2.52b, the naive approach of using afl-gcc and afl-g++ to compile Julia by setting the CC and CXX build variables, results in a functioning julia binary. This only instruments those parts of the compiler that have been written in C/C++; code written in Julia is not covered.

To trap interesting errors, I built with LLVM_ASSERTIONS=1 and applied the following patch to at least detect errors in the Julia parts of the compiler:

--- a/julia/src/gf.c
+++ b/julia/src/gf.c
@@ -280,10 +280,11 @@ jl_code_info_t *jl_type_infer(...)
 jl_printf(JL_STDERR, "Internal error: encountered unexpected error in runtime:\n");
 jl_static_show(JL_STDERR, jl_current_exception());
 jl_printf(JL_STDERR, "\n");
 jlbacktrace(); // written to STDERR_FILENO
 linfo_src = NULL;
+abort();

Running the instrumented binary with afl-fuzz crashes the AFL forkserver. This is due to an incompatibility with the LD_BIND_NOW linker option that AFL uses. It can be disabled by specifying LD_BIND_LAZY=1 before running the fuzzer. Furthermore, Julia preallocates quite a bit of memory, and interesting test-cases might take a while due to infer and compile, so I specify -m 2048 to use up to 2G per process and -t 5000 to let them run for up to 5 seconds:

$ LD_BIND_LAZY=1 afl-fuzz -i inputs/ -o outputs/
                          -m 2048 -t 5000 -- julia @@
afl-fuzz 2.52b by <lcamtuf@google.com>
[+] You have 16 CPU cores and 2 runnable tasks (utilization: 12%).
[+] Try parallel jobs - see docs/parallel_fuzzing.txt.
[*] Checking CPU core loadout...
[+] Found a free CPU core, binding to #4.
[*] Checking core_pattern...
[*] Checking CPU scaling governor...
[*] Setting up output directories...
[*] Scanning 'inputs'...
[+] No auto-generated dictionary tokens to reuse.
[*] Creating hard links for all input files...
[*] Validating target binary...
[+] Deferred forkserver binary detected.
[*] Attempting dry run with 'id:000000,orig:binary_trees.jl'...
[*] Spinning up the fork server...
[+] All right - fork server is up.
    len = 1324, map size = 12396, exec speed = 249839 us
[+] All test cases processed.
[+] All set and ready to roll!

Sadly, performance is pretty low: between 0.2 and 2 executions per second, respectively executions that error (not crash) and succeed.

Let’s try and optimize that.

Only instrument the compiler

A first trivial optimization is to only instrument code that matters. Specifically, compiling dependencies (such as LLVM and OpenBLAS) with the system compiler bumps the speed up a notch:

julia$ make -C deps
julia$ make CC=afl/afl-gcc CXX=afl/afl-g++

Fail hard and fast

Fuzzing code results in lots of invalid sources getting processed by the Julia compiler. Each of these result in a nice exception, which relies on expensive stack unwinding to provide back traces. We can disable this functionality by building with DISABLE_LIBUNWIND=1.

Another costly element of errors is displaying them, as this code is not part of the precompiled system image. This functionality, courtesy of the display_error function in julia/base/client.jl, is easily disabled by preloading a file that redefines the relevant methods:

$ cat fuzz.jl
Base.display_error(er, bt) = println("ERROR")
$ julia -L fuzz.jl -e "error()"
ERROR

The above also applies to deprecation warnings, but those can be disabled more easily by passing --depwarn=no to Julia.

Deferred fork server

Much of the time launching Julia is spent setting-up libraries, initializing code generation, etc. This recurring cost can be avoided by using a deferred fork server, where AFL starts the binary at a manually-specified point during execution. In the case of Julia, this would be right before loading the input file:

--- a/julia/ui/repl.c
+++ b/julia/ui/repl.c
@@ -89,6 +89,10 @@ static NOINLINE int true_main(int argc, char *argv[])
     jl_function_t *start_client = jl_base_module ?
         (jl_function_t*)jl_get_global(jl_base_module, jl_symbol("_start")) : NULL;
 
+#ifdef __AFL_HAVE_MANUAL_CONTROL
+  __AFL_INIT();
+#endif
+
     if (start_client) {

This approach relies on the binary being compiled with AFL’s LLVM instrumenter, available as afl-clang-fast and afl-clang-fast++ for compiling respectively C and C++ code, again specified using the CC and CXX build variables. Note that this instrumentation seems incompatible with OpenBLAS, so if you were to instrument dependencies you would need to compile that library using a regular compiler.

Use of the deferred fork server necessitated several changes. For one, the symbols AFL defines need to be exported globally:

--- a/julia/src/julia.expmap
+++ b/julia/src/julia.expmap
@@ -1,5 +1,6 @@
 {
   global:
+    __afl*;

Furthermore, despite the __afl_manual_init symbol that enables the deferred fork server being part of the Julia binary, AFL does not pick it up and defaults to regular forking. I did not debug this issue, and instead patched AFL to unconditionally enter deferred mode:

--- a/afl/afl-fuzz.c
+++ b/afl/afl-fuzz.c
@@ -6950,7 +6950,7 @@ EXP_ST void check_binary(u8* fname) {
-  if (memmem(f_data, f_len, DEFER_SIG, strlen(DEFER_SIG) + 1)) {
+  if (1) {
 
     OKF(cPIN "Deferred forkserver binary detected.")

You should now see the following in the afl-fuzz output:

[+] Deferred forkserver binary detected.

Note that the AFL documentation suggests putting the __AFL_INIT call before the initialization of important threads, as they don’t survive the fork. The patch above does not do that, as threads are spawned by restore_signals as well as different module initializers. To be safe, I built with JULIA_THREADS=0, configured LLVM with ENABLE_THREADS unset and executed with OPENBLAS_NUM_THREADS=1 to reduce possible issues. Still, your mileage might vary and you might have to move the initialization point further upwards.

Multithreaded fuzzing

With the above fixes, we go from 0.2-2 executions/s to 3-30; still considered slow but a significant improvement nonetheless. Luckily, AFL scales pretty well and can be easily executed across all threads of a system. This brings the final execution speed to around 500 per second.

Now let’s focus on improving fuzzing quality.

Instrumenting Julia code

AFL can more accurately measure the impact of changes to inputs if all relevant code is instrumented. Nowadays, part of the Julia compiler is written in Julia, and compiled during initial bootstrap of the julia binary. Julia code is compiled with LLVM, so we can re-use the LLVM-based instrumentation passes that are part of AFL. However, Julia uses its own (heavily patched) version of LLVM, so we need to recompile AFL and link against Julia’s LLVM library:

afl/llvm_mode$ PATH=julia/usr/tools:$PATH make
# building the tests will fail

In order to use this pass, which is now linked against Julia’s LLVM, we need a compatible build of Clang that can load the instrumentation pass. We can do so by rebuilding Julia’s copy of LLVM with the BUILD_LLVM_CLANG variable set:

julia$ make -C deps BUILD_LLVM_CLANG=1

# we can now run AFL's tests
afl/llvm_mode$ PATH=julia/usr/tools:$PATH make

AFL needs to be modified to expose its instrumentation pass:

--- a/afl/llvm_mode/afl-llvm-pass.so.cc
+++ b/afl/llvm_mode/afl-llvm-pass.so.cc
@@ -177,6 +177,10 @@
+namespace llvm {
+Pass *createAFLCoveragePass() {
+  return new AFLCoverage();
+}
+};

The following changes make Julia use these instrumentation passes:

--- a/julia/src/Makefile
+++ b/julia/src/Makefile
@@ -95,6 +95,8 @@
 endif # USE_LLVM_SHLIB == 1
 endif

+LLVMLINK += afl/afl-llvm-pass.so

--- a/julia/src/jitlayers.cpp
+++ b/julia/src/jitlayers.cpp
@@ -32,6 +32,7 @@
 namespace llvm {
     extern Pass *createLowerSimdLoopPass();
+    extern Pass *createAFLCoveragePass();
 }

@@ -230,6 +231,8 @@ void addOptimizationPasses(...)
+    PM->add(createAFLCoveragePass());
 }

Now build Julia while pointing AFL to the compatible clang binary:

julia$ export AFL_CC=julia/usr/tools/clang
julia$ export AFL_CXX=julia/usr/tools/clang++
julia$ make CC=afl/afl-clang-fast CXX=afl/afl-clang-fast++

For some reason, compiled Julia code uses AFL’s afl_prev_loc TLS variable differently then how it is defined in the contained binary (instead using __emutls_v.__afl_prev_loc). I worked around this issue by changing the definition to a regular variable, since we’re not using threads anyway:

--- a/afl/llvm_mode/afl-llvm-rt.o.c
+++ b/afl/llvm_mode/afl-llvm-rt.o.c
@@ -52,7 +52,7 @@
-__thread u32 __afl_prev_loc;
+u32 __afl_prev_loc;

--- a/afl/llvm_mode/afl-llvm-pass.so.cc
+++ b/afl/llvm_mode/afl-llvm-pass.so.cc
@@ -101,8 +101,7 @@ bool AFLCoverage::runOnModule(Module &M) {
 GlobalVariable *AFLPrevLoc = new GlobalVariable(
-M, Int32Ty, false, GlobalValue::ExternalLinkage, 0, "__afl_prev_loc",
-0, GlobalVariable::GeneralDynamicTLSModel, 0, false);
+M, Int32Ty, false, GlobalValue::ExternalLinkage, 0, "__afl_prev_loc");

Make sure to rebuild all of Julia after applying this patch.

We can now compare the generated code with and without instrumentation:

--- julia-original     -e "@code_llvm sin(1.)"
+++ julia-instrumented -e "@code_llvm sin(1.)"
 L4:                                               ; preds = %top
 ; Location: special/trig.jl:32
 ; Function <; {
 ; Location: float.jl:452
-  %4 = fcmp uge double %2, 0x3E50000000000000
+  %10 = load i32, i32* @__afl_prev_loc
+  %11 = load i8*, i8** @__afl_area_ptr
+  %12 = xor i32 %10, 56933
+  %13 = getelementptr i8, i8* %11, i32 %12
+  %14 = load i8, i8* %13
+  %15 = add i8 %14, 1
+  store i8 %15, i8* %13
+  store i32 28466, i32* @__afl_prev_loc
+  %16 = fcmp uge double %8, 0x3E50000000000000
 ;}

Dictionary for Julia code

A dictionary makes it easier for AFL to explore verbose grammars such as programming languages. I created a dictionary for Julia, with most keywords, important operators, and common syntax such as values and definitions. You can use this dictionary by passing -x julia.dict to afl-fuzz.

Test-case reduction

Although AFL provides a test-case minimizer (afl-tmin), I decided to look into a independent tool. C-Reduce is such a tool, blindly reducing test cases based on a user-specified condition. As such, it is more flexible, and independent from the complicated set-up from above.

Using C-Reduce is fairly simple: provide a script whose return code indicates the state of the current test case, and run it over some files. I’ve created some scripts that improve integration with Julia and its code loading mechanism: maleadt/creduce_julia. For more information, refer to the repository’s README, or have a look at this Discourse thread.

Demo

After running AFL for about a day on a 16-core machine, afl-whatsup reported the following:

$ afl-whatsup -s outputs/
status check tool for afl-fuzz by <lcamtuf@google.com>

Summary stats
=============

       Fuzzers alive : 16
      Total run time : 13 days, 9 hours
         Total execs : 3 million
    Cumulative speed : 42 execs/sec
       Pending paths : 2023 faves, 62710 total
  Pending per fuzzer : 126 faves, 3919 total (on average)
       Crashes found : 1 locally unique

We found a crash! The actual input is written in one of the crashes subdirectories, and contained the following code:

abstract type BTree end

mutable struct Empty <: BTree
end

mutable struct Node <: BTree
  info
  left::BTree
  right::BTree
end

function make(val, d)
  if d == 0
    Node(val, Empty(), Empty())
  else
    nval = val * 2
    Node(val, make(nval-1, d-1), make(nval, d-1))
  end
end

check(t::Empty) = 0
check(t::Node) = t.info + check(t.left) - check(t.right)

function loop_depths(d, min_depth, max_depth)
  for i = 0:div(max_depth - d, 2)
    niter = 1 << (max_depth - d + min_depth)
    c = 0
    for j = 1:niter
      c += check(make(i, d)) + check(make(-i, d))
    end,#@printf("%i\t trees of depth %i\t check: %i\n", 2*niter, d, c)
    d += 2
  end
end

const N=10
min_depth = 4
max_depth = N
stretch_depth = max_depth + 1

let c = check(make(0, stretch_depth))
  #@printf("stretch tree of depth %i\t check: %i\n", stretch_depth, c)
end

long_lived_tree = make(0, max_depth)

loop_depths(min_depth, min_depth, max_depth)

As you may notice, I used the BaseBenchmark’s shootout benchmarks as inputs to AFL. Pasting the reduced code in a Julia REPL indeed crashes the compiler:

Internal error: encountered unexpected error in runtime:
BoundsError(a=Array{Core.Compiler.NewNode, (0,)}[], i=(3,))

signal (4): Illegal instruction

To reduce this code, I check-out maleadt/creduce_julia and use the crashing code as the initial input. We need to make sure that the run script successfully catches the error condition; in this case it is sufficient to grep for encountered unexpected error in runtime:

--- a/creduce_julia/run
+++ b/creduce_julia/run
@@ -30,4 +30,4 @@ JULIA_LOAD_PATH=$TEMP \
-julia main.jl |& grep "example error message"
+julia main.jl |& grep "encountered unexpected error in runtime"

creduce_julia$ ./run
Internal error: encountered unexpected error in runtime:

creduce_julia$ echo $?
0

Finally, we run the reduce script to kick-off the process. It will automatically work in parallel on all available cores:

creduce_julia$ ./reduce
===< 30926 >===
running 8 interestingness tests in parallel
...
===================== done ====================

pass statistics:
  method pass_balanced :: parens-inside worked 1 times and failed 2 times
  method pass_indent :: final worked 1 times and failed 0 times
  method pass_clex :: rm-toks-3 worked 1 times and failed 61 times
  method pass_lines :: 4 worked 2 times and failed 50 times
  method pass_lines :: 1 worked 2 times and failed 50 times
  method pass_clex :: rename-toks worked 2 times and failed 3 times
  method pass_blank :: 0 worked 2 times and failed 0 times
  method pass_indent :: regular worked 2 times and failed 0 times
  method pass_lines :: 8 worked 2 times and failed 50 times
  method pass_lines :: 10 worked 2 times and failed 50 times
  method pass_lines :: 6 worked 2 times and failed 50 times
  method pass_lines :: 3 worked 2 times and failed 50 times
  method pass_clex :: rm-toks-2 worked 2 times and failed 64 times
  method pass_lines :: 2 worked 2 times and failed 50 times
  method pass_balanced :: parens-to-zero worked 2 times and failed 6 times
  method pass_clex :: rm-toks-4 worked 3 times and failed 49 times
  method pass_clex :: rm-toks-13 worked 3 times and failed 18 times
  method pass_lines :: 0 worked 15 times and failed 78 times
  method pass_clex :: rm-toks-1 worked 16 times and failed 68 times

The above took about 2 minutes on an 4-core workstation. Doing some manual clean-up, we end up with the issue as reported in JuliaLang/julia#30062:

for a = 1 end, b += 2

Future work

Although a tool like AFL isn’t ideal for fuzzing a compiler as done in this blogpost, the technique and its set-up can be used to fuzz other libraries and tools that are written in Julia (eg. HTTP.jl or Images.jl). That would of course require productionisation of this proof-of-concept and its many workarounds.

A hypothetical AFL.jl package would:

build AFL’s instrumentation pass against an LLVMBuilder toolchain
provide native functionality a la python-afl
conditionally instrument code (Julia has some great tools for that)

Fuzzing the compiler itself could also be improved:

deferring the fork server even more: there’s significant initialization done by Julia code, and the manual initialization point could move right before the input eval loop
persistent fuzzing: having AFL input new test cases without restarting the process. Although Julia is far from stateless, AFL could be modified to verify crashes in a fresh process.

maleadt posts about