Sound and Cost-effective Fuzzing of Stripped Binaries by Incremental and Stochastic Rewriting

Overview

StochFuzz: A New Solution for Binary-only Fuzzing

test benchmark

loading-ag-167

StochFuzz is a (probabilistically) sound and cost-effective fuzzing technique for stripped binaries. It is facilitated by a novel incremental and stochastic rewriting technique that is particularly suitable for binary-only fuzzing. Any AFL-based fuzzer, which takes edge coverage (defined by AFL) as runtime feedback, can acquire benefits from StochFuzz to directly fuzz stripped binaries.

More data and the results of the experiments can be found here. Example cases of leveraging StochFuzz to improve advanced AFL-based fuzzers (AFL++ and Polyglot) can be found in system.md.

Clarifications

  • We adopt a new system design than the one from the paper. Details can be found at system.md.
  • In the paper, when we are talking about e9patch, we are actually talking about the binary-only fuzzing tool built upon e9patch, namely e9tool. Please refer to its website for more details.
  • StochFuzz provides sound rewriting for binaries without inlined data, and probabilistically sound rewriting for the rest.

Building StochFuzz

StochFuzz is built upon Keystone, Capstone, GLib, and libunwind.

These dependences can be built by build.sh. If you are trying to build StochFuzz in a clean container, make sure some standard tools like autoreconf and libtool are installed.

$ git clone https://github.com/ZhangZhuoSJTU/StochFuzz.git
$ cd StochFuzz
$ ./build.sh

StochFuzz itself can be built by GNU Make.

$ cd src
$ make release

We have tested StochFuzz on Ubuntu 18.04. If you have any issue when running StochFuzz on other systems, please kindly let us know.

How to Use

StochFuzz provides multiple rewriting options, which follows the AFL's style of passing arguments.

$ ./stoch-fuzz -h
stoch-fuzz 1.0.0 by <[email protected]>

./stoch-fuzz [ options ] -- target_binary [ ... ]

Mode settings:

  -S            - start a background daemon and wait for a fuzzer to attach (defualt mode)
  -R            - dry run target_binary with given arguments without an attached fuzzer
  -P            - patch target_binary without incremental rewriting
  -D            - probabilistic disassembly without rewriting
  -V            - show currently observed breakpoints

Rewriting settings:

  -g            - trace previous PC
  -c            - count the number of basic blocks with conflicting hash values
  -d            - disable instrumentation optimization
  -r            - assume the return addresses are only used by RET instructions
  -e            - install the fork server at the entrypoint instead of the main function
  -f            - forcedly assume there is data interleaving with code
  -i            - ignore the call-fallthrough edges to defense RET-misusing obfuscation

Other stuff:

  -h            - print this help
  -x execs      - set the number of executions after which a checking run will be triggered
                  set it as zero to disable checking runs (default: 200000)
  -t msec       - set the timeout for each daemon-triggering execution
                  set it as zero to ignore the timeout (default: 2000 ms)
  -l level      - set the log level, including INFO, WARN, ERROR, and FATAL (default: INFO)

Basic Usage

- It is worth first trying the advanced strategy (see below) because that is much more cost-effective.

To fuzz a stripped binary, namely example.out, we need to cd to the directory of the target binary. For example, if the full path of example.out is /root/example.out, we need to first cd /root/. Furthermore, it is dangerous to run two StochFuzz instances under the same directory. These restrictions are caused by some design faults and we will try to relax them in the future.

Assuming StochFuzz is located at /root/StochFuzz/src/stoch-fuzz, execute the following command to start rewriting the target binary.

$ cd /root/
$ /root/StochFuzz/src/stoch-fuzz -- example.out # do not use ./example.out here

After the initial rewriting, we will get a phantom file named example.out.phantom. This phantom file can be directly fuzzed by AFL or any AFL-based fuzzer. Note that the StochFuzz process would not stop during fuzzing, so please make sure the process is alive during fuzzing.

Here is a demo that shows how StochFuzz works.

asciicast

Advanced Usage

Compared with the compiler-based instrumentation (e.g., afl-clang-fast), StochFuzz has additional runtime overhead because it needs to emulate each CALL instruction to support stack unwinding.

Inspired by a recent work, we provide an advanced rewriting strategy where we do not emulate CALL instructions but wrap the _ULx86_64_step function from libunwind to support stack unwinding. This strategy works for most binaries but may fail in some cases like fuzzing statically linked binaries.

To enable such strategy, simply provide a -r option to StochFuzz.

$ cd /root/
$ /root/StochFuzz/src/stoch-fuzz -r -- example.out # do not use ./example.out here

Addtionally, before fuzzing, we need to prepare the AFL_PRELOAD environment variable for AFL.

$ export STOCHFUZZ_PRELOAD=$(/root/StochFuzz/scritps/stochfuzz_env.sh)
$ AFL_PRELOAD=$STOCHFUZZ_PRELOAD afl-fuzz -i seeds -o output -t 2000 -- example.out.phantom @@

Following demo shows how to apply this advanced strategy.

asciicast

Troubleshootings

Common issues can be referred to trouble.md. If it cannot help solve your problem, please kindly open a Github issue.

Besides, we provide some tips on using StochFuzz, which can be found at tips.md

Development

Currently, we have many todo items. We present them in todo.md.

We also present many pending decisions which we are hesitating to take, in todo.md. If you have any thought/suggestion, do not hesitate to let us know. It would be very appreciated if you can help us improve StochFuzz.

StochFuzz should be considered an alpha-quality software and it is likely to contain bugs.

I will try my best to maintain StochFuzz timely, but sometimes it may take me more time to respond. Thanks for your understanding in advance.

Cite

Zhang, Zhuo, et al. "STOCHFUZZ: Sound and Cost-effective Fuzzing of Stripped Binaries by Incremental and Stochastic Rewriting." 2021 IEEE Symposium on Security and Privacy (SP). IEEE, 2021.

References

  • Duck, Gregory J., Xiang Gao, and Abhik Roychoudhury. "Binary rewriting without control flow recovery." Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation. 2020.
  • Meng, Xiaozhu, and Weijie Liu. "Incremental CFG patching for binary rewriting." Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems. 2021.
  • Aschermann, Cornelius, et al. "Ijon: Exploring deep state spaces via fuzzing." 2020 IEEE Symposium on Security and Privacy (SP). IEEE, 2020.
  • Google. “Google/AFL.” GitHub, github.com/google/AFL.
Owner
Zhuo Zhang
Zhuo Zhang
Code for Overinterpretation paper Overinterpretation reveals image classification model pathologies

Overinterpretation This repository contains the code for the paper: Overinterpretation reveals image classification model pathologies Authors: Brandon

Gifford Lab, MIT CSAIL 17 Dec 10, 2022
TLXZoo - Pre-trained models based on TensorLayerX

Pre-trained models based on TensorLayerX. TensorLayerX is a multi-backend AI fra

TensorLayer Community 13 Dec 07, 2022
WaveFake: A Data Set to Facilitate Audio DeepFake Detection

WaveFake: A Data Set to Facilitate Audio DeepFake Detection This is the code repository for our NeurIPS 2021 (Track on Datasets and Benchmarks) paper

Chair for Sys­tems Se­cu­ri­ty 27 Dec 22, 2022
190 Jan 03, 2023
Scalable Optical Flow-based Image Montaging and Alignment

SOFIMA SOFIMA (Scalable Optical Flow-based Image Montaging and Alignment) is a tool for stitching, aligning and warping large 2d, 3d and 4d microscopy

Google Research 16 Dec 21, 2022
Unofficial Pytorch Implementation of WaveGrad2

WaveGrad 2 — Unofficial PyTorch Implementation WaveGrad 2: Iterative Refinement for Text-to-Speech Synthesis Unofficial PyTorch+Lightning Implementati

MINDs Lab 104 Nov 29, 2022
[ACM MM 2021] TSA-Net: Tube Self-Attention Network for Action Quality Assessment

Tube Self-Attention Network (TSA-Net) This repository contains the PyTorch implementation for paper TSA-Net: Tube Self-Attention Network for Action Qu

ShunliWang 18 Dec 23, 2022
Implementation for the paper 'YOLO-ReT: Towards High Accuracy Real-time Object Detection on Edge GPUs'

YOLO-ReT This is the original implementation of the paper: YOLO-ReT: Towards High Accuracy Real-time Object Detection on Edge GPUs. Prakhar Ganesh, Ya

69 Oct 19, 2022
Pytorch Implementation for CVPR2018 Paper: Learning to Compare: Relation Network for Few-Shot Learning

LearningToCompare Pytorch Implementation for Paper: Learning to Compare: Relation Network for Few-Shot Learning Howto download mini-imagenet and make

Jackie Loong 246 Dec 19, 2022
Planner_backend - Academic planner application designed for students and counselors.

Planner (backend) Academic planner application designed for students and advisors.

2 Dec 31, 2021
Deep Halftoning with Reversible Binary Pattern

Deep Halftoning with Reversible Binary Pattern ICCV Paper | Project Website | BibTex Overview Existing halftoning algorithms usually drop colors and f

Menghan Xia 17 Nov 22, 2022
The PyTorch implementation of paper REST: Debiased Social Recommendation via Reconstructing Exposure Strategies

REST The PyTorch implementation of paper REST: Debiased Social Recommendation via Reconstructing Exposure Strategies. Usage Download dataset Download

DMIRLAB 2 Mar 13, 2022
RoBERTa Marathi Language model trained from scratch during huggingface 🤗 x flax community week

RoBERTa base model for Marathi Language (मराठी भाषा) Pretrained model on Marathi language using a masked language modeling (MLM) objective. RoBERTa wa

Nipun Sadvilkar 23 Oct 19, 2022
A short code in python, Enchpyter, is able to encrypt and decrypt words as you determine, of course

Enchpyter Enchpyter is a program do encrypt and decrypt any word you want (just letters). You enter how many letters jumps and write the word, so, the

João Assalim 2 Oct 10, 2022
Probabilistic Gradient Boosting Machines

PGBM Probabilistic Gradient Boosting Machines (PGBM) is a probabilistic gradient boosting framework in Python based on PyTorch/Numba, developed by Air

Olivier Sprangers 112 Dec 28, 2022
Implementation of the Chamfer Distance as a module for pyTorch

Chamfer Distance for pyTorch This is an implementation of the Chamfer Distance as a module for pyTorch. It is written as a custom C++/CUDA extension.

Christian Diller 205 Jan 05, 2023
Registration Loss Learning for Deep Probabilistic Point Set Registration

RLLReg This repository contains a Pytorch implementation of the point set registration method RLLReg. Details about the method can be found in the 3DV

Felix Järemo Lawin 35 Nov 02, 2022
The official implementation of Theme Transformer

Theme Transformer This is the official implementation of Theme Transformer. Checkout our demo and paper : Demo | arXiv Environment: using python versi

Ian Shih 85 Dec 08, 2022
A modular application for performing anomaly detection in networks

Deep-Learning-Models-for-Network-Annomaly-Detection The modular app consists for mainly three annomaly detection algorithms. The system supports model

Shivam Patel 1 Dec 09, 2021
Classify bird species based on their songs using SIamese Networks and 1D dilated convolutions.

The goal is to classify different birds species based on their songs/calls. Spectrograms have been extracted from the audio samples and used as features for classification.

Aditya Dutt 9 Dec 27, 2022