BinCmp vs Alternatives: When to Use It for Reverse Engineering

BinCmp Tutorial — Compare Binaries, Find Differences, Export ReportsBinCmp is a binary comparison tool aimed at reverse engineers, security researchers, and developers who need to understand differences between two compiled binaries. This tutorial covers installation, core concepts, workflows for comparing binaries, interpreting results, exporting reports, and integrating BinCmp into automation pipelines. Examples use practical scenarios (patch analysis, tracking compiler changes, and regression detection).

What BinCmp does (high-level)

BinCmp performs semantic and structural comparison of binary executables. Instead of relying only on byte-level diffs, it matches functions and code regions by their behavior and structure, allowing you to identify:

which functions were added, removed, or modified,
semantic equivalence (functions that changed but still do the same thing),
non-semantic differences introduced by compiler optimizations or different build flags.

1. Installation and setup

System requirements

BinCmp typically runs on Linux and macOS. You’ll want:

Python 3.8+ (if BinCmp is Python-based),
common reverse-engineering tools installed (e.g., radare2, IDA Pro, Ghidra — depending on integration),
standard build tools (gcc/clang) if you’ll build from source.

Installation (typical steps)

Clone the repository:


git clone https://example.com/binmp.git cd bincmp

Create a virtual environment and install dependencies:


python -m venv venv source venv/bin/activate pip install -r requirements.txt

Install optional integrations (radare2, r2pipe, ghidra headless scripts) per README.

If BinCmp provides pre-built packages, use the platform-specific installer.

2. Key concepts

Function matching: the process of mapping functions in binary A to corresponding functions in binary B. BinCmp uses signatures, control-flow-graph (CFG) similarity, and semantic features.
Semantic similarity: comparing the effect and behavior rather than exact instruction sequence. This helps to ignore incidental differences (register allocation, instruction reordering).
Confidence score: many tools assign a score (0–1 or 0–100) indicating how closely two functions match.
Unmatched functions: functions present in one binary but not matched to anything in the other — likely added/removed or heavily changed.
Heuristics vs exact matches: BinCmp may give exact, fuzzy, or no match results.

3. Typical workflows

A. Quick binary diff (overview)

Run BinCmp on two binaries:


bincmp compare binary_v1.bin binary_v2.bin -o report.json

Review the summary: matched functions, added, removed, modified counts.

B. Function-level analysis

Generate function maps (disassembly + function boundaries) using your preferred disassembler:


bincmp analyze binary_v1.bin -d radare2 -o v1_funcs.json bincmp analyze binary_v2.bin -d radare2 -o v2_funcs.json

Perform matching with finer options (CFG, semantic):


bincmp match v1_funcs.json v2_funcs.json --method semantic --threshold 0.75 -o matches.json

Inspect high-delta functions manually in a disassembler or decompiler to confirm behavior changes.

C. Patch verification and regression detection

Compare a patched build against the unpatched build to verify that only intended functions changed.
Use confidence thresholds to flag potentially unintended modifications.

4. Interpreting results

A BinCmp output typically contains:

Summary metrics (total functions in each binary, matches, additions, deletions).
Per-function entries with:
- function name/address in both binaries,
- match type (exact/semantic/fuzzy),
- confidence score,
- change notes (e.g., “CFG changed”, “calls removed”, “size increased by 24 bytes”),
- optional similarity map of basic blocks.

How to read common outcomes:

High confidence, small size delta: likely unchanged or only cosmetic changes.
Low confidence or no match: likely modified logic, inlined/outlined code, or function removed.
Many matches with structural differences but high semantic similarity: compiler optimizations or different compiler versions.

5. Practical examples

Example 1 — Patch validation

Scenario: You patched a vulnerability in function do_auth() in binary_v2. Steps:

Compare v1 and v2.
Find do_auth entry: confirm it’s marked modified.
Review diff: ensure only expected changes exist (e.g., input validation added).
Export a focused report for auditors.

Example 2 — Compiler change analysis

Scenario: Two builds compiled with different optimization flags. Observation: Many functions show CFG and instruction changes but remain semantically similar. Action: Use a higher semantic threshold to classify these as equivalent rather than modified.

6. Exporting reports

BinCmp usually supports multiple report formats: JSON, CSV, HTML. Example commands:

JSON (for machine processing):

bincmp compare a.bin b.bin -f json -o diff.json

HTML (for human-readable interactive report):

bincmp compare a.bin b.bin -f html -o report.html

CSV (for spreadsheets or quick overviews):

bincmp compare a.bin b.bin -f csv -o diff.csv

Report contents to expect:

Summary section with totals and percentages.
Function-level rows with match type, confidence, and notes.
For HTML, interactive navigation and links to disassembly/decompilation snippets (if integrated).

7. Advanced configuration and tips

Tweak thresholds: start with a conservative similarity threshold (0.8) and lower if you get too many false negatives.
Use symbol information if available (stripped vs unstripped binaries behave differently).
Normalize binaries before comparison: strip timestamps, deterministic build artifacts, or use linker flags to reduce noise.
Combine static and dynamic information: if you can run the binaries, collecting execution traces and using them as an extra matching signal improves results.
Parallelize comparisons for large codebases — many tools support multi-threaded matching.

8. Integrating BinCmp into CI

Add a compare step in your pipeline to run BinCmp between golden/build artifacts and current builds.
Fail the build or raise alerts when:
- Unexpected new functions appear,
- Critical functions’ confidence scores drop below a threshold,
- Binary size or control-flow changes exceed set limits.

Example GitLab CI snippet:

bincmp:   stage: test   script:     - bincmp compare golden.bin current.bin -f json -o diff.json     - python scripts/check_bincmp_thresholds.py diff.json

9. Common pitfalls and how to avoid them

False positives from compiler differences — mitigate by normalizing builds or using semantic matching.
Misinterpreting confidence scores — treat them as guidance, not absolute truth.
Assuming no-match means bug — it may be inlining, symbol stripping, or heavy optimization.

10. Summary checklist before you start

Ensure you have disassembly/function boundaries for both binaries.
Decide on similarity thresholds and matching methods (structural vs semantic).
Normalize builds if possible.
Choose report format suitable for your audience (JSON for automation, HTML for reviewers).
Run a few known-case comparisons to calibrate thresholds.

If you want, I can:

provide step-by-step commands tailored to your OS and BinCmp version,
generate an example JSON/HTML report structure,
or walk through a real binary comparison (you can upload binaries or give example outputs).