Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save programming086/996dbe2bb6dc66c8908e5516d7df8b16 to your computer and use it in GitHub Desktop.
Save programming086/996dbe2bb6dc66c8908e5516d7df8b16 to your computer and use it in GitHub Desktop.
Some notes, tools, and techniques for reverse engineering macOS binaries

Reverse Engineering on macOS

Some notes, tools, and techniques for reverse engineering macOS binaries.

Table of Contents

Reverse Engineering Tools

Binary Ninja

  • https://binary.ninja/
    • Binary Ninja is an interactive decompiler, disassembler, debugger, and binary analysis platform built by reverse engineers, for reverse engineers. Developed with a focus on delivering a high-quality API for automation and a clean and usable GUI, Binary Ninja is in active use by malware analysts, vulnerability researchers, and software developers worldwide. Decompile software built for many common architectures on Windows, macOS, and Linux for a single price, or try out our limited (but free!) Cloud version.

    • https://binary.ninja/free/
      • There are two ways to try Binary Ninja for free! Binary Ninja Cloud supports all architectures, but requires you to upload your binaries. Binary Ninja Free is a downloadable app that runs locally, but has architecture restrictions. Neither free option supports our powerful API / Plugin ecosystem.

    • https://cloud.binary.ninja/
      • Binary Ninja Cloud is our free, online reverse engineering tool.

    • https://sidekick.binary.ninja/
      • Sidekick Makes Reverse Engineering Easy Don't open that binary alone! Take Sidekick, your AI-powered assistant, with you. Sidekick can help answer your questions about the binary, recover structures, name things, describe and comment code, find points of interest, and much more.

  • https://binary.ninja/blog/
  • https://docs.binary.ninja/guide/
    • User Guide

    • https://docs.binary.ninja/guide/types/
      • There's so many things to learn about working with Types in Binary Ninja that we've organized it into several sections!

        • Basic Type Editing: Brief overview of the basics

          • https://docs.binary.ninja/guide/types/basictypes.html
            • Basic Type Editing The biggest culprit of bad decompilation is often missing type information. Therefore, some of the most important actions you can take while reverse engineering is renaming symbols/variables, applying types, and creating new types to apply.

        • Working with Types: Interacting with types in disassembly and decompilation

          • https://docs.binary.ninja/guide/types/type.html
            • Working with Types, Structures, and Symbols in Decompilation There are two main ways to interact with types in decompilation or disassembly. The first is to use the types view, and the second is to take advantage of the smart structures workflow or otherwise annotate types directly in a disassembly or IL view.

        • Importing/Exporting Types: How to import or export types from header files, archives, or other BNDBs

          • https://docs.binary.ninja/guide/types/typeimportexport.html
            • Importing Type Information Type information can be imported from a variety of sources. If you have header files, you can import a header. If your types exist in an existing BNDB, you can use import from a bndb. With the introduction of type archives we recommend migrating away from importing via BNDB to type archives as they allow types to remain synced between different databases.

            • https://docs.binary.ninja/guide/types/typeimportexport.html#import-bndb-file
              • Import BNDB File The Import BNDB File feature imports types from a previous BNDB into your currently open file. In addition, it will apply types for matching symbols in functions and variables. Import BNDB will not port symbols from a BNDB with symbols to one without -- the names must already match. Matching functions and porting symbols is beyond the scope of this feature.

            • https://docs.binary.ninja/guide/types/typeimportexport.html#import-header-file
              • Import Header File If you already have a collection of headers containing types you want to use, you can import them directly. You can specify the compiler flags that would be used if a compiler were compiling a source file that uses this header.

              • After specifying the file(s) and flag(s), pressing Preview will give a list of all the types and functions defined in the file(s). You may check or uncheck the box next to any of the types/functions to control whether they will be imported to your analysis.

            • https://docs.binary.ninja/guide/types/typeimportexport.html#finding-system-headers
              • Finding System Headers Since you need to specify the include paths for system headers, you will need to deduce them for the target platform of your analysis. Here are a few tricks that may help

              • Systems with GCC/Clang (macOS, Linux, etc) On these systems, you can run a command to print the default search path for compilation:

                gcc -Wp,-v -E -
                clang -Wp,-v -E -
                

                For the directories printed by this command, you should include them with -isystem<path> in the order specified.

              • ⇒ gcc -Wp,-v -E -
                clang -cc1 version 15.0.0 (clang-1500.3.9.4) default target x86_64-apple-darwin23.4.0
                ignoring nonexistent directory "/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/local/include"
                ignoring nonexistent directory "/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/Library/Frameworks"
                #include "..." search starts here:
                #include <...> search starts here:
                 /usr/local/include
                 /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/lib/clang/15.0.0/include
                 /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include
                 /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/include
                 /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/System/Library/Frameworks (framework directory)
                End of search list.
                
              • -isystem/usr/local/include
                -isystem/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/lib/clang/15.0.0/include
                -isystem/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include
                -isystem/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/include
                
      • Additionally, several types of containers for type information are documented here:

        • Debug Info: Debug Info can provide additional type information (examples include DWARF and PDB files)

          • https://docs.binary.ninja/guide/types/debuginfo.html
            • Debug Info Debug Info is a mechanism for importing types, function signatures, and data variables from either the original binary (eg. an ELF compiled with DWARF) or a supplemental file (eg. a PDB).

              Currently debug info plugins are limited to types, function signatures, and data variables, but in the future will include line number information, comments, local variables, and possibly more.

        • Type Libraries: Type Libraries contain types from commonly-used dynamic libraries

          • https://docs.binary.ninja/guide/types/typelibraries.html
            • Type Libraries Type Libraries are collections of type information (structs, enums, function types, etc.), corresponding to specific dynamic libraries that are imported into your analysis. You can browse and import them in the Types View.

            • Most of your usage of Type Libraries will be performed automatically by Binary Ninja when you analyze a binary. They are automatically imported based on the libraries that your binary uses. Any library functions or global variables your binary references will have their type signature imported, and any structures those functions and variables reference are imported as well.

        • Platform Types: Types that automatically apply to a platform

          • https://docs.binary.ninja/guide/types/platformtypes.html
            • Platform Types Binary Ninja pulls type information from a variety of sources. The highest-level source are the platform types loaded for the given platform (which includes operating system and architecture). There are two sources of platform types. The first are shipped with the product in a binary path. The second location is in your user folder and is intended for you to put custom platform types.

            • Platform types are used to define types that should be available to all programs available on that particular platform. They are only for global common types.

        • Type Archives: How you can use type archives to share types between analysis databases

        • Signature Libraries: Signature libraries are used to match names of functions with signatures for code that is statically compiled

          • https://docs.binary.ninja/dev/annotation.html#signature-library
            • Signature Library While many signatures are built-in and require no interaction to automatically match functions, you may wish to add or modify your own. First, install the SigKit plugin from the plugin manager.

            • Once the signature matcher runs, it will print a brief report to the console detailing how many functions it matched and will rename matched functions.

            • To generate a signature library for the currently-open binary, use Tools > Signature Library > Generate Signature Library. This will generate signatures for all functions in the binary that have a name attached to them. Note that functions with automatically-chosen names such as sub_401000 will be skipped. Once it's generated, you'll be prompted where to save the resulting signature library.

        • Platform Types: Platform types are base types that apply to all binaries on a particular platform

      • Additionally, make sure to see the applying annotations section of the developer guide for information about using the API with types and covering the creation of many of the items described below.

    • https://docs.binary.ninja/guide/cpp.html
    • https://docs.binary.ninja/guide/objectivec.html
      • Objective-C (Beta) Recent version of Binary Ninja ship with an additional plugin for assisting with Objective-C analysis. It provides both a workflow and a plugin command for enhancing Objective-C binary analysis.

    • https://docs.binary.ninja/guide/debugger.html
      • Debugger Binary Ninja Debugger is a plugin that can debug executables on Windows, Linux, and macOS, and more!

        The debugger plugin is shipped with Binary Ninja. It is open-source under an Apache License 2.0. Bug reports and pull requests are welcome!

    • https://docs.binary.ninja/dev/bnil-overview.html
      • Binary Ninja Intermediate Language: Overview

      • https://docs.binary.ninja/dev/bnil-overview.html#reading-il
        • Reading IL All of the various ILs (with the exception of the SSA forms) are intended to be easily human-readable and look much like pseudo-code. There is some shorthand notation that is used throughout the ILs, though, explained below

    • https://docs.binary.ninja/dev/uidf.html
      • User Informed Data Flow Binary Ninja now implements User-Informed DataFlow (UIDF) to improve the static reverse engineering experience of our users. This feature allows users to set the value of a variable and have the internal dataflow engine propagate it through the control-flow graph of the function. Besides constant values, Binary Ninja supports various PossibleValueSet states as containers to help inform complex variable values.

    • https://docs.binary.ninja/dev/workflows.html
      • Binary Ninja Workflows Documentation

      • Binary Ninja Workflows is an analysis orchestration framework which simplifies the definition and execution of a computational binary analysis pipeline. The extensible pipeline accelerates program analysis and reverse engineering of binary blobs at various levels of abstraction. Workflows supports hybridized execution models, where the ordering of activities in the pipeline can be well-known and procedural, or dynamic and reactive. Currently, the core Binary Ninja analysis is made available as a procedural model and is the aggregate of both module and function-level analyses.

      • https://github.com/Vector35/binaryninja-api/tree/dev/examples/workflows
      • I saw a note somewhere that suggested this feature would allow implementing deoptimisers / similar (eg. fast modulo / division, etc) that could simplify the view of the decompiled output
  • https://github.com/Vector35/binaryninja-api
  • https://github.com/Vector35/official-plugins
    • Official Binary Ninja Plugins

  • https://github.com/Vector35/community-plugins
    • Binary Ninja Community Plugins

  • https://github.com/Vector35/community-themes
    • Binary Ninja Community Themes

  • https://www.youtube.com/@Vector35

Ghidra

  • https://ghidra-sre.org/
    • A software reverse engineering (SRE) suite of tools developed by NSA's Research Directorate in support of the Cybersecurity mission

Hex-Rays IDA

  • https://hex-rays.com/
    • https://hex-rays.com/ida-free/
      • This (completely!) free version of IDA offers a privilege opportunity to see IDA in action. This light but powerful tool can quickly analyze the binary code samples and users can save and look closer at the analysis results.

    • https://hex-rays.com/ida-home/
      • IDA Home was introduced thanks to the experience Hex-Rays has been gaining throughout the years to propose hobbyists a solution that combines rapidity, reliability with the levels of quality and responsiveness of support that any professional reverse engineers should expect.

    • https://hex-rays.com/ida-pro/
      • IDA Pro as a disassembler is capable of creating maps of their execution to show the binary instructions that are actually executed by the processor in a symbolic representation (assembly language). Advanced techniques have been implemented into IDA Pro so that it can generate assembly language source code from machine-executable code and make this complex code more human-readable.

        The debugging feature augmented IDA with the dynamic analysis. It supports multiple debugging targets and can handle remote applications. Its cross-platform debugging capability enables instant debugging, easy connection to both local and remote processes and support for 64-bit systems and new connection possibilities.

    • https://www.hex-rays.com/products/ida/debugger/mac/
    • https://hex-rays.com/products/ida/news/8_3/

radare2

Frida / etc

  • https://frida.re/
    • Dynamic instrumentation toolkit for developers, reverse-engineers, and security researchers.

    • Scriptable Inject your own scripts into black box processes. Hook any function, spy on crypto APIs or trace private application code, no source code needed. Edit, hit save, and instantly see the results. All without compilation steps or program restarts.

      Portable Works on Windows, macOS, GNU/Linux, iOS, watchOS, tvOS, Android, FreeBSD, and QNX. Install the Node.js bindings from npm, grab a Python package from PyPI, or use Frida through its Swift bindings, .NET bindings, Qt/Qml bindings, Go bindings, or C API. We also have a scalable footprint.

      Free Frida is and will always be free software (free as in freedom). We want to empower the next generation of developer tools, and help other free software developers achieve interoperability through reverse engineering.

      Battle-tested We are proud that NowSecure is using Frida to do fast, deep analysis of mobile apps at scale. Frida has a comprehensive test-suite and has gone through years of rigorous testing across a broad range of use-cases.

  • https://github.com/frida
  • https://github.com/Ch0pin/medusa
    • medusa Binary instrumentation framework based on FRIDA

    • MEDUSA is an extensible and modularized framework that automates processes and techniques practiced during the dynamic analysis of Android and iOS Applications.

  • https://github.com/rsenet/FriList
    • Collection of useful FRIDA Mobile Scripts

    • Observer Security Bypass Static Analysis Specific Software Other

Reversing C++ Binaries

Unsorted

C++ vtables

std::string

  • https://shaharmike.com/cpp/std-string/
    • Exploring std::string

    • Every C++ developer knows that std::string represents a sequence of characters in memory. It manages its own memory, and is very intuitive to use. Today we’ll explore std::string as defined by the C++ Standard, and also by looking at 4 major implementations.

    • One particular optimization found its way to pretty much all implementations: small objects optimization (aka small buffer optimization). Simply put, Small Object Optimization means that the std::string object has a small buffer for small strings, which saves dynamic allocations.

    • Recent GCC versions use a union of buffer (16 bytes) and capacity (8 bytes) to store small strings. Since reserve() is mandatory (more on this later), the internal pointer to the beginning of the string either points to this union or to the dynamically allocated string.

    • clang is by-far the smartest and coolest. While std::string has the size of 24 bytes, it allows strings up to 22 bytes(!!) with no allocation. To achieve this libc++ uses a neat trick: the size of the string is not saved as-is but rather in a special way: if the string is short (< 23 bytes) then it stores size() * 2. This way the least significant bit is always 0. The long form always bitwise-ors the LSB with 1, which in theory might have meant unnecessarily larger allocations, but this implementation always rounds allocations to be of form 16*n - 1 (where n is an integer). By the way, the allocated string is actually of form 16*n, the last character being '\0'

  • https://tastycode.dev/memory-layout-of-std-string/
    • Memory Layout of std::string

    • Discover how std::string is represented in the most popular C++ Standard Libraries, such as MSVC STL, GCC libstdc++, and LLVM libc++.

    • In this post of Tasty C++ series we’ll look inside of std::string, so that you can more effectively work with C++ strings and take advantage and avoid pitfalls of the C++ Standard Library you are using.

    • In C++ Standard Library, std::string is one of the three contiguous containers (together with std::array and std::vector). This means that a sequence of characters is stored in a contiguous area of the memory and an individual character can be efficiently accessed by its index at O(1) time. The C++ Standard imposes more requirements on the complexity of string operations, which we will briefly focus on later in this post.

    • If we are talking about the C++ Standard, it’s important to remember that it doesn’t impose exact implementation of std::string, nor does it specify the exact size of std::string. In practice, as we’ll see, the most popular implementations of the C++ Standard Library allocate 24 or 32 bytes for the same std::string object (excluding the data buffer). On top of that, the memory layout of string objects is also different, which is a result of a tradeoff between optimal memory and CPU utilization, as we’ll also see below.

    • For people just starting to work with strings in C++, std::string is usually associated with three data fields:

      • Buffer – the buffer where string characters are stored, allocated on the heap.
      • Size – the current number of characters in the string.
      • Capacity – the max number of character the buffer can fit, a size of the buffer.

      Talking C++ language, this picture could be expressed as the following class:

      class TastyString {
        char *    m_buffer;     //  string characters
        size_t    m_size;       //  number of characters
        size_t    m_capacity;   //  m_buffer size
      }
      

      This representation takes 24 bytes and is very close to the production code.

  • https://stackoverflow.com/questions/5058676/stdstring-implementation-in-gcc-and-its-memory-overhead-for-short-strings
    • std::string implementation in GCC and its memory overhead for short strings

    • At least with GCC 4.4.5, which is what I have handy on this machine, std::string is a typdef for std::basic_string<char>, and basic_string is defined in /usr/include/c++/4.4.5/bits/basic_string.h. There's a lot of indirection in that file, but what it comes down to is that nonempty std::strings store a pointer to one of these:

      struct _Rep_base
      {
        size_type       _M_length;
        size_type       _M_capacity;
        _Atomic_word        _M_refcount;
      };
      

      Followed in-memory by the actual string data. So std::string is going to have at least three words of overhead for each string, plus any overhead for having a higher capacity than length (probably not, depending on how you construct your strings -- you can check by asking the capacity() method).

      There's also going to be overhead from your memory allocator for doing lots of small allocations; I don't know what GCC uses for C++, but assuming it's similar to the dlmalloc allocator it uses for C, that could be at least two words per allocation, plus some space to align the size to a multiple of at least 8 bytes.

std::vector

Universal (Fat) Binaries

  • https://developer.apple.com/documentation/apple-silicon/building-a-universal-macos-binary
    • Building a Universal macOS Binary

    • Create macOS apps and other executables that run natively on both Apple silicon and Intel-based Mac computers.

    • https://developer.apple.com/documentation/apple-silicon/building-a-universal-macos-binary#Update-the-Architecture-List-of-Custom-Makefiles
      • To create a universal binary for your project, merge the resulting executable files into a single executable binary using the lipo tool.

      • lipo -create -output universal_app x86_app arm_app

    • https://developer.apple.com/documentation/apple-silicon/building-a-universal-macos-binary#Determine-Whether-Your-Binary-Is-Universal
      • Determine Whether Your Binary Is Universal To users, a universal binary looks no different than a binary built for a single architecture. When you build a universal binary, Xcode compiles your source files twice—once for each architecture. After linking the binaries for each architecture, Xcode then merges the architecture-specific binaries into a single executable file using the lipo tool. If you build the source files yourself, you must call lipo as part of your build scripts to merge your architecture-specific binaries into a single universal binary.

        To see the architectures present in a built executable file, run the lipo or file command-line tools. When running either tool, specify the path to the actual executable file, not to any intermediate directories such as the app bundle. For example, the executable file of a macOS app is in the Contents/MacOS/ directory of its bundle. When running the lipo tool, include the -archs parameter to see the architectures.

      • % lipo -archs /System/Applications/Mail.app/Contents/MacOS/Mail
        x86_64 arm64
      • To obtain more information about each architecture, pass the -detailed_info argument to lipo.

    • https://developer.apple.com/documentation/apple-silicon/building-a-universal-macos-binary#Specify-the-Launch-Behavior-of-Your-App
      • Specify the Launch Behavior of Your App For universal binaries, the system prefers to execute the slice that is native to the current platform. On an Intel-based Mac computer, the system always executes the x86_64 slice of the binary. On Apple silicon, the system prefers to execute the arm64 slice when one is present. Users can force the system to run the app under Rosetta translation by enabling the appropriate option from the app’s Get Info window in the Finder.

        If you never want users to run your app under Rosetta translation, add the LSRequiresNativeExecution key to your app’s Info.plist file. When that key is present and set to YES, the system prevents your app from running under translation. In addition, the system removes the Rosetta translation option from your app’s Get Info window. Don’t include this key until you verify that your app runs correctly on both Apple silicon and Intel-based Mac computers.

        If you want to prioritize one architecture, without preventing users from running your app under translation, add the LSArchitecturePriority key to your app’s Info.plist file. The value of this key is an ordered array of strings, which define the priority order for selecting an architecture.

  • https://ss64.com/osx/lipo.html
    • lipo Create or operate on a universal file: convert a universal binary to a single architecture file, or vice versa.

    • lipo produces one output file, and never alters the input file.

    • lipo can: list the architecture types in a universal file; create a single universal file from one or more input files; thin out a single universal file to one specified architecture type; and extract, replace, and/or remove architectures types from the input file to create a single new universal output file.

  • https://github.com/konoui/lipo
    • LIPO This lipo is designed to be compatible with macOS lipo, which is a utility for creating Universal Binary as known as Fat Binary.

Reverse Engineering Audio VST Plugins

Compiler Optimisations

Fast Division / Modulus

  • https://binary.ninja/2023/09/15/3.5-expanded-universe.html#moddiv-deoptimization
    • Mod/Div Deoptimization

    • One of the many things compilers do that can make reverse engineering harder is use a variety of algorithmic optimizations, in particular for modulus and division calculations. Instead of implementing them with the native CPU instructions, they will use shifts and multiplications with magic constants that when operating on a fixed integer size has the same effect as a native division instruction.

      There are several ways to try to recover the original division which is far more intuitive and easer to reason about.

  • https://lemire.me/blog/2020/02/26/fast-divisionless-computation-of-binomial-coefficients/
    • Fast divisionless computation of binomial coefficients

    • We would prefer to avoid divisions entirely. If we assume that k is small, then we can just use the fact that we can always replace a division by a known value with a shift and a multiplication. All that is needed is that we precompute the shift and the multiplier. If there are few possible values of k, we can precompute it with little effort.

    • I provide a full portable implementation complete with some tests. Though I use C, it should work as-is in many other programming languages. It should only take tens of CPU cycles to run. It is going to be much faster than implementations relying on divisions.

    • Another trick that you can put to good use is that the binomial coefficient is symmetric: you can replace k by n–k and get the same value. Thus if you can handle small values of k, you can also handle values of k that are close to n. That is, the above function will also work for n is smaller than 100 and k larger than 90, if you just replace k by n–k.

    • Is that the fastest approach? Not at all. Because n is smaller than 100 and k smaller than 10, we can precompute (memoize) all possible values. You only need an array of 1000 values. It should fit in 8kB without any attempt at compression. And I am sure you can make it fit in 4kB with a little bit of compression effort. Still, there are instances where relying on a precomputed table of several kilobytes and keeping them in cache is inconvenient. In such cases, the divisionless function would be a good choice.

    • Alternatively, if you are happy with approximations, you will find floating-point implementations.

    • https://github.com/lemire/Code-used-on-Daniel-Lemire-s-blog/blob/master/2020/02/26/binom.c
    • https://github.com/dmikushin/binom/blob/master/include/binom.h
    • https://github.com/bmkessler/fastdiv
    • https://github.com/jmtilli/fastdiv/blob/master/fastdiv.c

Unsorted

  • https://github.com/mroi/apple-internals
    • Apple Internals This repository provides tools and information to help understand and analyze the internals of Apple’s operating system platforms.

    • https://mroi.github.io/apple-internals/
      • Collected knowledge about the internals of Apple’s platforms.

        Sorted by keyword, abbreviation, or codename.

  • https://opensource.apple.com/source/objc4/
  • https://github.com/smx-smx/ezinject
    • Modular binary injection framework, successor of libhooker

    • ezinject is a lightweight and flexible binary injection framework. it can be thought as a lightweight and less featured version of frida.

      It's main and primary goal is to load a user module (.dll, .so, .dylib) inside a target process. These modules can augment ezinject by providing additional features, such as hooks, scripting languages, RPC servers, and so on. They can also be written in multiple languages such as C, C++, Rust, etc... as long as the ABI is respected.

      NOTE: ezinject core is purposedly small, and only implements the "kernel-mode" (debugger) features it needs to run the "user-mode" program, aka the user module.

      It requires no dependencies other than the OS C library (capstone is optionally used only by user modules)

      Porting ezinejct is simple: No assembly code is required other than a few inline assembly statements, and an abstraction layer separates multiple OSes implementations.

  • https://github.com/evelyneee/ellekit
    • ElleKit yet another tweak injector / tweak hooking library for darwin systems

    • What this is

      • A C function hooker that patches memory pages directly
      • An Objective-C function hooker
      • An arm64 assembler
      • A JIT inline assembly implementation for Swift
      • A Substrate and libhooker API reimplementation
  • http://diaphora.re/
    • Diaphora A Free and Open Source Program Diffing Tool

    • Diaphora (διαφορά, Greek for 'difference') version 3.0 is the most advanced program diffing tool (working as an IDA plugin) available as of today (2023). It was released first during SyScan 2015 and has been actively maintained since this year: it has been ported to every single minor version of IDA since 6.8 to 8.3.

      Diaphora supports versions of IDA >= 7.4 because the code only runs in Python 3.X (Python 3.11 was the last version being tested).

    • https://github.com/joxeankoret/diaphora
      • Diaphora, the most advanced Free and Open Source program diffing tool.

      • Diaphora has many of the most common program diffing (bindiffing) features you might expect, like:

        • Diffing assembler.
        • Diffing control flow graphs.
        • Porting symbol names and comments.
        • Adding manual matches.
        • Similarity ratio calculation.
        • Batch automation.
        • Call graph matching calculation.
        • Dozens of heuristics based on graph theory, assembler, bytes, functions' features, etc...

        However, Diaphora has also many features that are unique, not available in any other public tool. The following is a non extensive list of unique features:

        • Ability to port structs, enums, unions and typedefs.
        • Potentially fixed vulnerabilities detection for patch diffing sessions.
        • Support for compilation units (finding and diffing compilation units).
        • Microcode support.
        • Parallel diffing.
        • Pseudo-code based heuristics.
        • Pseudo-code patches generation.
        • Diffing pseudo-codes (with syntax highlighting!).
        • Scripting support (for both the exporting and diffing processes).

See Also

My StackOverflow/etc answers

  • https://stackoverflow.com/questions/46802472/recursively-find-hexadecimal-bytes-in-binary-files/77706906#77706906
    • Recursively searching through binary files for hex strings (with potential wildcards) using radare2's rafind2
    • Crossposted: https://twitter.com/_devalias/status/1738458619958751630
    • SEARCH_DIRECTORY="./path/to/bins"
      GREP_PATTERN='\x5B\x27\x21\x3D\xE9'
      
      # Remove all instances of '\x' from PATTERN for rafind2
      # Eg. Becomes 5B27213DE9
      PATTERN="${GREP_PATTERN//\\x/}"
      
      grep -rl "$GREP_PATTERN" "$SEARCH_DIRECTORY" | while read -r file; do
        echo "$file:"
        rafind2 -x "$PATTERN" "$file"
      done
    • SEARCH_DIRECTORY="./path/to/bins"
      PATTERN='5B27213DE9'
      
      # Using find
      find "$SEARCH_DIRECTORY" -type f -exec sh -c 'output=$(rafind2 -x "$1" "$2"); [ -n "$output" ] && echo "$2:" && echo "$output"' sh "$PATTERN" {} \;
      
      # Using fd
      fd --type f --exec sh -c 'output=$(rafind2 -x "$1" "$2"); [ -n "$output" ] && (echo "$2:"; echo "$output")' sh "$PATTERN" {} "$SEARCH_DIRECTORY"
    • time ./test-grep-and-rafind2
      # ..snip..
      ./test-grep-and-rafind2  7.33s user 0.19s system 99% cpu 7.578 total
      
      ⇒ time ./test-find-and-rafind2
      # ..snip..
      ./test-find-and-rafind2  3.24s user 0.72s system 98% cpu 4.041 total
      
      ⇒ time ./test-fd-and-rafind2
      # ..snip..
      ./test-fd-and-rafind2  3.85s user 1.04s system 488% cpu 1.002 total

My Other Related Deepdive Gist's and Projects

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment