This lesson is target at reverse engineering iOS tweaks that have been written in Logos, and using the MobileSubstrate framework. Logos also has an "internal" generator configuration, and we will not be exploring that output today, however you should be able to figure out the differences yourself after completing this lesson.
Most modern tweaks are written in Logos. To understand the code we'll be looking at from reversed tweaks, we need to understand what a normal "hook" looks like in native code.
This is the example logos we'll be working with:
%hook NSString
- (NSUInteger)length {
return 2;
}
%end
I put this in the default Tweak.xm
file that Theos provides. To look at the Objective-C, I ran $THEOS/bin/logos.pl Tweak.xm > TweakOut.mm
. If you're not familiar with a Bourne shell, the >
operator puts the output of the command to the left into the filename on the right. I've formatted my TweakOut.mm
, and ran the clang preprocessor over it for clarity.
#include <substrate.h>
static NSUInteger (*_logos_orig$_ungrouped$NSString$length)(NSString *, SEL);
static NSUInteger _logos_method$_ungrouped$NSString$length(NSString *, SEL);
static NSUInteger _logos_method$_ungrouped$NSString$length(NSString *__unused self, SEL __unused _cmd) {
return 2;
}
static __attribute__((constructor)) void _logosLocalInit() {
{
Class _logos_class$_ungrouped$NSString = objc_getClass("NSString");
MSHookMessageEx(_logos_class$_ungrouped$NSString, @selector(length), (IMP)&_logos_method$_ungrouped$NSString$length, (IMP *)&_logos_orig$_ungrouped$NSString$length);
}
}
The first line imports substrate.h. If you're familiar with C++, you may notice it's a C++ header, which is why we use .xm
(converts to .mm
) files. You can use .x
/.m
files and import substrate, however you won't be able to use things like MSHookIvar
. When reverse engineering, I find it helpful to understand how the other engineer was thinking. Using default options are a big part of that. C++ swizzles function names, so we'll see that in Hopper. It doesn't matter if you use Objective-C or Objective-C++, so don't be scared if you see a little odd function names in Hopper.
The next two lines are two C function declarations. Logos uses the static
keyword for all functions and global variables (there are no global variables in this example code, we may look at that later). The C static
keyword changes globally scoped objects to "file private". GeeksForGeeks explains in more detail what that means. Logos constructs function names using four components for organizational reasons, and to avoid naming conflicts.
Logos constructed function naming components (in order):
-
original or hook.
_logos_orig
indicates the original implementation._logos_method
indicates our swizzled code. -
group name. Logos supports "grouping" hooks. Any code that is not in a group is implicitly put in the "_ungrouped" group.
-
class name. The name of the class being hooked
-
method name. The name of the method in the class being hooked
The length
method takes no arguments, but we see two arguments on these C functions. These are implicit arguments passed in by the Objective-C runtime. This article says:
Although these arguments aren’t explicitly declared, source code can still refer to them (just as it can refer to the receiving object’s instance variables). A method refers to the receiving object as
self
, and to its own selector as_cmd
.
The next bit of code should look familiar. It's the code we actually wrote. return 2
for all lengths.
The last function has __attribute__((constructor))
. This is a "constructor attribute". It indicates to the compiler that this code should be run as soon as the binary is loaded into memory. I say binary, because this can be used in apps (binaries with a main
function) or libraries. Notably, if using dlopen
, constructor functions will only run if called with RTLD_NOW
.
{
Class _logos_class$_ungrouped$NSString = objc_getClass("NSString");
MSHookMessageEx(_logos_class$_ungrouped$NSString, @selector(length), (IMP)&_logos_method$_ungrouped$NSString$length, (IMP *)&_logos_orig$_ungrouped$NSString$length);
}
This is odd. Why are there another pair of curly braces? This is the body of %init(_ungrouped)
, which can be put anywhere. The extra curly braces are scoping indicators. Logos uses fairly unique names, but to be safe, having another scope avoids almost all naming issues.
There are two lines in the body of this function. The first is getting the class we hooked from the runtime. The second calls MSHookMessageEx
. It's important that we know what is passed into this function, because it's how we'll usually find out how a tweak works. When compiling a release (not debug) build, function names won't be visible, so we won't be able to look at the functions names to know what's being hooked in a function. The first argument is the Class
to hook. Second is the selector name. Third is the hooked function. Lastly there's a handle to the original implementation.
Hopper is a disassembler for macOS and Linux. There's a free trial. The main limitation is only being able to use it for thirty minutes at a time. This is usually not an issue when working with small tweak binaries. Read through the quick tutorial for Hopper.
Continuing with the example code above, I ran make
in the project directory. I'm on macOS, so I ran open .theos/obj/debug/HopMe.dylib -a /Applications/Hopper\ Disassembler\ v4.app
to open the dylib with Hopper. My project is called HopMe
, so the dylib is called that.
Depending on your Theos setup, you likely compiled for armv7 and arm64. If not, that's fine, you might get a slightly different screen in Hopper. I was asked if I want to look at ARM v7 or AArch64. I'm usually using arm64, so I picked the later. We'll be looking at the pseudo code in Hopper though, so it shouldn't matter too much. A second option panel will come up with about six check boxes. Leave all the defaults and click continue.
Make sure you've selected "Proc." in the left viewer. There should be four procedures there. MSHookMessageEx
and objc_getClass
are not ours. If you click on them, you'll see that it's very small. Let's switch to the pseudo code viewer in Hopper. Refer to the tutorial linked in the first paragraph if you're unsure how to do this. Uncheck "Remove potentially dead code"
This is what I see for MSHookMessageEx
:
void MSHookMessageEx() {
r16 = *_MSHookMessageEx_ptr;
r0 = _MSHookMessageEx_ptr();
return;
}
The objc_getClass
function looks similar. These are calling the functions in the libraries/frameworks our binary is linked against.
Because this is a debug build, we can see the function names of the other two functions. This will not always be the case. Click on the logos constructor function. It should look like this:
int __ZL15_logosLocalInitv() {
saved_fp = r29;
r29 = &saved_fp;
r0 = objc_getClass("NSString");
r0 = r0;
r1 = @selector(length);
r2 = 0x7f3c;
r3 = 0x8038;
r0 = MSHookMessageEx();
r29 = saved_fp;
r30 = r30;
r31 = (r31 - 0x20) + 0x20;
return r0;
}
I mentioned earlier that C++ swizzles function names. This is an example of that, __ZL15
was not in our original function name. Nothing to worry about, just something I like to note.
These "variable names" are usually registers. If you want to understand this well, I strongly recommend Azeria Labs' lessons.
First two lines are dealing with the frame pointer, don't worry about that. I mentioned earlier the order of arguments for MSHookMessageEx
. ARM loads arguments up on consecutive registers. This is good for us, because it means we just look at r0
through r3
in order, and we know all the arguments.
r0
is holding the NSString class
r1
is holding the selector name "length"
r2
is holding the pointer to our hooked code
r3
is holding the pointer to where the original implementation will be put
We don't care about the instructions after that. So now we know that this code will hook length
on NSString
. I'm going to double click on 0x7f3c
(this may be a different address for you), which should be pointing to our hooked code. Unfortunately we can't look at it from the pseudo code viewer. While that address is still highlighted, switch back to the assembly viewer. This is the line that was highlighted for me:
0000000000007f04 add x8, x8, #0xf3c; 0x7f3c@PAGEOFF, __ZL40_logos_method$_ungrouped$NSString$lengthP8NSStringP13objc_selector
Looks good! We have the function that's hooking length
on NSString
. Double click on the function address here again. You should be popped over to the function body. Click on the pseudo code again. Should look similar to:
int __ZL40_logos_method$_ungrouped$NSString$lengthP8NSStringP13objc_selector(void * arg0, void * arg1) {
var_8 = arg0;
var_10 = arg1;
r0 = 0x2;
r31 = (r31 - 0x10) + 0x10;
return 0x2;
}
Since this is a debug build, C++ gives us a lot of information. I'm not going to get into it, because we typically won't be looking at debug builds. First lines are loading the arguments into unused variables. The third line sets the return register (r0
) to 0x2
, which is the hexadecimal representation of 2
. The fourth line doesn't pertain to us, and then the last line is returning 2, as we expect.
Next, we'll look at a tweak with multiple hooks, a preferences setup, and compiled without debug symbols. Here's the code I'll be using:
// user preference, default to off
static BOOL wantsBadLocale = NO;
static void updatePreferences() {
// obviously fake preference update
wantsBadLocale = YES;
}
%hook NSString
// always return two longer than how long the string actually is
- (NSUInteger)length {
NSUInteger ret = %orig;
return ret+2;
}
// return "Very bad", if the locale passed in is NULL and user wants, otherwise default implementation
- (NSString *)lowercaseStringWithLocale:(NSLocale *)locale {
if (!locale && wantsBadLocale) {
return @"Very bad";
}
return %orig;
}
%end
%hook NSDateFormatter
// return "My birthday!" if the date is the day I was born, otherwise default
- (NSString *)stringFromDate:(NSDate *)date {
NSCalendar *calendar = [NSCalendar currentCalendar];
NSDateComponents *dateComs = [calendar components:(NSCalendarUnitMonth | NSCalendarUnitDay) fromDate:date];
if ((dateComs.month == 4) && (dateComs.day == 3)) {
return @"My birthday!";
}
return %orig;
}
%end
%ctor {
updatePreferences();
%init;
}
I built a dylib using make DEBUG=0
, and opened it in Hopper using the same command as before, but without the "debug" directory: open .theos/obj/HopMe.dylib -a /Applications/Hopper\ Disassembler\ v4.app
I picked AArch64 (arm64) and the defaults on the second page, again. In our procedures, there are three "stub" functions. MSHookMessageEx
, objc_getClass
, and objc_msgSend
. That means these are the only external functions we used. We can eliminate these from our interests for now.
Moving into the EntryPoint procedure, be sure to uncheck "Remove potentially dead code". With the check, the code is almost unusable to us:
int EntryPoint() {
*(int8_t *)0x80d0 = 0x1;
objc_getClass("NSString");
MSHookMessageEx();
MSHookMessageEx();
objc_getClass("NSDateFormatter");
r0 = MSHookMessageEx();
return r0;
}
With the box unchecked, we see the selectors and addresses:
int EntryPoint() {
stack[-24] = r19;
saved_fp = r29;
r29 = &saved_fp;
*(int8_t *)0x80d0 = 0x1;
r19 = objc_getClass("NSString");
r1 = @selector(length);
r2 = 0x7db8;
r3 = 0x80b8;
r0 = MSHookMessageEx();
r1 = @selector(lowercaseStringWithLocale:);
r2 = 0x7dd8;
r3 = 0x80c0;
r0 = r19;
r0 = MSHookMessageEx();
r0 = objc_getClass("NSDateFormatter");
r1 = @selector(stringFromDate:);
r2 = 0x7e00;
r3 = 0x80c8;
r29 = saved_fp;
r30 = r30;
r20 = r20;
r19 = stack[-24];
r31 = r31 + 0x0;
r0 = MSHookMessageEx();
return r0;
}
From this, you should be able to reconstruct the hooks:
%hook NSString
- (?)length {
}
- (?)lowercaseStringWithLocale:(?) {
}
%end
%hook NSDateFormatter
- (?)stringFromDate:(?) {
}
%end
What I normally do is open up the headers, and check the types:
%hook NSString
// almost always, I replace `unsigned long long` with `NSUInteger`
- (unsigned long long)length {
}
- (id)lowercaseStringWithLocale:(id)arg {
}
%end
%hook NSDateFormatter
- (id)stringFromDate:(id)arg {
}
%end
Now, this is awkward. I realized that our updatePreferences
function got inlined at the top:
*(int8_t *)0x80d0 = 0x1;
This is important to note: The code you write might get optimized out by the compiler.
Since we'd be reverse engineering this, it's not an issue. 0x80d0
is a memory address, and we can safely assume it's a numerical global variable. It's set to 1
now.
Jumping back into the assembly, let's find the function that's swizzling the length
method on NSString
.
0000000000007d50 ldr x1, =aLength; "length",@selector(length)
0000000000007d54 adr x2, #0x7db8
On register 1, we see the length selector, and there, on register 2 is the function pointer for the swizzled code. Let's double click on it (#0x7db8
). Your cursor should move down a bit. Switch back to pseudo code. It looks like this for me:
int sub_7db8(int arg0) {
saved_fp = r29;
r29 = &saved_fp;
r8 = *qword_80b8;
r0 = qword_80b8(arg0);
r0 = r0 + 0x2;
r29 = saved_fp;
r30 = r30;
r31 = r31 + 0x0;
return r0;
}
Hopper randomly named a pointer "qword_80b8". This function is then called with the first argument, and then 2 is added onto that value, and returned. Logically, this function is likely to be the original implementation. This hook looks to me like:
%hook NSString
- (NSUInteger)length {
return %orig + 2;
}
%end
Using the same technique, let's look at the second hook. Here's the function I got for lowercaseStringWithLocale:
void sub_7dd8() {
if ((r2 == 0x0) && ((*(int8_t *)byte_80d0 & 0x1) != 0x0)) {
r0 = @"Very bad";
}
else {
r3 = *qword_80c0;
r0 = qword_80c0();
}
return;
}
We'll look at the if
condition secondly. Firstly, the if body sets the return to @"Very bad"
, and the else body sets the return to the original implementation return value. The condition of the if
is a little confusing, and that's something you'll see a lot with Hopper pseudo code. There are two boolean components in this condition: (r2 == 0x0)
and ((*(int8_t *)byte_80d0 & 0x1) != 0x0)
. The first part is straight forward: Is r2
(equal to) NULL
(you should know that 0x0
is the hexadecimal representation of 0
which is NULL
)? What's r2
though? The second register should have the third C-style argument. Hopper didn't populate the arguments, unfortunately. We know from the selector that there's one Objective-C argument. Since self
and _cmd
are our first two arguments, this first condition is checking if the argument is NULL
. The second condition is basically testing that byte_80d0
is non-zero. It's fairly straight forward C, so I'm not going to get into it. What is byte_80d0
though? If we look at the assembly, and double click on the corresponding address, you should be forwarded to this line:
00000000000080d0 db 0x00; DATA XREF=EntryPoint+20, sub_7dd8+8
This is that 0x80d0
that we set to 1
earlier. We can see that in the cross-reference (XREF) list. You can double click on either of those references, and jump to the code. This assembly tells us that the default value is 0
. I can deduce this code:
/* I usually call bools "test" before I know exactly
* what they're used for, because they're just being
* used to "test" an unknown situation */
BOOL tester = NO; // note that I don't really know that this was "static"
%hook NSString
- (id)lowercaseStringWithLocale:(id)arg {
if (arg == NULL && tester) {
return @"Very bad";
} else {
return %orig;
}
}
%end
%ctor {
tester = YES;
%init;
}
Lastly, the last hook was for stringFromDate:
on NSDateFormatter
. Hopper generated this pseudo code for me:
int sub_7e00(int arg0, int arg1, int arg2) {
var_20 = r22;
stack[-40] = r21;
r31 = r31 + 0xffffffffffffffd0;
var_10 = r20;
stack[-24] = r19;
saved_fp = r29;
stack[-8] = r30;
r29 = &saved_fp;
r19 = arg2;
r20 = arg1;
r21 = arg0;
r0 = [NSCalendar currentCalendar];
r0 = [r0 components:0x18 fromDate:r19];
r22 = r0;
if (([r0 month] == 0x4) && ([r22 day] == 0x3)) {
r0 = @"My birthday!";
r29 = saved_fp;
r30 = stack[-8];
r20 = var_10;
r19 = stack[-24];
r22 = var_20;
r21 = stack[-40];
r31 = r31 + 0x30;
}
else {
r0 = r21;
r1 = r20;
r2 = r19;
r29 = saved_fp;
r30 = stack[-8];
r20 = var_10;
r19 = stack[-24];
r22 = var_20;
r21 = stack[-40];
r31 = r31 + 0x30;
r0 = qword_80c8(r0, r1, r2, *qword_80c8);
}
return r0;
}
The first thing I noticed was qword_80c8
is called (last line in the else body) with four arguments, and it should only be three (only one Obj-C argument plus two implicit). My guess is that qword_80c8
happened to be loaded into register 3, and Hopper though it was the fourth argument. These things happen. Hopper is not going to be fully accurate with some things, and it's not anyone's fault. There are a lot of things that say stack[-#]
in this output. I typically ignore those lines, they pertain to the (memory) stack, and aren't usually needed to be looked at. I feel like this is all straight forward Obj-C, and I'm confident I can write the code:
%hook NSDateFormatter
- (id)stringFromDate:(id)arg {
NSCalendar *currentCalendar = [NSCalendar currentCalendar];
// I'm going to see if -[NSCalendar components:fromDate:] is public API
// nice, it is: https://developer.apple.com/documentation/foundation/nscalendar/1414841-components?language=objc
// 0x18 is hex for 24, not 18, be careful. I'm going to leave it in hex here, because the C compiler is perfectly happy with that
// arg2 was loaded into r19, which is what was passed in as the date in the pseudo code above
NSDateComponents *components = [currentCalendar components:(0x18) fromDate:arg];
if ([components month] == 4 && [components day] == 3) {
return @"My birthday!";
} else {
return %orig;
}
}
%end
All done!