This document illustrates the methodology followed to solve the 11 levels that constitute the 2015 edition of the Flare Challenge.

Since the end of the contest, a few other write-ups were published on the Internet:

The binaries are still provided by the Flare team at

Level 1

The first level of the challenge requires to examine a file named i_am_happy_you_are_to_playing_the_flareon_challenge.exe.

When started, the program asks for a password and will print You are failure, as shown below:

Let's start out easy
Enter the password> hello_flare
You are failure

Using IDA, it is possible to decompile the code from the entry point:

BOOL start()
  HANDLE v0; // ST18_4@1
  int v1; // ecx@1
  HANDLE hFile; // [sp+8h] [bp-8h]@1
  DWORD NumberOfBytesWritten; // [sp+Ch] [bp-4h]@1

  v0 = GetStdHandle(0xFFFFFFF6);
  hFile = GetStdHandle(0xFFFFFFF5);
  WriteFile(hFile, aLetSStartOutEa, 0x2Au, &NumberOfBytesWritten, 0);
  ReadFile(v0, byte_402158, 0x32u, &NumberOfBytesWritten, 0);
  v1 = 0;
  while ( ((unsigned __int8)byte_402158[v1] ^ 0x7D) == byte_402140[v1] )
    if ( ++v1 >= 24 )
      return WriteFile(hFile, aYouAreSuccess, 0x12u, &NumberOfBytesWritten, 0);
  return WriteFile(hFile, aYouAreFailure, 0x12u, &NumberOfBytesWritten, 0);

The input string is stored at address 0x402158 and each byte is compared to a byte array at address 0x402140, after being xor-ed with the value 0x7d.

The ruby script below can be used to read the 0x18 bytes at address 0x402140 (offset 0x540 in the file), to apply a xor operation on each byte with the value 0x7d and then return the expected string.

#!/usr/bin/env ruby

OFFSET = 0x540
COUNT = 0x18

f ="i_am_happy_you_are_to_playing_the_flareon_challenge.exe", "rb"), IO::SEEK_CUR)

p'C*').map {|x| x ^ 0x7d}.pack('C*')


By running this script, we can obtain the validation email for the first level:

C:\temp>ruby solve_01.rb

Level 2

This level requires to analyze the provided program, very_success.exe. At first, it looks quite similar to the previous level: when started, it asks for a password and then prints the same message.

You crushed that last one! Let's up the game.
Enter the password> AAAAA
You are failure

As shown below, the entry point starts by calling the function at address 0x401000:


Interestingly, this function pops the return address from the stack, which is equal to 0x4010E4, and stores it in a local variable (renamed ref_data in IDA) :


The rest of the function prints a bunch of string on the standard output, reads at most 50 bytes from the standard input and calls the function at address 0x401084 :


This function takes three arguments: the return address from function 0x401000, the string read from the standard input and the number of bytes read. This last value is compared to 0x25: if equal, the execution continues at address 0x401098, otherwise the function returns 0.

The function then starts a loop, reading each byte from the input, applying a certain operation on it and then comparing the result with the data referenced by ref_data, starting from the end.


The comparison is :

input_string[i] ^ 0xC7 + ROL(1, bx & 3) + 1 == ref_data[0x25 - 1 - i]

As this is clearly reversible, it is possible to obtain the expected string by applying the reverse operation on the data referenced at address 0x4010E4, using the Ruby script below:

#!/usr/bin/env ruby

OFFSET = 0x2E4 # ref_data

input = "very_success.exe"

f=, "rb")
data ='C*')

bx = 0
res = []
data.reverse.each do |b|

  shift = bx & 0x3
  c = (b - (1 + (1 << shift))) ^ 0xc7

  res << c
  bx += b
p res.pack('C*')

Executing the script returns the validation email address for this level:

C:\temp>ruby solve_02.rb

Level 3

The next file to analyze, elfie.exe, is quite heavy, weighing around 12MB.

When executed, the program will only display the picture below:


It can take a while to figure out what to do: opening the binary in a disassembler and performing a top-down analysis from the entry point is not a very good idea because of the vast quantity of code.

When examining the strings contained in the program, many references to Python libraries can be found:

$ strings elfie.exe

At this point, you can google a few strings to figure out what is going on, but long story short, this program was built using the PyInstaller tool which is used to embed all the dependencies required to run a Python script into a big binary file.

These dependencies can be extracted using :

C:\temp>c:\Python27\python.exe elfie.exe
Successfully extracted Pyinstaller archive : elfie.exe

Now use Easy Python Decompiler v1.1 to decompile the pyc files
Choose Uncompyle2 as the decompiler engine as the other engine
is unstable and can crash although it is very fast.

Among the extracted resources, the elfie file looks interesting: it is a large obfuscated Python script (4MB, over 57K lines) which looks as shown below:

O00OO00OOO0OOOO0OOOO0OO00000OOO0 += 'aTBTMXd3OC8yY0ZqdzBIU0JMT0tEcktGckJUTkpvRGw2d'
O00OO00OOO0OOOO0OOOO0OO00000OOO0 += 'nNocTB'
import base64
exec(base64.b64decode(OOO0OOOOOOOO0000O000O00O0OOOO00O + O0O00OO0OO00OO00OO00O000OOO0O000 + [...] + O00OOOOO000O00O0O00000OOO0000OOO + O0O0OOO000O000OO0O0O0OOOOO0OO000))

Obtaining the decoded Python instructions to will be executed only requires the exec call to be replaced by print and that the standard output be redirected to a new file:

C:\temp>c:\Python27\python.exe elfie >

The resulting Python file is still obfuscated, but a method at line 13 looks interesting:

def O000OOOOOO0OOOO00000OO0O0O000OO0(self):
      O0O0O0000OOO000O00000OOO000OO000 = getattr(self, 'txeTnialPot'[::-1])()
      if (O0O0O0000OOO000O00000OOO000OO000 == ''.join((OO00O00OOOO00OO000O00OO0OOOO0000 for OO00O00OOOO00OO000O00OO0OOOO0000 in reversed('')))):
              self.OO0O0O0O0OO0OO00000OO00O0O0000O0.setWindowTitle('!sseccus taerg'[::-1])
              self.OOOOOOOOOO0O0OOOOO000OO000OO0O00 = True

The validation email is present in the source code of the script, in a reversed form. Once reversed, the correct email address appears:

irb(main):001:0> ''.reverse
=> ""

Level 4

The provided file, youPecks, can be opened in PeID for identification:


According to PeID, the binary seems packed with UPX.

Running the program only prints 2 + 2 = 4, as illustrated below:

2 + 2 = 4

However, after unpacking the binary with upx, the resulting program seems to behave differently:

C:\temp>upx.exe -d -o youPecks-upx.exe youPecks.exe
                       Ultimate Packer for eXecutables
                          Copyright (C) 1996 - 2013
UPX 3.91w       Markus Oberhumer, Laszlo Molnar & John Reiser   Sep 30th 2013

        File size         Ratio      Format      Name
   --------------------   ------   -----------   -----------
     25088 <-     12800   51.02%    win32/pe     youPecks-upx.exe

Unpacked 1 file.

2 + 2 = 5

This time, the program prints 2 + 2 = 5 which, obviously, is wrong. Something seems fishy here. When opening the file in IDA, a reference to byte 5 can be found near the string 2 + 2.


At this point, it can be assumed that the upx stub in the original youPecks.exe is patching the data section during the unpacking process, whereas the upx tool is still using the standard decompression algorithm.

A manual unpacking can be attempted at this point. Before doing so, it is a good precaution to disable ASLR, to ensure that the sections are mapped at the same base address during each execution. This can be achieved using the tool setdllcharacteristics provided by Didier Stevens.

C:\temp>setdllcharacteristics.exe -d youPecks.exe
 NX_COMPAT       = 1
 NX_COMPAT       = 1

The program is then debugged through OllyDbg, a breakpoint being set just before the jump to the original entry point (first unconditional jump after the popad instruction).


The unpacked sections are then dumped to a file using the plugin ollydump :


Finally, LordPE is used to rebuild the import table :


The resulting program can be run to print the string 2 + 2 = 4 :

2 + 2 = 4

It is then possible to open the binary in IDA for analysis. First, we can check that the value 4 is present in the data section:


The assembly can then be analyzed, starting from the entry point. Interesting code can be found at location 0x401560: the first argument of the program is converted to an integer and a MD5 digest is computed from this value. Additionally, the current hour is obtained through a call to _localtime64_s and then stored in the edi register.


Further down, an index is computed from the value contained in edi and a string is referenced using said index. The string content is then decoded from base64.


Next, each byte of the data decoded from base64 (md5_ref in the screenshot below) is compared to the MD5 digest calculated from the first argument.


As the digest is computed on a single byte, it is possible to find the expected value for argv[1] by computing every digest from 0 to 255 and then look for the digest corresponding to md5_ref:

irb(main):008:0> (0..255) {|x| [x, Digest::MD5.hexdigest([ x ].pack('C*'))]}.select {|x| x[1] =~ /47ed73/}
 => [[17, "47ed733b8d10be225eceba344d533586"]]

To avoid a premature exit, the first argument of the program must be 17. When debugging the program, it has been observed that this value matches the current hour obtained with the call to _localtime64_s. To summarize, the program must be ran with the current hour as first argument.

As there is only one call to the function responsible for calculating a MD5 digest, we can only assume at this point that the digest for every hour is precomputed and stored in base64. For example, for value 17, we expect to find a string corresponding to :

irb(main):004:0> Base64.encode64(Digest::MD5.digest([17].pack('C')))
=> "R+1zO40QviJezro0TVM1hg==\n

The same exact string cannot be found but a similar string with upper/lowercase swapped is present in the binary:


This can be explained by looking at the function responsible for base64 decoding. The string used as alphabet is not in the regular order (uppercase before lowercase letters), as shown below:


Continuing the analysis, another base64 string is decoded (referenced by the local variable ciphertext in the screenshot below) and a loop is executed to apply a xor operation on each byte of the decoded data.


The result can be observed when debugging the program:


Finally, these operations can be reproduced by the Ruby script below:

#!/usr/bin/env ruby

require 'base64'


f =, "rb") START_OFFSET

data = - START_OFFSET + 1)


b64_strings = data.split("\x00\x00\x00\x00")

s1 = b64_strings[0]
s2 = b64_strings[24]

key = Base64.decode64(s1.swapcase).bytes
cipher = Base64.decode64(s2.swapcase).bytes

puts {|b, i| b ^ key[i % 16]}.pack('C*')

The complete validation email can be obtained by running the script:

$ ./solve_04.rb youPecks-ollydump_.exe

Level 5

The archive for this level provides two files: challenge.cap, a network capture file, and sender which is a win32 portable executable.

The capture file contains twelve HTTP POST requests, each one transmitting 4 bytes corresponding to 4 printable characters (UDYs in the screenshot below).


To understand the meaning of these 4 characters, it is necessary to disassemble the provided binary.

The main function only calls another function, sub_821100 :


IDA does a pretty good job at decompiling this function, as shown below:

signed int sub_821100()
  HANDLE v0; // esi@1
  DWORD bytes_count; // esi@3
  unsigned int v3; // ecx@5
  unsigned int total_size; // edi@7
  char *dest_buffer; // ebx@7
  unsigned int count; // esi@8
  DWORD NumberOfBytesRead; // [sp+4h] [bp-80008h]@1
  unsigned __int8 key_data[524288]; // [sp+8h] [bp-80004h]@3

  NumberOfBytesRead = 0;
  v0 = CreateFileA("key.txt", GENERIC_READ, 0, 0, 3u, 0x80u, 0);
  ReadFile(v0, key_data, 0x80000u, &NumberOfBytesRead, 0);
  bytes_count = NumberOfBytesRead;

  sub_821250((char *)key_data, NumberOfBytesRead);
  if ( bytes_count % 3 )
    v3 = 3 * (bytes_count / 3) - bytes_count + 3;
    v3 = 0;
  total_size = 4 * ((bytes_count + v3) / 3);
  dest_buffer = (char *)sub_821427(1, total_size + 1);
  sub_8212A0(bytes_count, key_data, dest_buffer, total_size);

  count = 0;
  while ( sub_821000(&dest_buffer[count], count) ) {
      count += 4;
      if ( count >= total_size )
	  return 1;
  return 0;

Four functions then need to be analyzed to understand the whole program: sub_821250, sub_821427, sub_8212A0 and sub_821000.

The function sub_821250 can be decompiled to the C code below:

void __fastcall sub_821250(char *a1, unsigned int a2)
  unsigned int v2; // esi@1

  v2 = 0;
  if ( a2 )
      a1[v2] += aFlarebearstare[v2 % 14];
    while ( v2 < a2 );

It takes a string as input and will add the corresponding byte in the string "flarebearstare" to each byte in the input string.

The function sub_821427 is used to allocate the specified amount of memory.

The function sub_8212A0 implements a base64 encoding function. The encoding alphabet is constructed by storing 4 different 128 bits integers on the stack, using SSE instructions. As with the previous level, the alphabet used for encoding is different than the regular base64 alphabet: lowercase letters have been swapped with the uppercase letters.


Finally, the function sub_821000 is decompiled by IDA as below (without the error handling code) :

signed int __usercall sub_821000@<eax>(void *a1@<ecx>, char a2@<sil>)
  void *v2; // eax@1
  void *v3; // edi@1
  signed int result; // eax@2
  void *v5; // eax@3
  void *v6; // ebx@3
  void *v7; // eax@5
  void *v8; // esi@5
  LPVOID lpOptional; // [sp+4h] [bp-4h]@1

  lpOptional = a1;
  v2 = InternetOpenA("Mozilla/5.0 (Windows NT 6.1; WOW64) KEY", 0, 0, 0, 0);
  v3 = v2;
  v5 = InternetConnectA(v2, "localhost", 0x50u, 0, 0, 3u, 0, 1u);
  v6 = v5;
  v7 = HttpOpenRequestA(v5, "POST", "/", 0, 0, 0, 0, 1u);
  v8 = v7;
  HttpSendRequestA(v7, 0, 0, lpOptional, 4u);
  result = 1;

  return result;

In a nutshell, the following steps are executed by the program:

  1. the content of the file key.txt is read

  2. this data read previously is mixed with the string flarebearstare

  3. a new buffer is allocated to store the data encoded in base64, with the case swapped

  4. finally, the base64 encoded string is sent via HTTP POST requests, 4 bytes at a time

As the base64 encoding and the mixing operation with the string flarebearstare are reversible , it is then possible to extract the payloads from the network capture file and obtain the content of the original key.txt file.

The Ruby script below implements these manipulations:

require 'base64'

FLARE_BEAR_STARE = "flarebearstare".unpack('C*')

s = "55:44:59:73" +
    "31:44:37:62" +
    "4e:6d:64:45" +
    "31:6f:33:67" +
    "35:6d:73:31" +
    "56:36:52:72" +
    "59:43:56:76" +
    "4f:44:4a:46" +
    "31:44:70:78" +
    "4b:54:78:41" +
    "4a:39:78:75" +

s = s.gsub(':', '').scan(/../).map {|x| x.to_i(16)}
s = s.pack('C*').swapcase

t = Base64.decode64(s)
t = t.unpack('C*')

v = []
t.size.times do |i|
  v << (t[i] - FLARE_BEAR_STARE[i % 14])

p v.pack('C*')

Running this script returns the validation email for this level:

C:\temp>ruby solve_05.rb

Level 6

Solving the level 6 requires the analysis of an android application, android.apk.

When started through an emulator, the application asks for a string to validate:


The content of the application can simply be extracted using unzip:

$ unzip android.apk
  inflating: AndroidManifest.xml
  inflating: res/anim/abc_fade_in.xml
  inflating: res/anim/abc_fade_out.xml
  inflating: res/anim/abc_grow_fade_in_from_bottom.xml
  extracting: resources.arsc
  inflating: classes.dex
  inflating: lib/armeabi/
  inflating: META-INF/CERT.SF
  inflating: META-INF/CERT.RSA

Two files are particularly interesting: classes.dex and

The first one can be converted to a JAR file using dex2jar :

$ ./ -o ../classes.jar ../classes.dex
dex2jar ../classes.dex -> ../classes.jar

The result can be decompiled with jd-gui and the string entered through the application seems to be checked against the validate method which is implemented in a native library,


This library exports a few symbols, especially the function Java_com_flareon_flare_Va:

$ readelf -s ./lib/armeabi/

Symbol table '.dynsym' contains 65 entries:
   Num:    Value  Size Type    Bind   Vis      Ndx Name
     0: 00000000     0 NOTYPE  LOCAL  DEFAULT  UND
     1: 00000000     0 FUNC    GLOBAL DEFAULT  UND __cxa_finalize
     2: 00000000     0 FUNC    GLOBAL DEFAULT  UND __cxa_atexit
     3: 00000fd1     0 FUNC    GLOBAL DEFAULT    8 __aeabi_uidiv
     4: 00001051    18 FUNC    GLOBAL DEFAULT    8 __aeabi_uidivmod
     5: 00000e65   364 FUNC    GLOBAL DEFAULT    8 Java_com_flareon_flare_Va
     6: 00000000     0 FUNC    GLOBAL DEFAULT  UND memset
     7: 00000000     0 FUNC    GLOBAL DEFAULT  UND memcpy
     8: 00000000     0 FUNC    GLOBAL DEFAULT  UND strle

This function can be debugged remotely using IDA by pushing the android_server binary on the smartphone and starting it.

$ adb push $HOME/ida/dbgsrv/android_server /data/local/tmp
$ adb shell
# su -
# cd /data/local/tmp
# chmod 755 android_server
# ls -al
-rw-rw-rw- root     root      1078129 2015-07-27 09:40 android.apk
-rwxr-xr-x root     root       523480 2015-04-28 15:17 android_server
# ./android_server
IDA Android 32-bit remote debug server(ST) v1.17. Hex-Rays (c) 2004-2014
Listening on port #23946...

Once done, a TCP port forward must be set up with adb.

$ ./adb forward tcp:23946 tcp:23946

Finally, the debugger can attach itself to the process corresponding to the Flare application:


A breakpoint must then be set up at the beginning of the validate method that will be hit when the user clicks on the Validate button in the application:


The function can be debugged step by step to facilitate its analysis. A few instructions below, the function strlen is called on the string entered through the application and the result is compared to 46. Hence, the application expects a string that is 46 characters long.


The page describes the prototype of a JNI method:

//C++ code
extern "C"
JNIEXPORT void JNICALL Java_ClassName_MethodName
  (JNIEnv *env, jobject obj, jstring javaString)
    //Get the native string from javaString
    const char *nativeString = env->GetStringUTFChars(javaString, 0);

    //Do something with the nativeString

    env->ReleaseStringUTFChars(javaString, nativeString);

Using this information, it is possible to decompile the function under IDA and redefine the type of the arguments. The result is presented below:

int __fastcall Java_com_flareon_flare_ValidateActivity_validate(JNIEnv *env, jobject jobj, jstring str)
  JNIEnv *v3; // r5@1
  const char *v4; // r0@1
  const char *v5; // r6@1
  jstring (*v6)(JNIEnv *, const char *); // r3@4
  JNIEnv *v7; // r0@4
  _UNKNOWN *v8; // r1@4
  int v9; // r4@5
  int i_1; // r7@6
  size_t i; // [sp+0h] [bp-1BB8h]@3
  int v13; // [sp+4h] [bp-1BB4h]@3
  jstring v14; // [sp+8h] [bp-1BB0h]@1
  signed int v15; // [sp+Ch] [bp-1BACh]@3
  unsigned int v16; // [sp+10h] [bp-1BA8h]@7
  _DWORD dest[23]; // [sp+1Ch] [bp-1B9Ch]@1
  _WORD s[3476]; // [sp+78h] [bp-1B40h]@1

  v3 = env;
  v14 = str;
  j_j_memset(s, 0, 6952u);
  j_j_memcpy(dest, &off_4B5DA004, 0x5Cu);
  v4 = (const char *)((int (__fastcall *)(JNIEnv *, jstring, _DWORD))(*v3)->GetStringUTFChars)(v3, v14, 0);
  v5 = v4;
  if ( v4 && j_j_strlen(v4) <= 0x2E )
    v13 = 0;
    v15 = 1;
    for ( i = 0; i < j_j_strlen(v5); i += 2 )
      j_j_memset(s, 0, 6952u);
      v9 = 0;
      if ( v5[i] )
        v9 = v5[i];
        if ( v5[i + 1] )
          v9 = ((v5[i] << 8) | (unsigned int)v5[i + 1]) <= 0x7E7E ? (v5[i] << 8) | v5[i + 1] : 0;
      i_1 = 0;
        v16 = (unsigned __int16)word_4B5D7214[i_1];
        while ( !(v9 % v16 & 0xFFFF) )
          v9 = v9 / v16 & 0xFFFF;
          if ( (unsigned int)v9 <= 1 )
            goto LABEL_10;
      while ( i_1 != 3476 );
      if ( j_j_memcmp((const void *)dest[v13], s, 0xD94u) )
        v15 = 0;
    ((void (__fastcall *)(_DWORD, _DWORD, _DWORD))(*v3)->ReleaseStringUTFChars)(v3, v14, v5);
    v6 = (*v3)->NewStringUTF;
    v7 = v3;
    if ( v13 == 23 && v15 )
      v8 = (_UNKNOWN *)"That's it!";
      v8 = &unk_4B5D8D3C;
    ((void (__fastcall *)(_DWORD, _DWORD, _DWORD))(*v3)->ReleaseStringUTFChars)(v3, v14, v5);
    v6 = (*v3)->NewStringUTF;
    v7 = v3;
    v8 = &unk_4B5D8D3C;
  return ((int (__fastcall *)(JNIEnv *, _UNKNOWN *))v6)(v7, v8);

The function starts by clearing the content of a buffer (variable s). Then the data at address 0x4B5DA004 is copied to another buffer, dest. This data corresponds to an array of 23 valid memory addresses:


Each one of these addresses references another array containing 3476 words, most of them equal to 0:


Next, the function starts a loop. At the beginning of each iteration, a word (variable v9) is set using two characters of the input string, from the current position. Then another loop is started, with 3476 iterations at most. For each iteration, a word (variable v16) is looked up from an array at address 0x4B5D7214. If the word v9 can be divided by v16, then the value at v[i_1] is incremented. This inner loop will stop if v9 is equal or less than 1 or after 3476 iterations.

The beginning of the array at address 0x4B5D7214 is shown below:


A list of prime integers can be recognized.

After exiting the inner loop, the contents of the buffer s and the one at address dest[v13] are compared. If equal, v13 is incremented. Otherwise, v15 is set to 0.

At the end, if v13 is equal to 23 and v15 equal to 1, the string That’s it! is returned.

The function implements a basic integer factoring algorithm: each pair of bytes from the input string is transformed into a word and the result of the factorization is stored in the array s. For each iteration, this result is compared to the expected factorization (dest[v13]).

The Ruby script below uses the reference factorization to reconstruct the expected input string :

#!/usr/bin/env ruby

T1_ADDR = 0x5004

input = ARGV.shift

f =, "rb") (T1_ADDR - 0x1000), IO::SEEK_SET

t1_data = * 4)

t1 = t1_data.unpack('L*') 0x2214, IO::SEEK_SET

word_2214_data =
word_2214 = word_2214_data.unpack('S*')

res = []

t1.each do |v|
  offset = v - 0x1000 offset, IO::SEEK_SET
  data =
  a = data.unpack('S*')
  v = 1
  a.each_with_index do |x, i|
    if x != 0 then
      v *=  (word_2214[i] ** x)
  res << (v >> 8)
  res << (v & 0xff)

p res.pack('C*')


Running the script returns the validation email:

$ ruby solve_06.rb

Level 7

The provided file for this level is a Win32 portable executable, coded in .NET.

Opening the file with ILSpy is not very helpful:


The binary seems obfuscated. A few references to SmartAssembly can be found, which is indeed a .NET obfuscator.

The tool de4dot can be used to get a clean binary:

c:\temp>de4dot.exe -f e:\level_07\YUSoMeta -o e:\level_07\YUSoMeta-cleaned.exe

de4dot v3.1.41592.3405 Copyright (C) 2011-2014
Latest version and source code:
21 deobfuscator modules loaded!

Detected SmartAssembly (e:\level_07\YUSoMeta)
Cleaning e:\level_07\YUSoMeta
Renaming all obfuscated symbols
Saving e:\level_07\YUSoMeta-cleaned.exe

This time, the decompilation with ILSpy works perfectly:


The Main method reads a string from the console that is then compared to another string, generated by the methods smethod_0 and smethod_3. At this point, we want to analyze these two methods in order to generate the expected string.

The code from smethod_0 is shown below:

// ns2.Class3
static string smethod_0(Class1 class1_0, byte[] byte_0)
	byte[] array = Class3.smethod_2();
	string text = "";
	for (int i = 0; i < byte_0.Length; i++)
		text += (char)(byte_0[i] ^ array[i % array.Length]);
	return text;

This method will simply get a byte array from smethod_2 and apply a xor operation on the parameter using this array.

The code of smethod_2 is:

// ns2.Class3
static byte[] smethod_2()
	return Assembly.GetExecutingAssembly().ManifestModule.ResolveMethod(100663297).GetMethodBody().GetILAsByteArray();

Interestingly, the byte array returned is the bytecode of a specific method from the program. We can expect that this array will not be the same in the deobfuscated version of the binary compared to the original one.

The code of method smethod_3 is shown below:

// ns2.Class3
static string smethod_3()
	StringBuilder stringBuilder = new StringBuilder();
	MD5 mD = MD5.Create();
	foreach (CustomAttributeData current in CustomAttributeData.GetCustomAttributes(Assembly.GetExecutingAssembly()))
	byte[] bytes = Encoding.Unicode.GetBytes(stringBuilder.ToString());
	byte[] value = mD.ComputeHash(bytes);
	return BitConverter.ToString(value).Replace("-", "");

As with the previously seen smethod_2, the output of this method will vary between the two versions of the binary, because of the call to GetExecutingAssembly.

A few different methods can then be followed to recover the expected string. I chose to code a small program that will read the assembly from a specified file, the methods smethod_2 and smethod_3 being slightly modified to operate on this assembly. The program source is presented below:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Reflection;
using System.Security.Cryptography;

namespace level07_calc
    class Program
        static byte[] smethod_2(Assembly asm)
            return asm.ManifestModule.ResolveMethod(0x6000001).GetMethodBody().GetILAsByteArray();

        static string smethod_0(Assembly asm, byte[] byte_0)
            byte[] array = smethod_2(asm);
            string text = "";
            for (int i = 0; i < byte_0.Length; i++)
                text += (char)(byte_0[i] ^ array[i % array.Length]);
            return text;

        static string smethod_3(Assembly asm)
            StringBuilder stringBuilder = new StringBuilder();
            MD5 mD = MD5.Create();
            foreach (CustomAttributeData current in CustomAttributeData.GetCustomAttributes(asm))
            byte[] bytes = Encoding.Unicode.GetBytes(stringBuilder.ToString());
            byte[] value = mD.ComputeHash(bytes);
            return BitConverter.ToString(value).Replace("-", "");

        static void Main(string[] args)
            String input = args[0];

            var asm = Assembly.LoadFile(args[0]);

            byte[] byte_2 = new byte[]
              { 31, 100, 116, 97, 0, 84, 69, 21,
                115, 97, 109, 29, 79, 68, 21, 104,
                115, 104, 21, 84, 78

            var s1 = Program.smethod_0(asm, byte_2);
            var s2 = Program.smethod_3(asm);
            Console.WriteLine(s1 + "_" + s2);

Running the program on the YUSoMeta file will return the expected string.

c:\temp>level07_calc.exe YUSoMeta.exe

As expected, the result is different on the deobfuscated version:

c:\temp>level07_calc.exe c:\temp\YUSoMeta-cleaned.exe
↔L{a ^o↨[nm↔En↨@|h§^d_6E51105290B0D056B93C06ED630404C6

Finally, entering the expected string when running YUSoMeta returns the validation address:

Warning! This program is 100% tamper-proof!
Please enter the correct password: metaprogrammingisherd_DD9BE1704C690FB422F1509
Thank you for providing the correct password.
Use the following email address to proceed to the next challenge: Justr3adth3sou

Level 8

The program for this level is a Win32 executable named gdssagh. As shown below, the file contains strings that seem to be encoded in base64.

$ strings gdssagh
!This program cannot be run in DOS mode.

Decoding this data will result in a PNG file shown below:


At this point, we want to use the zteg tool on the image to check if any data is encoded using steganographic techniques:

$ zsteg 0.png
imagedata           .. text: " $%NPP>=<"
b1,r,msb,xy         .. file: Applesoft BASIC program data, first line number 64
b1,rgb,msb,xy       .. file: PE32 executable Intel 80386, for MS Windows
b1,bgr,lsb,xy       .. file: GLS_BINARY_LSB_FIRST
b2,rgb,msb,xy       .. text: "UDDADPAE"
b2,bgr,msb,xy       .. text: "|IAEQ@DDD"
b4,r,msb,xy         .. text: "Ab@pT&we-b e"
b4,g,msb,xy         .. text: "%`$Q\"wTf@"
b4,b,msb,xy         .. text: "C$qFqgf#0wpq"
b4,rgb,msb,xy       .. text: "BcrpAPpv#"
b4,bgr,msb,xy       .. text: "@CrbqP@v s"

With the b1,rgb,msb,xy method, zsteg detects an embedded PE32 executable. This method means that the first bit of the red, green and blue components must be extracted and that the bitstream must be reconstructed starting by the most significant bit.

The Ruby script below reconstructs the bitstream according to this schema and saves the result in a file:

#!/usr/bin/env ruby

require 'chunky_png'

class BitStream
  def initialize(basename)
    @basename = basename
    @msb_first = []

  def <<(bit)
    @current_msb |= (bit << @count)
    @count += 1
    if @count == 8
      @msb_first << @current_msb

  def reset
    @count = 0
    @current_msb = 0

  def finalize"#{@basename}-msb.bin", "wb") do |f|
      f.write @msb_first.pack('C*')

image = ChunkyPNG::Image.from_file('0.png')

rgb ="rgb")

(0..image.dimension.height-1).each do |y|
  (0..image.dimension.width-1).each do |x|
    red = ChunkyPNG::Color.r(image[x,y])
    green = ChunkyPNG::Color.g(image[x,y])
    blue = ChunkyPNG::Color.b(image[x,y])

    rb = red & 1
    gb = green & 1
    bb = blue & 1

    rgb << rb << gb << bb


The file extracted is indeed a PE32 executable:

$ ruby extract_pe.rb
$ file rgb-msb.bin
rgb-msb.bin: PE32 executable (console) Intel 80386, for MS Windows

Running the executable returns the validation email:

$ wine rgb-msb.bin

Level 9

The goal of this level is to analyze the Win32 executable you_are_very_good_at_this. When started, the program asks for a password, as shown below:

I have evolved since the first challenge. You have not. Bring it.
Enter the password> helloworld
You are failure

The next step is to open the file in IDA in order to analyze the disassembly. However, the code uses many anti-disassembly tricks, like jump label+1:


A full static analysis does not feel like the right approach here. Even debugging is made difficult because of the presence of numerous junk instructions.

Using OllyDbg, it is possible to record a trace showing the executed instructions and the memory accesses. This trace was useful to analyze the behaviour of the program.

Inside the trace, it can be observed that a specific sequence of instructions repeats itself several times (many junk instructions have been removed):

main  00401A99   MOV ECX,DWORD PTR SS:[EBP-14]      [0018FF74]=1                ECX=00000001
main  00401A9C   MOV AL,BYTE PTR DS:[ECX+EAX]       [00402187]=41 ('A')         EAX=00402141
main  0018FDC0   MOV AH,BYTE PTR SS:[EBX+ESP+0B4]   [0018FE82]=15               EAX=00401541
main  0018FDC4   XOR AL,AH                                                      EAX=00401554
main  0018FDC0   MOV CL,BYTE PTR SS:[EBX+ESP+88]    [0018FE56]=F5               ECX=000000F5
main  00401B14   ROL AL,CL                                                      EAX=0040158A
main  00401B16   MOV EBX,DWORD PTR SS:[EBX+ESP+2C]  [0018FDFE]=F080B8CC         EBX=F080B8CC
main  0018FDC4   CMPXCHG BL,DL                                                  EAX=004015CC
main  0018FDC7   RETN                               [0018FDCC]=you_are_very_good_at_this.00401B38;ESP=0018FDD0

It seems that the instruction at address 00401A9C reads one byte from the password given as input, depending on the value stored in the ecx register (during the recording of this trace, the password was "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"). This sequence can be found forty and one times in the trace file:

$ grep -F 'MOV AL,BYTE PTR DS:[ECX+EAX]' trace.txt
main  00401A9C                    MOV AL,BYTE PTR DS:[ECX+EAX]            [00402186]=41 ('A')         EAX=00402141
main  00401A9C                    MOV AL,BYTE PTR DS:[ECX+EAX]            [00402187]=41 ('A')         EAX=00402141
main  00401A9C                    MOV AL,BYTE PTR DS:[ECX+EAX]            [00402188]=41 ('A')         EAX=00402141
main  00401A9C                    MOV AL,BYTE PTR DS:[ECX+EAX]            [004021AE]=41 ('A')         EAX=00402141
$ grep -F 'MOV AL,BYTE PTR DS:[ECX+EAX]' trace.txt | wc -l

The program loops on each byte of the input string, applies certain arithmetic operations (ROL(AH ^ AL, CL)) and compares the result to BL with the CMPXCHG instruction. For each iteration, it is possible to reverse these operations to obtain the expected value of AL = ROR(BL, CL) ^ AH, being given the values of BL, AH and CL that can be extracted from the recorded trace.

The Ruby script below extracts the values of the registers BL, AH and CL for each iteration and computes the expected value for AL:

#!/usr/bin/env ruby


def ror(b, count)
  count = count % BIT_WIDTH
  (b >> count ) | (b << (BIT_WIDTH - count)) & 0xFF

bl = []
ah = []
cl = []"trace.txt", "r").each_line do |line|
  case line
  when /MOV EBX,DWORD PTR SS:\[EBX\+ESP\+2C\].+EBX=......(..)/
    bl << $1.to_i(16)
  when /MOV AH,BYTE PTR SS:\[EBX\+ESP\+0B4\].+\]=(..)/
    ah << $1.to_i(16)
  when /MOV CL,BYTE PTR SS:\[EBX\+ESP\+88\].+\]=(..)/
    cl << $1.to_i(16)

al = []
bl.each_with_index do |b, i|
  al << (ror(b, cl[i]) ^ ah[i])

p al.pack('C*')

Running this script returns the validation email:

$ ruby solve_09.rb

Level 10

For this level, a new Win32 executable named loader.exe must be analyzed. Running the program under Windows 7 64-bits shows the following message box:


When started under Windows XP, the program silently returns:


According to the strings command, an AutoIt script seems embedded in the binary.

$ strings loader.exe
This is a third-party compiled AutoIt script.

This script can be recovered using the tool Exe2Aut:


The script begins by copying two files (challenge.sys and ioctl.exe) in the system directory (c:\windows\system32), then the function dothis is called several times. This function takes two parameters $data and $key, decrypts $data using $key and executes the decrypted data after converting it to a string.

The decryption process is handled by the function decrypt. This function prepares a call to CallWindowProc with 5 parameters:

  • $codebuffer object from opcodes harcoded in the function decrypt,

  • $buffer object from the $data argument,

  • size of $data

  • $key as string

  • 0

To get information about the decryption algorithm, it is necessary to dump the hardcoded opcodes to a file and disassemble it. The result can be seen in IDA:


Further down, the astute reverser can recognize a piece of code that looks like the initialization function of the RC4 algorithm:


To test this hypothesis, a decryption is attempted on a string passed to the dothis function:

irb(main):001:0> require 'rc4'
 => true
irb(main):002:0> key = "flarebearstare"
 => "flarebearstare"
irb(main):003:0> dec =
 => #<RC4:0x00000002017b90 @q2=0, @q1=0, @key=[102, 108, [...], 112, 34]>
irb(main):004:0> data = "96d587b8139933d17e3598505e729da736bb66aa6cfa5180289fb6845530".scan(/../).map {|x| x.to_i(16)}.pack('C*')
 => "\x96\xD5\x87\xB8\x13\x993\xD1~5\x98P^r\x9D\xA76\xBBf\xAAl\xFAQ\x80(\x9F\xB6\x84U0"
irb(main):005:0> dec.decrypt(data)
 => "_StartService("", "challenge")"

The calls to dothis can then be replaced by the decrypted strings in order to obtain the following result :

$nret = Execute('_CreateService("", "challenge", "challenge", @SystemDir & "\\challenge.sys", "", "", $SERVICE_KERNEL_DRIVER, $SERVICE_DEMAND_START)')
If $nret Then
  If Execute('_StartService("", "challenge")') Then
    Execute('ShellExecute(@SystemDir & "\\ioctl.exe", "22E0DC")')

The AutoIt script creates a service to load the driver challenge.sys, starts the service and executes the program ioctl.exe with 22E0DC as parameter.

The file challenge.sys and ioctl.exe are indeed present on the filesystem but running the latter with the given parameter does not return any result:


The ioctl.exe program is very basic and can be decompiled cleanly using IDA:

int __cdecl main(int argc, const char **argv, const char **envp)
  const char *v3; // ST18_4@1
  unsigned __int32 param; // ebx@1
  HANDLE v5; // esi@1
  int result; // eax@2
  HANDLE dev_handle; // edi@3
  struct _OVERLAPPED Overlapped; // [sp+8h] [bp-24h]@5
  DWORD BytesReturned; // [sp+1Ch] [bp-10h]@1
  int OutBuffer; // [sp+20h] [bp-Ch]@1
  int v11; // [sp+24h] [bp-8h]@1

  OutBuffer = 0;
  v11 = 0;
  v3 = argv[1];
  BytesReturned = 0;
  param = strtoul(v3, 0, 16);
  v5 = CreateEventA(0, 1, 0, 0);
  if ( v5 )
    dev_handle = CreateFileA("\\\\.\\challenge", 0xC0000000, 3u, 0, 3u, 0x40000000u, 0);
    if ( dev_handle == (HANDLE)-1 )
      printf("open device fail!\n");
      result = 1;
      Overlapped.Internal = 0;
      Overlapped.InternalHigh = 0;
      Overlapped.Offset = 0;
      Overlapped.OffsetHigh = 0;
      Overlapped.hEvent = v5;
      if ( DeviceIoControl(dev_handle, param, 0, 0, &OutBuffer, 8u, &BytesReturned, &Overlapped) )
        GetOverlappedResult(dev_handle, &Overlapped, &BytesReturned, 1);
        result = 0;
        printf("device ioctl fail!\n");
        result = 1;
    printf("CreateEvent fail!\n");
    result = 1;
  return result;

At this point, reversing the driver challenge.sys seems required to identify the code path that is executed when receiving the ioctl 0x22E0DC.

The driver entry calls the initialization function sub_29ED85 and then jumps at location sub_29CC90:


This function can be decompiled with IDA:

NTSTATUS __stdcall sub_29CC90(PDRIVER_OBJECT DriverObject, PUNICODE_STRING RegistryPath)
  NTSTATUS result; // eax@5
  int v3; // [sp+0h] [bp-30h]@4
  NTSTATUS v4; // [sp+0h] [bp-30h]@6
  UNICODE_STRING DestinationString; // [sp+4h] [bp-2Ch]@4
  PDEVICE_OBJECT DeviceObject; // [sp+Ch] [bp-24h]@1
  UNICODE_STRING SymbolicLinkName; // [sp+10h] [bp-20h]@6
  int i; // [sp+18h] [bp-18h]@1
  int v9; // [sp+1Ch] [bp-14h]@1
  __int16 v10; // [sp+20h] [bp-10h]@1
  __int16 v11; // [sp+22h] [bp-Eh]@1
  char v12; // [sp+24h] [bp-Ch]@1
  char v13; // [sp+25h] [bp-Bh]@1
  char v14; // [sp+26h] [bp-Ah]@1
  char v15; // [sp+27h] [bp-9h]@1
  char v16; // [sp+28h] [bp-8h]@1
  char v17; // [sp+29h] [bp-7h]@1
  char v18; // [sp+2Ah] [bp-6h]@1
  char v19; // [sp+2Bh] [bp-5h]@1

  DeviceObject = 0;
  v9 = -571561217;
  v10 = 4919;
  v11 = -16657;
  v12 = -120;
  v13 = 119;
  v14 = 102;
  v15 = 85;
  v16 = 17;
  v17 = 34;
  v18 = 51;
  v19 = 68;
  for ( i = 0; i < 27; ++i )
    DriverObject->MajorFunction[i] = (PDRIVER_DISPATCH)sub_29C1A0;
  DriverObject->DriverUnload = (PDRIVER_UNLOAD)sub_29B5C0;
  RtlInitUnicodeString(&DestinationString, off_29D684);
  v3 = sub_29D9E4(DriverObject, 0, &DestinationString, 34, 256, 1, L"68", &v9, &DeviceObject);
  if ( v3 >= 0 )
    RtlInitUnicodeString(&SymbolicLinkName, SourceString);
    v4 = IoCreateSymbolicLink(&SymbolicLinkName, &DestinationString);
    if ( v4 >= 0 )
      result = 0;
      DbgPrint("Cannot create symbolic link: %x", v4);
      result = 0;
    DbgPrint("Cannot create device: %x", v3);
    result = 0;
  return result;

The driver dispatch function seems to be sub_29C1A0 (renamed driver_dispatch for the rest of this analysis) and can also be decompiled :

int __stdcall driver_dispatch(int a1, PIRP Irp)
  char v2; // ST0F_1@3
  char v94; // ST0F_1@101
  PIO_STACK_LOCATION v96; // [sp+Ch] [bp-14h]@1

  Irp->IoStatus.Status = 0;
  Irp->IoStatus.Information = 0;
  v96 = (PIO_STACK_LOCATION)sub_29CC30(Irp);
  if ( v96->MajorFunction == 14 )
    switch ( v96->Parameters.Read.ByteOffset.LowPart - (_DWORD)&loc_22E004 )
      case 0u:
        v2 = sub_105E0(65);
      case 4u:
      case 8u:
        v3 = sub_10C40(65);
      case 384u:
        v93 = sub_239B0(65);
      case 388u:
        v94 = sub_23CD0(65);
      case 392u:
  IofCompleteRequest(Irp, 0);
  return 0;

The dispatch of IRP is executed through a switch-case structure, the index being calculated relatively to 0x22E004. For example, for the ioctl 22E0DC, the index would be:

irb(main):001:0> 0x22E0DC - 0x22E004
 => 216

The corresponding function is sub_29B620, as shown below:


This function starts by clearing a 24-bytes buffer on the stack. Then, on the first 21 bytes, each bit position is tested with an and instruction, with the operand 1 to 0x80. If the next instruction is jz, the bit needs to be cleared to avoid a premature exit. If the next instruction is jnz, the bits needs to be set.


Using these indications, it is possible to reconstruct the expected buffer. The Ruby script below implements these operations that can be tedious when done manually.

#!/usr/bin/env ruby

require 'metasm'
include Metasm


exefmt = Metasm.const_get("PE")

exe = exefmt.decode_file(ARGV.shift)

exe.disassemble([ IOCTL_22E0DC_HANDLER ])

dasm = exe.disassembler

init_state = { values: { |h,k| h[k] = 0 } }
dasm.function_walk(IOCTL_22E0DC_HANDLER, init_state) do |args|
  event_type = args.first
  next if event_type != :di

  di = args[2]
  state = args[3]
  ins = di.instruction

  case ins.opname
  when "movzx"
    offset = ins.args[1].imm.reduce
    state[:curr_offset] = offset
  when "and"
    state[:bit_pos] = ins.args[1].reduce
  when "jnz"
    offset = state[:curr_offset]
    state[:values][offset] |= state[:bit_pos]

values = init_state[:values]
puts values.sort_by {|offset, value| offset }.map {|offset, value| value }.pack('C*')

Running the script returns the string.

$ ruby -I metasm solve_10.rb challenge.sys
try this ioctl: 22E068

Unfortunately, the string obtained is not the validation email address but an indication to try another ioctl. As shown previously, the index to the switch-case can be computed as below:

irb(main):001:0> 0x22E068 - 0x22E004
 => 100

The function sub_2c760 seems to be the handler for this ioctl:


This functions starts by setting many variables on the stack:


Then many instructions are executed to transform these variables:


Finally, the function calls another function, sub_10570 with three parameters: address 0x29D890, [ebp+a2], [ebp+a3].


This function can be decompiled as below by IDA:

unsigned int __stdcall sub_10570(int *arg1, unsigned int *arg2, int *arg3)
  unsigned int result; // eax@4
  unsigned int n_iter; // [sp+0h] [bp-Ch]@1
  unsigned int index; // [sp+4h] [bp-8h]@1

  n_iter = *arg2 >> 3;
  for ( index = 0; index < n_iter; ++index )
    sub_10490(&arg1[2 * index], arg3);
  result = arg1[2 * n_iter - 1];
  *arg2 = result;
  return result;

The next function, sub_10490, is decompiled as below:

unsigned int __stdcall sub_10490(int *crypted_data, int *key_data)
  unsigned int result; // eax@4
  unsigned int dw_0; // [sp+0h] [bp-24h]@1
  unsigned int v4; // [sp+14h] [bp-10h]@1
  unsigned int dw_1; // [sp+18h] [bp-Ch]@1
  unsigned int i; // [sp+20h] [bp-4h]@1

  dw_0 = *crypted_data;
  dw_1 = crypted_data[1];
  v4 = 0xC6EF3720;
  for ( i = 0; i < 32; ++i )
    dw_1 -= (key_data[3] + (dw_0 >> 5)) ^ (v4 + dw_0) ^ (key_data[2] + 16 * dw_0);
    dw_0 -= (key_data[1] + (dw_1 >> 5)) ^ (v4 + dw_1) ^ (*key_data + 16 * dw_1);
    v4 += 0x61C88647;
  *crypted_data = dw_0;
  result = dw_1;
  crypted_data[1] = dw_1;
  return result;

This function decrypts two double words, using the provided key. By looking up the constants in Google, many references to the TEA encryption algorithm can be found. However, the decrypting operations do not look similar.

To obtain the decrypted data, the three parameters to the function sub_10570 must be resolved.

According to the cross-references to the local variable, the second argument, which represents the crypted data length, is not modified by the function and is equal to 0x28:


Similary, the stack variables used as decryption key are kept unmodified from the beginning of the function:


From that conclusion, we can obtain the decryption key which is 30313233343536373839414243444546 in hexadecimal or 0123456789ABCDEF in ASCII.

The last difficulty is to identify the data that will decrypted by the pseudo-TEA algorithm. According to IDA, this data is stored in a buffer at address 0x29D890 and, from the value of the second argument, we already know that the size of this buffer is 0x28 (40) bytes.

In the data section, this buffer is initialized to zero, as shown below:


One way of obtaining the data to decrypt is to start a kernel debugger and set up a breakpoint before calling the decryption function. Unfortunately, the buffer still seems initialized to zero:


Going back to IDA, we can see that each byte of the buffer has a data cross-reference to an instruction at the end of a sub-function that modifies the value of the byte:


By following each cross-reference to each byte, it is possible to reconstruct the buffer.

By using the backtracking feature of Metasm, these steps can be reproduced by the script below:

#!/usr/bin/env ruby

require 'metasm'
include Metasm


  DELTA = 0x9e3779b9

  def initialize(key)
    @key = key.unpack('L*')
    puts "[+] key: " << b_to_h(key)

  def decrypt_chunk(dw0, dw1)
    y, z = dw0, dw1
    sum = (DELTA << 5) & BIT_MASK
    ITERATIONS.times do |i|
      z -= (@key[3] + (y >> 5)) ^ (y + sum) ^ (@key[2] + 16 * y)
      z &=  BIT_MASK
      y -= (@key[1] + (z >> 5)) ^ (z + sum) ^ (@key[0] + 16 * z)
      y &= BIT_MASK
      sum -= DELTA
      sum &= BIT_MASK
    return [y, z]

  def decrypt(data)
    puts "[+] data: " << b_to_h(data)
    res = []
    dwords = data.unpack('L*')
    while not dwords.empty?
      dw0, dw1 = dwords.shift, dwords.shift
      y, z = decrypt_chunk(dw0, dw1)
      res << y << z
    # pack and remove padding
    return res.pack('L*').strip

  def b_to_h(s)
    s.unpack('C*').map {|x| "%2.2x" % x}.join(" ")

KEY = "0123456789ABCDEF"
IOCTL_22E068_HANDLER = 0x2c760
BUFFER_ADDR = 0x29D890

exefmt = Metasm.const_get("PE")
file = ARGV.shift || "challenge-xp.sys"
exe = exefmt.decode_file(file)
exe.disassemble([ IOCTL_22E068_HANDLER ])

dasm = exe.disassembler

a = []
(0..40).each do |offset|
  dasm.each_xref(BUFFER_ADDR + offset, :w) do |xref|
    next if xref.len != 1
    di = dasm.decoded[xref.origin]
    ins = di.instruction
    arg = ins.args[1]
    next unless ins.opname == "mov" and arg and arg.kind_of? Ia32::Reg

    exp = Expression[arg.symbolic]
    res = dasm.backtrace(exp, xref.origin).first
    a << res.reduce

crypted = a.pack('C*')
flare_tea =
puts flare_tea.decrypt(crypted)

Running the script returns the validation email:

$ ruby -I~/git/hub/metasm solve_10_1.rb challenge-xp.sys
[+] key: 30 31 32 33 34 35 36 37 38 39 41 42 43 44 45 46
[+] data: 56 7f dc fa aa 27 99 c4 6c 7c fc 92 61 61 47 1a 19 b9 63 fd 0c f2 b6 20 c0 2d 5c fd d9 71 54 96 4f 43 f7 ff bb 4c 5d 31

Level 11

For this level, FireEye provides a new Win32 PE to analyze named CryptoGraph.exe.

The program needs at least one argument to run. With just one argument, it loops forever and does not return:


Cryptographic algorithms

As the filename seems to imply, the code may implement some cryptographic algorithms. The KANAL plugin of PeID detects RC5 and MD5 implementations in the binary:


KANAL can produce an IDC script to automatically add comments and mark positions in IDA:

 1 #include <idc.idc>
 3 static main(void)
 4 {
 5 	auto slotidx;
 6 	slotidx = 1;
 8 	MarkPosition(0x00410008, 0, 0, 0, slotidx + 0, "CryptGenRandom [Import]");
 9 	MakeComm(PrevNotTail(0x00410009), "CryptGenRandom [Import]\nMicrosoft CryptoAPI import");
10 	MarkPosition(0x004021E4, 0, 0, 0, slotidx + 1, "MD5");
11 	MakeComm(PrevNotTail(0x004021E5), "MD5\nMD5 transform (\"compress\") constants");
12 	MarkPosition(0x00402A17, 0, 0, 0, slotidx + 2, "RC5 / RC6 [Init, -Delta]");
13 	MakeComm(PrevNotTail(0x00402A18), "RC5 / RC6 [Init, -Delta]\nRC5/6 32bit magic constants, negative Delta");
14 }

The result can be seen in IDA, for the MD5 constant (used in the MD5Transform function) :


And for RC5 (used in the RC5Setup function) :


It can be noticed that the key is initialized in an special way, compared to the standard implementations (as rc5ref.c).

The usual setup looks like the code below :

void RC5_SETUP(unsigned char *K) /* secret input key K[0...b-1]      */
{  WORD i, j, k, u=w/8, A, B, L[c];
   /* Initialize L, then S, then mix key into S */
   for (i=b-1,L[c-1]=0; i!=-1; i--) L[i/u] = (L[i/u]<<8)+K[i];
   for (S[0]=0xb7e15163,i=1; i<t; i++) S[i] = S[i-1]+0x9e3779b9;
   for (A=B=i=j=k=0; k<3*t; k++,i=(i+1)%t,j=(j+1)%c)   /* 3*t > 3*c */
     { A = S[i] = ROTL(S[i]+(A+B),3);
       B = L[j] = ROTL(L[j]+(A+B),(A+B));

In CryptoGraph.exe, the sub-keys are initialized with a sub operation and the negative value of the constant Q:

irb(main):001:0> "%x" % (-0x9e3779b9 & 0xFFFFFFFF)
 => "61c88647"

As mentioned by the Equation Group: questions and answers report from Kapersky, this implementation is unusual and was found in several advanced malwares like Regin and those used by the Equation Group. This may be some kind of reference coming from the author of this challenge or just some kind of coincidence.

By following the cross-references to the MD5Transform (0x4021A0) and RC5Setup (0x402A10) functions and comparing the code to reference implementations of the two algorithms, a few other functions can be identified :

  • MD5Init at 0x401FE0

  • MD5Update at 0x402040

  • MD5Final at 0x4020F0

  • RC5Key at 0x402AC0

  • RC5Decrypt_dwords at 0x402A60, called to decrypt two unsigned 32-bit integers

Additionally, the two structures below were loaded in IDA and mapped to the variables referenced by these functions:

/* MD5 context. */
typedef struct {
  unsigned long state[4];     /* state (ABCD) */
  unsigned long count[2]; /* number of bits, modulo 2^64 (lsb first) */
  unsigned char buffer[64];   /* input buffer */
} MD5_CTX;

/* RC5 context */
typedef struct {
    unsigned long *S;     /* subkeys */
    unsigned long rounds; /* number of rounds */
} RC5_CTX;

Concerning the algorithm implementations in the CryptoGraph.exe binary, some specificities need to be addressed. At the end of the MD5Final function, each double-word of the final state is byte-swapped and then copied to the destination buffer, as illustrated by the screenshot below:


In a standard implementation (see md5.c), the result must be in little endian byte order. The byte-swap operation is only required on big endian architectures, as shown below:

#ifdef __BIG_ENDIAN
# define SWAP(n)							\
    (((n) << 24) | (((n) & 0xff00) << 8) | (((n) >> 8) & 0xff00) | ((n) >> 24))
# define SWAP(n) (n)


/* Put result from CTX in first 16 bytes following RESBUF.  The result
   must be in little endian byte order.

   IMPORTANT: On some systems it is required that RESBUF is correctly
   aligned for a 32 bits value.  */
void *
md5_read_ctx (ctx, resbuf)
     const struct md5_ctx *ctx;
     void *resbuf;
  ((u32 *) resbuf)[0] = SWAP (ctx->A);
  ((u32 *) resbuf)[1] = SWAP (ctx->B);
  ((u32 *) resbuf)[2] = SWAP (ctx->C);
  ((u32 *) resbuf)[3] = SWAP (ctx->D);

  return resbuf;

Consequently, the implementation of the MD5 algorithm in CryptoGraph.exe is not standard.

Another interesting observation is that the function MD5Update is called several times by another function at address 0x402870:


The hashes are computed from data initialized by two constants, xmmword_414690 and xmmword_4146A0:


These two constants are respectively ipad and opad: they are used to compute hash-based message authentication code, as described on Wikipedia.

The function at address 0x402870 can be called to compute a HMAC-MD5 code, given a specify key and a message to authenticate.

Concerning the RC5 algorithm, the function RC5Decrypt_dwords is called by a single function at address 0x402BE0, renamed to RC5Decrypt. This function is used to decrypt a block of data and is shown below (after being decompiled by IDA):

int __stdcall RC5Decrypt(void *crypted, void *plaintext, int size_in_bytes, _DWORD *a4)
  unsigned int rounded_size; // edi@1
  _DWORD *crypted_dwords; // ebx@2
  _DWORD *plaintext_dwords; // esi@2
  unsigned int iter_count; // edi@3
  int v8; // eax@4
  _DWORD *v9; // edx@5
  int result; // eax@5

  rounded_size = size_in_bytes & 0xFFFFFFF8;
  if ( size_in_bytes & 0xFFFFFFF8 )
  { /* last two dwords */
    crypted_dwords = (char *)crypted + rounded_size - 8;
    plaintext_dwords = (char *)plaintext + rounded_size - 8;
    if ( rounded_size > 8 )
      iter_count = ((rounded_size - 9) >> 3) + 1;
        RC5Decrypt_dwords(crypted_dwords, plaintext_dwords);
        *plaintext_dwords ^= *(crypted_dwords - 2);
        v8 = *(crypted_dwords - 1);
        crypted_dwords -= 2;
        plaintext_dwords[1] ^= v8;
        plaintext_dwords -= 2;
      while ( iter_count );
    /* Decrypt the first two dwords */
    RC5Decrypt_dwords(crypted_dwords, plaintext_dwords);
    v9 = plaintext;
    *v9 ^= *a4;
    result = a4[1];
    v9[1] ^= result;
  return result;

The data is decrypted two double-words at a time, starting from the end of the buffer. After each decryption, the result is xor-ed with the previous double-words in the crypted buffer (that will be decrypted at the next iteration).

The first two double-words are decrypted then xor-ed with the values specified by the last argument.

To complement this bottom-up approach and understand how the cryptographic algorithms are used through CryptoGraph.exe, the next step is to follow the code flow, starting from the entry point.

Top-down analysis

After the usual initialization functions (computing the value of security cookie, ___tmainCRTStartup, etc), the main function is finally identified at address 0x401F60. Its code can be decompiled cleanly by IDA and is shown below:

int __cdecl main(int argc, const char **argv, const char **envp)
  HANDLE handle; // esi@1
  char v4; // al@3

  handle = CreateFileW(L"secret.jpg", 0x40000000u, 2u, 0, 2u, 0x80u, 0);
  if ( handle != (HANDLE)-1 )
    if ( argc > 1 )
      v4 = atoi((int)argv[1]);
      sub_401910((int)handle, v4);
      print_string(L"The \"secret.jpg\" has been written to the current execution folder.\n");
      return 0;
    print_string(L"The number of parameters passed in is incorrect.\n");
  return 0;

A handle is opened on the file secret.jpg, the first argument is converted to an integer and the function sub_401910 (renamed decrypt_and_write_secret during the analysis) is called with these two values as parameters. After returning from this function, the program should print The "secret.jpg" has been written to the current execution folder.

The callgraph from the main function is shown below:

main l2

Obvisouly, the next step is to take a look at the function decrypt_and_write_secret.

Function decrypt_and_write_secret

The function starts by reading the contents of the resource number 120 into a buffer referenced by the register esi:


and continues by doing the same for resource 121, this time referencing the data read with edi:


Further down, a crypto provider is initialized and some computations are realized on the data of resource 120: the data is divided into 3 arrays of 16 bytes, each one stored in a xmmword (128-bits). Finally, the three xmmwords are xor-ed and the result is stored in the memory area pointed by esi.

After that, the function sub_4015D0 (renamed decrypt_blocks) is called as shown on the screenshot below:


After returning, the function sub_401CF0 (renamed write_secret) is called. This one is added to the backlog of functions for further analysis.

Function decrypt_blocks

The callgraph of the function decrypt_blocks is shown below:

decrypt blocks l2

First, the function sub_401000 (renamed check_res_121_data) is called with res_121_data as argument.


This function can be decompiled as such:

 1 char __thiscall check_res_121_data(char *res_121_data)
 2 {
 3   char *res_121_data_; // esi@1
 4   _DWORD *ref_md5; // edx@3
 5   unsigned int bytes_to_check; // esi@3
 6   _DWORD *pdigest; // ecx@3
 7   bool finished; // cf@5
 8   MD5_CTX md5_ctx; // [sp+8h] [bp-70h]@1
 9   char digest; // [sp+64h] [bp-14h]@1
11   res_121_data_ = res_121_data;
12   MD5Init(&md5_ctx);
13   MD5Update(res_121_data_ + 48, 0x600);
14   MD5Final(&md5_ctx, &digest);
15   /* compare res_121_data to "FLARE-ON" */
16   if ( *(_DWORD *)res_121_data_ == 0x52414C46 && *((_DWORD *)res_121_data_ + 1) == 0x4E4F2D45 )
17   {
18     ref_md5 = res_121_data_ + 32;
19     bytes_to_check = 12;
20     pdigest = &digest;
21     while ( *pdigest == *ref_md5 )
22     {
23       ++pdigest;
24       ++ref_md5;
25       finished = bytes_to_check < 4;
26       bytes_to_check -= 4;
27       if ( finished )
28         return 1;
29     }
30   }
31   return 0;
32 }

This gives us a few indications on the content of the resource 121. The first 8 bytes must be equal to the string FLARE-ON. Starting from offset 48, 0x600 bytes are hashed through the MD5 algorithm. The resulting digest is compared to a reference digest at offset 32.

Back to the function decrypt_blocks, the beginning of the decompilation is shown below:

 1 int __fastcall decrypt_blocks(void *ctx, int unused, char first_arg, void *res_120_xor, int unused_1, void *res_121_data, int unused_2)
 2 {
 3   [...]
 4   v7 = 0;
 5   result_value = 0;
 6   ctx_ = ctx;
 7   ctx__ = ctx;
 8   if ( !res_121_data )
 9     goto LABEL_29;
10   if ( !check_res_121_data((char *)res_121_data) )
11   {
12     v7 = result_value;
13 LABEL_29:
14     result = v7 + 1;
15     result_value = result;
16     return result;
17   }
18   memcpy_s((char *)ctx_ + 8, 0x630u, res_121_data, 0x630u);
19   *((_DWORD *)ctx_ + 398) = *((_DWORD *)ctx_ + 4);
20   *((_BYTE *)ctx_ + 24) |= first_arg;
21   sub_4014E0(res_120_xor, 16, (char *)ctx_ + 24, v9, *((_DWORD *)ctx_ + 398), (int)rc5_key);
22   rc5_ctx.rounds = *((_DWORD *)res_121_data + 3);
23   KR = 2 * rc5_ctx.rounds + 2;
24   S_mem = (unsigned __int32 *)alloc_mem(4 * KR | -((unsigned __int64)KR >> 30 != 0));
25   rc5_ctx.S = S_mem;
26   if ( S_mem )
27   {
28     RC5Setup(&rc5_ctx);
29     RC5Key(rc5_key, 16u);
30   }
31   dwords[0] = 1;
32   dwords[1] = 1;
33   RC5_decrypt((char *)ctx_ + 56, (char *)ctx_ + 56, 48, dwords);
34   if ( !*((_DWORD *)ctx_ + 14) )
35   {
36     MD5Init(&md5_ctx);
37     MD5Update((char *)ctx_ + 56, 32);
38     MD5Final(&md5_ctx, digest);
39     pref_digest = (char *)ctx_ + 88;
40     bytes_to_check = 12;
41     pdigest = (_DWORD *)digest;
42     while ( *pdigest == *pref_digest )
43     {
44       ++pdigest;
45       ++pref_digest;
46       finished = bytes_to_check < 4;
47       bytes_to_check -= 4;
48       if ( finished )
49         goto LABEL_11;
50     }
51   }
52   ++result_value;
53 LABEL_11:
54   [...]
55   return result;
56 }

The pointer to ctx seems to be used as a structure. At line 18, the entire content of resource 121 is copied to ctx + 8. From the previous analysis of the check_res_121_data function, we obtained a few indications on the meaning of this data. Furthermore, by examining the way the ctx pointer is accessed, its structure can be guessed. For example, a block of 48 bytes is decrypted by the RC5 algorithm at offset 56.

The decompilation can be cleaned a little bit by defining the two structures below and applying this definition to the ctx pointer:

typedef struct {
    _DWORD dw0;
    char data[28];
    char ref_md5sum[16];

typedef struct {
    void *unk;           /* +0 */
    HCRYPTPROV provider; /* +4 */
    char str_1[8];       /* +8, res_121_data, FLARE-ON */
    int v1;              /* +16, 0x8000 */
    int v2;              /* +20, 0xF */
    char str_2[16];      /* +24, random data */
    char ref_md5sum[16]; /* +40, md5 of the 0x600 bytes below */
    BLOCK bl0;           /* +56 */
    char data[0x5d0];    /* +104, random data */
    int v3;              /* +1592 */

The fields v1 and v2 are obtained by debugging the program and examining the memory.

The result is:

 1 int __fastcall decrypt_blocks(void *ctx, int unused, char first_arg, void *res_120_xor, int unused_1, void *res_121_data, int unused_2)
 2   [...]
 3   memcpy_s(ctx_->str_1, 0x630u, res_121_data, 0x630u);
 4   ctx_->v3 = ctx_->v1; /* 0x8000 */
 5   ctx_->str_2[0] |= first_arg; /* atoi(argv[1]) */
 6   /* res_120_xor = 73 22 F7 E5 E9 47 61 76  E6 D2 EB 86 FB EB DF 7E */
 7   sub_4014E0(res_120_xor, 16, ctx_->str_2, v9, ctx_->v3, (int)rc5_key);
 8   rc5_ctx.rounds = *((_DWORD *)res_121_data + 3); /* res_121_data + 3 <=> ctx->v2 == 0xF */
 9   KR = 2 * rc5_ctx.rounds + 2;
10   S_mem = (unsigned __int32 *)alloc_mem(4 * KR | -((unsigned __int64)KR >> 30 != 0));
11   rc5_ctx.S = S_mem;
12   if ( S_mem )
13   {
14     RC5Setup(&rc5_ctx);
15     RC5Key(rc5_key, 16u);
16   }
17   dwords[0] = 1;
18   dwords[1] = 1;
19   RC5_decrypt(&ctx_->bl0, &ctx_->bl0, 48, dwords);
20   if ( !ctx_->bl0.dw0 )
21   {
22     MD5Init(&md5_ctx);
23     MD5Update(&ctx_->bl0, 32);
24     MD5Final(&md5_ctx, digest);
25     pref_digest = (_DWORD *)ctx_->bl0.ref_md5sum;
26     bytes_to_check = 12;
27     pdigest = (_DWORD *)digest;
28     while ( *pdigest == *pref_digest )
29     {
30       ++pdigest;
31       ++pref_digest;
32       finished = bytes_to_check < 4;
33       bytes_to_check -= 4;
34       if ( finished )
35         goto LABEL_11;
36     }
37   }
38   ++result_value;
39 LABEL_11:
40   [...]
41 }

This code can be summarized as follows:

  1. a context is initialized from the content of resource 121

  2. the first byte of atoi(argv[1]) is mixed to the first byte of the ctx→str_2 string

  3. the function sub_4014E0 (renamed calc_rc5_key) is called with the following arguments: res_120_xor = 73 22 F7 E5 E9 47 61 76 E6 D2 EB 86 FB EB DF 7E, 16, ctx→str_2, v9 = 0, ctx→v3 = 0x8000 and a pointer to a buffer on the stack which is later used as an RC5 key

  4. a RC5 context is initialized, with the previously calculated key from the call to sub_4014E0 and the number of rounds specified by ctx→v2 which is equal to 15

  5. the context of ctx→bl0 is decrypted using the modified RC5 algorithm (see Source code of RC5Decrypt function)

  6. the field ctx→bl0.dw0 must be equal to 0 after decryption. If so,

    1. a MD5 digest is computed on the first 32 bytes of ctx→bl0

    2. the resulting digest is checked against a reference digest in the decrypted data (ctx→bl0.ref_md5sum)

    3. if the two digests match, execution continues to LABEL_11. If not, a variable is incremented

At this point, we can make the reasonable assumption that, if the decryption succeeds, it is expected that the double word ctx→bl0.dw0 must be equal to 0. When testing with a random value as first argument, this is clearly not the case:


The function sub_4014E0 seems to be used to derivate a RC5 key from the arguments. IDA manages to decompile this function in a clean manner:

int __usercall sub_4014E0@<eax>(char *key@<ecx>, int len@<edx>, void *message, int unused, int iter_count, void *rc5_key)
  char *key_; // edi@1
  int len_; // esi@1
  int v8; // ecx@1
  MD5_CTX md5_ctx; // [sp+10h] [bp-84h]@1
  char tmp[16]; // [sp+6Ch] [bp-28h]@1
  char hmac_key[16]; // [sp+7Ch] [bp-18h]@1

  key_ = key;
  len_ = len;
  MD5Update(key_, len_);
  MD5Final(&md5_ctx, hmac_key);
  HMAC_MD5_ntimes(tmp, (int)hmac_key, v8, message, v8, iter_count, 1);
  return memcpy_s(rc5_key, 16u, tmp, 16u);

First, the data pointed by the first argument (res_120_xor) is hashed to obtain a MD5 digest, then a message authentication code is computed using the digest as key and the third argument as message. The HMAC-MD5 operation is in fact repeated as many times as specified by the 5th argument (iter_count = 0x8000) using the same key but with the previous HMAC as message.

As seen before, only the first byte of atoi(argv[1]) is mixed with ctx→str_2 which is used as input to sub_4014E0. The other parameters of this function being constant, it must be then possible to test each possible value for argv[1] (0 to 255) and check which one leads to a successful decryption (ctx→bl0.dw0 == 0).

This can be done manually, by debugging the program started with each possibility but for obvious reasons, this is quite tedious. To automate this process, a script can be used to start the program with all the values from 0 to 255. Moreover, for each run, we need to tell if the decryption was successful or not.

From the previous analysis, we know that a different code path is followed if the decryption is unsuccessful: in that case, the variable result_value is incremented.


It may then be possible to patch the code to replace the increment with an invalid instruction. By doing so, only the right value will not crash the program.

By setting a breakpoint before executing this instruction and running the program, we can examine the status of the process. Specifically, the eax register is interesting because it does not reference a valid memory address. It may be possible to crash the program by writing a value at the location pointed by eax. The binary code is patched to obtain the result below:


As the original instruction referenced a memory location in the data section, the operand is modified by the PE loader, when applying relocations. The new instruction will also be modified by the relocation process but this should only affect the operand (value written to the memory), so the instruction will still remain invalid.

The batch script below starts the patched version of CryptoGraph.exe with each possible value:

@echo off
for /l %%x in (0, 1, 255) do (
   echo %%x
   CryptoGraphPatched %%x

Before running the script, it is recommended to disable error reporting and the error pop-up by setting by following the instructions at

The script only stops when the argument is equal to 205:


Debugging the program with the right value shows that the decryption is successful this time:


The digest can also be computed on the first 32 bytes of the decrypted data:

irb(main):001:0> s = "00 00 00 00 00 00 01 00 39 6C 1A DA B5 01 08 02 B0 54 48 92 F0 0F B7 D3 71 C9 80 74 B9 A2 22 29"
 => "00 00 00 00 00 00 01 00 39 6C 1A DA B5 01 08 02 B0 54 48 92 F0 0F B7 D3 71 C9 80 74 B9 A2 22 29"
irb(main):002:0> require 'digest'
 => true
irb(main):003:0> Digest::MD5.hexdigest( s.gsub(/\s+/, '').scan(/../).map {|x| x.to_i(16)}.pack('C*') )
 => "88d48c0cae734f5b57d8e3cc8c711759"

This is the same digest but with a different byte order, as expected (last line in the memory dump of the previous screenshot).

At this point, we know that by starting the program with 205 as argument, the location LABEL_11 will be reached in the expected state (first block being decrypted successfully). The analysis of sub_4015D0 can continue from this location.

After reaching LABEL_11, the program will try to decrypt the next block, that is to say the next 48 bytes pointed by ctx→data, this times with different parameters:

  • the HMAC key is bl0→data + 4

  • the HMAC message is bl0→data + 12

  • the HMAC iteration count is pointed by bl0→data (0x10000 = 65536)

  • the RC5 decryption function is called 3 times on the same block

With these indications, the previously defined structures can be refined as follow:

typedef struct {
    _DWORD block_idx;
    _DWORD hmac_iter_count;
    char hmac_key[8];
    char hmac_mess[16];
    char ref_md5sum[16];

typedef struct {
    void *unk;             /* +0 */
    HCRYPTPROV provider;   /* +4 */
    char header[8];        /* +8, res_121_data, FLARE-ON */
    int hmac_iter_count;   /* +16 */
    int rc5_rounds;        /* +20 */
    char hmac_mess[16];    /* +24 */
    char ref_md5sum[16];   /* +40, md5 of the 0x600 bytes below */
    BLOCK blocks[32];         /* +56 */
    int current_hmac_iter; /* +1592 */

Loading these new definitions in IDA is helpful to get a cleaner decompilation:

int __fastcall decrypt_blocks(void *ctx, int unused, char first_arg, void *res_120_xor, int unused_1, void *res_121_data, int unused_2)
  if ( S_mem )
  result = 1;
  v27 = 2;
  block_idx = 1;
  pblock = &ctx_->blocks[1];
    v17 = pblock[-1].hmac_iter_count + ctx__->current_hmac_iter * ((unsigned int)result >> 4);
    ctx__->current_hmac_iter = v17;
    MD5Update(pblock[-1].hmac_key, 8);
    MD5Final(&md5_ctx, hmac_key);
    /* computes the RC5 key */
    HMAC_MD5_ntimes(hmac_result, (int)hmac_key, v18, pblock[-1].hmac_mess, v18, v17, 1);
    memcpy_s(rc5_key, 16u, hmac_result, 16u);
    block_idx_ = block_idx;
    rc5_ctx.rounds = block_idx + ctx__->rc5_rounds;
    KR = 2 * rc5_ctx.rounds + 2;
    rc5_ctx.S = (unsigned __int32 *)alloc_mem(4 * KR | -((unsigned __int64)KR >> 30 != 0));
    if ( rc5_ctx.S )
      RC5Key(rc5_key, 0x10u);
    dwords[0] = block_idx_ + 1;
    rc5_iter = v27 + 1;
    dwords[1] = block_idx_ + 1;
      RC5_decrypt(pblock, pblock, 48, dwords);
    while ( rc5_iter );
    /* checks decryption result */
    if ( pblock->block_idx == block_idx_ )
      MD5Update(pblock, 32);
      MD5Final(&md5_ctx, &v34);
      v21 = pblock->ref_md5sum;
      v22 = 12;
      v23 = &v34;
      /* compare the two digests */
      while ( *(_DWORD *)v23 == *(_DWORD *)v21 )
        v23 += 4;
        v21 += 4;
        finished = v22 < 4;
        v22 -= 4;
        if ( finished )
          goto LABEL_24;
    /* wrong digest */
    if ( rc5_ctx.S )
    result = dwords[0];
    v24 = __ROL4__(v27, 1);
    block_idx = dwords[0];
    v27 = v24;
  while ( dwords[0] < 32u );
  return result;

Starting from location LABEL_11, the program tries to decrypt the 31 remaining blocks, each block being decrypted in function of the content of the previously decrypted block.

This function can be rewritten in Ruby as below (full code: cryptograph.rb):

  # sub_4015D0
  def decrypt_blocks(count = 31)
    @init_message[0] |= 205

    key = @init_key # res 120 data, xored
    message = @init_message.pack('C*') # res 121 data, at offset 16
    rc5_iter = 1

    (0..count).each do |b_idx|
      bl = @blocks[b_idx]

      rc5_key = calc_rc5_key(FlareMD5.digest(key), message, @iter, 1)
      rc5 =, @rc5_rounds + b_idx)

      crypted = bl.crypted
      rc5_iter.times do
        crypted = decrypt_data(rc5, crypted, b_idx + 1, b_idx + 1)

      key, message = bl.key, bl.message
      @iter = bl.iter_count + @iter * (b_idx >> 4)
      rc5_iter = 1.rotl_32(b_idx+1) + 1

  # sub_4014E0
  def calc_rc5_key(key, message, iter_count, val)
    expanded_message = message + [ val ].pack('N')
    mac =, FlareMD5)

    mac << expanded_message
    prev_hmac = mac.digest

    (iter_count - 1).times do |i|
      mac << prev_hmac
      curr_hmac = mac.digest

      qw0, qw1 = prev_hmac.unpack('Q2'), curr_hmac.unpack('Q2')
      prev_hmac = [ qw0, qw1 ] {|x,y| x ^ y}.pack('Q2')
    return prev_hmac

  # sub_402BE0
  def decrypt_data(rc5, crypted, y0, y1)
    raise "needs padding" if crypted.size % 8 != 0
    ct = crypted.unpack('L*')
    iter = ct.size / 2 - 1
    (0..iter).reverse_each do |i|
      pt0, pt1 = rc5.decrypt_words(ct[2*i], ct[2*i+1])
      x0 = i > 0 ? ct[2*(i-1)]   : y0
      x1 = i > 0 ? ct[2*(i-1)+1] : y1
      ct[2*i]   = pt0 ^ x0
      ct[2*i+1] = pt1 ^ x1

As the first block is successfully decrypted (with argv[1] == 205), every next block is also correctly decrypted but the program never returns.

The problem is that the time required to compute the RC5 key increases greatly for each block, the bottleneck being the number of HMAC-MD5 computed (ctx→current_hmac_iter field). For example, for the 10th block, this value is equal to 0x2000000. Computing the key for the 11th block seems to take forever.

As it seems that we have met a dead-end here, the next step is to analyze the function write_secret that was put aside before.

Function write_secret

This function is called just after having decrypted all the blocks, as shown below:


As decrypting the 32 blocks takes forever, this function is never reached (at least for now). For now, a static analysis can be attempted.

The callgraph of the function is shown below:

write secret l2

The beginning of this function is decompiled as follows, in a simplified version without the error checking code:

char __thiscall sub_401CF0(FLARE_CTX *ctx, HANDLE *fhandle, char *res_120_data, int res_120_size)
  res_122_h = FindResourceW(0, (LPCWSTR)122, L"RT_RCDATA");
  res_122_size = SizeofResource(0, res_122_h);
  res_122_data = (char *)alloc_mem(res_122_size);
  read_res_data(res_122_data, res_122_h);

  idx = sub_401B60(ctx, res_120_data, res_120_size);
  RC5_init(&rc5_ctx, idx + 2 * ctx->rc5_rounds, ctx->blocks[idx].hmac_key, 8u);

The content of resource 122 is read into a memory buffer and an index is computed by the function sub_401B60 from the data of resource 120. This index seems to be used to select a key from the decrypted blocks.

Before analyzing this function, you may recall that the first 16 bytes of resource 120 have been xor-ed to obtain the following result: res_120_xor = 73 22 F7 E5 E9 47 61 76 E6 D2 EB 86 FB EB DF 7E.

The function sub_401B60 (renamed calc_index) starts by calculating some values: the eax register is equal to res_120_xor[2] | 0x31000C01 = 0x86ebd2e6 | 0x31000C01 = 0xb7ebdee7. Using a well-known technique, the number of bits set in eax is computed, the result being 24. Using this result, the value of the field ctx→current_hmac_iter is shifted to the right: the function exits prematurely if the result is equal to 0.


From the previous analysis, we know that the field ctx→current_hmac_iter is updated as follow (with result being equal to the current block index):

  v17 = pblock[-1].hmac_iter_count + ctx__->current_hmac_iter * ((unsigned int)result >> 4);
  ctx__->current_hmac_iter = v17;

The field pblock[-1].hmac_iter_count is obtained by decrypting the data of the previous block so it is not predictable. However, by looking at each block decrypted, we can obtain the following results:

  • block 0 : hmac_iter_count = 0x8000

  • block 1 : hmac_iter_count = 0x10000 = 0x8000 << 1

  • block 2 : hmac_iter_count = 0x20000 = 0x8000 << 2

  • etc

so the previous formula can be simplified as such:

current_hmac_iter = (0x8000 << block_index) + current_hmac_iter * (block_index >> 4).

For the function to continue its execution, we need current_hmac_iter >> 24 != 0. The first block index satisfying this condition is 9:

irb(main):001:0> (0x8000 << 9) >> 24
 => 1

Similarly, a new condition is tested on ctx→current_hmac_iter. This time, when shifted to the right 26 times, the result must not be equal to zero.


The first block index satisfying this condition is 11:

irb(main):004:0> (0x8000 << 11) >> 26
 => 1

From these two conditions, we can deduce that the valid block index is either 9 or 10.

Finally, if the two conditions are met, the function raises an exception with the content of the register esi which is equal to res_120_xor[1] | 16 = 0x766147f9. The catch clause of the exception calls a function to count the number bits set in 0x766147f9, which is 18. This value is divided by 2 and the resulting value 9 is returned.


The first 9 blocks must be decrypted in order for the right index to be returned by calc_index. The file CryptoGraph.exe is patched once again, this time to change the limit of the number of blocks decrypted:


Using OllyDbg, the result of the function calc_index can be confirmed:


The value in register eax is indeed equal to 9.

Now that the right index is known, the function continues its execution with the code below (simplified decompilation without error checking):

   idx = calc_index(ctx_, res_120_data_1, res_120_size);
   /* Initialize a RC5 context using the key corresponding to the index */
   RC5_init(&rc5_ctx, idx + 2 * ctx_->rc5_rounds, ctx_->blocks[idx].hmac_key, 8u);
   v26 = 0;
   dwords[0] = idx;
   dwords[1] = idx;
   RC5_decrypt(&res_122_data[16 * idx], &rc5_key, 16, dwords);

   /* Reads resource 124 content to memory */
   res_124_h = FindResourceW(0, (LPCWSTR)124, L"RT_RCDATA");
   res_124_size = SizeofResource(0, res_124_h);
   res_124_data = (void *)alloc_mem(res_124_size);
   read_res_data(res_124_data, res_124_h);

   /* Initialize a RC5 context from the previously decrypted key */
   rc5_ctx.rounds = 15;
   rc5_ctx.S = alloc_mem(0x80u);
   RC5Key(&rc5_key, 0x16);

   /* Decrypt res_124_data in place */
   dwords[0] = res_124_size;
   dwords[1] = res_124_size;
   RC5_decrypt(res_124_data, res_124_data, res_124_size, dwords);

   /* Write plaintext to file */
   WriteFile(hFile, res_124_data, res_124_size, &numberOfBytesWritten, 0);

16 bytes, extracted from the content of resource 122 at offset 16 * idx, are decrypted to obtain a new RC5 key. This key is then used to decrypt the content of resource 124: the resulting plaintext is then written to the file secret.jpg.

By running the patched version of CryptoGraph.exe, that will only decrypt the 10 first blocks, with the correct argument (205), the file secret.jpg can be correctly decrypted:

C:\temp>CryptoGraphPatched.exe 205
The "secret.jpg" has been written to the current execution folder.

The resulting picture is shown below:


TL;DR version

The binary needs to be patched as follow to stop the main loop after having decrypted 10 blocks (0 to 9):

$ diff -u <(xxd CryptoGraph.exe) <(xxd CryptoGraphPatched.exe)
--- /proc/self/fd/11    2015-09-20 18:12:55.439580562 +0200
+++ /proc/self/fd/12    2015-09-20 18:12:55.439580562 +0200
@@ -202,7 +202,7 @@
 0000c90: 73ef eb06 ff05 3080 4100 8b44 2410 85c0  s.....0.A..D$...
 0000ca0: 7409 50e8 b513 0000 83c4 048b 4c24 1c83  t.P.........L$..
 0000cb0: c330 8b44 2420 d1c1 8944 2428 894c 241c  .0.D$ ...D$(.L$.
-0000cc0: 83f8 200f 8287 feff ff5f 5e5b 8b8c 24d0  .. ......_^[..$.
+0000cc0: 83f8 0a0f 8287 feff ff5f 5e5b 8b8c 24d0  ........._^[..$.
 0000cd0: 0000 0033 cce8 7413 0000 8be5 5dc2 1400  ...3..t.....]...
 0000ce0: a130 8041 008b 8c24 dc00 0000 405f 5e5b  .0.A...$....@_^[
 0000cf0: 33cc a330 8041 00e8 5213 0000 8be5 5dc2  3..0.A..R.....].

When the patched binary is started with 205 as argument, it will write the secret.jpg file in the current directory.


As reference, a full reimplementation in Ruby of CryptoGraph.exe is provided (full code: cryptograph.rb).

Running this script against the binary returns the secret image:

$ ruby cryptograph.rb CryptoGraph.exe
[0] iter: 0x00010000, key: 396c1adab5010802, message: b0544892f00fb7d371c98074b9a22229
[1] iter: 0x00020000, key: b643d24dd4ef10c6, message: 30f52ae8ab8cca75fa596a459951d5dc
[2] iter: 0x00040000, key: 772d13a8324adf43, message: 1156266cb601a46fcd1b23008b85de09
[3] iter: 0x00080000, key: 7725f7e27c707591, message: fd140b83b05d2ea07d34bcfe530be7d3
[4] iter: 0x00100000, key: 55886ef1395bb1d7, message: 9a63c6f727567a4e1257af5caeecceb6
[5] iter: 0x00200000, key: 183af674ee81c54e, message: 65cf49c81d638dc218753fdffcb7e558
[6] iter: 0x00400000, key: b8b7e07fa7498f88, message: 81c86cd216d4b12ae132b71b575acc76
[7] iter: 0x00800000, key: cd9bd81201f22b84, message: 5d07aaf3a4a131fde86b16097ef3af0e
[8] iter: 0x01000000, key: 81eaa8163d61df39, message: 44f6d113555c9aa1949d0b10a74797c9
[9] iter: 0x02000000, key: be7572ff6af5002e, message: 5582baa870d7d71c31bfad4659724eda
[*] File written to secret.jpg