04 October 2012

A pure assembly equivalent of libc's `getenv` (GNU assembler format)

As the comments explain, this assembly function can be used as a substitute for getenv if libc support is not present, since getenv is not a syscall on Linux. To use it, create a buffer in memory containing the name of the property requested followed by `=` (for example, `HOME=` can be used to get the current user's home directory path). This string does not need to be null terminated, and the length should not include the null terminator even if one is present (the comparison used is more like `strncmp` than `strcmp`).

# getenvp is a pure assembly version of the libc function
# `getenv`. Returns 0 if the variable could not be found, 
# otherwise returns the address of the value in memory. 
# Preserves protected registers.

# Function parameters are passed via registers as follows:
#   eax = buffer with name of variable to find
#   ecx = length of buffer
#   edx = address of envp list in memory

    push ebx 
    push esi
    push edi
        # Check for the last entry in the list.
        mov edi, DWORD PTR [edx]
        cmp edi, 0x00
        je end_getenvp_loop
        # This is a valid string, so we'll start the comparison.
        mov esi, 0
            # Are we done with the comparison?
            cmp esi, ecx
            je getenvp_comp_string_match
            mov bl, BYTE PTR [eax + esi]
            cmp BYTE PTR [edi + esi], bl
            je getenvp_comp_char_match
            # This character isn't a match, so go to the next
            # string.
            add edx, 4
            jmp getenvp_loop
                # Character matches, so increment and loop.
                inc esi
                jmp getenvp_comp_loop
            # A match was found! Return the address of the
            # first character of this environment variable.
            mov eax, edi
            add eax, esi
            jmp getenvp_exit

        mov eax, 0
    pop edi
    pop esi
    pop ebx

27 August 2012

Project 002 Update (27/8/2012)

Okay, I'm the most delinquent blogger ever, and I'm really sorry for that. I've been making so much progress lately that I was too excited/busy to write it all down.

At the request of a friend, I've been studying the Linux ELF loader and its inner workings. I've written my own adaptation for PE executables, based on the ELF, a.out, and flat loaders (located in fs/binfmt_elf.c, fs/binfmt_aout.c, and fs/binfmt_flat.c, respectively). It was pretty challenging at first: I've never written a Linux kernel module before. However, it can load some basic executables now and run them (basic register math, memory access, private function calls, etc.) with a simple ./program.exe at the terminal. The next logical step is enabling API calls.

I want to, at some point, write Linux versions of the basic Windows libraries (kernel32, user32, and so on), much like Wine does. First, though, I need to be able to load a library at all, so I'm going to start small. ELF files, as it turns out, aren't directly loaded by the kernel in most cases. Instead, the ELF interpreter is loaded, which in turn loads the executable's sections and any library dependencies (loading in user mode instead of kernel mode). At the moment, my code loads the executable itself, but that will need to change, I think.

So my next task will be writing a PE interpreter, in pure assembly, relocatable, and not linked to anything. All I'll be able to do, so far as interacting with the system goes, is syscalls (which will allow me to use mmap to map all the executable sections to memory). As far as I can tell (do an strace if you don't believe me), that's exactly what Linux does natively. I'll try to update more frequently as I work on this part.

09 August 2012

Project 002 Update (9/8/2012)

After some research and StackOverflow questions (here and here), I've come to learn of binary formats, and the methods by which Linux allows specifying custom loaders for new formats. I'm going through articles now; it's not much, but the best one I've found is here. I'm also going through the kernel modules associated with process loading and such. It's dense, but kind of cool, since this is exactly what I needed for this project.

24 July 2012

Project 002 Update (24/7/2012)

I've been delinquent in my posting again, and (surprise) everything about the disassembler has changed. The file format is quite a bit more complicated, but I think it's as simple as it can be while still encoding the complexity of the instruction set. There are, at present, four input files: one-byte opcodes, two-byte opcodes, three-byte opcodes, and extended opcodes. Sampling a bit of the one-byte opcodes file:

5 add rAX Iz
6 push aES
7 pop aES
8 or Eb Gb
9 or Ev Gv
10 or Gb Eb
11 or Gv Ev

The first field is the opcode value in decimal (I may change to hexadecimal later; this is left over from a previous attempt). The second field is the mnemonic. The rest of the line contains operands, in the same format as the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 2. The first letter (captial) gives the addressing method and the following (lowercase) letters give the operand type. When an r or e precede a register code, they will be replaced by an R or E, or removed completely, when the instruction is interpreted, based on the current operand size. When an a precedes a register code, the following code will be used as-is with no modification. Flags, if any, are preceded by $ and may appear anywhere on the line. Comments are preceded by # and are ignored to the end of the line.

The extended opcode file uses a slightly different format, sampled below.

*247 &mem &11b
0 test Ev Iz
2 not Ev
3 neg Ev
4 mul rAX Ev
5 imul rAX Ev
6 div rAX Ev
7 idiv rAX Ev

The line that begins with * denotes the beginning of a new opcode, the decimal integer that follows. On that line, flags that begin with & may appear (the ones above indicate that the mod field of the ModR/M byte may be either 11b (so the r/m field encodes a general register), or it may take one of 00b01b, or 10b (so the r/m field encodes a memory address). For some instructions, different values of the mod field will change the interpretation of the opcode.

There's still a lot of work to be done, but to give you an idea of how far it's come, consider the following code (the code section of a program to create a simple window):

    ; Load common controls library
    invoke InitCommonControlsEx, icc

    ; Register the main window class
    mov [wc.cbSize], sizeof.WNDCLASSEX
    mov [wc.lpszClassName], pszClassName
    mov [wc.lpfnWndProc], WndProc
    mov [wc.hbrBackground], COLOR_WINDOW
    mov [wc.hIcon], 0
    mov [wc.hIconSm], 0

    invoke GetModuleHandle, 0
    mov [wc.hInstance], eax

    invoke LoadCursor, 0, IDC_ARROW
    mov [wc.hCursor], eax

    invoke RegisterClassEx, wc

    ; Create the main window
    invoke CreateWindowEx, WS_EX_CLIENTEDGE, pszClassName,    pszWindowName, WS_VISIBLE or WS_MINIMIZEBOX or WS_SYSMENU, CW_USEDEFAULT, CW_USEDEFAULT, 600, 500, 0, 0, [wc.hInstance], 0
    mov [hwnd], eax

    ; Update the main window
    invoke ShowWindow, [hwnd], SW_SHOW
    invoke UpdateWindow, [hwnd]

        invoke GetMessage, msg, 0, 0, 0

        cmp eax, 0
        jle endloop

        invoke TranslateMessage, msg
        invoke DispatchMessage, msg

        jmp msgloop

        invoke ExitProcess, [msg.wParam]

proc WndProc, hwnd:DWORD, msg:DWORD, wParam:DWORD, lParam:DWORD
    push ebx esi edi

    cmp [msg], WM_CLOSE
    je wmCLOSE
    cmp [msg], WM_DESTROY
    je wmDESTROY

        invoke DefWindowProc, [hwnd], [msg], [wParam], [lParam]
        jmp wmFINISH

        invoke DestroyWindow, [hwnd]

        xor eax, eax
        jmp wmFINISH

        invoke PostQuitMessage, 0
        xor eax, eax

        pop edi esi ebx

After compiling and extracting the code section of the executable, the disassembler outputs:

push 0x0040107c
calln 0x00404080
mov 0x0040104c, 0x00000030
mov 0x00401074, 0x00401020
mov 0x00401054, 0x00402104
mov 0x0040106c, 0x00000005
mov 0x00401064, 0x00000000
mov 0x00401078, 0x00000000
push 0x00
calln 0x004040b0
mov 0x00401060, EAX
push 0x00007f00
push 0x00
calln 0x00404120
mov 0x00401068, EAX
push 0x0040104c
calln 0x00404128
push 0x00
push 0x00401060
push 0x00
push 0x00
push 0x000001f4
push 0x00000258
push 0x80000000
push 0x80000000
push 0x100a0000
push 0x00401036
push 0x00401020
push 0x00000200
calln 0x0040410c
mov 0x00401000, EAX
push 0x05
push 0x00401000
calln 0x0040412c
push 0x00401000
calln 0x00404134
push 0x00
push 0x00
push 0x00
push 0x00401004
calln 0x0040411c
cmp EAX, 0x00
jle 0x18
push 0x00401004
calln 0x00404130
push 0x00401004
calln 0x00404118
jmp 0xd2
push 0x0040100c
calln 0x004040ac
push EBP
mov EBP, ESP
push EBX
push ESI
push EDI
cmp [EBP]+0x0c, 0x10
jz 0x1a
cmp [EBP]+0x0c, 0x02
jz 0x21
push [EBP]+0x14
push [EBP]+0x10
push [EBP]+0x0c
push [EBP]+0x08
calln 0x00404110
jmp 0x17
push [EBP]+0x08
calln 0x00404114
xor EAX, EAX
jmp 0x0a
push 0x00
calln 0x00404124
xor EAX, EAX
pop EDI
pop ESI
pop EBX
retn 0x0010

You must admit, that's fairly impressive. There's work to be done, but at least this project is going somewhere.

11 July 2012

COM in Win32 x86 Assembly

As much as I truly hate Microsoft COM, I must admit that some of the coolest Win32 features can only be implemented through it, so it's worth mentioning the basics of how to use it in Win32 ASM.

First of all, we have to initialize COM; that's pretty easy, since all you have to do is

invoke CoInitialize, 0

The MSDN documentation for CoInitialize says that new applications should use CoInitializeEx, so you might want to look into that, but, personally, I've never found it necessary and will probably use the older version until it's no longer supported. Next, we create an instance of an object through its CLSID and IID, which are just two long strings of hex numbers; the CLSID is an identifier for the object we want, and the IID is an identifier for the interface. For this example, I'll create a UIRibbonFramework object, which implements the IUIFramework interface, so I'll declare

clsidUIRibbonFramework  GUID 926749FA-2615-4987-8845-C33E65F2B957
iidIUIFramework         GUID F4F0385D-6872-43A8-AD09-4C339CB3F5C5

in the data section of my program. GUID is actually a macro, not like dd or db, which I found in a FASM example program. It is defined as follows.

struc GUID def
    match d1-d2-d3-d4-d5, def
        .Data1 dd 0x\#d1
        .Data2 dw 0x\#d2
        .Data3 dw 0x\#d3
        .Data4 db 0x\#d4 shr 8,0x\#d4 and 0FFh
        .Data5 db 0x\#d5 shr 40,0x\#d5 shr 32 and 0FFh,0x\#d5 shr 24 and 0FFh,0x\#d5 shr 16 and 0FFh,0x\#d5 shr 8 and 0FFh,0x\#d5 and 0FFh

With these, we can call CoCreateInstance:

invoke CoCreateInstance, clsidUIRibbonFramework, 0x00, 0x01, iidIUIFramework, fmwk

The second parameter is 0x00, which means that this object is not part of an aggregate, and the third parameter is 0x01, for CLSCTX_INPROC_SERVER. For more information, see the MSDN documentation for CLSCTX; these parameters are the ones used in most applications I've seen. fmwk was declared in the data section as a simple doubleword, so we pass its address to CoCreateInstance so that it receives the address of the object we want.

If all goes as we wished, we now have a pointer to a UIRibbonFramework object. How do we call functions with it, though? First, we need to understand a bit more about COM's structure. This CodeProject article explains it excellently; we'll take a similar approach with our ASM implementation. I define the virtual table for this object as

struct IUIFrameworkVtbl
    QueryInterface              dd 0
    AddRef                      dd 0
    Release                     dd 0
    Initialize                  dd 0
    Destroy                     dd 0
    LoadUI                      dd 0
    GetView                     dd 0
    GetUICommandProperty        dd 0
    SetUICommandProperty        dd 0
    InvalidateUICommand         dd 0
    FlushPendingInvalidations   dd 0
    SetModes                    dd 0

The order of the functions is very important! To make sure you get the right order, you should consult the C-only section of the header file for the object you want. With that definition, all you need is a pointer to the virtual table for the object you're using, which you can obtain by

mov ebx, [fmwk]
mov ebx, [ebx]

so that the address of the table is stored in ebx, which much more convenient than using eax or another non-protected register, since you'll end up calling the code above a lot if you do it that way. Then, we simply call a function as follows.

push [app]
push [hwnd]
push [fmwk]
call [ebx + IUIFrameworkVtbl.Initialize]

app and hwnd are just parameters for this function. The interesting bit is the last two lines. The first parameter to any COM call is always the calling object itself, hence we push [fmwk]; [ebx + IUIFrameworkVtbl.Initialize] gives us a pointer to the Initialize method of the object so we can call it. I haven't yet found a way to use the invoke macro instead of pushing all the arguments separately, unfortunately. Finally, when you're done with COM, do

invoke CoUninitialize

and you're done!

13 June 2012

Project 002 Update (13/6/2012)

I've finally settled on a file format I like. Each line of the file lists the mnemonic and then the encoding, such as

add 100000:s:w 11000:reg imm8

Bytes are separated by spaces and fields are separated by colons; comments are denoted by the hash sign and are ignored to the end of the line. The program reads in lines in this format and stores the information in structures that I have defined by

typedef struct instruction32bytefield
    int numBits;
    int type;
    char data[8];
} inst32bytefield;

typedef struct instruction32byte
    int numFields;
    struct instruction32bytefield field[MAX_FIELDS];
} inst32byte;

typedef struct instruction32def
    int numBytes;
    char mnemonic[MAX_MNEMONIC];
    struct instruction32byte byte[MAX_BYTES];
} inst32def;

All of this is subject to change, but it seems to be working for now.

01 June 2012

Project 001 Update (1/6/2012)

Well, Project 001 ended with a bit of a bang... I was innocently watching a YouTube video on my secondary laptop when, suddenly, the screen went black. Not "off" black, but "on" black; the screen obviously still had power, but nothing was displayed. Assuming that the Linux X server had crashed, I tried to make the computer sleep and reawaken, which had no effect. Finally, I resorted to a hard restart. Upon powering on, however, the BIOS did not even load. The HP splash screen was not displayed, and, after a few seconds, it powered itself off, then on, then off... I still have no explanation for this behavior, but the damage seems to be permanent and irreparable, so I'm declaring Project 001 closed yet, sadly, unsuccessful.

21 May 2012

Project 002 Update (21/5/2012)

I'm still struggling with the best file format to define an entire instruction set; it's a lot more difficult than I originally anticipated. As soon as I have one that encodes all the necessary information, I'll post it here.

13 May 2012

Project 002 Update (13/5/2012)

I've been coding when I should be doing homework/sleeping... But never mind that. I've actually been doing more reading than coding lately; those Intel IA-32 manuals are long! Of course, most of it doesn't apply to me, but still, a lot of it will be important. I've identified some challenges that need working:
  1. What happens if I want to execute 32-bit code in my 64-bit environment? I need a way to tell the processor to switch modes... and then switch back again when I'm done. I'm working on a way to do that; there are some Linux system calls that look promising.
  2. How will the external process know when to return to me, the PE-loader-emulator-thing? A lot of people don't tack a ret onto the end of their main procedure; I can say this because I don't, either. You're not really returning to anything, and the code runs just fine without it. Until you're emulating a PE loader, that is, because you are returning to something, and it's important that the processor not blindly execute code out into oblivion. To this end, I need to identify the end of the main procedure and add a return command to the end.
  3. How much of the instruction set does the loader need to "understand"? I need to identify addresses, jumps, calls, etc., so some processing and decompiling will be necessary; the Intel instruction set is quite long, however, and coding it will be tedious. I'm still not sure what the best way to encode it will be, but I'm working on that.

11 May 2012

Project 002 Update (11/5/2012)

A first success! I'm still working on the PE loader/disassembler, but I have successfully executed code dynamically copied into a memory block. Here's the code:

#include <stdlib.h>
#include <stdint.h>
#include <memory.h>
#include <sys/mman.h>

/* Executes:
 * mov eax, 4
 * add eax, 5
 * ret */
char math1[] = {0xb8, 0x04, 0x00, 0x00, 0x00, 0x83, 0xc0, 0x05, 0xc3};
char *buf = NULL;

int main()
    /* Allocate a block of memory. */
    buf = (char *) valloc(sizeof(math1));
    if (buf == NULL)
        return -1;
    /* Coppy the code buffer into the memory block. */
    memcpy(buf, math1, sizeof(math1));
    /* Allow execution of the memory block. */
    mprotect(buf, sizeof(math1), PROT_READ | PROT_EXEC);
    /* Call the "function" in memory. */
    int res = ((int (*)()) buf)();
    return res;

This function returns the value 9, since the eax register contains the value 9 after the execution of the assembly code. This is pretty cool, to be honest, and is definitely needed for this project. Obviously, I'll need to test more complex code, but this is a very encouraging start.

I've used the following pages for inspiration/research: