Saturday, March 9, 2013

Conditionals In Assembly

if(var >= 1)
{
printf("Hello!");
}
else if(var <= -1)
{
printf("Goodbye!");
}
else
{
printf("Error");
}

Anyone with programming experience(i.e. everyone reading this) knows what this code is and how it works.  How does it work in assembly, though?

(these are not full functions)
ARM:

cmp r0, #0
ldrgt r0, =#ptr_hello
ldrlt r0, =#ptr_goodbye
ldreq r0, =#ptr_error
bl printf
ldmfd sp!, {lr}
bx lr 


THUMB:

cmp r0, #0
bgt greater
blt less
b   zero

greater:
ldr r0, =#ptr_hello
bl printf
b end

less:
ldr r0, =#ptr_goodbye
bl printf
b end

zero:
ldr r0, =#ptr_error
bl printf

end:

pop {lr} 
bx lr


Big difference, huh?

Note how I'm using conditional suffixes on non-branch instructions in ARM making it very compact, whereas in THUMB you would only be using the conditionals to jump around in the code and run different code "banks" as I like to call them. This is a list of applicable conditionals:

eq  Z=1           Zero (EQual to 0)
ne  Z=0           Not zero (Not Equal to 0)
cs  C=1           Carry Set / unsigned Higher or Same  
cc  C=0           Carry Clear / unsigned Lower         
mi  N=1           Negative (MInus)
pl  N=0           Positive or zero (PLus)
vs  V=1           Signed overflow (oVerflow Set)
vc  V=0           No signed overflow (oVerflow Clear)
hi  C=1 & Z=0     Unsigned HIgher                              
ls  C=0 | Z=1     Unsigned Lower or Same                       
ge  N=V           Signed Greater or Equal                      
lt  N != V        Signed Less Than                             
gt  Z=0 & N=V     Signed Greater Than                          
le  Z=1 | N != V  Signed Less or Equal                         
al  -             ALways (default)
nv  -             NeVer 

The conditions that are being checked are based on the condition flags on the CPU.  A quick look at those:

Z:    zero- result was zero
C:    carry- (unsigned) result overflowed the destination register
N:    negative- result was negative
V:    overflow- same as C, but for signed math. 
 
  And here's an example of them in IDA:

 

Let's very briefly check out an example or two:

ex. 1:
ldr  r2, =0xFFFFFFFF
mov  r3, #0x1
adds r0, r2, r3

What does this set? 0xFFFFFFF is the max number for a 32-bit register, but our result is 0x100000000.  So the top bit is lost due to the size restriction and the result is 0x00000000.  So the flags are set as follows:

Z: The result was zero, so this is set
C: We lost some of the info in the final result due to the size, so the carry would be set
N: Zero is non-negative, so this would not be set
V: We're using two's compliment negatives here, so 0xFFFFFFFF is -1.  So for this, the math would be -1 + 1 which is 0.  Nothing overflowed here, so this is not set.

ex. 2:
mov r4, #0

loop:
/* do stuff here */
add r4, r4, #1
cmp r4, #10
blt loop

What is set with this code and compare?  r4 goes from 0-9(while it's true) which sets the flags as such:

Z: Will be set once r4 is 10 and break the loop.  Otherwise unset.
C: The carry is set if register1 is >= register2.  It will be set on r4 = 10.
N: r4 - 10 will always be negative until the loop ends @ r4 = 10.  This is set.
V: No signed overflow.  This is not set. 

Note in the chart:
lt  N != V 

In our loop N is set and V is not, so the "lt" condition works.


Next time we'll go further in-depth on exactly how the flags are set and what exactly is happening in the different compare instructions. Also, a bit on efficiency with compares and the logic behind turning conditions in a higher level language into assembly.


Wednesday, January 30, 2013

My ARM Is Stronger Than My THUMB

In this chapter we'll have a look at ARM code vs THUMB code. For starters, the term ARM comes up a lot.  This is the ARMv5te architecture on which an ARM946E-S CPU runs made by a company called ARM.  Now to add one more level to that, the ARM assembly language is actually split up into 2 modes called ARM and THUMB.  ARM instructions are 32-bit.  They look like "E02D4010", for example.  THUMB instructions are 16-bit.  They look like "B510", for example.  So then, if THUMB is half the size, why not do everything in THUMB?  Obviously it would save space and since it fits more code into the instruction cache(don't worry about what this is for now, just saying) and makes for a smaller ROM, it's ideal.  Well, the down side is that since there's less room for information in a 16-bit instruction, you can't do as much with it.  Thus, in some situations, using ARM instructions for a routine is better. 

So when is THUMB a better choice?  Simple functions should pretty much always be in THUMB.   Get and set functions, functions with little complex math, boolean tests, simple loads and stores, and simple comparisons all work better in THUMB.  In THUMB, only branches can have conditional suffixes which is a big drawback for complex functions.  This is something we'll go into in the next part, but long story short, the instructions look like bne(branch if not equal), bhi(branch if higher), and bls(branch if less or same).  They're actually b-ne, b-hi, and b-ls where the last part is a suffix which causes the instruction to trigger based on specific conditions evaluated by the CPU.  This is actually really simple and we'll get to it next time. 

ARM is a better choice for complex algorithms and math, stuff like sound mixing and compression just to name a couple.  Any instruction can use those conditional suffixes which means that there's much, much less branching around. If there is more than one bit-related operation(shifting, masking, etc.) then ARM is likely the way to go. 

There are some limitations on THUMB, as well.  It can't fetch the program status registers nor does it have explicit access to r8-r12.  There are a few limited instructions that can access those registers, but they are not used much.  Also, due to the lack of registers, there is a greater use of the stack.  As I was saying before, only branches can be conditional.  This is kind of a big deal in functions with lots of conditionals and loops.  Due to the reduced capability of creating immediate values(and also because of the reduced number of registers), THUMB functions will also often have slightly larger literal pools.  This isn't a big deal since THUMB functions are usually already 60-70% the size of the equivalent ARM function.  We'll be dealing with literal pools soon.  All you need to know now is that they're caches of data that sit at the end of each assembly function that the function can load and use.


Next time we deal with conditionals in assembly and how that affects ARM vs THUMB.

Tuesday, January 29, 2013

*Instruction Joke Here* pt. 2

Last time, we discussed what assembly is at its core: a series of mnemonic devices that allow a programmer to direct a CPU at the lowest level.  Now that we're acquainted with it, let's see what they do.  A simple example:

mov    r0, r1

this moves the contents of register1(r1) into r0.  This is the very basic layout of an instruction:

command rd, rs


The command is obviously the instruction type.  As a note, instructions are also sometimes called opcodes.  They're interchangeable The first register is the destination and the second is the source.  This brings up a whole new question, though:  what exactly is a register?  A register is a tiny bit of storage on a CPU that is used as a sort of swap space.  There are 16 general-purpose 32-bit registers available at one time on the ARM7TDMI and ARM946E-S(though there are 31 registers in total) along with a couple status registers, the CPSR and SPSR(current program status register and saved program status register).  The general purpose registers are labelled in 2 ways:

r0   | a1
r1   | a2
r2   | a3
r3   | a4
r4   | v1
r5   | v2
r6   | v3
r7   | v4
r8   | v5
r9   | v6
r10 | sl
r11 | fp
r12 | ip
r13 | sp
r14 | lr
r15 | pc

r0-r15 is how they're labelled in debuggers, so that's our main concern.  The second set are more for identification and we will come back to them at a later point in time.

Now back to the actual use of the instructions.  Say we have some kind of algorithm.  Let's go with something simple to start:

n = 2x + 1

So we want to compute n using assembly.  First thing, r0 is the first register and it is the one that the result is always returned in.  so we will be putting the final number in there.  We'll be taking 1 argument for x and returning the final value.  The way arguments work in assembly is that the first 4 are passed in the first 4 registers, r0-r3 and the rest are sorted to the stack and loaded as needed.  So we will operate under the assumption that r0 is the value we will be acting upon.  

Here's the simple way of doing it:

mov r1, #2
mul  r0, r1
add  r0, #1
bx lr

Here's a more efficient way of doing it:

lsl r0, r0, #1
add r0, #1
bx lr

And finally, here it is with a more complex ARM instruction:

mov r1, #1
add r0, r1, r0, lsl #1
bx lr  

These all accomplish the same thing, but at different speeds.  We don't need to be concerned about that right now since we're just learning the basics, but just as a note it would be(fastest to slowest) 2, 3, 1.

We'll start with taking apart the first one.  It looks like this:

  1. move #2 into r1
  2. multiply the value of r0 by the value of r1 and put the result into r0
  3. add #1 to r0
  4. return  
 The second looks like this:

  1. left-shift r0 by #1(this is equivalent to multiplying by #2)
  2. add #1 to r0
  3. return
The third looks like this:

  1. move #1 into r1
  2. left-shift r0 by #1(multiply by 2) and then add r0 and r1 together
  3. return 

There is a small order of operations sort of concern here, as well.  If you notice in example 3:

add r0, r1, r0, lsl #1

this has to be evaluated the correct way.  It's basically right to left.  r0 is shifted using what's called the "barrel shifter" then it's added to r1 and ends up in r0.  The barrel shifter is a feature on ARM CPUs that allows a register to be shifted as part of another instruction.  This becomes important because it shows up in common uses such as bit shifting.  There's no plain right and left shift instructions in ARM mode so these shifts are taken care of using mov instructions with the barrel shifter.  In THUMB, we would see:

lsl r1, r1 #2 

In ARM:


mov r1, r1, lsl #2

These are equivalent in the different instruction sets.  In the next part we'll get into the difference between ARM and THUMB. 

Sunday, January 27, 2013

*Instruction Joke Here*

For part 2, let's discuss assembly instructions.  Note that I am assuming you have at least some programming knowledge. I will not be explaining high-level code.  This is an example of ASM:

ORR     R1, R1, R0,LSR R3

This is an ARM instruction.(as opposed to THUMB- we'll get to the difference soon)  It's also a fairly complex instruction.  It means this, basically:

r1 |= r0 >> r3  *or*  r1 = r1 | (r0 >> r3)

First of all, what IS an instruction?  It's a mnemonic device to assist humans in communicating directly with a CPU.  You see "ORR     R1, R1, R0,LSR R3", but the CPU sees "0xE1811330" / "b11100001100000010001001100110000".  So if you were going to write the code directly for the CPU to interpret it, you would need to write:

E1811330
E1A00210
E12FFF1E
etc..

instead of:

ORR     R1, R1, R0,LSR R3
MOV    R0, R0,LSL R2
BX        LR

So it's pretty easy to see why having an actual language that can be assembled by an assembler is much simpler and more convenient than simply writing every instruction out as hex or binary. 

I suppose that that begs the question: how do we get from the text assembly instructions to the code for the CPU to run?  We use an assembler.  It creates code from the mnemonic devices.  Since I have(and you should also have) DevkitARM installed, we have a copy of GAS(GNU Assembler) already installed that we could assemble code with.  Our other option is ASMtoARDS if we want an AR code ready to go.  Building something with the assembler is a project we'll get to later.  Right now we just want the very simple what, why, and how all of this works so that we can get to something more engaging. 

In the next post we'll have a look at how actual instructions function.

An Assembly Intro

This is part 1 of probably very many of a sort of course where I lay out basically everything I know about assembly. We'll start from the beginning at what instructions are and what they do and we'll end up very, very deep into some of the most complex stuff you can accomplish.  That said, let's get started.  We have a lot to do!


Tools
These are all the main tools I use when I'm working on stuff.

IDA
There's one main tool to use: IDA 6.1.  IDA is amazing in pretty much every way and you'll use it constantly.  It can work with ARMv4 and ARMv5 no problem, among many, many other architecture types.  It's very easy to use for debugging or plain disassembly, great for organization, can handle most of what you'll need for keeping notes, types, functions, almost anything you need.  Get this asap.

No$GBA
No$GBA debugger is a very valuable tool for working with GBA games.  There's no version of VBA that can connect properly with IDA, so you'll likely be using a mix of No$GBA(a.k.a. "Nocash") with a symbol file to do the debugging and IDA for keeping your notes, labels, symbols, etc.  This is a very good tool even though it hasn't been updated since 2008.  

Desmume
The best DS emulator.  Desmume can be compiled so that it can interact with GDB and thus, IDA.  So for NDS debugging, you would use IDA with a database connected to Desmume.  It's nice that you can use its RAM watch, cheat system, and other tools while linked with IDA.  Being able to very quickly(and temporarily) change a bit of code with an Action Replay cheat is very handy.  The cheat system is also great for adding/testing full ASM hacks that you build elsewhere.  Use the latest version that you possibly can.  As of this writing, the GDB version in use is .9.9 dev+ x86. 

ASMtoARDS
This is a tool to build AR codes from plain assembly.  You can use ARM or THUMB  and it builds with a version of GAS(GNU Assembler) so you can use any macros or abilities that assembler has.  This is a very handy tool.  

Kodinator
This thing has 3 tabs, but the only useful one is the middle tab.  The branch builder is rather handy.  It can make 2 different kinds of branches depending on whether you're using ARM or THUMB.  This is something we'll come back to later.

DevkitARM
DevkitARM is part of a set of toolchains maintained by Wintermute.  They include DevkitARM, DevkitPPC, and DevkitPSP.  The ARM devkit is for building for the GBA and DS(and other ARM architectures- it's not just for making programs for the DS and GBA), the PPC kit is for the GameCube and Wii, and the PSP kit is obviously for the PSP.  Installing DevkitARM is useful for many, many reasons.  You can code things up and copy and paste them right into your ASM hacks, code up tests to see what the ASM looks like for given code, figure out how the systems work to a greater extent, and so forth.  You absolutely should install this.  If you're on Windows, just go get the installer.  It's really easy.

Tinke
This isn't specifically for assembly, but it's an amazing, amazing DS filesystem viewer.  It can open many kinds of files and can browse and replace files in sdats and is just extremely, extremely useful all-around.  

CrystalTile 2
An indispensable program.  It can extract and decompress many file types, can view the full filesystem for a ROM, and most importantly, it's the only tool that can show you the exact memory locations that overlays will be loaded to.  You can do that yourself using a table in a file in the filesystem, but it's so much easier this way.  This is extremely handy and we will be making good use of it later.

Friday, January 25, 2013

Thus It Begins

This is mostly going to be programming and assembly junk that I have no idea where else to put.