Code Optimization

The general use of the Assembly language for programming is to achieve a more efficient and faster programs than a compiler would. So some optimization rules should help out with that. But be aware that there are way more rules than dealt with here.

NOTE: This page takes some basic rules described in the Intel documentation and does not consider the µops. Thus, these rules here won't be enough if you want to get a maximum optimization.

General Rule

The general rule is to keep as less lines as possible. Each instruction takes at least one cycle, while other instructions such as mul and div might take even up to ca. 130 cycles per instruction. Also avoid using CMP as much as possible, as this branches the code and the CPU cannot go for a parallelization or such. Use the smallest possible floating-point or SIMD data type, to enable more parallelism. Avoid the use of conditional branches inside loops and consider using SSE instructions to eliminate branches. Avoid the use of unneccessary MOV as you can access the registers quicker than the RAM.

Code Alternatives

Here are some ways to optimize code by e.g. using logical operations or simple add and sub. reg is an alias for a register and ? for a number.

CodeAlternativeDescription
mov reg, 0xor reg, regClears the register and sets to 0
mov reg, 0sub reg, regSame as xor reg, reg
movd xmm?, 0PXOR xmm?, xmm?Clears the xmm? register
CMP reg, 0 \ JE j_eq \ JNE j_neTEST reg, reg \ JZ j_eq \ JNZ j_neTEST is better than CMP reg, 0 because the instruction size is smaller and it only changes the flags, not the registers.
AND reg32, 0x80000000TEST reg32, 0x80000000If you want to check if only a single bit turned on, use TEST and use the zero flag
INC regadd reg, 1ADD and SUB overwrites all flags, whereas INC and DEC won't set the carry flag, therefore creating false dependencies on earlier instructions that set the flags.
imul reg, [n] where n = 2^xshl reg, [x]Multiplies a value by a power of two. Division is similar
mov reg, eaxmovd xmm?, eaxIf additional registers are needed or results must be stored somewhere, using the xmm registers might be useful
shl reg, [n]shl reg, [n] \ clcIf carry flag is not needed, change the carry flag via clc, add or sub after the instruction to avoid unnecessary setting of the flags

Blog Comments powered by Disqus.