console-dev.de

Home of VsTortoise, VisualHAM, N3D and HEL Library

In my previous post, struct/class/union: member alignment, I said the compiler usually generates more and slower program code for packed than for unpacked structs. In this post, I’ll show what code is being generated.

Remember I said the ARM7TDMI CPU does not support accessing mis-aligned addresses? In order to support this, the compiler generates code that works with byte instructions, since those can access addresses with any alignment.

In order to load a 32bit word from a packed struct, the compiler generates code that actually loads four bytes, then shift and bitwise OR them together to create the final 32bit word:

1
2
3
4
5
6
7
8
9
10
11
struct __attribute__((packed)) PackedStruct
{
  unsigned char  myByte;
  unsigned short myShort;
  unsigned int   myInt;
};
 
unsigned int GetMyInt(PackedStruct *p)
{
  return p->myInt;
}

The above C++ code is compiled with devkitARM release 25 using optimization level -O4 and transforms to the following 32bit arm assembler:

1
2
3
4
5
6
7
8
9
10
11
12
; Incoming parameter "p" is stored in register r0
; Return value is also stored in r0 when function returns
 
GetMyInt:
 ldrb  r3, [r0, #3]         ; r3 = ((unsigned char*)r0)[3]
 ldrb  r2, [r0, #4]         ; r2 = ((unsigned char*)r0)[4]
 ldrb  r1, [r0, #5]         ; r1 = ((unsigned char*)r0)[5]
 orr   r3, r3, r2, asl #8   ; r3 = r3 | (r2 << 8)
 ldrb  r0, [r0, #6]         ; r0 = ((unsigned char*)r0)[6]
 orr   r3, r3, r1, asl #16  ; r3 = r3 | (r1 << 16)
 orr   r0, r3, r0, asl #24  ; r0 = r3 | (r0 << 24)
 bx    lr                   ; return

On the other hand, when you use an unpacked struct, you only need to remove the packed attribute at PackedStruct and recompile, the code transforms to a single load instruction:

1
2
3
4
5
; Incoming parameter "p" is stored in register r0
; Return value is also stored in r0 when function returns
GetMyInt:
 ldr  r0, [r0, #4] ; r0 = *(unsigned int*)&((unsigned char*)r0)[4]
 bx   lr ; return

For this reason, I said the compiler generates more and less efficient code for packed structs! If you’re a careful reader, you should have noticed myInt is located at member offset four rather than three. If you don’t know why, read my previous post “struct/class/union: member alignment“.

If you want to look at the generated code in your favourite text editor, you need to add -save-temp to CFLAGS in your makefile:

CFLAGS += --save-temps

This will instruct the devkitPro tool-chain to store temporary intermediate files in your build directory. I found the .ii and .s files very interesting!

Add A Comment

This site uses a Hackadelic PlugIn, Hackadelic SEO Table Of Contents 1.6.0.