In my previous post, struct/class/union: member alignment, I said the compiler usually generates more and slower program code for packed than for unpacked structs. In this post, I’ll show what code is being generated.
Remember I said the ARM7TDMI CPU does not support accessing mis-aligned addresses? In order to support this, the compiler generates code that works with byte instructions, since those can access addresses with any alignment.
In order to load a 32bit word from a packed struct, the compiler generates code that actually loads four bytes, then shift and bitwise OR them together to create the final 32bit word:
1 2 3 4 5 6 7 8 9 10 11 | struct __attribute__((packed)) PackedStruct { unsigned char myByte; unsigned short myShort; unsigned int myInt; }; unsigned int GetMyInt(PackedStruct *p) { return p->myInt; } |
The above C++ code is compiled with devkitARM release 25 using optimization level -O4 and transforms to the following 32bit arm assembler:
1 2 3 4 5 6 7 8 9 10 11 12 | ; Incoming parameter "p" is stored in register r0 ; Return value is also stored in r0 when function returns GetMyInt: ldrb r3, [r0, #3] ; r3 = ((unsigned char*)r0)[3] ldrb r2, [r0, #4] ; r2 = ((unsigned char*)r0)[4] ldrb r1, [r0, #5] ; r1 = ((unsigned char*)r0)[5] orr r3, r3, r2, asl #8 ; r3 = r3 | (r2 << 8) ldrb r0, [r0, #6] ; r0 = ((unsigned char*)r0)[6] orr r3, r3, r1, asl #16 ; r3 = r3 | (r1 << 16) orr r0, r3, r0, asl #24 ; r0 = r3 | (r0 << 24) bx lr ; return |
On the other hand, when you use an unpacked struct, you only need to remove the packed attribute at PackedStruct and recompile, the code transforms to a single load instruction:
1 2 3 4 5 | ; Incoming parameter "p" is stored in register r0 ; Return value is also stored in r0 when function returns GetMyInt: ldr r0, [r0, #4] ; r0 = *(unsigned int*)&((unsigned char*)r0)[4] bx lr ; return |
For this reason, I said the compiler generates more and less efficient code for packed structs! If you’re a careful reader, you should have noticed myInt is located at member offset four rather than three. If you don’t know why, read my previous post “struct/class/union: member alignment“.
If you want to look at the generated code in your favourite text editor, you need to add -save-temp to CFLAGS in your makefile:
CFLAGS += --save-temps
This will instruct the devkitPro tool-chain to store temporary intermediate files in your build directory. I found the .ii and .s files very interesting!

Add A Comment