console-dev.de

Home of VsTortoise, VisualHAM, N3D and HEL Library

Archive for March, 2009

I just stumbled upon an interesting article about potential fields in RTS games, which was a very nice read.

Potential Fields have some similarities with influence maps. Influence maps are often used to decide whether an area in the game world is controlled by own or enemy units, or if it is an area currently not under control by any forces (no man’s land).

If you have a few minutes to spare, don’t miss to read Using Potential Fields in a Real-time Strategy Game Scenario (Tutorial)

generated code for packed structs

Posted by Peter Schraut under Programming

In my previous post, struct/class/union: member alignment, I said the compiler usually generates more and slower program code for packed than for unpacked structs. In this post, I’ll show what code is being generated.

Remember I said the ARM7TDMI CPU does not support accessing mis-aligned addresses? In order to support this, the compiler generates code that works with byte instructions, since those can access addresses with any alignment.

In order to load a 32bit word from a packed struct, the compiler generates code that actually loads four bytes, then shift and bitwise OR them together to create the final 32bit word:

1
2
3
4
5
6
7
8
9
10
11
struct __attribute__((packed)) PackedStruct
{
  unsigned char  myByte;
  unsigned short myShort;
  unsigned int   myInt;
};
 
unsigned int GetMyInt(PackedStruct *p)
{
  return p->myInt;
}

The above C++ code is compiled with devkitARM release 25 using optimization level -O4 and transforms to the following 32bit arm assembler:

1
2
3
4
5
6
7
8
9
10
11
12
; Incoming parameter "p" is stored in register r0
; Return value is also stored in r0 when function returns
 
GetMyInt:
 ldrb  r3, [r0, #3]         ; r3 = ((unsigned char*)r0)[3]
 ldrb  r2, [r0, #4]         ; r2 = ((unsigned char*)r0)[4]
 ldrb  r1, [r0, #5]         ; r1 = ((unsigned char*)r0)[5]
 orr   r3, r3, r2, asl #8   ; r3 = r3 | (r2 << 8)
 ldrb  r0, [r0, #6]         ; r0 = ((unsigned char*)r0)[6]
 orr   r3, r3, r1, asl #16  ; r3 = r3 | (r1 << 16)
 orr   r0, r3, r0, asl #24  ; r0 = r3 | (r0 << 24)
 bx    lr                   ; return

On the other hand, when you use an unpacked struct, you only need to remove the packed attribute at PackedStruct and recompile, the code transforms to a single load instruction:

1
2
3
4
5
; Incoming parameter "p" is stored in register r0
; Return value is also stored in r0 when function returns
GetMyInt:
 ldr  r0, [r0, #4] ; r0 = *(unsigned int*)&((unsigned char*)r0)[4]
 bx   lr ; return

For this reason, I said the compiler generates more and less efficient code for packed structs! If you’re a careful reader, you should have noticed myInt is located at member offset four rather than three. If you don’t know why, read my previous post “struct/class/union: member alignment“.

If you want to look at the generated code in your favourite text editor, you need to add -save-temp to CFLAGS in your makefile:

CFLAGS += --save-temps

This will instruct the devkitPro tool-chain to store temporary intermediate files in your build directory. I found the .ii and .s files very interesting!

struct/class/union: member alignment

Posted by Peter Schraut under Programming

Earlier this day, while reading my daily forum threads, I came across a rather quirky C/C++ struct, that reminded me on some of my own evil doings when I worked on HEL Library, where I came up with the following struct for the new sprite system introduced in HEL 2 Library:

1
2
3
4
5
6
7
8
struct StupidMemberAlignment
{
  unsigned short Attr[3];
  unsigned short Flags;
  unsigned char Next;
  unsigned char Prev;
  const void* pSrc;
};

After I pushed out the first release candidate of HEL 2, Jasper Vijn was kind enough to sent me an email to let me know I can reduce the size of the struct by 4 bytes, by simply reorder its members.

Now you might think: 4 bytes, hello?! Please keep in mind, HEL Library was built for the Game Boy Advance and 128 of those structs were instantiated all the time. Once I reordered the struct members, I had 512 bytes more memory available. This is more than 3% of the internal work RAM!

Jasper suggested to change the struct to the following:

1
2
3
4
5
6
7
8
struct BetterMemberAlignment
{
  unsigned short Attr[3];
  unsigned char Next;
  unsigned char Prev;
  unsigned long Flags;
  const void* pSrc;
};

Why should the struct get smaller, when I reorder its members and even change Flags from short to long, which is two bytes larger?

It happens, because the compiler inserts padding bytes, when one of the members is not aligned on the members type size. In other words, when your struct contains a type larger than one byte, make sure the member is aligned on its type size, otherwise the compiler will do this for you automatically (except you turned off alignment or specified the packed attribute).

Let’s take a look at the StupidMemberAlignment struct again. This time, I added how you might think at which byte offset each member is located:

1
2
3
4
5
6
7
8
9
10
// PLEASE NOTE THE BYTE OFFETS ARE WRONG!
// I EXPLAIN WHY IN THE TEXT BELOW!
struct StupidMemberAlignment
{
  unsigned short Attr[3]; // byte offset 0
  unsigned short Flags; // byte offset 6
  unsigned char Next; // byte offset 7
  unsigned char Prev; // byte offset 8
  const void* pSrc; // byte offset 9
};

Summing up all members, the struct should be 14 bytes, but sizeof(StupidMemberAlignment) tells me it’s 16 bytes!

The reason why is quite simple: pSrc is 32bit wide, but not aligned on a 32bit boundary! The compiler silently inserts padding bytes, to ensure correct alignment, because the ARM7TDMI target CPU doesn’t support accessing mis-aligned addresses.

The compiler generated instead:

1
2
3
4
5
6
7
8
9
10
11
struct StupidMemberAlignmentPadded
{
  unsigned short Attr[3]; // byte offset 0
  unsigned short Flags; // byte offset 6
  unsigned char Next; // byte offset 7
  unsigned char Prev; // byte offset 8
  unsigned char _padding0; // byte offset 9
  unsigned char _padding1; // byte offset 10
  unsigned char _padding2; // byte offset 11
  const void* pSrc; // byte offset 12
};

I could had specified attributes to pack the struct as well, but the compiler generates usually more and even slower code for parts where objects or members of packed structs are accessed, so this was no option for me on a 16Mhz target device. Reordering the members was the proper solution in my opinion.

Keep that in mind next time you design struct’s that have to be efficient!

Follow up material:

console-dev.de update

Posted by Peter Schraut under Uncategorized

I finally managed to migrate console-dev.de to Web 2.0! I’m quite suprised how comfortable it’s to use a content management software. I used to update console-dev.de with notepad + xhtml so far :-)

I imported the entire console-dev.de news as well as the regular pages in the new system. I really hope I didn’t screw up anything, which is very unlikely though. Please let me know if you find something obscure and I’ll fix it asap.

Comments etc are enabled for everyone, you just need to verify you’re human using the reCAPTCHA system (hint: enable java-script to post).

I noticed in several game programming related forums, most users specify speed in “pixels per frame” in their 2d games. They just add a magic number to the object position every frame and the object moves.

The problem with this approach (beside movement chaos at different frame rates): it is hard to understand as well as almost impossible to find reasonable speed values without trial and error, since this unit system does not work well with our brains. In the following text, I present an approach to solve the ambiguous unit system problem, that works good for me.

Let us take the 2d game again, where the player character moves 1.5 pixels per frame on the x-axis. To understand how fast this is, we need to translate “per frame” to something meaningful, something we use every day in our real life.

Vehicles specify speed in “distance per hour”, we are used to it. When someone tells me his car makes it up to 200mph I go wow, 0.05555 miles per frame (at 60 fps) does not tell me anything.

I find pixels per second (px/s) being a good unit system for 2D games. A gaming device typically runs at 60 frames per second (fps). When we use px/s, 1.5px per frame become 1.5px * 60fps = player moves 90 px/s.

When we could specify speed as “pixels per second” in your 2D game, it makes it easier to find reasonable values and helps, in my opinion, the code to become more meaningful as well as making it possible to be understood by non-programmers, which is very much needed when (character) properties are specified outside the code-editor, by a level designer for example.

Fortunalety, it is very simple to do that! We just need to specify speed in pixels per second and multiply it by the timestep, before we add it to the object position. Timestep describes how many seconds a frame lasts in this case. At 60 frames per second, one frame lasts 1/60 = 0.016667 seconds.

Here comes a pseudo snippet that moves an object 90px/s, when executed at 60fps:

1
2
3
4
5
6
7
8
speed = 90; // 90 pixels per second.
timestep= 1/60; // duration of one game frame in seconds.
 
// computes how many pixels to move in this (time)step
distance = speed * timestep;
 
// moves object by distance
object_x = object_x + distance;

You can, of course, use whatever units you want. Pixels per second is just my personal preference when I move objects around in 2d-space. It is just important that you use meaningful units, if your team is fine with miles per frame, go for it.

Further reading and advanced information on the timestep topic, please head over to: Fix Your Timestep!

I want to let you know about the blog of Shawn Hargreaves, who works at Microsoft on the XNA Game Studio. He blogs lots of interesting material about XNA and game development in general.

I especially like his posts about MotoGP. He often describes problems they came across during development and then explains how they solved them. Do not miss to visit his blog!

This site uses a Hackadelic PlugIn, Hackadelic SEO Table Of Contents 1.6.0.