console-dev.de

Home of VsTortoise, VisualHAM, N3D and HEL Library

Archive for the ‘Programming’ Category

In my previous post sine approximation with fixed point math I wrote the output precision of f32sin is getting worse when the incoming radians parameter grows and guessed it is due to using fixed point.

Well, Jasper “cearn” Vijn suggested to use a higher resolution fixed point format for 1.0f/(2*pi) and this solves the problem indeed! But it does not end here!

He also created an awesome article about the ins and outs of fixed point sine approximation and presents a couple of highly precise and optimized routines.

Head over to Jasper Vijn’s document Another fast fixed-point sine approximation now!

I’ll post a corrected version of the f32sin routine from my previous post for completeness, but I highly encourage to vists Jasper’s site and get one of his!

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
// Gets sin of specified radians.
// Input and output format is fixed point sign.19.12
int32 f32sin(int32 radians)
{
  const int32 PI2_PRECISION_BITS = 20;
  const int32 PI2_RECIPROCAL     = (1.0f / (2*M_PI)) * (1<<PI2_PRECISION_BITS);
 
  // Divide incoming radians by 2*PI, so it is a "normalized"
  // angle between -1 and +1, where 1 equals 2*PI (360 degree).
  int32 value = (radians * PI2_RECIPROCAL) >> PI2_PRECISION_BITS;
 
  // Now that we have our normalized angle,
  // we just discard all numbers which are not in the range -1..+1.
  // Since this is fixed point, a simple AND operation can be used.
  value &= floattof32(1)-1;
 
  // Always wrap normalized angle to -0.5(-PI)..+0.5(+PI)
  if(value > floattof32(0.5f))
    value -= floattof32(1); // subtract 2*PI in normalized form
  else
    if(value < floattof32(-0.5f))
      value += floattof32(1); // add 2*PI in normalized form
 
  // Convert normalized angle (-1..1) back to radians (-2*PI..2*PI)
  value = f32mul(value, floattof32(2*M_PI));
 
  // Approximate sin value
  const int32 B = floattof32(1.2732395447351f);
  const int32 C = floattof32(-0.405284734569f);
  return f32mul(B, value) + f32mul(f32mul(C, value), f32abs(value));
}

When I was writing the Nintendo DS 4k intro, I obviously needed sin and cos functions to set up rotation matrices. Since using the standard C library math module would have blown the 4kb limit to mars, I was looking for an alternative that does not use much memory.

I was searching for an approximation method using taylor series and came across the following sites:

The only task that remained was porting it from float to fixed point, which is rather trivial, yay!

Most amazing for me is how precise the approximation actually is. Here is a screenshot of the original sinf routine (green) and the approximated one (red) using the source code below:

Sine approximation

Sine approximation


Well, when the incoming radians parameter is getting bigger (1000 and above), precision becomes is not so good anymore, but this is rather a fixed point problem and the way I implemented it, I guess.

The routine from polygon labs also can’t really handle radians greater than 2*PI, which I tried to fix by normalizing it to an angle between -1..+1 and then just use a bit-AND to discard all numbers outside this range.

I wanted to keep the radians format for the incoming parameter, because it’s how the standard sinf routine works. Using another range, like 0..4096 to represent a full rotation would probably solve that precision problem with large radians because this would remove the need for the first two code lines, but I wanted 2*PI. ;-)

Here comes the source code, to be compiled with libnds and devkitARM:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
// Gets sin of specified radians.
// Input and output format is fixed point sign.19.12
int32 f32sin(int32 radians)
{
  // Offset to slightly restore precision when radians is large.
  const int32 offset= f32mul(radians, floattof32(0.00147f));
 
  // Divide incoming radians by 2*PI, so it is a "normalized"
  // angle between -1 and +1, where 1 equals 2*PI (360 degree).
  int32 value = f32mul(radians + offset, floattof32(1.0f / (2*M_PI)));
 
  // Now that we have our normalized angle,
  // we just discard all numbers which are not in the range -1..+1.
  // Since this is fixed point, a simple AND operation can be used.
  value &= floattof32(1)-1;
 
  // Always wrap normalized angle to -0.5(-PI)..+0.5(+PI)
  if(value > floattof32(0.5f))
    value -= floattof32(1); // subtract 2*PI in normalized form
  else
  if(value < floattof32(-0.5f))
    value += floattof32(1); // add 2*PI in normalized form
 
  // Convert normalized angle (-1..1) back to radians (-2*PI..2*PI)
  value = f32mul(value, floattof32(2*M_PI));
 
  // Approximate sin value
  const int32 B = floattof32(1.2732395447351f);
  const int32 C = floattof32(-0.405284734569f);
  return f32mul(B, value) + f32mul(f32mul(C, value), f32abs(value));
}
 
inline int32 f32abs(int32 a)
{
  return a >= 0 ? a : -a;
}
 
// multiplies two fixed point numbers.
// since the result is shifted rather than the two
// operands, make sure a and b are small.
inline int32 f32mul(int32 a, int32 b)
{
  return (a*b)>>12;
}

Let me know when you fixed the precision problem with large radian values :-)

Single language vs. multi-language

Posted by Peter Schraut under GBA, NDS, Programming

Introduction

During my daily forum visits, I came across a thread about how to create a game in more than one language.

This reminded me on the first multi-language game I was working on. We ran into several issues back then and I want to share my experience with you, so you don’t make the same mistakes.

When you create a multi-language game, several things change that would be static otherwise. Here is a list a few things to consider:

  • You can’t hard-code every text, because you’re usually not the person who translates text.
  • You can’t assume texts have same lengths in different languages.
  • Prepare your GUI to word-/line-break and squeze text.
  • Different languages might imply different character ranges (char, wchar, utf-8, choose which one to support before you start)
  • Different languages might imply different font definitions
  • Different languages might imply different graphics: title logo, textures with baked-in text, etc
  • Different languages might imply different sounds: voice output
  • Being able to change language in the options menu, in case it is required.
  • Being able to change language at runtime during development.
  • Testing time increases with more languages
  • You might need to communicate with people who don’t speak your language (bug reports etc)
  • Several persons involved in development have more work to do: programmer, translator, quality assurance, (artist, sound artist maybe)

Localization from a programmers perspective

If text is available in more than one language, it must not be hard-coded in the game code.

Functionality to retrieve texts, depending on the current language setting is essential. It can be something simple like a function that expects an identifier that represents the text in question and returns its text or a database query.

We decided to edit texts in Microsoft Excel, because it’s standard in many offices and we don’t wanted to force translators to use some unknown custom in-house tool that he/she might not understand and we would have to support.

The spreadsheet structure looked like:

1
2
3
4
5
ID               EN                  DE            DESC
---------------+-------------------+-------------+---------------
helloworld     | HELLO WORLD       | HALLO WELT  | Displayed in the welcome screen
goodday        | GOOD DAY          | GUTEN TAG   | Displayed when the player confirms the welcome screen
...

ID represents the text identifier that is used to query the translated text.

EN and DE are the english and german translations.

DESC is an optional field that describes the purpose of this text. It can be quite challenging to find a good translation when you don’t know the context of the text, that’s what DESC should solve.

Localization, 1st iteration

Our first language system iteration worked like this:

  • Create texts in Microsoft Excel and save as XML
  • Custom tool to convert XML to our text format
  • Custom tool outputs .cpp file with texts as array
  • Custom tool outputs .h file with identifier #define’s that index into the text arrays

In order to get the translated text, we had a function that expects one of the generated identifiers and returned the corresponding text. The text system looked like this:

language.h file:

#ifndef __language_h__
#define __language_h__
 
#define TID_helloworld  0
#define TID_goodday     1
#define TID__MAX        2
 
#define LANG_EN  0
#define LANG_DE  1
 
const char* GetText(unsigned int textId);
#endif // __language_h__

language.cpp file:

#include "language.h"
 
// initial language is english
int CurrentLanguage=LANG_EN;
 
// german texts
const char* const TEXT_DE[]=
{
  "HALLO WELT",
  "GUTEN TAG"
};
 
// english texts
const char* const TEXT_EN[]=
{
  "HELLO WORLD",
  "GOOD DAY"
};
 
// get text of the specified text identifer
const char* GetText(unsigned int textId)
{
  assert(textId < TID__MAX); // invalid text id
 
  switch(CurrentLanguage)
  {
    case LANG_DE:
      return TEXT_DE[textId];
 
    default:
      break;
  }
 
  return TEXT_EN[textId];
}

Once we worked with it for a while, we realised updating the language file is a time killer.

Everytime we modified text and exported, the header file changed, thus all source files that include language.h were recompiled due to header dependencies. This was a huge problem for us, nobody wanted to wait several minutes for a recompile, only because someone fixed a typo in the translation or added a new text.

Localization, 2nd iteration

What we learned from the 1st approach is no matter if we change, add or even remove text, it must not have a significant influence on compile times.

Rather than using generated identifiers, we used crc32’s of string literals to identify texts. This completely removed the header dependency / recompile problem! Our text system now worked like:

  • Create texts in Microsoft Excel and save as XML
  • Custom tool to convert XML to our text format, verify that all text identifiers generate unique checksums
  • Custom tool outputs .cpp file with texts as array and checksum lookup table

In order to get a tranlated text, GetText() now expects a string literal as id, generates a crc32 of the incoming id, performs a binary search on the checksum table and then uses the lookup position to index into the language array.

This even allowed us to return the text id when the translated text was not found, so the tester could add the text id of the translated text that is missing to the bug report. But it also allowed us to switch text at runtime to display the id rather than text (”uh what text id is display here” belongs to the past).

The new text system was a bit slower than the 1st iteration, but it had no significant influence on the overall runtime performance.

We worried more about being able to have typos in text id’s, as those are strings and located in game code now, but this was never really a problem I think.

We added text id’s to the Excel file first, then always copy/paste text id’s from Excel to game code. However, we can’t be 100% certain that all possible missing texts were hunted down by the QA team. We still needed a verification system that does this automatically and 100% reliable, but more on this later.

Due to the additional checksum lookup table and the string literal id’s in the code itself, the game also requires more memory.

Most development systems feature some kind of debug memory (eg no$gba has an option to emulate 8MB rather than 4MB main memory), this is at least during development not a problem. More on this later, again.

language.h looked like this:

#pragma once
 
enum Languages
{
  LANG_EN,
  LANG_DE,
};
 
const char* GetText(const char* textId);

We changed from #defines to enum’s too, because those are much more debug friendly.

language.cpp looked like this:

#include "language.h"
 
// initial language is english
Languages CurrentLanguage=LANG_EN;
 
// crc32 checksums / hashes of text id's
// in sorted ascending order
const unsigned int TEXT_IDs[]=
{
  12345, // text id: helloworld
  23456, // text id: goodday
};
 
// german texts
const char* const TEXT_DE[]=
{
  "HALLO WELT",
  "GUTEN TAG"
};
 
// english texts
const char* const TEXT_EN[]=
{
  "HELLO WORLD",
  "GOOD DAY"
};
 
// performs a binary search on the the TEXT_IDs array
// and returns the index where the hash is located, or
// -1 when it could not be found.
int FindTextIndex(unsigned int hash)
{
  int left  = 0;
  int right = (sizeof(TEXT_IDs) / sizeof(TEXT_IDs[0])) - 1;
 
  while (left <= right)
  {
    int index = (left + right) / 2;
    if (hash == TEXT_IDd[index])
      return index; // hash found, leave!
 
    if (hash > TEXT_IDd[index])
      left = index + 1;
    else
      right = index - 1;
  }
 
  return -1; // hash not found
}
 
// get text of the specified text identifer,
// returns the textId if text could not be found
const char* GetText(const char* textId)
{
  // generate checksum / hash of incoming text id
  unsigned int hash = CalcCRC32(textId);
 
  // search for the checksum / hash in our TEXT_IDs array
  int index= FindTextIndex(hash);
  if(index == -1)
  {
    // text not found, return the id instead!
    return textId;
  }
 
  switch(CurrentLanguage)
  {
    case LANG_DE:
      return TEXT_DE[index];
 
    default:
      break;
  }
 
  return TEXT_EN[index];
}

Localization, 3rd iteration

The 2nd iteration is not bad, but as the project came along, new requirements did pop up.

We not only needed to display texts in different languages, different title logos should be displayed. We just hacked to load different resources, depending on the language setting, in game code.

But we should have known before, this ain’t fulfils the artists vision. So before we clump the whole game code, we decided it should be handled automatically without any action from a programmer and this was easier than we thought.

We already used some sort of file archive, where all game content is stored, to load files from. Think of it as a zip archive. In order to load language dependend resources, all we had to do is to support more than one file archive and priorize it. When the game requests a file, file archives are searched by priority.

We added an additional file archive with german content and high-priorized it. When the game requests “title.bmp”, the german archive was searched first. If the resource could not be found, the next archive was consulted. This allowed to add language dependend resources without any programmer work!

Localization, 4th iteration

Having all languages in main memory is quite a waste, at least on systems that don’t feature hundrets of mega-bytes. In my experience, non-text-heavy games contain about 700 texts, where edutainment games can contain thousands.

If every text would be 50 chars and 700 texts are available, it’s 50*700 = 35000 chars, which in ASCII is about 34kb for one language! 5 languages sum up to 34kb*5 = 170kb.

This is more than 4% of the Nintendo DS main memory, only for text! Not really an option to spend that much precious memory for text if you could use those wasted 135kb and spend them on a larger level, more sounds or more textures instead.

On memory limited systems it makes sense to have one language in memory only, namely the current language. However, this comes with several things to consider:

  • Memory consumption is different for different languages.
  • GetText() returns different pointers for different languages.

Different memory consumption is a huge problem. It’s irresponsible if some levels don’t load when german language is activated, because german texts consume 2kb more memory than the english ones. It makes it also impossible to replace entire language files on-the-fly, change language setting in options for example, without any delete/new mechanism involved.

Furthermore it’s also quirky when GetText() returns a different pointer for every language, because GUI widgets can no longer store pointers to texts, because they would point to whatever memory if the language setting changes.

The secret is always try to keep memory consumption as static as possible! Our custom text tool compared texts of every language and padded shorter texts of different languages with zero-bytes to consume as much space as the longest text, for example:

You won.000000000
Du hast gewonnen.
 
Hello World
Hallo Welt0

Where “0″ represents the padding 0×00 byte.

This makes sure that:

  • every language file has the exact same size.
  • offsets to texts inside the language file are always the same.

This approach makes it possible to allocate memory once for the language file and then being able to work with that buffer, because the size never changes for different language files of the same category. You can load other language files to this buffer and the text system still works.

When text is located in a language file, we also no longer have the const char* overhead from our text arrays, just make sure to null-terminate every text!

Localization, 5th iteration

The additional memory footprint introduced with the 2nd iteration bothered us and we wanted it better spending on textures than text and this is very simple again.

We supported a hybrid system of the 1st and 2nd iteration. The 2nd iteration was perfect for development purposes, as it does not require much recompile, but comes with higher memory footprint.

The 1st iteration on the other hand is horrible during development, because of the recompile times, but does not require any additional data (crc table) and is lighting-fast.

Instead of using string literals for the text identifier directly, we wrapped them in a TXT macro. The debug build stringified the incoming parameter, where as the release build concatenated it to create an identifier:

#if _DEBUG
  #define TXT(id) #id
#else
  #define TXT(id) TID_##id
#endif

It was used like this:

const char* text = GetText(TXT(helloworld));

The debug build replaced it with:

const char* text = GetText("helloworld");

and the release build with:

const char* text = GetText(TID_helloworld);

where TID_helloworld is the generated #define identifier of our custom text tool, as shown in 1st iteration.

We used the 2nd approach for debug builds and the 1st approach for release builds. When you switch between debug and release builds, you need to do a recompile anyway, so using the 1st does not hurt.

And at this point I can also resolve the “more on it later..” note from the 2nd iteration paragraph.

Because we used the 1st approach in release builds, we could catch all invalid text identifiers at compile time and had no memory overhead anymore, yay! Supporting both systems is also not really problematic in my opinion, since they’re pretty similar and not complicated anyway.

Conclusion

Creating a multi-language game comes with a couple of new tasks, don’t underestimate it! :-)

#include “data.c” revised

Posted by Peter Schraut under NDS, Programming

Introduction

In the palib homebrew community (Nintendo DS homebrew development) is #include “data.c” a daily occurence. Several people point out not to follow this approach, because it’s “bad practice”.

In this article, I’m not so much trying to convince you about the evilness of including data, but I’m telling you when this approach is appropriate, when it isn’t and why. The article assumes you use devkitARM to build your .nds files.

History

I guess the palib author adapted this approach from the early HAM days. HAM was around 2001-2005 a very popular free software development solution for the Nintendo Game Boy Advance, including almost everything you would need to create games for this lovely device.

The example projects in HAM used to integrate resources (graphics, sounds, etc…) by converting resources to C source code arrays. The generated files were #included in the example source file (main.c).

The intention behind this approach makes sense for the Game Boy Advance. GBA games are shipped on Game Pak’s that consist of Read-Only-Memory (ROM) containing the program code (.text) and data (.rodata). When you declare const data on the heap, the memory is allocated in ROM. In order to load graphics to video ram, you could tranfer image data directly from ROM to Vram, without the need to have it in main memory.

What’s the deal with that?

The Nintendo DS on the other hand does not feature this ROM section in the traditional way anymore, everything (code, data) is located in the 4MB main memory region. When you #include data in your program code, memory for it is always allocated and present in main memory, even when you don’t need it!

Let’s take a simple example to clearify: Imagine you create a platform game with several levels. Each level features a different set of graphics (sunny, snowy, rainy). If you would #include graphics in program code, memory for every single graphic would be allocated in main memory for every level in each level. So you would have graphics of level 1 (sunny) in main memory, even when you play level 2 (snowy). It might work for the first 1 1/2 graphic-sets, but then exceeds the main memory capacity of 4MB eventually.

As you already know, program code is located in main memory too, so you can’t even calculate with 4MB for resources. As far as my experience goes, real-world-projects use about 400-600 KB for program code, leaving you with only about 3.5MB for internal game management and resources.

#include or not #include, that’s the question

When you aim for a game with a decent quality, you can’t #include data as this would blow up memory, you have to load resources from the file system. To be more precise, you load only resources to memory, that you actually need in the particular level (file i/o handling in nds homebrew).

However, sometimes it’s necessary to have access to resources all the time, for example to display on-screen debug text. For such “low-level-system”, you want to have the resource in memory, because what happens when loading of the debug-font resource would fail?! You couldn’t even display an error at this time, because the font is missing! For this special case, I suggest you either convert the resource in question to an object file (.o) and link this one, or convert to assembler code and assemble as well as link, but don’t #include.

If you generate C code, rather than assembler, the C compiler must parse the code, then generate the corresponding assembler code, that is being transformed to object code afterwards. When you don’t like waiting for your compiler to complete, choose .o or .s as target!

Why did I just say “don’t #include”?

When you create a seperate source file that contains data and store it in your “source” directory, it’s being compiled only when the target object file (.o) is either missing or the source file is newer. As for the debug font resource, this would be once or at least very rarely. After the source file has been compiled and transformed to an object file, it only needs to get passed to the linker when you build the project. It removes the need to compile the file every time again.

When you #include data in a file where you currently work at, the compiler must parse the #included data every time you change something on the code and recompile. Performance-wise isn’t it really much of a problem anymore, since todays computers have several GB memory, multi-core processors and relatively fast hard-drives, but it makes the compile process slower anyway, which sums up when your project grows.

I think the error that occurs most often when dealing with #include data is multiple definitions of the same symbol. This occurs when you include a file in more than one source file, there are dozens of these posts on the palib forum!

How to embed data

You want to embed “debugfont.bmp” in your game, so it’s always in memory, how you doing that? It’s simple! Use your favorite graphics converter to ouput a “binary” version of the converted data, lets call it “output.bin”. In order to embed output.bin, all you have to do is to move it to the “build” directory of your project. The example makefiles of libnds/devkitARM include a target that takes every .bin file in the “build” directory and convert, compile and link it.

If you want to do it by hand, use devkitARM/bin/bin2s.exe to convert the .bin file to assembler code and move the output file to your “source” directory. Next time you build the project, the file gets compiled and linked.

Conclusion

  • You can’t complete a game project by #include’ing data, which contains more than about 3.5MB resources.
  • Sometimes you want to embed data, eg graphic for debug text, compile and link the file.
  • #include data is error-prone, multiple definition of…
  • #include data slows down compile process which can become significant as the project grows.


Introduction

iDeaS is a Nintendo DS software emulator available for Linux and Microsoft Windows. Since version 1.0.2.8 (21 Dec 2008) iDeaS features program breakpoints and user messages that can be sent to the debug console:

# Added program breakpoint (SWI #0xFDFDFD).
# Added output on console for user’s messages (SWI #0xFCFCFC).

My alarm bells started to bang in the moment I saw the changelog. Both features have been implemented using software interrupts, that do not exist on the target hardware.

What does it mean? It means, those software interrupt ID’s make your NDS application incompatible with the NDS hardware. Whenever you accidentally print an iDeaS debug message, the application will crash on actual hardware.

no$gba implemented the breakpoint and debug message facility quite excellent. In no$gba a breakpoint is just a mov r11, r11 instruction (a nop), which can be executed by the hardware by all means. The no$gba debug message system is a bit more complicated. It uses a combination of harmless instructions to detect you want to print text. However, all instructions used for that feature do work on hardware as well, this is what we want, this is how it should be implemented.

The current scenario

Shortly after the release I contacted the iDeaS author and told him the current approach isn’t top notch and should be changed that it won’t break program execution on actual hardware. Unfortunalety he didn’t see the advantage, since you can:

  • Remove all prints before you test on hardware

Obviously, this cannot be the way to handle it. Manually removing all prints would be enormously time consuming and error-prone.

  • Use some sort of #ifdef blocks to automatically remove all prints

This might sound like a solid idea at first glance, but thinking about it a further minute, proves it isn’t. The problem is simple, it’s too time comsuning. When you use the pre-processor to remove all print calls, you have to rebuild the entire project every time you change the corresponding pre-processor switch. Depending on how many files the project has, can it take a significant amount of time to rebuild, which is no option for many people.

  • Wrap print calls with some enabled/disabled mechanism

A function that wraps print calls with a surrounding if block, removes the need to rebuild the project. Just disable debug output at program initialization and there you go. However, I tend to forget things that are of no importance for me and this is something I would forget many times!

How to solve the puzzle

We developers want a debug text system that will not break program execution, no matter if the application runs in an emulator or on actual hardware. All those previous points do make some sense, but don’t remove the problem in a whole. What we have to create is a system that:

  • Is able to print text to the debug console
  • Can be switched on/off at runtime
  • Automatically detects when it runs on hardware and discards all print calls in this case
  • Remove all print calls when building a “release version” (pre-processor)

Everthing on the list should be quite clear, except for the hardware detection. When I had the requirements-list done and was wondering how to implement that, I remembered this post at gbadev.org. They used a hardware feature that no emulator seems to emulate correctly, instruction fetching:

1
2
3
4
5
detectGBA: ;returns 0 if emulated
mov r0,#0
str r0,_0
_0: mov r0,#1
mov pc,lr

The str r0, _0 instruction overwrites the following instruction, where r0 is set to 1. However, on real hardware the instruction would be fetched already and the change has no effect. Basically the memory at this address is being overwritten, but the instruction pipeline fetched the instruction before, so the original instruction is used. When you call the function more than once, it won’t work correctly anymore.

So I slightly changed the code to restore the original instruction to be able to call the function more than once. I also made it arm and thumb compatible. Here is the entire source code to print text to iDeaS debug console, that detects real hardware and discards all prints at runtime. You can also use a pre-processor #define to remove all print calls when you build a “release version”.

ideas.h

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
#ifndef __ideas_h__
#define __ideas_h__
 
#if __cplusplus
  extern "C" {
#endif
 
#if !_RELEASE
 
int IdeasEnableDebugOutput(int enable);
int IdeasOutputDebugString(const char* format, ...);
 
#else
 
// in release mode use empty functions, so the compiler
// optimizes any call to them away.
inline int IdeasEnableDebugOutput(int /*enable*/) { return 0; }
inline int IdeasOutputDebugString(const char* /*format*/, ...) { return 0; }
 
#endif // _RELEASE
 
int IdeasIsEmulator();
 
#if __cplusplus
  } // extern "C"
#endif
 
#endif // __ideas_h__

ideas.c

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
#include "ideas.h"
#include <stdio.h>
#include <stdarg.h>
 
#if !_RELEASE
 
static int IdeasDebugOutputEnabled=1;
 
// Gets if the program runs in an emulator.
__attribute__ ((noinline))
int IdeasIsEmulator()
{
// The idea behind the code is to overwrite
// the "mov r0, #0" instruction with "mov r0, r0" (NOP).
// On real hardware, the instruction would have been fetched,
// so the overwrite has no effect for the first time executed.
// In order to be able to call the function more than once,
// the original instruction is being restored.
// see http://forum.gbadev.org/viewtopic.php?t=910
#ifdef __thumb__
  int mov_r0_r0 = 0x1c00;   // mov r0, r0
  int mov_r0_0  = 0x2000;   // mov r0, #0
 
  asm volatile (
    "mov  r0, %1     \n\t"  // r0 = mov_r0_r0
    "mov  r2, %2     \n\t"  // r2 = mov_r0_0
    "mov  r1, pc     \n\t"  // r1 = program counter
    "strh r0, [r1]   \n\t"  // Overwrites following instruction with mov_r0_r0
    "mov  r0, #0     \n\t"  // r0 = 0
    "strh r2, [r1]   \n\t"  // Restore previous instruction
    : "=r"(mov_r0_r0)       // output registers
    : "r"(mov_r0_r0), "r"(mov_r0_0)    // input registers
    : "%r1","%r2"           // clobbered registers
    );
 
#else
  int mov_r0_r0 = 0xe1a00000; // mov r0, r0
  int mov_r0_0  = 0xe3a00000; // mov r0, #0
 
  asm volatile (
    "mov  r0, %1     \n\t"  // r0 = mov_r0_r0
    "mov  r2, %2     \n\t"  // r2 = mov_r0_0
    "mov  r1, pc     \n\t"  // r1 = program counter
    "str  r0, [r1]   \n\t"  // Overwrites following instruction with mov_r0_r0
    "mov  r0, #0     \n\t"  // r0 = 0
    "str  r2, [r1]   \n\t"  // Restore previous instruction
    : "=r"(mov_r0_r0)       // output registers
    : "r"(mov_r0_r0), "r"(mov_r0_0)    // input registers
    : "%r1","%r2"           // clobbered registers
    );
#endif
 
  return mov_r0_r0 != 0;
}
 
// This function must be noinline, because
// iDeaS expects the text to output in register r0.
// If this code is inlined somewhere, it's not guaranteed
// that text is located in r0 anymore, thus will not work.
static __attribute__ ((noinline))
void IdeasOutputDebugStringInternal(const char* text)
{
#ifdef __thumb__
  asm volatile ("swi #0xfc");
#else
  asm volatile ("swi #0xfc000");
#endif
}
 
 
// Prints formatted output to the iDeaS debug console
// Returns false when text has not been printed, true otherwise.
int IdeasOutputDebugString(const char* format, ...)
{
  va_list args;
  char    buffer[128]; // increase to support more characters
 
  if(!IdeasDebugOutputEnabled || !IdeasIsEmulator())
    return 0;
 
  va_start(args,0);
  vsnprintf(buffer, sizeof(buffer), format, args);
  va_end(args);
 
  IdeasOutputDebugStringInternal(buffer);
  return 1;
}
 
// Enables or disables debug output.
// Returns the previous enabled state.
int IdeasEnableDebugOutput(int enable)
{
  int old = IdeasDebugOutputEnabled;
  IdeasDebugOutputEnabled = enable;
  return old;
}
 
#endif // !_RELEASE

Download the files here.

file i/o handling in nds homebrew

Posted by Peter Schraut under NDS, Programming

One problem in many nds homebrew games is improper handling of missing assets and improper handling when user content could not be created.

A commercial game that is being shipped on its own cartrige can rely on the fact all game assets are available. There is no way to delete files from it. In this case, you don’t need to keep attention if fopen or fread succeded, because they always do.

With nds homebrew it’s a whole different. The user of your game has to copy related game assets and the .nds executable itself to his/her flashcard. Everything can go wrong during this process. Beside missing files, it could be also a flashcard compatibility problem, such as an invalid DLDI driver. In these cases, many games just die silently, rather that displaying an error-message to let the user know what happened.

You often find file loading code such as (don’t copy!):

1
2
3
4
5
6
7
8
9
10
11
uint8_t* LoadFile(const char* filename)
{
  FILE* file = fopen(filename, "rb");
  fseek(file, 0, SEEK_END);
  size_t size = ftell(file);
  rewind(file);
  uint8_t* buffer = (uint8_t*)malloc(size);
  fread(buffer, size, sizeof(uint8_t), file);
  fclose(file);
  return buffer;
}

All those functions have return values, which must not be ignored. If any function call fails here, it’s very unlikely the program will continue to operate correctly.

We do have to find a better way to ensure loading either works “in all cases” or does not. The very first thing that comes to mind is to use some sort of file archive. If you bundle all assets in one archive, there is a good chance everything is available and there is no way to delete files from it. At application startup, test if the archive is available, open it and read a test file to confirm the system works. If it does not, display a human readable error message, numerical error codes don’t count.

Another critical part is deployment. If there is more than one file to deploy, there is also more than one chance to fail. So, it would be quite good if the user needs to deploy only one file for the entire game, rather than the .nds file and the archive seperately. This can be done by appending the archive to the .nds file!

I’ve created my own tool set for all this, which I won’t share,  but there is a free public solution for it called Embedded File System Library as well. I’ve never used EFS, but it should do exactly what we need if we trust its documentation. If we use this approach, an archive appended to the .ds file, we can be pretty sure assets are available as long as the appended archive is available. Only two problems left

  • Incompatible DLDI driver.

To detect an incompatible DLDI driver, check the return value of fatInitDefault. This function is part of libfat. If it returns false, display an appropriate message and make the user aware it could be due to an incompatible DLDI driver. This gives him/her a chance to solve the problem rather than just being frustrated, deleting your game and sending you hate-mails.

  • User aborted the copy process before the whole file was written to the flashcard.

To check if the whole file was written to the flashcard could be done by appending a magic value at the end of the archive. At application startup seek to this position, read the value and compare it with what it should be. If it’s different, display a warning that the attached data seems to be corrupted. You could let the user continue with the game, he/she will know what could have caused the problem when the game crashes eventually.

What I do to ensure file i/o works

Create an archive of all assets, append it to the .nds file as well as a magic value.

The magic value can be a string like “I FEEL GOOD” at the end of the file. At application startup, initialize libfat and repsond to its return value. Display an error message when fatInitDefault failed.

Open the application file where the archive is appened. Check if this operation succeded and display an error message if anything went wrong.

Seek to and read the “I FEEL GOOD” magic value. If it’s different, display a “data is corrupted” warning. Open and read a file from the archive to verify the system works, display an error if it fails.

From now on assume the file i/o system will work and continue with further initialization.

If all checks last longer than a few milliseconds, display a “Please wait, initializing file system…” message before.

Source code snippet to verify that at the end of the program file is “I FEEL GOOD” located:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
// opens the file specified by filename and checks if at the
// end of the file is "I FEED GOOD" located.
// returns true on success, false otherwise. If return value
// is false, further error information is stored at errorString.
bool VerifyProgramFile(const char* filename, char* errorString)
{
  const char* const MAGICVALUE="I FEEL GOOD";
 
  // open file
  FILE *file = fopen(filename, "rb");
  if(file == NULL)
  {
    sprintf(errorString, "Cannot open file '%s'.", filename);
    return false;
  }
 
  // seek to magic value offset
  fseek(file, 0, SEEK_END);
  size_t magicOffset = ftell(file) - strlen(MAGICVALUE);
  if(fseek(file, magicOffset, SEEK_SET) != 0)
  {
    // could not seek to magic
    sprintf(errorString, "Data in file '%s' seems to be corrupted.", filename);
    fclose(file);
    return false;
  }
 
  // read magic value
  char magicValue[64];
  if(fread(magicValue, sizeof(char), strlen(MAGICVALUE), file) != strlen(MAGICVALUE))
  {
    // could not read magic
    sprintf(errorString, "Data in file '%s' seems to be corrupted.", filename);
    fclose(file);
    return false;
  }
 
  // compare magic value
  if(memcmp(magicValue, MAGICVALUE, strlen(MAGICVALUE)) != 0)
  {
    // magic is different
    sprintf(errorString, "Data in file '%s' seems to be corrupted.", filename);
    fclose(file);
    return false;
  }
 
  // all tests successfully passed
  fclose(file);
  return true;
}

Source code snippet of application startup code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
int main()
{
  if(!fatInitDefault())
  {
    DisplayMessage("libfat file system initialization failed. Did you apply the correct DLDI driver?");
    Halt();
  }
 
  char errorString[1024];
  if(!VerifyProgramFile(pathToFileArchive, errorString))
  {
    DisplayMessage(errorString);
    WaitForUserConfirmation();
  }
 
  // open archive and test if a file
  // can be read from the archive. if all tests
  // succeed, continue with application initialization...
 
	return 0;
}

Nintendo DS 4k Intro

Posted by Peter Schraut under NDS, Programming

Over eastern I followed the breakpoint demo party via live stream and was especially attired in the 4k intro competition, that motivated me to try if it is possible to create an application that draws a simple quad in 4096 bytes on the Nintendo DS, using the devkitARM tool-chain.

However, even compiling a .c file containing only the application entry point, without any external libraries, already creates a .nds file that is 54848 bytes:

int main(void)
{
  return 0;
}

After fiddling around for “some” time, I had a .nds file that not only shows a simple quad, I had one that shows hundreds of lit textured cubes that create a tunnel where the camera flies through with a fullscreen distortion effect in less than 4096 bytes.

Once we’ve released it at pouet.net, I figured from the comments I could had shrink the filesize further 480 bytes by just compressing it – I feel so dumb! ;)

Obviously it can not compete with the breakpoint 4k entries, but I felt good that I was able to do it anyway, especially because it seems to be the worlds first 4k intro for the Nintendo DS! Unfortunalety, it features no music.

You can download the binary at: http://pouet.net/prod.php?which=53081

Continuing my journey to show what 32bit arm instructions devkitARM generates for particular C++ source code snippets, I thought it’s a good idea to compare classes and structs.

It’s not about C++ classes and C structs, but C++ classes and C++ structs!

I always read in several Nintendo DS homebrew related forums, using classes rather than structs in C++ has a dramatic impact on performance and people recommend not to do this.

Is there really a performance problem? Let’s find it out!

The following C++ code was compiled with devkitARM r25 using optimization level -O1, no runtime type information and no exceptions.

What is the difference between a class and struct in C++?

Actually, the only difference is the default access of members:

  • Default access of members in a class is private.
  • Default access of members in a structure or union is public.
  • Default access of a base class is private for classes and public for structures. Unions cannot have base classes.

When you don’t specify any access specifier (public, protected, private), members of a class have private access by default, while members of a struct have public access.

Both, classes and structs, allow to have member functions, virtual member functions, special member functions (ctor, dtor, …), operators, feature inheritance and so on.

Member variable access of a class vs struct

Let’s take a look what 32bit arm instructions are generated for the following C++ code. Since classes and structs are technically the same, the compiler should output identical code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
struct MyStruct
{
  int x;
};
 
void SetMyStruct(MyStruct *p)
{
  p->x = 2;
}
 
class MyClass
{
public:
  int x;
};
 
void SetMyClass(MyClass *p)
{
  p->x = 2;
}

Instructions for SetMyStruct:

1
2
3
4
5
; the parameter "p" is stored in r0
_Z11SetMyStructP8MyStruct:
  mov  r3, #2        ; r3 = 2
  str  r3, [r0, #0]  ; *(int*)&((char*)r0)[0] = r3
  bx   lr            ; return

Instructions for SetMyClass:

1
2
3
4
5
; the parameter "p" is stored in r0
__Z10SetMyClassP7MyClass:
  mov  r3, #2        ; r3 = 2
  str  r3, [r0, #0]  ; *(int*)&((char*)r0)[0] = r3
  bx   lr            ; return

The function names went through name mangling, that’s why they look so weird. Name mangling is a technique used to solve various problems caused by the need to resolve unique names for programming entities.

Alright, accessing a member variable of a class or struct produces identical code.

non-virtual member function vs function

Let’s take a look at the generated code for member functions of a class vs passing a struct to a function.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
struct MyStruct
{
  int x;
};
 
void InitMyStruct(MyStruct *p)
{
  p->x = 2;
}
 
class MyClass
{
public:
  void Init();
 
  int x;
};
 
void MyClass::Init()
{
  this->x = 2;
}

Instructions for InitMyStruct:

1
2
3
4
5
; the parameter "p" is stored in r0
__Z12InitMyStructP8MyStruct:
  mov  r3, #2        ; r3 = 2
  str  r3, [r0, #0]  ; *(int*)&((char*)r0)[0] = r3
  bx   lr            ; return

Instructions for MyClass::Init:

1
2
3
4
5
; "this" is stored in r0
___ZN7MyClass4InitEv:
  mov  r3, #2        ; r3 = 2
  str  r3, [r0, #0]  ; *(int*)&((char*)r0)[0] = r3
  bx   lr            ; return

The member function code is identical to the code of InitMyStruct! Now you should go like “Why is it identical, why has MyClass::Init also a parameter?”

Because the compiler substitudes a hidden parameter to every member function. The parameter is what you know as this keyword. It is passed as first parameter, thus located in register r0 for the arm instruction set.

The same applies to InitMyStruct, it has one parameter that expects a pointer (to a MyStruct object), so it’s the same.

We also see the member function code is not attached to the object as many people claim! Non-virtual member functions are resolved statically. That is, the member function is selected statically (at compile-time) based on the type of the pointer (or reference) to the object.

How a non-virtual member function gets called

Until now, we only analysed what code is generated for the particular functions, but we don’t know how they get called. Let’s take a look at main, where both functions get called.

1
2
3
4
5
6
7
8
9
10
int main(void)
{
  MyStruct myStruct;
  InitMyStruct(&myStruct);
 
  MyClass myClass;
  myClass.Init();
 
  return 0;
}

Instructions generated for main:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
main:
  ; store link-register to stack
  str  lr, [sp, #-4]!
 
  ; move stack pointer by 12 bytes
  ; 4 bytes for the link-register store above
  ; 4 bytes for storage of MyStruct object
  ; 4 bytes for storage of MyClass object
  sub  sp, sp, #12
 
  ; set r0 to start of MyStruct object
  add  r0, sp, #4
 
  ; call InitMyStruct
  ; "myStruct" pointer is stored r0
  bl   _Z12InitMyStructP8MyStruct
 
  ; set r0 to start of MyClass object
  mov  r0, sp
 
  ; call member function Init of MyClass
  ; "myClass" pointer (which becomes "this") is stored in r0
  bl   _ZN7MyClass4InitEv

The code that calls InitMyStruct and myClass.Init is the same. There is no difference, no difference, no …!

How a virtual member function gets called

I’ll show you what’s going on with virtual member function in an upcoming article, stay tuned for that!

Conclusion

  • C++ classes and C++ structs are technically the same.
  • There is no performance impact when calling a non-virtual member function over a function with one parameter.

Follow up material

I recommend to read the topics about classes and inheritance at C++ FAQ Lite in case you want to know more details about classes.

I just stumbled upon an interesting article about potential fields in RTS games, which was a very nice read.

Potential Fields have some similarities with influence maps. Influence maps are often used to decide whether an area in the game world is controlled by own or enemy units, or if it is an area currently not under control by any forces (no man’s land).

If you have a few minutes to spare, don’t miss to read Using Potential Fields in a Real-time Strategy Game Scenario (Tutorial)

generated code for packed structs

Posted by Peter Schraut under Programming

In my previous post, struct/class/union: member alignment, I said the compiler usually generates more and slower program code for packed than for unpacked structs. In this post, I’ll show what code is being generated.

Remember I said the ARM7TDMI CPU does not support accessing mis-aligned addresses? In order to support this, the compiler generates code that works with byte instructions, since those can access addresses with any alignment.

In order to load a 32bit word from a packed struct, the compiler generates code that actually loads four bytes, then shift and bitwise OR them together to create the final 32bit word:

1
2
3
4
5
6
7
8
9
10
11
struct __attribute__((packed)) PackedStruct
{
  unsigned char  myByte;
  unsigned short myShort;
  unsigned int   myInt;
};
 
unsigned int GetMyInt(PackedStruct *p)
{
  return p->myInt;
}

The above C++ code is compiled with devkitARM release 25 using optimization level -O4 and transforms to the following 32bit arm assembler:

1
2
3
4
5
6
7
8
9
10
11
12
; Incoming parameter "p" is stored in register r0
; Return value is also stored in r0 when function returns
 
GetMyInt:
 ldrb  r3, [r0, #3]         ; r3 = ((unsigned char*)r0)[3]
 ldrb  r2, [r0, #4]         ; r2 = ((unsigned char*)r0)[4]
 ldrb  r1, [r0, #5]         ; r1 = ((unsigned char*)r0)[5]
 orr   r3, r3, r2, asl #8   ; r3 = r3 | (r2 << 8)
 ldrb  r0, [r0, #6]         ; r0 = ((unsigned char*)r0)[6]
 orr   r3, r3, r1, asl #16  ; r3 = r3 | (r1 << 16)
 orr   r0, r3, r0, asl #24  ; r0 = r3 | (r0 << 24)
 bx    lr                   ; return

On the other hand, when you use an unpacked struct, you only need to remove the packed attribute at PackedStruct and recompile, the code transforms to a single load instruction:

1
2
3
4
5
; Incoming parameter "p" is stored in register r0
; Return value is also stored in r0 when function returns
GetMyInt:
 ldr  r0, [r0, #4] ; r0 = *(unsigned int*)&((unsigned char*)r0)[4]
 bx   lr ; return

For this reason, I said the compiler generates more and less efficient code for packed structs! If you’re a careful reader, you should have noticed myInt is located at member offset four rather than three. If you don’t know why, read my previous post “struct/class/union: member alignment“.

If you want to look at the generated code in your favourite text editor, you need to add -save-temp to CFLAGS in your makefile:

CFLAGS += --save-temps

This will instruct the devkitPro tool-chain to store temporary intermediate files in your build directory. I found the .ii and .s files very interesting!

This site uses a Hackadelic PlugIn, Hackadelic SEO Table Of Contents 1.6.0.