Segmentation fault on an ARM device
#1
Hello.

I have compiled D1X-Rebirth v0.58.1 on an MK808, an Ubuntu-compatible ARM device not too different to a Raspberry Pi.

Whenever I try to run Descent though, I encounter a segmentation fault once the title screen appears. This is what strace produces. On some occasions, the game exits with the strange message *** glibc detected *** ./d1x-rebirth: free(): invalid pointer: 0x0093b800 ***.

Since my device is compatible with OpenGLES, I compiled D1X using scons opengl=0 opengles=1. I have dumped the output of the compilation process here.

I am rather new to this, so I am not quite sure what is wrong. I would expect that my system is compatible, as other games using SDL and OpenGLES (etc.) manage to run, and moreover D1X runs successfully on similar systems. Am I missing certain libraries, or should I tweak the environmental variables when installing? Might there alternatively be a problem with my hogfile? If it helps, here is a list of libraries installed on my system.

Any help would be greatly appreciated.
Reply
#2
Hello fish and Welcome,

I'm sorry if I cannot offer a direct help since right now I have no access to either ARM nor OpenGL ES-capable hardware. But maybe you could probably add a GDB backtrace? That would give me a point on where exactly the program crashes.

If you don't know how to do that, here's a little tutorial:
- Check that gdb is installed (usually has it's own package on most Linux distributions)
- Compile Rebirth with the debug flag (i.e. "scons opengl=0 opengles=1 debug=1")
- Launch d1x-rebirth with gdb: "gdb ./d1x-rebirth". Now you enter the gdb shell.
- In gdb shell launch d1x-rebirth itself (in window mode so you won't get stuck on the crash) via "run -window"
- Run the program until the point where it crashes.
- Soon as the crash occurs, you get thrown back into the gdb shell. There type "bt" which should give you an output from the program start to the function where it crashed. Copy this output and please paste it here (using the code or quote function).

This should help me to identify the problem and possibly I can give you tips for a solution/workaround.

Thank you very much. 
The greatest pleasure in life is to do what people say you cannot do.
Uhm... Honey, there's a head in the toilet!
Reply
#3
Thank you most kindly for your quick response, zico.

Running gdb produced the following message at the point of the program's termination:

Code:
Program received signal SIGBUS, Bus error.
get4fix (fixp=0xe66fa6 <Robot_info+146>) at main/bmread.c:855
855        for (i=0; i<NDL; i++) {

Entering bt at this point produces only the following lines:
Code:
#0  get4fix (fixp=0xe66fa6 <Robot_info+146>) at main/bmread.c:855
#1  0x0002c5ae in bm_read_robot_ai (skip=0) at main/bmread.c:951
#2  0x0002a1ce in gamedata_read_tbl (pc_shareware=0) at main/bmread.c:503

#3  0x0002849c in gamedata_init () at main/bm.c:129
#4  0x000680dc in main (argc=1, argv=0xbefff2e4) at main/inferno.c:389
Reply
#4
Thank you, this already helps very much. I can't say that I have a direct point on how you can work around the issue - unfortunately. It might be a simple word alignment issue here. I'll check this out as soon as I can and get back to you if I found something out.

Thanks again for the helpful backtrace. It really helps to find this lil bugger here. Smile
The greatest pleasure in life is to do what people say you cannot do.
Uhm... Honey, there's a head in the toilet!
Reply
#5
Line 951 is get4fiDeadrobptr->field_of_view);.  For me, using a newer codebase:
Code:
(gdb) p &((robot_info*)0)->field_of_view
$3 = (int (*)[5]) 0x92
fish: please run that gdb command on your binary.  If the output number is not a multiple of four, an alignment fault is very likely.
Reply
#6
Kp, although I do not quite understand your suggestion, entering p &((robot_info*)0)->field_of_view before (or after) running D1X in gdb produces this:
Code:
$1 = (fix (*)[5]) 0x92

Was this the right command to enter? I do apologise if it was not.
Reply
#7
That is exactly what I wanted and, combined with the SIGBUS message when you answered zico, confirms my suspicion.  ARM processors require word-sized accesses to occur on word-aligned boundaries.  The game code does not do this, hence an alignment violation that results in a SIGBUS.  x86 compatible processors do not require this, so Descent shipped with numerous places that perform unaligned accesses, primarily because it reduced space consumption.  There is code in Rebirth to try to fix up unaligned accesses, so I thought someone had made it run on an alignment-mandatory architecture.  Your experience contradicts this.  You might be able to make the game run if you can configure the system to fix alignment faults in software, but there could be a performance cost to it.  If you can't do this or the performance cost is too high, then code changes will be necessary to make the game run.

Fixing these is not hard, but is inconvenient without access to a system that faults on unaligned access, since the only ways to catch it are tedious data structure analysis or giving it to someone who does have a strict alignment chip.
Reply
#8
Thank you for your insights. It is interesting that these alignment issues can arise on one type of ARM SoC, whilst not occurring on other SoCs featuring the same processor. Presumably there are more intricacies to the operations of these little chips than one might presume.
Reply
#9
It might be possible to have the kernel compensate for the fault or to set some control register to enable hardware compensation.  Both of these likely incur a performance hit, so it would be better to fix the Descent code.  I was already working on a large set of changes that would have the side effect of fixing this, but have not yet fixed the site that fails for you.  Also, my work is in the active development branch, which may be less stable than the branch you are trying to play.
Reply
#10
(02-16-2014, 02:10 PM)fish link Wrote: Thank you for your insights. It is interesting that these alignment issues can arise on one type of ARM SoC, whilst not occurring on other SoCs featuring the same processor. Presumably there are more intricacies to the operations of these little chips than one might presume.

It does not have to be a hardware issue. It might just be random chance. The alignment of certain things might end up differently with every compiler version or target backend. I did not experience any SIGBUS errors due to misaligned accesses in the raspberry pi port since I set the WORDS_NEED_ALIGNMENT preprocessor macro (which enables some code paths to deal with such alignment issues), but I am actually skeptical if this will deal with all possible alignment issues (and is actually actively maintained, since there seemed to be no configuration left which eneabled this).

However, you don't seem to have set this macro yet, so you should try and see if it helps. Look for  WORDS_NEED_ALIGNMENT in the SConstruct file to see how I have set this for the RPi builds.
"Perfection is attained not when there is nothing more to add, but when there is nothing more to remove." -- Antoine de Saint Exupéry
Reply


Forum Jump:


Users browsing this thread: 1 Guest(s)