Wednesday, November 26, 2008

Frame pointers and Function Call Tracing


Function call tracing is really helpful when tracking down a problem, specially on an embedded system. On the ARM architecture data abort events and such provide useful data regarding where the problem was found like the instruction where that happened and the values each register had on that moment. However that usually isn't enough, specially when functions like memcpy() that are usually called from many places in our program. Even worse: imagine running a RTOS. If we knew where memcpy() was called it would be different and probably a lot easier to debug and trace down. Yesterday I was talking with David and he mentioned the function __builtin_return_address that comes with GCC. He uses it together with C++ (i386) to detect memory leaks, however it's not available for the Arm architecture.

GCC and frame pointers

GCC implements a nice concept called Frame Pointer. One of the CPU's registers is reserved and used as a frame pointer. Each time a function starts its execution the frame pointer is set to point exactly after the return address the caller has placed on the stack. Apart from being useful to the compiler to refer to function arguments in an easier way it can also be used to deduce who called us, and who called the one who called us, and so. A nice explanation can be found here.

However GCC turns on the -fomit-frame-pointer flag for some optimization levels so if we need the frame pointer we need to force it by adding -fno-omit-frame-pointer.
GCC provides a function called __builtin_return_address() with which you can trace up the function calls, but it's not available for ARM. However I found a nice piece of code in the linux kernel for the ARM architecture here. I just stripped this part (remember it's published under the GPL license by Russell King):



.align 0
.type arm_return_addr %function
.global arm_return_addr

arm_return_addr:
mov ip, r0
mov r0, fp
3:
cmp r0, #0
beq 1f //@ frame list hit end, bail
cmp ip, #0
beq 2f //@ reached desired frame
ldr r0, [r0, #-12] // else continue, get next fp
sub ip, ip, #1
b 3b
2:
ldr r0, [r0, #-4] //@ get target return address
1:
mov pc, lr //get back to callee

The 'magic' values are -4 and -12 which indicate the relative position of the previous function's link register (return address) and frame pointer. This values come from analysing the push instruction which is called in every function entry that pushes, apart from other registers, {pc, lr, ip, fp} in that precise order in the stack.
The C prototype for arm_return_addr would be:



void * arm_return_addr( unsigned int num );

Where num is the number of frames to search back. Take care and remember that if you reach the last frame it's value will be 0 and you should stop there.
What I did is to show the call trace once I get data abort or prefetch abort exception so I know what caused it and it's easier to track back.

Give me something I can understand: decrypting addresses

With the function discussed above we can get the return addresses but where to go from there? You can generate a listing with arm-elf-objdump and find the address, but there is a nice tool called arm-elf-addr2line which will do it for you, thanks to David for pointing this out. Just do something like:

    arm-elf-addr2line --exe=yourelf.elf

And it will output something like:

    main.c:90

Which means you can find it in main.c, line 90. Quite nice, isn't it?

Disadvantages

There is a stack penalty when using frame pointers. In one product I'm developing I saw between 20 to 30 words stack penalty when using frame pointers compared to the same program compiled without frame pointers.  That means about 20*4 = 80 bytes of stack. Since that was a RTOSsed product and has more than 10 tasks running simultaneously that number multiplies and yields an important increase in total stack usage. That can be a problem if RAM is not enough, not mentioning that a tight-tuned program will probably crash for the first time it's compiled to work with frame pointers because of stack overflows.

To see what causes this behaviour let's look at a simple C function:



int sumNumbers( int a, int b ) {
return a + b;
}

When compiled with -fomit-frame-pointer we get:



00007f6c <sumNumbers>:
7f6c: e0810000 add r0, r1, r0
7f70: e12fff1e bx lr

But if we force frame pointers with -fno-omit-frame-pointer we obtain:



0000818c <sumNumbers>:
818c: e1a0c00d mov ip, sp
8190: e92dd800 push {fp, ip, lr, pc}
8194: e0810000 add r0, r1, r0
8198: e24cb004 sub fp, ip, #4 ; 0x4
819c: e89da800 ldm sp, {fp, sp, pc}

By using frame pointers GCC is obliged to push the registers fp, ip, lr and pc and set up the frame pointer, that means bigger code and higher stack usage, at least for small or medium sized functions. Now you may notice why the GCC documentations says "-O also turns on -fomit-frame-pointer on machines where doing so does not interfere with debugging."
Using frame pointers or not depends on whether you give priority to debugging or small code footprint (and smaller ram/stack footprint too).

It's important to remember that if the program is compiled without frame pointers then the arm_return_addr function must not be called since the frame pointer register will contain other information, likely not related to anthing to do with frame pointers.

Alternative methods

There is another method we can use with GCC. It involves using the -finstrument-functions compiler flag. That will force GCC to call user-defined functions when entering and exiting functions, so an array can be kept on RAM with the call tree. However that could slow down the whole program excessively. On the other hand care must be taken with multithreaded designs. For more information here is the GCC documentation.

4 comments:

  1. Excelent post! You made me realize that I have to add -fno-omit-frame-pointer in some places :)

    ReplyDelete
  2. __builtin_return_address works fine on arm.
    I tested it on an arm11mpcore with gcc 4.3.3

    ReplyDelete