C Plus / Add: c++

Showing posts with label c++. Show all posts

Monday, May 4, 2009

Compiling and using GDB for arm-linux

Some days ago I had to compile gdb manually in order to debug an arm-linux app. It's quite trivial but it's so useful that I thought it would be a nice idea to post the instructions here. Below there are some explanations about debugging remotely with KDevelop.

This was done with gdb-6.8, you can grab it here. It is assumed that arm-linux tools are available (PATH correctly set).

Compiling the GDB client

Decompress gdb-6.8 and compile it by issuing:


tar xvf gdb-6.8.tar.gz
cd gdb-6.8
./configure --build=x86 --host=x86 --target=arm-linux
make

After compiling just copy gdb/gdb to arm-linux-gdb where the arm-linux toolchain binaries reside (this is purely for organization and proper naming). You can now remove the gdb-6.8 directory.

NOTE: if your host computer architecture isn't intel-based just replace x86 by the correct value for your platform.

Compiling the GDB server (ARM)

Decompress gdb-6.8 and compile it by issuing:


tar xvf gdb-6.8.tar.gz
cd gdb-6.8
./configure --host=arm-linux
make

After compiling just copy gdb/gdbserver/gdbserver to your arm-linux filesystem so that you can execute it from a remote shell (gdbserver will be an arm-elf binary). You can now remove the gdb-6.8 directory.

Testing connections

First the server should be started in the arm processor by issuing something like:


gdbserver host:1234 _executable_

Where _executable_ is the application that is going to be debugged and host:1234 tells gdbserver to listen for connections to port 1234.

To run gdb just type this in a PC shell:


arm-linux-gdb --se=_executable_

After that you'll get the gdb prompt. To connect to the target type:


target remote target_ip:1234

To resume execution type 'continue'. You can get an overview on gdb usage here.

Debugging with KDevelop

KDevelop can be used to watch variables and debug the remote target (place breakpoints, stop execution, etc). Just go to Project -> Project Options and make sure you have something like this:

monitor-debug.gdb is a file that contains the following


target remote target_ip:1234

After this you will be ready to remotely debug any arm-linux application.

Friday, January 16, 2009

Dirty little bugs - a pin/macro approach

Program testing can be used to show the presence of bugs, but never to show their absence!

Edsger W. Dijkstra

Measuring and observing usually (if not always) change what one is measuring. I've never seen Schrödinger's cat, at least not when leading with embedded systems nor in my garden, but the former implication causes many problems when trying to debug time-critical functions/operations in embedded software and hardware.

Usually a printf-like UART output works but that would slow down or even hang code which can't wait for the UART to stream its data out. Even though with a substantial buffer it will come a time where it will get stuck.

There is a nice, well known techinque, that uses output pins to reflect some state, such as which function or interrupt handler the microcontroller is doing at the time. Since setting the output value requires only a few instructions, this approach suits better the time-critical scenarios. This technique is helpful in many ways: one may measure from interrupt latency and cpu usage to interrupt or thread deadlock.

That's how the following code came up to mind, here is a simple header file which is based on the macros I posted some time ago here.


#ifndef _dbgpin_h_
#define _dbgpin_h_

#include "portutil.h"

/* Enables or disables debugging */
#if    DBGPIN_ENABLED 1

/* Debug pins */
#define DBGPIN_01    0,2
#define DBGPIN_02    0,3
//---- etc

/* Initialization */
#define DBGPIN_INIT( pin )    do { FPIN_AS_OUTPUT(pin); FPIN_CLR(pin);  } while(0);

/**
* Beware of returns, exclude them!
*/
#if    DBGPIN_ENABLED
#define DBGPIN_BLOCK( pin, block ) \
           FPIN_SET_( pin ); \
           do { \
               block \
           } while(0); \
           FPIN_CLR_( pin );
#else
#define    DBGPIN_BLOCK( pin, block ) block
#endif

#endif /* _dbgpin_h_ */

Here is an example:


DBGPIN_BLOCK( DBGPIN_01,
a = b;
callSomething();
);

There is a small drawback: the do-while structure won't let any variable declared inside 'block' to be visible outside. That can be solved by removing the do-while, if you need it that way just go ahead and delete those two lines.

Also notice that a return in 'block' won't allow the macro to clear the corresponding pin, so applying the macro to a function with many return points is messy. The alternative is to change the original function's name and call it through a sort of wrapper which does it inside a DBGPIN_BLOCK macro. It's not pure beauty but it works nice, and once you've done the debugging it can be disabled by changing DBGPIN_ENABLED to 0.

Tuesday, December 2, 2008

Compile-time assertions

The new C++0x standard contains a neat feature called static_assert which does a compile-time assertion. It's really useful in cases we can't use the #if keyword (preprocessor) to throw a compile-time error. This happens when the preprocessor finds a C construct or expression it can't resolve such as sizeof, even if it is known at compile-time (actually it's only known to the compiler).

After searching the web I found this webpage that contains a nice trick to do this assertions in C. However, the macros defined there do not provide a description about the assertion, which might be useful, so here is my modified version:


/**
* STATIC ASSERT
* -- based on
*    http://www.pixelbeat.org/programming/gcc/static_assert.html
*/
#define ASSERT_CONCAT_(a, b, c) a##b##c
#define ASSERT_CONCAT(a, b, c) ASSERT_CONCAT_(a, b, c)
#define STATIC_ASSERT(e,descr) enum { ASSERT_CONCAT(descr,_Line_,__LINE__) = 1/(!!(e)) }

As an example, this would require the size of the struct myStruct to be less than 20 in order to compile (this is only an example, remember 'magic numbers' should be avoided when programming):


STATIC_ASSERT( sizeof( struct myStruct ) < 20, MyStructIsTooLarge );

If there is an assertion error at compile time this error will be thrown:


file.c:40: error: division by zero
file.c:40: error: enumerator value for 'MyStructIsTooLarge_Line_40' is not an integer constant

On the other side, if C++ is being used and there is access to a (at least partial) C++0x compiler you can use static_assert() in a similar way:


static_assert( sizeof( struct myStruct ) < 20, "My Struct Is Too Large" );

Currently GCC 4.3 supports static_asserts as mentioned here.

Wednesday, November 26, 2008

Frame pointers and Function Call Tracing

Function call tracing is really helpful when tracking down a problem, specially on an embedded system. On the ARM architecture data abort events and such provide useful data regarding where the problem was found like the instruction where that happened and the values each register had on that moment. However that usually isn't enough, specially when functions like memcpy() that are usually called from many places in our program. Even worse: imagine running a RTOS. If we knew where memcpy() was called it would be different and probably a lot easier to debug and trace down. Yesterday I was talking with David and he mentioned the function __builtin_return_address that comes with GCC. He uses it together with C++ (i386) to detect memory leaks, however it's not available for the Arm architecture.

GCC and frame pointers

GCC implements a nice concept called Frame Pointer. One of the CPU's registers is reserved and used as a frame pointer. Each time a function starts its execution the frame pointer is set to point exactly after the return address the caller has placed on the stack. Apart from being useful to the compiler to refer to function arguments in an easier way it can also be used to deduce who called us, and who called the one who called us, and so. A nice explanation can be found here.

However GCC turns on the -fomit-frame-pointer flag for some optimization levels so if we need the frame pointer we need to force it by adding -fno-omit-frame-pointer.
GCC provides a function called __builtin_return_address() with which you can trace up the function calls, but it's not available for ARM. However I found a nice piece of code in the linux kernel for the ARM architecture here. I just stripped this part (remember it's published under the GPL license by Russell King):


 .align 0
 .type arm_return_addr %function
 .global arm_return_addr

arm_return_addr:
 mov ip, r0
 mov r0, fp
3:
 cmp r0, #0
 beq 1f                    //@ frame list hit end, bail
 cmp ip, #0
 beq 2f                    //@ reached desired frame
 ldr r0, [r0, #-12]        // else continue, get next fp
 sub ip, ip, #1
 b 3b
2:
 ldr r0, [r0, #-4]         //@ get target return address
1:
 mov pc, lr                //get back to callee

The 'magic' values are -4 and -12 which indicate the relative position of the previous function's link register (return address) and frame pointer. This values come from analysing the push instruction which is called in every function entry that pushes, apart from other registers, {pc, lr, ip, fp} in that precise order in the stack.
The C prototype for arm_return_addr would be:


void * arm_return_addr( unsigned int num );

Where num is the number of frames to search back. Take care and remember that if you reach the last frame it's value will be 0 and you should stop there.
What I did is to show the call trace once I get data abort or prefetch abort exception so I know what caused it and it's easier to track back.

Give me something I can understand: decrypting addresses

With the function discussed above we can get the return addresses but where to go from there? You can generate a listing with arm-elf-objdump and find the address, but there is a nice tool called arm-elf-addr2line which will do it for you, thanks to David for pointing this out. Just do something like:

arm-elf-addr2line --exe=yourelf.elf

And it will output something like:

main.c:90

Which means you can find it in main.c, line 90. Quite nice, isn't it?

Disadvantages

There is a stack penalty when using frame pointers. In one product I'm developing I saw between 20 to 30 words stack penalty when using frame pointers compared to the same program compiled without frame pointers. That means about 20*4 = 80 bytes of stack. Since that was a RTOSsed product and has more than 10 tasks running simultaneously that number multiplies and yields an important increase in total stack usage. That can be a problem if RAM is not enough, not mentioning that a tight-tuned program will probably crash for the first time it's compiled to work with frame pointers because of stack overflows.

To see what causes this behaviour let's look at a simple C function:


int sumNumbers( int a, int b ) {
return a + b;
}

When compiled with -fomit-frame-pointer we get:


00007f6c <sumNumbers>:
   7f6c: e0810000  add r0, r1, r0
   7f70: e12fff1e  bx lr

But if we force frame pointers with -fno-omit-frame-pointer we obtain:


0000818c <sumNumbers>:
   818c: e1a0c00d  mov ip, sp
   8190: e92dd800  push {fp, ip, lr, pc}
   8194: e0810000  add r0, r1, r0
   8198: e24cb004  sub fp, ip, #4 ; 0x4
   819c: e89da800  ldm sp, {fp, sp, pc}

By using frame pointers GCC is obliged to push the registers fp, ip, lr and pc and set up the frame pointer, that means bigger code and higher stack usage, at least for small or medium sized functions. Now you may notice why the GCC documentations says "-O also turns on -fomit-frame-pointer on machines where doing so does not interfere with debugging."
Using frame pointers or not depends on whether you give priority to debugging or small code footprint (and smaller ram/stack footprint too).

It's important to remember that if the program is compiled without frame pointers then the arm_return_addr function must not be called since the frame pointer register will contain other information, likely not related to anthing to do with frame pointers.

Alternative methods

There is another method we can use with GCC. It involves using the -finstrument-functions compiler flag. That will force GCC to call user-defined functions when entering and exiting functions, so an array can be kept on RAM with the call tree. However that could slow down the whole program excessively. On the other hand care must be taken with multithreaded designs. For more information here is the GCC documentation.

Friday, November 21, 2008

To volatile or not to volatile - Part 2

Hamlet did know about volatile. He just ignored it.

Recalling Hamlet's action on volatileness and the previous post we can 'trick' the compiler when working with ISRs and no nested interrupts enabled.
We can't undeclare a variable or remove the volatile specifier once it has been written before on the same file. However we can write our ISR code in a different source file and declare the variables as global, volatile in the file that uses it outside the ISR and non volatile inside the file containing the ISR.
Here is the resulting code:


// ----- FILE: non_isr.c ------- //
volatile unsigned int var1,var2;

/** Just to use the variables as volatile
 * If Enter_Critical() and Exit_Critical() are
 * global functions then the volatile specifier above
 * could be avoided too
 */
unsigned int getIntSum(void)
{
   unsigned int temp;
   Enter_Critical();
      temp = var1 - var2;
   Exit_Critical();

   return temp;
}

//----------------------------------

// ----- FILE: isr.c ------- //
unsigned int var1,var2;

void  ISR_Handler(void)
{
   // do something with var1 and var2
   // nested interrupts not enabled,
   // thus it's secure to use them as normal variables
}

A big disadvantage when doing this is that variables will be global and we can't declare them static since they have to be shared between several source files.
As said in the previous post, it's only a matter on what you need in that specific situation. In some cases the ISR will result in less execution time but in some others it might be the same or only a small performance boost.

Thursday, November 20, 2008

To volatile or not to volatile

Hamlet never had to worry about volatile specifiers, only existence.

UPDATE:There is a continuation on this topic here

The volatile keyword is something every developer has to deal with when working with interrupts or multiple threads. It's nature is quite simple: when present in a variable declaration it says to the compiler that the current executing code is not the only one which may change its value, so the compiler can't rely on a cached value (on a register for instance) or make any other assumption based on past values. On one hand this means that we should use this keyword if we expect certain variables to behave that way. On the other side it also means that much more code and memory access will be done than when using a normal (non-volatile) variable.

When first using volatile there is a myth about declaring everything shared between interrupts/threads and the main execution path or thread as volatile. Of course it will work but it can lead to slower and larger code and it may not be necessary, specially when coding inside individual functions with critical sections or mutexes/semaphores.

Here is a piece of code, supposed to be a dumb interrupt handler:


extern volatile unsigned int  x,y;
void __attribute__((interrupt ("IRQ"))) ISR_test(void)
{
if( x > 0 ) //optimized to ==0 since x is unsigned
{
  x = x + y;
}
else
  x = 2*y + x;
}

And the corresponding assembly listing for ARM7, compiling with gcc 4.2.2 optimization level 1:


2d7c:  push {r1, r2, r3}
2d80:  ldr r1, [pc, #68]  // r1 = &x;
2d84:  ldr r3, [r1]       // r3 = x; READ X
2d88:  cmp r3, #0         // r3 == 0 ?
2d8c:  beq 2da8           // decision
2d90:  ldr r2, [r1]       // r2 = x; READ X
2d94:  ldr r3, [pc, #52]  // r2 = &y
2d98:  ldr r3, [r3]       // r3 = y; READ Y
2d9c:  add r3, r3, r2     // r3 = r3 + r2;
2da0:  str r3, [r1]       // x = r3
2da4:  b 2dc4           // ready to return
2da8:  ldr r3, [pc, #32]  // r3 = &y
2dac:  ldr r3, [r3]       // r3 = y READ Y
2db0:  ldr r1, [pc, #20]  // r1 = &x
2db4:  ldr r2, [r1]       // r2 = x READ X
2db8:  lsl r3, r3, #1     // r3 = 2*r3
2dbc:  add r3, r3, r2     // r3 = r3 + r2
2dc0:  str r3, [r1]       // x = r3
2dc4:  pop {r1, r2, r3}      
2dc8:  subs pc, lr, #4     // 0x4
2dcc:  .word 0x40004900     // &x
2dd0:  .word 0x40004904     // &y

There are three instructions to read x's value and two to read y's. GCC did as we told, do not make any assumptions on the values. However we may know certain constraints which can make the use of the volatile keyword redundant, like knowing that nested interrupts are not enabled. That way nothing will interrupt our ISR routine. The same conclusions get to mind if we use semaphores, mutexes or critical sections whenever a thread tries to access the variables. There are some exceptions I will comment near the ending. If we remove the volatile specifiers from both x and y we get:


2d7c:   push {r1, r2, r3}
2d80:   ldr r1, [pc, #48]  // r1 = &x
2d84:   ldr r2, [r1]       // r2 = x
2d88:   cmp r2, #0         // r2 == 0?
2d8c:   ldrne r3, [pc, #40]  // only if neq
2d90:   ldrne r3, [r3]       // only if neq
2d94:   addne r3, r3, r2     // only if neq
2d98:   strne r3, [r1]       // only if neq
2d9c:   ldreq r3, [pc, #24]  // only if eq
2da0:   ldreq r3, [r3]       // only if eq
2da4:   lsleq r3, r3, #1     // only if eq
2da8:   ldreq r2, [pc, #8]   // only if eq
2dac:   streq r3, [r2]       // only if eq
2db0:   pop {r1, r2, r3}
2db4:   subs pc, lr, #4
2db8:   .word 0x40004900
2dbc:   .word 0x40004904

Looks like GCC is doing some black magic! The conditional store, add and load instructions help the compiler to avoid branches. The total number of instructions was reduced from 20 to 15. This is a simple example but on a complex one register popping/pushing due to variable volatileness can slow down things even more. As said before, we may need the volatile specifier on certain situations, but if we know when to avoid it then final code can be much more concise.
Recalling the mutexes here is another example:


extern void TakeMutex(void);
extern void GiveMutex(void);

int x,y;

int getSum( void ) {
int temp;
/* This should be in the critical section,
 * however it's here to show what happens */
if ( x < 0 )
  return 0;
if ( y < 0 )
  return 0;

TakeMutex();

  temp = x + y;

GiveMutex();

return temp;
}

Note that line 10 and line 12 have atomic reads since this is a 32-bit architecture and int is 32-bit for this compiler. Either one or both of them are unlikely to be true for 8-bit and 16-bit processors, so it's not a good practice if we're looking for portability. Assembly output results in:


2e14: push {r4, r5, lr}
2e18: ldr r5, [pc, #64]
2e1c: ldr r3, [r5]
2e20: cmp r3, #0 ; 0x0
2e24: blt 2e50
2e28: ldr r4, [pc, #52]
2e2c: ldr r3, [r4]
2e30: cmp r3, #0 ; 0x0
2e34: blt 2e50
2e38: bl 2d7c (takemutex)
2e3c: ldr r2, [r4]
2e40: ldr r3, [r5]
2e44: add r4, r2, r3
2e48: bl 2d80 (givemutex)
2e4c: b 2e54
2e50: mov r4, #0 ; 0x0
2e54: mov r0, r4
2e58: pop {r4, r5, lr}
2e5c: bx lr
2e60: .word 0x40004900
2e64: .word 0x40004904

Since TakeMutex() and GiveMutex() are proper functions (defined somewhere else) GCC doesn't know what they will or won't do to x and y, so the code will read them again after the function calls. The only values that were cached are the variable's addresses, which of course won't change.
However, if TakeMutex() and GiveMutex() are macros we may get into trouble:


#define TakeMutex() asm volatile("nop")
#define GiveMutex() asm volatile("nop")

int getSum( void ) {
int temp;
/* This should be in the critical section,
 * however it's here to show what happens */
if ( x < 0 )
  return 0;
if ( y < 0 )
  return 0;

TakeMutex();

  temp = x + y;

GiveMutex();

return temp;

I accept that a nop won't protect anything, it's just a snippet. Here is the resulting assembly code:


2dcc:  ldr r3, [pc, #48]
2dd0:  ldr r2, [r3]
2dd4:  cmp r2, #0
2dd8:  blt 2dfc
2ddc:  ldr r3, [pc, #36]
2de0:  ldr r0, [r3]
2de4:  cmp r0, #0 ; 0x0
2de8:  blt 2dfc
2dec:  nop
2df0:  add r0, r0, r2
2df4:  nop
2df8:  bx lr
2dfc:  mov r0, #0 ; 0x0
2e00:  bx lr
2e04:  .word 0x40004900
2e08:  .word 0x40004904

Values were cached so it will be a mess, a difficult error to track too. We can avoid this by declaring the variables as volatile.

We can say that we can avoid all this problems by declaring everything 'suspicious' as volatile, but if we were to optimize code and make it really tight while still programming in C then the non-volatile approach is valid too, given that we know what we are doing and the assumptions on the variables' types and values.

UPDATE:There is a continuation on this topic here

Wednesday, November 19, 2008

Writing portable Embedded code - Pin portability

EDIT: I've posted a C++ alternative using templates here.

Portability can be hard on embedded systems. The fact that we can code in C most of the time doesn't mean code is portable. Even worse, it means that you may have to rewrite many functions/macros in order to get the 'same' code to work on another platform/chip.

A nice approach is to write a simple yet powerful HAL (Hardware Abstraction Layer), sometimes called a driver. There are many protocols and peripherals whose functions are nearly the same between different vendors and chips, such as I2C, SPI, UART,MCI controllers. For SPI we would code at least three functions: SPI_init(), SPI_tx() and SPI_rx(). Actually SPI_tx() and SPI_rx() might be the same due to SPI's full duplex capability. There might be other functions or macros to control the chip select line. This approach works nice for standard peripherals, but we may need to access port pins individually to perform some task or communicate to a device by bit-banging.

Most LCD character displays work the same way and use the same protocol and control lines. A 'universal' C module for character LCD control sounds great, but pin compatibility should be addressed first, it's not attractive if we need to change tens of lines to port it.

We'll first define some useful string-concatenating macros:


#define    _CAT3(a,b,c)  a## b ##c

#define    CAT3(a,b,c)   _CAT3(a,b,c)

#define    _CAT2(a,b)    a## b

#define    CAT2(a,b)    _CAT2(a,b)

The macros are called twice to ensure tokens are preprocessed as we want them. Kernighan and Ritchie's wonderful C book has good information about how that works. These macros can be defined in a global header file so they can be included whenever they're needed.

Basic operations on port pins include setting a pin as output or input, clearing a bit, setting a bit and reading it's value when configured as input. Given that I defined another header file which looks like this, specially made for the Philips LPC23xx family (ARM7):


/* Set bit */
#define FPIN_SET(port,bit)  CAT3(FIO,port,SET) = (1<<(bit))
#define FPIN_SET_(port_bit) FPIN_SET(port_bit)


/* Clear bit */
#define FPIN_CLR(port,bit)  CAT3(FIO,port,CLR) = (1<<(bit))
#define FPIN_CLR_(port_bit) FPIN_CLR(port_bit)


/* Set as input */
#define FPIN_AS_INPUT(port,bit)  CAT3(FIO,port,DIR) &=~(1<<(bit))
#define FPIN_AS_INPUT_(port_bit) FPIN_AS_INPUT(port_bit)

/* Set as output */
#define FPIN_AS_OUTPUT(port,bit)  CAT3(FIO,port,DIR) |= (1<<(bit))
#define FPIN_AS_OUTPUT_(port_bit) FPIN_AS_OUTPUT(port_bit)


/* when used as input */
#define FPIN_ISHIGH(port,bit)  ( CAT3(FIO,port,PIN) &  (1<<(bit)) )
#define FPIN_ISHIGH_(port_bit) FPIN_ISHIGH(port_bit)

/* returns !=0 if pin is LOW */
#define FPIN_ISLOW(port,bit)  (!( CAT3(FIO,port,PIN)& (1<<(bit)) ))
#define FPIN_ISLOW_(port_bit) FPIN_ISLOW(port_bit)

Done this we can set bit 2.1 by ussuing FPIN_SET(2,1), or clear it by doing FPIN_CLR(2,1). The functions ending with an underscore are meant to be used when pin position is given as a #define macro, such as:


#define LEDA 2,1
#define LEDB 2,1

FPIN_AS_OUTPUT_( LEDA );
FPIN_SET_( LEDA );
FPIN_CLR_( LEDB );

I agree this may sound complicated, but by defining all these functions it's possible to manipulate all port pins easily and in a portable way. If we want to change the pin or port LEDA is using we only need to change it once, the macros will take care of it.

If we were to do the same on an AVR it's a question of changing the macros as shown below. Don't forget ports are named with letters (A,B,C,D...) rather than numbers.


/* Set bit */
#define FPIN_SET(port,bit)  CAT2(PORT,port) |= (1<<(bit))
#define FPIN_SET_(port_bit) FPIN_SET(port_bit)


/* Clear bit */
#define FPIN_CLR(port,bit)  CAT2(PORT,port) &=~(1<<(bit))
#define FPIN_CLR_(port_bit) FPIN_CLR(port_bit)


/* Set as input */
#define FPIN_AS_INPUT(port,bit)  CAT2(DDR,port) &= ~(1<<(bit))
#define FPIN_AS_INPUT_(port_bit) FPIN_AS_INPUT (port_bit)

/* Set as output */
#define FPIN_AS_OUTPUT(port,bit)  CAT2(DDR,port) |= (1<<(bit))
#define FPIN_AS_OUTPUT_(port_bit) FPIN_AS_OUTPUT(port_bit)


/* when used as input */
#define FPIN_ISHIGH(port,bit)  CAT2(PIN,port) & (1<<(bit)))
#define FPIN_ISHIGH_(port_bit) PIN_ISHIGH(port_bit)


/* returns !=0 if LOW */
#define FPIN_ISLOW(port,bit)  (!( CAT2(PIN,port) & (1<<(bit))) )
#define FPIN_ISLOW_(port_bit) FPIN_ISLOW(port_bit)

Now the LCD routines are really portable. Minor changes might be needed if there are other pin registers to modify, but the basic pin functionality is covered by the macros defined above.

Tuesday, November 18, 2008

FreeRTOS - Coding _real_ real-time interrupts

Ever since I discovered the RTOS world I've been amazed how fast, simple and beautiful coding can become.

Using an RTOS for devices with hard real-time constraints can get quite difficult, specially if the RTOS nature is not known by the developer, particularly when dealing with real time interrupts where latency plays an important role and has to be kept as low as possible.

Successful semaphore and queue synchronization (among other RTOS facilities) require interrupts to be disabled during some processing. Interrupt latency will vary depending on how often those resources are checked. If we want to make sure certain interrupts are executed as fastest as posible we must provide them a way to be processed, independently of how the RTOS and our application is behaving.

Things get worse when a TCP/IP stack gets to interact with the RTOS' tasks. As an example, lwIP is able to work with an RTOS like FreeRTOS. To do so we need to let lwIP define critical sections, if we don't we risk system stability.
Other data processing tasks or code might need critical sections too. If that involves disabling interrupts then interrupt latency will increase. If that happens we may lose the first two letters from 'RTOS'.

A trick to overcome this is to use interrupt priorities (if available) so that critical-time interrupts are placed on a priority group, while normal (ie: RTOS) interrupts are on another priority group. As an example, the ARM7 LPC2xxx family from Philips has an interrupt priority mask register where individual interrupt priority groups can be masked or unmasked.

There is an importan issue to consider: don't use RTOS queues/semaphores/etc from interrupts which are not disabled by the RTOS (critical-timing interrupts). There won't be any atomic protections nor critical sections for them.

This might look as if we would be loosing the great advantages of using an RTOS, but usually that can be solved by implementing manual queues if they're needed. Also, if realtime is such an issue on those interrupts, context-switching it's very likely to be a problem too. That means the action is to be taken directly from the interrupt, if possible.

If the interrupt belongs to an encoder the only action to take is to increment or decrement a variable. If you need to provide audio data to a DAC or another peripheral you just send data from an array (usually a double-buffered one) to the corresponding register.

The only change to be made to FreeRTOS is inside the port's code, in particular the macros entitled to disable and re-enable interrupts in portmacro.h. If thumb mode is needed portISR.c needs to be changed too.


#define portDISABLE_INTERRUPTS()    do{ VICSWPrioMask    = (1<<1); } while(0)
#define portENABLE_INTERRUPTS()        do{  VICSWPrioMask    = 0xFFFF; } while(0)

It's important to remember that we don't need to save nor restore any context information from our critical interupt handlers, since they won't interact with the RTOS (at least not directly). This ISR is coded as any ISR without RTOS. That makes it faster than the ISRs that need queue or semaphore management. Of course that real critical sections are needed when sharing information with our ISR, but that can be done by disabling all the interrupts, just like portDISABLE_INTERRUPTS() did before we changed it.

UPDATE: Later I found that portDISABLE_INTERRUPTS() and portENABLE_INTERRUPTS() are not the only macros used for interrupt enabling/disabling. There are other functions named vPortEnterCritical() and vPortExitCritical() which are extensibly used through the FreeRTOS code, so those should be changed too. However, I haven't tried this yet.