Thursday, November 20, 2008

To volatile or not to volatile

Hamlet never had to worry about volatile specifiers, only existence.

UPDATE:There is a continuation on this topic here

The volatile keyword is something every developer has to deal with when working with interrupts or multiple threads. It's nature is quite simple: when present in a variable declaration it says to the compiler that the current executing code is not the only one which may change its value, so the compiler can't rely on a cached value (on a register for instance) or make any other assumption based on past values. On one hand this means that we should use this keyword if we expect certain variables to behave that way. On the other side it also means that much more code and memory access will be done than when using a normal (non-volatile) variable.

When first using volatile there is a myth about declaring everything shared between interrupts/threads and the main execution path or thread as volatile. Of course it will work but it can lead to slower and larger code and it may not be necessary, specially when coding inside individual functions with critical sections or mutexes/semaphores.

Here is a piece of code, supposed to be a dumb interrupt handler:



extern volatile unsigned int x,y;
void __attribute__((interrupt ("IRQ"))) ISR_test(void)
{
if( x > 0 ) //optimized to ==0 since x is unsigned
{
x = x + y;
}
else
x = 2*y + x;
}


And the corresponding assembly listing for ARM7, compiling with gcc 4.2.2 optimization level 1:

2d7c: push {r1, r2, r3}
2d80: ldr r1, [pc, #68] // r1 = &x;
2d84: ldr r3, [r1] // r3 = x; READ X
2d88: cmp r3, #0 // r3 == 0 ?
2d8c: beq 2da8 // decision
2d90: ldr r2, [r1] // r2 = x; READ X
2d94: ldr r3, [pc, #52] // r2 = &y
2d98: ldr r3, [r3] // r3 = y; READ Y
2d9c: add r3, r3, r2 // r3 = r3 + r2;
2da0: str r3, [r1] // x = r3
2da4: b 2dc4 // ready to return
2da8: ldr r3, [pc, #32] // r3 = &y
2dac: ldr r3, [r3] // r3 = y READ Y
2db0: ldr r1, [pc, #20] // r1 = &x
2db4: ldr r2, [r1] // r2 = x READ X
2db8: lsl r3, r3, #1 // r3 = 2*r3
2dbc: add r3, r3, r2 // r3 = r3 + r2
2dc0: str r3, [r1] // x = r3
2dc4: pop {r1, r2, r3}
2dc8: subs pc, lr, #4 // 0x4
2dcc: .word 0x40004900 // &x
2dd0: .word 0x40004904 // &y

There are three instructions to read x's value and two to read y's. GCC did as we told, do not make any assumptions on the values. However we may know certain constraints which can make the use of the volatile keyword redundant, like knowing that nested interrupts are not enabled. That way nothing will interrupt our ISR routine. The same conclusions get to mind if we use semaphores, mutexes or critical sections whenever a thread tries to access the variables. There are some exceptions I will comment near the ending. If we remove the volatile specifiers from both x and y we get:



2d7c: push {r1, r2, r3}
2d80: ldr r1, [pc, #48] // r1 = &x
2d84: ldr r2, [r1] // r2 = x
2d88: cmp r2, #0 // r2 == 0?
2d8c: ldrne r3, [pc, #40] // only if neq
2d90: ldrne r3, [r3] // only if neq
2d94: addne r3, r3, r2 // only if neq
2d98: strne r3, [r1] // only if neq
2d9c: ldreq r3, [pc, #24] // only if eq
2da0: ldreq r3, [r3] // only if eq
2da4: lsleq r3, r3, #1 // only if eq
2da8: ldreq r2, [pc, #8] // only if eq
2dac: streq r3, [r2] // only if eq
2db0: pop {r1, r2, r3}
2db4: subs pc, lr, #4
2db8: .word 0x40004900
2dbc: .word 0x40004904

Looks like GCC is doing some black magic! The conditional store, add and load instructions help the compiler to avoid branches. The total number of instructions was reduced from 20 to 15. This is a simple example but on a complex one register popping/pushing due to variable volatileness can slow down things even more. As said before, we may need the volatile specifier on certain situations, but if we know when to avoid it then final code can be much more concise.
Recalling the mutexes here is another example:



extern void TakeMutex(void);
extern void GiveMutex(void);

int x,y;

int getSum( void ) {
int temp;
/* This should be in the critical section,
* however it's here to show what happens */
if ( x < 0 )
return 0;
if ( y < 0 )
return 0;

TakeMutex();

temp = x + y;

GiveMutex();

return temp;
}


Note that line 10 and line 12 have atomic reads since this is a 32-bit architecture and int is 32-bit for this compiler. Either one or both of them are unlikely to be true for 8-bit and 16-bit processors, so it's not a good practice if we're looking for portability. Assembly output results in:




2e14: push {r4, r5, lr}
2e18: ldr r5, [pc, #64]
2e1c: ldr r3, [r5]
2e20: cmp r3, #0 ; 0x0
2e24: blt 2e50
2e28: ldr r4, [pc, #52]
2e2c: ldr r3, [r4]
2e30: cmp r3, #0 ; 0x0
2e34: blt 2e50
2e38: bl 2d7c (takemutex)
2e3c: ldr r2, [r4]
2e40: ldr r3, [r5]
2e44: add r4, r2, r3
2e48: bl 2d80 (givemutex)
2e4c: b 2e54
2e50: mov r4, #0 ; 0x0
2e54: mov r0, r4
2e58: pop {r4, r5, lr}
2e5c: bx lr
2e60: .word 0x40004900
2e64: .word 0x40004904

Since TakeMutex() and GiveMutex() are proper functions (defined somewhere else) GCC doesn't know what they will or won't do to x and y, so the code will read them again after the function calls. The only values that were cached are the variable's addresses, which of course won't change.
However, if TakeMutex() and GiveMutex() are macros we may get into trouble:



#define TakeMutex() asm volatile("nop")
#define GiveMutex() asm volatile("nop")

int getSum( void ) {
int temp;
/* This should be in the critical section,
* however it's here to show what happens */
if ( x < 0 )
return 0;
if ( y < 0 )
return 0;

TakeMutex();

temp = x + y;

GiveMutex();

return temp;


I accept that a nop won't protect anything, it's just a snippet. Here is the resulting assembly code:



2dcc: ldr r3, [pc, #48]
2dd0: ldr r2, [r3]
2dd4: cmp r2, #0
2dd8: blt 2dfc
2ddc: ldr r3, [pc, #36]
2de0: ldr r0, [r3]
2de4: cmp r0, #0 ; 0x0
2de8: blt 2dfc
2dec: nop
2df0: add r0, r0, r2
2df4: nop
2df8: bx lr
2dfc: mov r0, #0 ; 0x0
2e00: bx lr
2e04: .word 0x40004900
2e08: .word 0x40004904

Values were cached so it will be a mess, a difficult error to track too. We can avoid this by declaring the variables as volatile.


We can say that we can avoid all this problems by declaring everything 'suspicious' as volatile, but if we were to optimize code and make it really tight while still programming in C then the non-volatile approach is valid too, given that we know what we are doing and the assumptions on the variables' types and values.


UPDATE:There is a continuation on this topic here

No comments:

Post a Comment