Endian solution for C
1 November 2011, by Berwyn add a comment
Are numbers stored big-end or little-end first? C programmers have to byte-swap manually to deal with this. Portable code becomes icky. The full ickiness is illustrated in Intel’s excellent Endianness White Paper.
But in C there is no reason the compiler can’t do the hard work and make programs both portable and pretty. Here we present quick hack, and also a solution requiring a minor change to the C compiler. First the quick hack.
Quick Hack
Suppose we’re dealing with a USB protocol setup packet:
struct setup {
char bmRequestType, bRequest;
short wValue, wIndex, wLength;
};
Since USB is little-endian, the short integers will work on a an X86 machine, but if you have the job of porting to a big-endian ARM, you’ll need to byte-swap each of these values every time they’re accessed. This could be a lot of code rework.
One quick-n-dirty way to accomplish this is to simply store the USB packet in reverse order as it comes in your door (or write a reversal function). Then define your struct in reverse order:
struct setup {
short wLength, wIndex, wValue;
char bRequest, bmRequestType;
};
Note that this will only work if you don’t have strings in your struct, as C’s string library functions don’t expect strings to be in reverse order!
A Real Solution
This solution has the C compiler dealing with the whole endian issue for you – making your program totally portable with zero effort. It would require the addition of a single type modifier to the C compiler (C compilers already have various type modifiers). To solve endian portability, this addition would be well worth it.
This concept is similar to the ‘signed’ and ‘unsigned’ access type modifiers or the GNU C compiler’s packed attribute which helps to access a protocol’s data by letting you prevent padding between structure elements.
Our example above would become:
struct setup {
char bmRequestType, bRequest;
little_endian short wValue, wIndex, wLength;
};
All that is needed is to be able to specify the endian nature in a type modifier: littleendian or bigendian. In the above an X86 compiler would knows to ignore the modifier since it’s already little_endian. But a big-endian ARM compiler would know to byte-swap for you upon reading or writing.
The same would work for pointers:
little_endian long *x, *y;
Whenever x or y is accessed, the bytes are swapped by the compiler. You can even cast a standard long pointer to a little_endian long to force a compiler byte-swap upon access.
Internally, the compiler would probably implement just a single byte-swap type modifier which it would apply to all non-native accesses. But for portability and clarity, this should be spelled out as little_ or big_endian in the source.
It should also be noted that this same solution solves the endian problem for bit fields and bit masks (White Paper p12).
We are not C compiler gurus, so I’m not going to risk adding this change to my compiler. But I put it “out there” for comment, and hopeful uptake. I’m guessing this should go into GCC first, and then others will gradually follow suit.



