Portability Tips

Back

The C programming language has enjoyed remarkable success as a "portable assembly language", although of course one should not take that phrase too literally. But C programs are not automatically portable. The programmer who wishes his or her programs to be portable must take a modicum of time and trouble to make them so. As Kernighan and Ritchie point out in the introduction to The C Programming Language, "With a little care it is easy to write portable programs that can be run without change on a variety of hardware. The standard makes portability issues explicit, and prescribes a set of constants that characterize the machine on which the program is run."

Character Sets

Not all machines use EBCDIC! Some computers use an alternative character set known as ASCII, and others use Unicode or other "wide" character sets. Nor are these the only possibilities that C acknowledges. The C Standard does allow us to make a few basic assumptions about character representations, though.

Firstly, we can be absolutely sure that the following characters are available to us in the source and execution character sets (although not necessarily in the order that is given here):


  A  B  C  D  E  F  G  H  I  J  K  L  M
  N  O  P  Q  R  S  T  U  V  W  X  Y  Z
  a  b  c  d  e  f  g  h  i  j  k  l  m
  n  o  p  q  r  s  t  u  v  w  x  y  z
  0  1  2  3  4  5  6  7  8  9
  !  "  #  %  &  '  (  )  *  +  ,  -  .  /  :
  ;  <  =  >  ?  [  \  ]  ^  _  {  |  }  ~

as well as "the space character, and control characters representing horizontal tab, vertical tab, and form feed". In the execution character set, we are also guaranteed characters to represent "alert, backspace, carriage return, and new line".

The Standard guarantees that "the value of each character after 0 in the above list of decimal digits shall be one greater than the value of the previous", which gives us a useful way of converting any character into a number, provided we know that that character represents a decimal digit:


  n = ch - '0';

and of course a way to reverse the process (provided we know that n is in the range 0 to 9):


  ch = n + '0';

We are further assured that the values ("coding points") of all of the above characters are positive. This in itself turns out to be highly useful information!

Whilst it is true that the digits are guaranteed to have coding points that are sequential and in a sane order, the same is not true, alas, for the letters of the alphabet. We should take care not to assume such an ordering if we wish our code to be portable to character sets in which the ordering does not exist.

Bits and Bytes

C guarantees that a char is exactly one byte in size. It does not guarantee that a byte is exactly eight bits in size, though. Fortunately for our sanity, however, we are assured that there are precisely CHAR_BIT bits in a byte, and that CHAR_BIT must be at least 8 (although it might be higher). CHAR_BIT is defined in <limits.h>.

Incidentally, there really are systems with bytes that are wider than 8 bits. For example, typical modern digital signal processor (DSP) boards have 16- or even 32-bit bytes. Such boards are often used in devices such as set-top boxes. C is commonly used in such environments.

Guaranteed Minimum Data Type Sizes and Ranges

Don't assume that int is a 32-bit type! The following table gives minimum data type sizes (in bits) and guaranteed value ranges. I give the sizes in bits because the minimum size in bytes varies, depending on the size of a byte. In general, if a type requires at least N bits, it will need at least (N + CHAR_BIT - 1) / CHAR_BIT bytes (with any remainder being ignored). Some implementations could conceivably require more bytes than this expression would imply. In any event, sizeof(type) gives the right answer!

Type Bits Low High
char80127 See Note.
unsigned char80255
signed char8-127127
unsigned short16065535
signed short16-3276732767
unsigned int16065535
signed int16-3276732767
unsigned long3204294967295
signed long32-21474836472147483647
unsigned long long64018446744073709551615C99 only
signed long long64-92233720368547758079223372036854775807C99 only

Note: if char is signed by default, it has the same range guarantees as a signed char. If char is unsigned by default, it has the same range guarantees as an unsigned char. Truly portable code will not assume either option; rather, it will work on the assumption that either option might be in force (one of them must be, of course).

In particular, we should not assume that sizeof(int) is 4. On MS-DOS, it's typically 2. On a Cray, it's 8 (as far as I know). On some DSPs, it's 1 (which implies, of course, that CHAR_BIT is >= 16 on those DSPs).

Implementation Namespace

The Standard reserves rather more identifiers to the implementation than most people realise. Whilst specifying the exact rules here would be possible, it would be terribly tedious (and anyway, that's what the Standard is for!).

Instead, I'm going to give you some rules of thumb that should keep you out of naming trouble. These guidelines are more restrictive than the Standard requires, because they are designed to be easy to remember.

Avoid using any identifier that starts with:

  1. An underscore character
  2. The letters str
  3. The letters mem
  4. The letters is
  5. The letters to
  6. The letter E

Life would be much simpler if all of C's reserved identifiers started with an underscore but, alas, they don't. Hence the other rules. Now, I know that it's sometimes tempting to use a leading underscore, but_ I_ think_ it_ may_ be_ possible_ to_ come_ up_ with_ an_ alternative_.

In fact, the easiest way to skip around the implementation's namespace is to define your own. Alas, C doesn't have a C++-style namespace feature; consequently, we can't define our own namespaces quite as elegantly as we might wish. Nevertheless, there is a crude, but effective, way to fence off a namespace: the use of a prefix.

For my CLINT library (the library With No Name), I used the prefix wnn_ for all functions and publicly visible objects (e.g. parameters, which could be seen in the headers, so they're kinda visible), and WNN_ for all macros, types, and type synonyms.

Whilst it's true that someone using the CLINT library might also use some other library that uses the same prefixes, it's not terribly likely; and even if it does happen, the fix is relatively obvious (a massive, automated, search and replace operation, to substitute a fresh prefix).

(Incidentally, CLINT is no longer available. The design sucked.)

Just to warn you off them, I also use psl_/PSL_ (for my "Portable"(ish) Sockets Library), and rjh_/RJH_, for stuff that doesn't belong in any of the above. I am also tempted by pgl_/PGL_ ("Portable"(ish) Graphics Library), but I must admit that I haven't actually written one yet. If I do, though, that's the prefix I'll use for it.

Linkers

Um, I'm not sure how you're going to take this. The original C Standard recognises the existence of linkers that require external identifiers (basically, functions and file scope objects with external linkage) to be unique in the first six characters! And even linkers that are case-insensitive. Frankly, rather than make your code almost unreadable in a bid to satisfy these Draconian limitations, I have a much better idea -- get a better linker. But if you can't do that, then you are going to be very much more restricted than most people when it comes to naming functions. (On this site, I have more or less ignored such foolishness.) The C99 standard requires linkers to be a bit less restrictive.

Abstracting Non-Portable Behaviour

When you have to use non-portable constructs, it can be helpful to encapsulate them into a library, so that your application code can be written completely portably. All you have to do, for each new platform, is to rewrite the library.

Let's take as a very simple example the problem of finding the length of a file. This isn't actually as simple as it sounds, because the concept of "file length" is more complex than most people give it credit for, but we'll assume for the purposes of this example that we simply mean the number that DIR (MS-DOS, and Windows console) or ls -al (Linux) will give you for a given file.

We will further assume that we can identify the file via a filename, and our file-length-getting function will accept this filename as a parameter (rather than a pointer to a FILE structure, as this lets us off the particularly nasty hook of giving the false impression that we could report the length of, say, stdout).

Let's look at how we might get the file length in a Windows console program:


FILE *fp = fopen(filename, "rb");
if(fp != NULL)
{
  length = _filelength(_fileno(fp));
  fclose(fp);
}

How about Linux?


struct stat MyStat = {0};
if(0 == stat(filename, &MyStat))
{
  length = MyStat.st_size;
}

(We'll ignore the Windows API version, since it won't add anything significant to the discussion.)

Now, we don't really want to pepper our code with N different versions of the above code, for N different compilers, do we? But the solution is very simple. In fact, there are at least two, and I'll describe them both here. Both have one thing in common, though, so I'll deal with that first.

The principle that is common to both is this: we define a new function (which I'll call apc_GetFileLength, with apc_ standing for "Abstracted Platform Code"). This function is placed into a source file, which it may share with other functions that abstract other non-portable features. The interface (in this case, just a function prototype) is placed into a header, of course. Let's assume that our header is called "apc.h", and naturally our source will be "apc.c".

In each case, "apc.h" will look a bit like this:


#ifndef H_APC
#define H_APC 1
long apc_GetFileLength(const char *apc_Filename);
#endif

(If you're quick, you'll have spotted that I have assumed the length of the file can be represented in a signed long int, which may not actually be true. I will leave that problem as an exercise for the graduate student.)

Conditional Preprocessing

This is my least favourite option. It goes like this:


/*
 * apc.c - various abstractions of platform-specific code
 *
 * apc_GetFileLength() - get the length of a file (-1 on error)
 */

#include ""apc.h""

#ifdef _WIN32
#include <stdio.h>
#include <io.h>

long apc_GetFileLength(const char *apc_Filename)
{
  long length = -1;
  FILE *fp = fopen(apc_Filename, "rb");
  if(fp != NULL)
  {
    length = _filelength(_fileno(fp));
    fclose(fp);
  }
  return length;
}
#else
#ifdef __GLIBC__
#include <sys/types.h>
#include <sys/stat.h>
#include <unistd.h>

long apc_GetFileLength(const char *apc_Filename)
{
  long length = -1;
  struct stat MyStat = {0};
  if(0 == stat(apc_Filename, &MyStat))
  {
    length = MyStat.st_size;
  }
  return length;
}
#else
/* unsupported platform - default to the long, painful way */
#include <stdio.h>
#include <limits.h>
long apc_GetFileLength(const char *apc_Filename)
{
  long length = 0;
  FILE *fp = fopen(apc_Filename, "rb");
  if(fp != NULL)
  {
    while(length <LONG_MAX && getc(fp) != EOF)
    {
      ++length;
    }
    if(!feof(fp)) /* whoa, too big to measure! */
    {
      length = -1;
    }
    fclose(fp);
  }
  else
  {
    length = -1;
  }
  return length;
}
#endif
#endif

Separate Source Files For Each Platform

Personally, I think that littering source code with preprocessor directives is ugly, and hard to maintain. I much prefer separating out the code that belongs to, say, the Linux platform, and putting that code into the Linux version of the apc.c file. Then I separate out the code that is intended to work on the Windows platform, and put that code in the Windows version of the apc.c file. And so on.

Doing things this way involves a little more care in file management, but (at least in my opinion!) gives much more readable code.

Using the Abstraction Layer

Once you have your header and your source file(s), how do you use them? Well, it's pretty simple. Compile the source file into an object file (which you will probably wish to add to a library at some point), and then link the object file (or library) to whichever programs you wish.

All you have to do in your program source is #include "apc.h" at the top somewhere, and call apc_GetFileLength(), passing it the name of the file whose length you wish to know. From now on, it really doesn't matter which platform you are working on, as long as your library is available on that platform. And if you aren't able to write a platform-specific version of the function for a given platform, well, you can always use the grindingly slow default version presented above. The important point is that your application logic doesn't change, so porting becomes easier. And that's the name of the game.


I'm bound to think of some more stuff to do with portability -- in due course.