Structure Padding and Alignment
The order in which you place a structure’s member’s effects, its size and memory layout

The essence of why this is the case, relates to the concept of memory alignment and the compiler's mission to make memory access as efficient as possible. The way in which the compiler chooses to make memory access more efficient, greatly depends on the architecture that's being used.

Let's take a look at a couple examples:

// I might assume that the total size is 8 bytes, but that will be false.
struct example {
  char a;  // 1 byte
  short b; // 2 bytes
  char c;  // 1 byte
  int d;   // 4 bytes
};

// Outputs: The size of structure example is: 12 bytes.
printf("The size of structure example is: %lu bytes.\n", sizeof(struct example));

Based on the output, you can see that this structure has an additional 4 bytes. That should be unexpected, if you're unfamiliar with memory alignment. Now let's make a small adjustment by moving member b below member c and recheck the output.

struct example {
  char a; // 1 byte
  char c; // 1 byte
  short b; // 2 bytes
  int d; // 4 bytes
};

// Outputs: The size of structure example is: 8 bytes.
printf("The size of structure example is: %lu bytes.\n", sizeof(struct example));

The structure's size is now 4 bytes less. As you can see, the way in which you order a structures members can have a dramatic effect on memory usage, especially if your dealing with thousands of instances.

Truthfully speaking, it's not all that important for you to understand exactly how the compiler is going to align your structure in memory, especially as a beginner. As you can always just perform manual checks to find the most optimal ordering for your use case. Also, the alignment of a structure's member's depends heavily on the platform. So manually optimizing a structure, won't generalize to all platforms.

Knowing exactly how the compiler chooses to align a structure members can be difficult to determine without a deep understanding of the architecture your using. But I think the following quote from The Lost Art of Structure Packing, can help you build an intuition for most cases.

Each type except char has an alignment requirement; chars can start on any byte address, but 2-byte shorts must start on an even address, 4-byte ints or floats must start on an address divisible by 4, and 8-byte longs or doubles must start on an address divisible by 8. The jargon for this is that basic C types on a vanilla ISA (Instruction Set Architectures) are self-aligned. Pointers, whether 32-bit (4-byte) or 64-bit (8-byte) are self-aligned too. – Eric S. Raymond

To better understand the above statement, let's look at the memory layout of the above two examples, and break down the memory alignment checks.

// 12 byte struct layout.
struct example {
  char a; // 1 byte
  short b; // 2 bytes
  char c; // 1 byte
  int d; // 4 bytes
};

// Addresses +0 +1 +2 +3
// -------------------------------------------
// 7156 0x41 0x7f 0x64 0x00 A?d?
// 7160 0x41 0xe0 0xff 0xff A???
// 7164 0x64 0x00 0x00 0x00 d???

struct example test = {'A', 100, 'A', 100};
// 0x41, [0x64,0x00], 0x41, [0x64,0x00,0x00,0x00]

So basically the 4-bytes that were skipped in favor of memory alignment is referred to as padding.

// 8 byte struct layout.
struct example {
  char a; // 1 byte
  char c; // 1 byte
  short b; // 2 bytes
  int d; // 4 bytes
};

// Addresses +0 +1 +2 +3
// --------------------------------------------
// 7156 0x41 0x41 0x64 0x00 AAd?
// 7160 0x64 0x00 0x00 0x00 d???

struct example test = {'A', 'A', 100, 100};
// 0x41, 0x41, [0x64,0x00],[0x64,0x00,0x00,0x00]

The theory of memory alignment in most ISA can also be put this way:

Unaligned memory accesses occur when you try to read N bytes of data starting from an address that is not evenly divisible by N (i.e. addr % N ! 0)=. For example, reading 4 bytes of data from address 0x10004 is fine, but reading 4 bytes of data from address 0x10005 would be an unaligned memory access.

Before closing this topic, it's also interesting to think of an array of structures. In C, array's items are placed side by side, allowing us to access items by size offsets, as an array can only contain items of the same type.

Looking at the previous memory layouts, it's clear that both our structures can be placed side by side and all their individual members will still be aligned correctly.

Author: Vernon Grant (info@vernon-grant.com)

Updated on: 2023-09-09 Sat 07:34

Emacs 29.1 (Org mode 9.6.6)