Aliasing and effective type
Remarks#
Violations of aliasing rules and of violating the effective type of an object are two different things and should not be confounded.
-
Aliasing is the property of two pointers
a
andb
that refer to the same object, that is thata == b
. -
The effective type of a data object is used by C to determine which operations can be done on that object. In particular the effective type is used to determine if two pointers can alias each other.
Aliasing can be a problem for optimization, because changing the object through one pointer, a
say, can change the object that is visible through the other pointer, b
. If your C compiler would have to assume that pointers could always alias each other, regardless of their type and provenance, many optimization opportunities would be lost, and many programs would run slower.
C’s strict aliasing rules refers to cases in the compiler may assume which objects do (or do not) alias each other. There are two rules of thumb that you always should have in mind for data pointers.
Unless said otherwise, two pointers with the same base type may alias.
Two pointers with different base type cannot alias, unless at least one of the two types is a character type.
Here base type means that we put aside type qualifications such as const
, e.g. If a
is double*
and b
is const double*
, the compiler must generally assume that a change of *a
may change *b
.
Violating the second rule can have catastrophic results. Here violating the strict aliasing rule means that you present two pointers a
and b
of different type to the compiler which in reality point to the same object. The compiler then may always assume that the two point to different objects, and will not update its idea of *b
if you changed the object through *a
.
If you do so the behavior of your program becomes undefined. Therefore, C puts quite severe restrictions on pointer conversions in order to help you to avoid such situation to occur accidentally.
Unless the source or target type is
void
, all pointer conversions between pointers with different base type must be explicit.
Or in other words, they need a cast, unless you do a conversion that just adds a qualifier such as const
to the target type.
Avoiding pointer conversions in general and casts in particular protects you from aliasing problems. Unless you really need them, and these cases are very special, you should avoid them as you can.
Character types cannot be accessed through non-character types.
If an object is defined with static, thread, or automatic storage duration, and it has a character type, either: char
, unsigned char
, or signed char
, it may not be accessed by a non-character type. In the below example a char
array is reinterpreted as the type int
, and the behavior is undefined on every dereference of the int
pointer b
.
int main( void )
{
char a[100];
int* b = ( int* )&a;
*b = 1;
static char c[100];
b = ( int* )&c;
*b = 2;
_Thread_local char d[100];
b = ( int* )&d;
*b = 3;
}
This is undefined because it violates the “effective type” rule, no data object that has an effective type may be accessed through another type that is not a character type. Since the other type here is int
, this is not allowed.
Even if alignment and pointer sizes would be known to fit, this would not exempt from this rule, behavior would still be undefined.
This means in particular that there is no way in standard C to reserve
a buffer object of character type that can be used through pointers
with different types, as you would use a buffer that was received by
malloc
or similar function.
A correct way to achieve the same goal as in the above example would
be to use a union
.
typedef union bufType bufType;
union bufType {
char c[sizeof(int[25])];
int i[25];
};
int main( void )
{
bufType a = { .c = { 0 } }; // reserve a buffer and initialize
int* b = a.i; // no cast necessary
*b = 1;
static bufType a = { .c = { 0 } };
int* b = a.i;
*b = 2;
_Thread_local bufType a = { .c = { 0 } };
int* b = a.i;
*b = 3;
}
Here, the union
ensures that the compiler knows from the start that the buffer could be
accessed through different views. This also has the advantage that now the buffer has a “view” a.i
that already is of type int
and no pointer conversion is needed.
Effective type
The effective type of a data object is the last type information that was associated with it, if any.
// a normal variable, effective type uint32_t, and this type never changes
uint32_t a = 0.0;
// effective type of *pa is uint32_t, too, simply
// because *pa is the object a
uint32_t* pa = &a;
// the object pointed to by q has no effective type, yet
void* q = malloc(sizeof uint32_t);
// the object pointed to by q still has no effective type,
// because nobody has written to it
uint32_t* qb = q;
// *qb now has effective type uint32_t because a uint32_t value was written
*qb = 37;
// the object pointed to by r has no effective type, yet, although
// it is initialized
void* r = calloc(1, sizeof uint32_t);
// the object pointed to by r still has no effective type,
// because nobody has written to or read from it
uint32_t* rc = r;
// *rc now has effective type uint32_t because a value is read
// from it with that type. The read operation is valid because we used calloc.
// Now the object pointed to by r (which is the same as *rc) has
// gained an effective type, although we didn't change its value.
uint32_t c = *rc;
// the object pointed to by s has no effective type, yet.
void* s = malloc(sizeof uint32_t);
// the object pointed to by s now has effective type uint32_t
// because an uint32_t value is copied into it.
memcpy(s, r, sizeof uint32_t);
Observe that for the latter, it was not necessary that we even have an
uint32_t*
pointer to that object. The fact that we have copied another
uint32_t
object is sufficient.
Violating the strict aliasing rules
In the following code let us assume for simplicity that float
and uint32_t
have the same size.
void fun(uint32_t* u, float* f) {
float a = *f
*u = 22;
float b = *f;
print("%g should equal %g\n", a, b);
}
u
and f
have different base type, and thus the compiler can assume that they point to different objects. There is no possibility that *f
could have changed between the two initializations of a
and b
, and so the compiler may optimize the code to something equivalent to
void fun(uint32_t* u, float* f) {
float a = *f
*u = 22;
print("%g should equal %g\n", a, a);
}
That is, the second load operation of *f
can be optimized out completely.
If we call this function “normally”
float fval = 4;
uint32_t uval = 77;
fun(&uval, &fval);
all goes well and something like
4 should equal 4
is printed. But if we cheat and pass the same pointer, after converting it,
float fval = 4;
uint32_t* up = (uint32_t*)&fval;
fun(up, &fval);
we violate the strict aliasing rule. Then the behavior becomes undefined. The output could be as above, if the compiler had optimized the second access, or something completely different, and so your program ends up in a completely unreliable state.
restrict qualification
If we have two pointer arguments of the same type, the compiler can’t make any assumption and will always have to assume that the change to *e
may change *f
:
void fun(float* e, float* f) {
float a = *f
*e = 22;
float b = *f;
print("is %g equal to %g?\n", a, b);
}
float fval = 4;
float eval = 77;
fun(&eval, &fval);
all goes well and something like
is 4 equal to 4?
is printed. If we pass the same pointer, the program will still do the right thing and print
is 4 equal to 22?
This can turn out to be inefficient, if we know by some outside information that e
and f
will never point to the same data object. We can reflect that knowledge by adding restrict
qualifiers to the pointer parameters:
void fan(float*restrict e, float*restrict f) {
float a = *f
*e = 22;
float b = *f;
print("is %g equal to %g?\n", a, b);
}
Then the compiler may always suppose that e
and f
point to different objects.
Changing bytes
Once an object has an effective type, you should not attempt to modify it through a pointer of another type, unless that other type is a character type, char
, signed char
or unsigned char
.
#include <inttypes.h>
#include <stdio.h>
int main(void) {
uint32_t a = 57;
// conversion from incompatible types needs a cast !
unsigned char* ap = (unsigned char*)&a;
for (size_t i = 0; i < sizeof a; ++i) {
/* set each byte of a to 42 */
ap[i] = 42;
}
printf("a now has value %" PRIu32 "\n", a);
}
This is a valid program that prints
a now has value 707406378
This works because:
- The access is made to the individual bytes seen with type
unsigned char
so each modification is well defined. - The two views to the object, through
a
and through*ap
, alias, but sinceap
is a pointer to a character type, the strict aliasing rule does not apply. Thus the compiler has to assume that the value ofa
may have been changed in thefor
loop. The modified value ofa
must be constructed from the bytes that have been changed. - The type of
a
,uint32_t
has no padding bits. All its bits of the representation count for the value, here707406378
, and there can be no trap representation.