Ansi- and Wide-character functions

Introduction

The Windows API documentation for functions taking one or more string as argument will usually look like this:

BOOL WINAPI CopyFile(
  _In_ LPCTSTR lpExistingFileName,
  _In_ LPCTSTR lpNewFileName,
  _In_ BOOL    bFailIfExists
);

The datatype for the two string parameters is made of several parts:

LP = Long pointer
C = const
T = TCHAR
STR = string

Now what does TCHAR mean? This depends on platform chosen for the compilation of program.

CopyFile itself is just a macro, defined something like this:

#ifdef UNICODE
#define CopyFile CopyFileW
#else
#define CopyFile CopyFileA
#endif

So there are actually two CopyFile functions and depending on compiler flags, the CopyFile macro will resolve to one or the other.

There core token, TCHAR is defined as:

#ifdef _UNICODE
typedef wchar_t TCHAR;
#else
typedef char TCHAR;
#endif

So again, depending on the compile flags, TCHAR is a “narrow” or a “wide” (2 bytes) character.

So when UNICODE is defined, CopyFile is defined to be CopyFileW, which will use 2-byte character arrays as their parameter, which are expected to be UTF-16 encoded.

If UNICODE isn’t defined, CopyFile is defined to be CopyFileA which uses single-byte character arrays which are expected to be encoded in the default ANSI encoding of the current user.

There are two similar macros: UNICODE makes the Windows APIs expect wide strings and _UNICODE (with a leading underscore) which enables similar features in the C runtime library.

These defines allow us to write code that compiles in both ANSI and in Unicode-mode.

It is important to know that the ANSI encoding may be a single-byte encoding (i.e. latin-1) a multi-byte encoding (i.e. shift jis), although utf-8 is, unfortunately, not well supported.

This means that neither the ANSI, nor the Wide-character variant of these functions can be assumed to work with fixed width encodings.