Character Encoding

3ds Max 2013 and higher provide support for multi-lingual character support in text files.

While the majority of the changes are transparent to the user, there are several aspects of this new functionality that affect MAXScript files or are exposed to MAXScript for advanced user control.

The following topic discusses the various encoding standards and provides links to the affected features.

Character Encoding Overview

In the early days of computing, text files were encoded using the ASCII (American Standard Code for Information Interchange) specification. The ASCII originally used 7-bit encoding and represented only the English alphabet, the digits from 0 to 9 and punctuation characters.

Later, ASCII was extended to 8-bit with international and special characters added in the upper half of the table (ASCII code above 127). Only 128 additional characters could be active at a time, so it wasn't possible to support both Cyrillic and German characters at the same time, for example. The extended ASCII used in Microsoft Windows is known as Windows-1252 or ANSI encoding.

To solve this problem and allow each character to be stored with its unique code, a Unicode standard was introduced in the early 90s. Various Unicode-based encodings implement the Unicode standard. These include UTF-8, UTF-16, UTF-32/UCS-4. They all go beyond 8-bit encoding and support almost every language on the planet. UTF-8 is the dominant Unicode encoding, esp. on the World Wide Web.

UTF-8

UTF-8 stands for "UCS Transformation Format", where UCS stands for "Universal Character Set". UCS is an International ISO/IEC standard.

UTF-8 is a variable-width encoding - it uses one to four 8-bit bytes (called octets in the Unicode Standard) to represent each of the 1,112,064 unique characters of the Unicode Character Set.

Characters with lower numerical values which are used more often in practice are encoded using less bytes, somewhat improving efficiency.

In particular, the first 128 characters of the Unicode Character Set correspond one-to-one to the ASCII character set, and UTF-8 also encodes them with one byte. This makes ASCII text valid UTF-8-encoded Unicode text as well.

A side effect of this is that some foreign languages require more bytes to represent their characters - for example Cyrillic and Greek characters require mostly two bytes, other languages can take 3 or even 4 bytes for their characters, thus producing larger text files.

UTF-16

UTF-16 is similar to UTF-8 in the amount of characters it can encode, but it uses two 16-bit bytes instead of four 8-bit bytes to encode the Unicode Character Set.

Since the WIndows API supports only ANSI and UTF-16, 3ds Max 2013 uses UTF-16 internally for Unicode representation, including text streams inside the scene file, INI files saving and loading etc.

BOM (Byte-Order Mark)

Some Windows programs add the bytes 0xEF, 0xBB, 0xBF to the beginning of an UTF-8 file. This is the UTF-8 encoding of the Unicode Byte-Order Mark and is commonly referred to as UTF-8 BOM, even though it is not relevant to the actual byte order. Most modern text editors will strip these bytes, but some older programs might not.

If compatibility with other programs is not important, the BOM bytes could be used to identify the file as UTF-8 encoding as opposed to ASCII encoding. For example, saving an .MS file with BOM will allow 3ds Max 2012 and earlier to open the script without losing international characters.

In general, if compatibility with older software is not an issue, it is recommended not to use UTF-8 BOM and leave the determination of the encoding to the program (by checking if the file is valid UTF-8).

Character Encoding Defaults in 3ds Max

The default Character Encoding of non-scene files is controlled by the Customize > Preferences > File tab > File String Data Handling group of controls.

Override language data specified in scene file

When unchecked, the language data stored in the .MAX scene file will be respected.

When checked, the language data will be overridden.

Save strings in legacy non-scene files using UTF8

When unchecked, text files will be saved as ASCII for compatibility.

When checked, text files will be saved using UTF-8 with BOM encoding.

Character Encoding and MAXScript

MAXScript Editor and Script Files

The most obvious area affected by the ability of 3ds Max to write text files using different character encoding is the MAXScript Tabbed Editor (also known as MAXScript Pro Editor) introduced in 3ds Max 2008.

3ds Max 2013 added the ability to control the encoding of the saved script file via a drop-down list in the Save File... dialog of the MAXScript Editor.

The default setting of the drop-down list for new files is "Default Code Page".

The setting of the drop-down list for existing files will depend on the encoding of the file, so resaving will preserve the encoding unless changed explicitly.

The default behavior of 3ds Max when saving "Default Code Page" is to NOT use UTF-8 with BOM. This can be changed in the Preferences dialog (see previous section) to enforce saving as UTF-8 with BOM.

In versions prior to 3ds Max 2013, the MAXScript Editor saved script files implicitly as UTF-8 BOM.

The Editor in previous versions of 3ds Max will also load UTF-8 with BOM files correctly, but does not support all other flavors including Unicode, Unicode BigEndian and UTF-8 without BOM.

Thus, script files saved from previous versions of 3ds Max will load correctly in 3ds Max 2013 and higher, but files saved from 3ds Max 2013 or higher will only load international characters correctly if saved as UTF-8 with BOM. Otherwise, international characters will be replaced with a ? symbol.

For Example

The following script contains three string definitions in three languages - English, German and Bulgarian.

After entering the text in the MAXScript Editor of 3ds Max 2013 or higher, the script was saved as Default Code Page, Unicode, Unicode BigEndian, UTF-8 BOM and UTF-8.

After saving the text using all supported encoding standards, the resulting script files were opened again in the MAXScript Editor of 3ds Max 2013:

As you can see, all files except for the Default Code Page one reproduce the original multi-lingual content correctly.

Now let's take a look at the same files loading in the MAXScript Editor of 3ds Max 2012:

As you can see, only the UTF-8 BOM encoding was interpreted correctly. The UTF-8 contains the German characters, but completely lost the Bulgarian text. The Unicode is not recognized at all.

Note that re-saving the UTF-8 BOM encoded file from 3ds Max 2012 opens it correctly in 3ds Max 2013:

Encoding and User Interface Text

Creating a test rollout with three edit text controls and putting each of the three strings in a text field produces correct encoding in 3ds Max 2013, but not in 3ds Max 2012:

In other words, although the MAXScript Editor in previous versions of 3ds Max is able to read UTF-8 BOM encoded files, 3ds Max itself was not able to represent the international characters correctly.

Starting with 3ds Max 2013, the complete application is Unicode-aware and international characters will be displayed correctly.

FileStream Text Files Input and Output

The saving and loading of text files using the FileStream methods will support the current character encoding settings.

For example, if the Preferences dialog is set to not override the encoding to UTF-8, using the following script will produce a text file with missing international characters:

EXAMPLE:

   theString1 = "This is some English text."
   theString2 = "Das wäre auf Deutsch, so daß man die Unterschiede sehen könnte..."
   theString3 = "Най-сетне повод да вмъкна малко Български в документацията!"

   theFileName = (GetDir #temp + "\\testencoding.txt")
   theFile = createFile theFileName
   format "%\n" theString1 to:theFile
   format "%\n" theString2 to:theFile
   format "%\n" theString3 to:theFile
   close theFile
   edit theFileName

Switching the Preferences to "Save strings in legacy non-scene files using UTF8" will save the file as UTF-8 and the Bulgarian text will be saved and loaded correctly.

INI Files Saving and Loading

The setIniSetting() method uses its own defaults settings independent from the Preferences settings of 3ds Max.

By default, setIniSetting() will default to UTF-16 encoding.

A setIniForceUTF16Default() method can set the temporary default of the encoding (valid only for the current 3ds Max session).

The user can specify the encoding explicitly using the optional keyword argument forceUTF16:

For details, please see the setINISetting() documentation.

MemStreamMgr Parsing

The [MemStreamMgr](../../3ds-Max-Objects-and-Interfaces/Interfaces/Core-Interfaces/Core-Interfaces-Documentation/M/Interface-MemStream.html).openFile() method now provides two optional keyword arguments to control the encoding and code page.