Rogue Wave banner
Previous fileTop of DocumentContentsIndex pageNext file
Essential Tools Module User's Guide

4.9 Multibyte Strings

Class RWCString provides limited support for multibyte strings, sometimes used in representing various alphabets (see Section 8.2). Because a multibyte character can consist of two or more bytes, the length of a string in bytes may be greater than or equal to the number of actual characters in the string.

If the RWCString contains multibyte characters, you should use member function mbLength() to return the number of characters. On the other hand, if you know that the RWCString does not contain any multibyte characters, then the results of length() and mbLength() will be the same, and you may want to use length() because it is much faster. Here's an example using a multibyte string in Sun:

The string in Sun is the name of the day Sunday in Kanji, using the EUC (Extended UNIX Code) multibyte code set. With the EUC, a single character may be 1 to 4 bytes long. In this example, the string Sun consists of 6 bytes, but only 3 characters.

In general, the second or later byte of a multibyte character may be null. This means the length in bytes of a character string may or may not match the length given by strlen(). Internally, RWCString makes no assumptions about embedded nulls. (However, system functions to transfer multibyte strings may make such assumptions. RWCString simply calls such functions to provide such transformations.) Given no assumptions, RWCString can be used safely with character sets that use null bytes. You should also keep in mind that while RWCString::data() always returns a null-terminated string, there may be earlier nulls in the string. All of these effects are summarized in the following program:

You will notice that two different constructors are used above. The constructor in lines 1 and 2 takes a single argument of const char*, a null-terminated string. Because it takes a single argument, it may be used in type conversion (ARM 12.3.1). The length of the results is determined the usual way, by the number of bytes before the null. The constructor in line 3 takes a const char* and a run length. The constructor will copy this many bytes, including any embedded nulls.

The length of an RWCString in bytes is always given by RWCString::length(). Because the string may include embedded nulls, this length may not match the results given by strlen().

Remember that indexing and other operators -- basically, all functions using an argument of type size_t -- work in bytes. Hence, these operators will not work for RWCStrings containing multibyte strings.



Previous fileTop of DocumentContentsIndex pageNext file

©2004 Copyright Quovadx, Inc. All Rights Reserved.
Rogue Wave and SourcePro are registered trademarks of Quovadx, Inc. in the United States and other countries. All other trademarks are the property of their respective owners.
Contact Rogue Wave about documentation or support issues.