Sunday, 8 August 2010

Strings and Qt

One thing which comes up quite often when I'm talking to developers new to Qt is the topic of strings, more specifically, character encoding: how to do it right, what options are available, and best practices.

Having written this a few times now, I thought that perhaps it was about time I write it up in a more permanent location (here) in the hopes that people will stumble across it and magically become enlightened, and end world hunger. ;)

Qt (and C++) have a number of different string types.

QString

Qt has a string type in QtCore called QString. QString, internally, stores data in utf16, and *does* have knowledge of character encoding.

Services across a network (like web services) often want data in utf8. QString, however, stores data in utf16. To get to utf8, you want QString::toUtf8(). To convert from utf8 back to utf16 QString (e.g. parsing input from a web service) see, QString::fromUtf8().

std::string

C++ also has std::string (although you won't find a lot of this in Qt applications). Simply put, it's a wrapper around a C string providing convenience operations and nicer syntax. It still doesn't have such (fairly essential) things like character encoding.

You probably want to avoid using this in an internationalized application or one requiring interaction with network services unless you find your own solution for encoding issues.

QString has ::toStdString() and ::fromStdString() methods if you must use them for whatever reason.

C strings (char*)

Finally, you have C strings (char*) which don't have any idea what encoding is, they are just a bunch of bytes.

Generally speaking, they're latin1 encoded (ASCII), to put them into a QString.. QString::fromLatin1(). If they aren't latin1, see QTextCodec::setCodecForCStrings().

QLatin1String class is also helpful - in particular, this will allow you to compile when using QT_NO_CAST_FROM_ASCII (which itself is helpful to make sure you explicitly give encodings for all of your strings).

Labels: , , , , ,