MyBB Community Forums

Full Version: [HOWTO] UTF-8 Supplementary / 4-Byte Characters with MySQL
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Unicode/UTF-8 has a bunch of rare characters (such as Egyptian hieroglyphs, musical notation, emoji, CJK ideographs, ...) which are currently not supported by MyBB (as of MyBB 1.6.11) nor by MySQL prior to MySQL 5.5.3, and there only by using a utf8mb4 charset instead of utf8.

Please see these links for further background information:
http://en.wikipedia.org/wiki/Plane_%28Un...gual_Plane
http://www.i18nguy.com/unicode/supplementary-test.html
http://dev.mysql.com/doc/refman/5.5/en/c...f8mb4.html

If you need/want support for these characters, a few code changes are necessary, as listed below.

For supporting the utf8mb4 charset, in inc/db_mysqli.php:
'utf8' => 'UTF-8 Unicode',
/* + */ 'utf8mb4' => 'UTF-8 Unicode (4-Byte)',
'utf8' => 'utf8_general_ci',
/* + */ 'utf8mb4' => 'utf8mb4_general_ci',

For changing existing database to utf8mb4, in phpMyAdmin:
ALTER TABLE `mybb_adminlog` CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci;
ALTER TABLE `mybb_adminoptions` CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci;
ALTER TABLE `mybb_adminsessions` CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci;
[... and so on and so forth for all tables including threads posts etc. ...]
ALTER TABLE `mybb_warninglevels` CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci;
ALTER TABLE `mybb_warnings` CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci;
ALTER TABLE `mybb_warningtypes` CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci;

For actually making MyBB use it, in inc/config.php:
/* - /* $config['database']['encoding'] = 'utf8';
/* + */ $config['database']['encoding'] = 'utf8mb4';

With those changes, you should be able to use all valid UTF-8 characters in MyBB.

However, MyBB 1.6.11 silently replaces such characters with question marks, as MySQL does not handle them well if it's not using utf8mb4. For that reason you also have to disable the utf8_handle_4byte_string() function.

In inc/functions.php:
function utf8_handle_4byte_string($input, $return=true)
{
    global $config;

/* - /* if($config['database']['type'] != 'mysql' && $config['database']['type'] != 'mysqli')
/* + */ if(1)
    {
        if($return == true)
        {
            return $input;
        }
        return true;
    }

With this, your board should be ready to use 4-byte UTF-8 characters.

If you want to make sure that only valid UTF-8 strings are stored in your database, you can add a check in the $db->escape_string() function. This function is supposed to make strings safe for queries, which means that almost any string that goes into the database passes through this function first, making it the ideal code point for such a check. However please note that it would break any query that attempts to store binary or otherwise invalid UTF-8 data on purpose. By default, MyBB itself has no such queries.

inc/db_mysqli.php:(dangerous!)
	function escape_string($string)
	{
/* + */ if(!mb_check_encoding($string, 'UTF-8'))
/* + */ {
/* + */     $this->error("[SQL] Bad character encoding. Invalid input.");
/* + */     die("Bad character encoding. Invalid input.");
/* + */ }

Regards
frostschutz
what
I think frostchutz is breaking stuff again....
--- edited into first post ---
This is a warning message from the Mars !!!!! Big GrinBig GrinBig Grin
English please Toungue
--- edited into first post ---
(2012-12-17, 12:04 AM)frostschutz Wrote: [ -> ]Which database is mybb.com using?
MySQL. Most likely using latin1 encoding, which "allows" pretty much any character sequence.
(2013-10-12, 03:53 PM)�?�?�?�?� Wrote: [ -> ]MySQL. Most likely using latin1 encoding

Yes, most likely Wink http://community.mybb.com/thread-131122-...#pid950460
What is this?