MyBB Community Forums

Full Version: Upgrade from 1.6.9 to 1.8.21 -- utf8 vs latin1
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
I recently upgraded a 1.6.9 board where the database was saved with mysqldump, then loaded into a test database where specs were:

server connection collation is utf8mb4_unicode_ci
default database collation is latin1_swedish_ci
db tables list collation are utf8_general_ci
mybb_posts table fields collation is latin1_swedish_ci for fields subject, username, message, editreason
config.php database encoding is utf8
languages/english.php charset is UTF-8

The conversion was successful except for some odd characters displaying in posts. The forum is a collection of posts in some different European languages.

Using phpMyAdmin, I experimented with changing the message field in mybb_posts table collation. There is a warning before changing collation to utf8 with a link to more explanation.

It seems I have Problem 2 as detailed in this link https://github.com/phpmyadmin/phpmyadmin...rbled_data

So, does anyone have a suggestion for processing the database? It's a large board with 367K posts. I'd like to avoid the database conversion from the beginning, although it is possible to start again.

Attached image is a screenshot from the SQL data backup. Is this UTF8 coding but labeled latin1 or vice versa.

[attachment=42340]
Solved, by editing a line in inc/config.php from
$config['database']['encoding'] = 'utf8';
to
$config['database']['encoding'] = 'latin1';

A new installation defaults to utf8, and it is best started out that way, but then there's the charset and collation specifications on the server. All the parts need to be coordinated to work well.

In this case, it was a multilingual board upgraded from 1.6.8 and old posts were encoded in latin1. I installed the forum on a local test bed, and while I had the opportunity to perform the UTF8 conversion from the AdminCP, it failed on 6 tables because there were unrecognized character combinations. The resulting table left certain field conversions stuck at part 1, changing TEXT fields to BLOBs, but unable to proceed to part 2, changing them back to TEXT after converting the character set.

Changing the database encoding to latin1 made old posts read correctly again, but it's not the best answer as it may introduce migration problems in future.

One thing I learned the hard way, perhaps folks with more experience already know. If you make a database backup to restore later, use the same method to restore as was used to backup. In other words, a sql backup used by mysqldump is not the same as an export created with phpmyadmin. Don't mix up process methods. They're not the same.

Some useful resources found while investigating:
http://string-functions.com/encodedecode.aspx
https://www.whitesmith.co/blog/latin1-to-utf8/
https://codex.wordpress.org/Converting_D...acter_Sets

All conversion tools I found were useful, but none worked thoroughly without introducing some flaw, although it may be more likely my own imperfect understanding of SQL, PHP, character sets and collations.

Thanks to Robert, who allowed me to work with his live forum data on the toughest problem I've encountered yet on this MyBB journey.
I'm sorry I have to answer to this old topic. That's a perfect solution except one thing: in all the posts are wrong HTML codes plus instead of a new line in the posts now it writes "<br/>". Can this one be fixed, too?
Sounds like this problem. Update templates, especially the ones here.
https://community.mybb.com/thread-223638.html

Oops. I see you are migrating from other software.
I assume your MyBB is a new installation.
Still, check that thread.
phpBB has those kinds of tags. I'm not sure if the Merge System would take good care of them.

To answer kunst79's question, there's no better way to deal with that unless a script is used to remove those tags on already converted data.
I don't recognize those codes as valid html.
You might need to create some MyCode expressions to achieve the same effects, or to ignore them when parsing the post.

This regex will remove <r> and </r>, leaving behind what's between.
\<\/?r*\>

Here is a very useful tool for creating and evaluating regex.
https://regex101.com/