MyBB Community Forums

Full Version: UTF8 Conversion Problems
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2
I think I may finally have figured this one out. Maybe. I hope so anyways.

Right now things are written like this:
$insert_data['message'] = encode_to_utf8($this->bbcode_parser->convert(utf8_unhtmlentities($data['body'])), "personal_messages", "privatemessages");

Which seems weird to me, if its utf8_unhtmlentities(), then that makes me believe the data passed to it, should already be in utf8 format. However we're not encoding to utf8 until after both the utf8_unhtmlentites() and bbcode_parser->convert() are already run on the data.

So, I'm thinking it should be switched to this:
$insert_data['message'] = $this->bbcode_parser->convert(utf8_unhtmlentities(encode_to_utf8($data['body'], "personal_messages", "privatemessages")));

Since Ryan isn't around anymore, I can't easily ask him why he originally wrote it the way he did, but if anyone out there who was having trouble with text conversion creating weird symbols wants to test this for me, I would appreciate it.
Can you explain what is that and how to do it
Sorry, I never posted but I've tested this in multiple situations and found that I was incorrect. It needs to be the way it is.

I have a feeling that the problem we're seeing is actually related to the database details page, there is a checkbox there for UTF8 conversion, but right above it is a dropdown that asks what the current table encoding is. I'm betting people aren't setting this dropdown correctly.
Thank you. I'll try this since my host does not support mbstring at the moment.
I am getting this kind of error using google seo plugin.
Quote:Your host does not seem to support mbstring. This may cause problems with UTF-8.
i was searching for some ways i can create a customize functions of php.
Dear Development and support team,

I converted my forum from SMF 2.0 to myBB 1.6.5 by Merge system a few day ago, then I faced this UTF-8 problem. Some posts lost a part of content. I searched for solution but could not find out.
My old SMF encoded with UTF-8, then all I chose in new myBB and Merge configuration are UTF-8 also. My forum language is Vietnamese.

How is the solving process of this issue ? How about mbstring lib ?
Sincerely.

Edit 1: I noticed that all messages will be cut off at position of character &nbsp !!!!!!!!!!!
(2012-01-11, 09:17 AM)hungld86 Wrote: [ -> ]Dear Development and support team,

I converted my forum from SMF 2.0 to myBB 1.6.5 by Merge system a few day ago, then I faced this UTF-8 problem. Some posts lost a part of content. I searched for solution but could not find out.
My old SMF encoded with UTF-8, then all I chose in new myBB and Merge configuration are UTF-8 also. My forum language is Vietnamese.

How is the solving process of this issue ? How about mbstring lib ?
Sincerely.

Edit 1: I noticed that all messages will be cut off at position of character &nbsp !!!!!!!!!!!

Please try converting again with the "Convert to UTF-8" option turned off.
I tried both cases:
- Convert to UTF-8 turned OFF: lost remain part of posts (when they has &nbsp).
- Convert to UTF-8 turned ON: some posts have error with strange characters (eg. Mỗi năm 2 lần, và o tháng 6 và tháng ....)
Both cases occur with some post, not all.
Please post the following in Private Inquiries:
a new MyBB Admin login for me to use
Links to your SMF and MyBB forums
ftp, phpMyAdmin & cPanel access (again, create new accounts for me to use, they will be deleted when we're done).

Once I get these I'll take a look and see about making some code adjustments. It would greatly help if we can figure this one out, especially that non breaking space (&nbspWink issue!
It's hard to give you permission to my forum because of our server policy.
But I have looked through merge system code, then I found error occur by the function utf8_unhtmlentities (in resource/function.php).

function utf8_unhtmlentities($string)
{
	// Replace numeric entities
	$string = preg_replace('~&#x([0-9a-f]+);~ei', 'unichr(hexdec("\\1"))', $string);
	$string = preg_replace('~&#([0-9]+);~e', 'unichr("\\1")', $string);
	
	// Replace literal entities
	$trans_tbl = get_html_translation_table(HTML_ENTITIES);
	$trans_tbl = array_flip($trans_tbl);
	
	return strtr($string, $trans_tbl);
}
I edit to by pass this function when merging, everything was fine, and all HTNL code remain.

Please have a deeper look into this function and test parameters of preg_replace and HTML_ENTITIES table if you can find the problem.
I wait for any new idea.
(2012-01-13, 09:08 AM)hungld86 Wrote: [ -> ]It's hard to give you permission to my forum because of our server policy.
But I have looked through merge system code, then I found error occur by the function utf8_unhtmlentities (in resource/function.php).

function utf8_unhtmlentities($string)
{
	// Replace numeric entities
	$string = preg_replace('~&#x([0-9a-f]+);~ei', 'unichr(hexdec("\\1"))', $string);
	$string = preg_replace('~&#([0-9]+);~e', 'unichr("\\1")', $string);
	
	// Replace literal entities
	$trans_tbl = get_html_translation_table(HTML_ENTITIES);
	$trans_tbl = array_flip($trans_tbl);
	
	return strtr($string, $trans_tbl);
}
I edit to by pass this function when merging, everything was fine, and all HTNL code remain.

Please have a deeper look into this function and test parameters of preg_replace and HTML_ENTITIES table if you can find the problem.
I wait for any new idea.

And there no obvious problems with the function - which is the problem. The trans table flip is fine. So whatever is wrong has to be with the regex. But I'll be darned if I can see what it is. I'll look again after I've had a little more sleep... only got up to drive the gf's daughter to school... I need more than 2 hours. So... bbl Big Grin
Pages: 1 2