MyBB Community Forums

Full Version: UTF8 Conversion Problems
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2
Oh, are you kidding ? gf's daugher ?? Big Grin

Again, I'm not a master of PHP, so I have to search about preg_replace: http://php.net/manual/en/function.html-e...decode.php

There is a note they say that:
You might wonder why trim(html_entity_decode(' ')); doesn't reduce the string to an empty string, that's because the ' ' entity is not ASCII code 32 (which is stripped by trim()) but ASCII code 160 (0xa0) in the default ISO 8859-1 characterset.

I'm reading for example right above this note. I think it should he helpful for find out the problem. Smile
Hmm. Thanks. I'll keep looking into it. Smile
Hi,

Although drank to much this weekend, I have tried to replace UTF-8 convert function. I have successful converted my forum without error or striped post. Lightbulb

Function utf8_unhtmlentities has been change, and some code was added for it. If possible, please update it to Merge system. Smile

/**
 * Support function for unicode convert
 * Convert character to UTF-8 code
 */

function chr_utf8($code) 
    { 
        if ($code < 0) return false; 
        elseif ($code < 128) return chr($code); 
        elseif ($code < 160) // Remove Windows Illegals Cars 
        { 
            if ($code==128) $code=8364; 
            elseif ($code==129) $code=160; // not affected 
            elseif ($code==130) $code=8218; 
            elseif ($code==131) $code=402; 
            elseif ($code==132) $code=8222; 
            elseif ($code==133) $code=8230; 
            elseif ($code==134) $code=8224; 
            elseif ($code==135) $code=8225; 
            elseif ($code==136) $code=710; 
            elseif ($code==137) $code=8240; 
            elseif ($code==138) $code=352; 
            elseif ($code==139) $code=8249; 
            elseif ($code==140) $code=338; 
            elseif ($code==141) $code=160; // not affected 
            elseif ($code==142) $code=381; 
            elseif ($code==143) $code=160; // not affected 
            elseif ($code==144) $code=160; // not affected 
            elseif ($code==145) $code=8216; 
            elseif ($code==146) $code=8217; 
            elseif ($code==147) $code=8220; 
            elseif ($code==148) $code=8221; 
            elseif ($code==149) $code=8226; 
            elseif ($code==150) $code=8211; 
            elseif ($code==151) $code=8212; 
            elseif ($code==152) $code=732; 
            elseif ($code==153) $code=8482; 
            elseif ($code==154) $code=353; 
            elseif ($code==155) $code=8250; 
            elseif ($code==156) $code=339; 
            elseif ($code==157) $code=160; // not affected 
            elseif ($code==158) $code=382; 
            elseif ($code==159) $code=376; 
        } 
        if ($code < 2048) return chr(192 | ($code >> 6)) . chr(128 | ($code & 63)); 
        elseif ($code < 65536) return chr(224 | ($code >> 12)) . chr(128 | (($code >> 6) & 63)) . chr(128 | ($code & 63)); 
        else return chr(240 | ($code >> 18)) . chr(128 | (($code >> 12) & 63)) . chr(128 | (($code >> 6) & 63)) . chr(128 | ($code & 63)); 
    } 

    // Callback for preg_replace_callback('~&(#(x?))?([^;]+);~', 'html_entity_replace', $str); 
    function html_entity_replace($matches) 
    { 
        if ($matches[2]) 
        { 
            return chr_utf8(hexdec($matches[3])); 
        } elseif ($matches[1]) 
        { 
            return chr_utf8($matches[3]); 
        } 
        switch ($matches[3]) 
        { 
            case "nbsp": return chr_utf8(160); 
            case "iexcl": return chr_utf8(161); 
            case "cent": return chr_utf8(162); 
            case "pound": return chr_utf8(163); 
            case "curren": return chr_utf8(164); 
            case "yen": return chr_utf8(165); 
            //... etc with all named HTML entities 
        } 
        return false; 
    } 
	
// End of html to UTF-8 convert


/**
 * Returns any html entities to their original character.
 * Edited for UTF-8 issue !!
 *
 * @param string The string to un-htmlentitize.
 * @return int The un-htmlentitied' string.
 */
function utf8_unhtmlentities($string)
{
	$string = preg_replace_callback('~&(#(x?))?([^;]+);~', 'html_entity_replace', $string);  // use callback above
    return $string;
}

So: Alcohol is useful. ^_^
This doesn't prove true.

Quick test script with the function copied from the Merge System:
$work = 'This is a&nbsp;non-breaking space.';

$out = utf8_unhtmlentities($work);

echo "Output: ".$out;

function utf8_unhtmlentities($string)
{
    // Replace numeric entities
    $string = preg_replace('~&#x([0-9a-f]+);~ei', 'unichr(hexdec("\\1"))', $string);
    $string = preg_replace('~&#([0-9]+);~e', 'unichr("\\1")', $string);
    echo "String 1: ".$string."<br />";
    // Replace literal entities
    $trans_tbl = get_html_translation_table(HTML_ENTITIES);
    $trans_tbl = array_flip($trans_tbl);
    
    return strtr($string, $trans_tbl);
}

And it returns the expected:
String 1: This is a&nbsp;non-breaking space.<br />Output: This is a non-breaking space.

So, it obviously isn't true that &nbsp; is breaking this function. It has to be something else with the formatting of these posts, however if you cannot copy & paste me an example post where this happens, then I can't do further debugging.

EDIT: And when I say copy & paste, I mean the raw data from the table, not what you see from the board.
(2012-01-17, 05:23 AM)Dylan M. Wrote: [ -> ]So, it obviously isn't true that &nbsp; is breaking this function. It has to be something else with the formatting of these posts, however if you cannot copy & paste me an example post where this happens, then I can't do further debugging.

EDIT: And when I say copy & paste, I mean the raw data from the table, not what you see from the board.

My full post ((sample) raw data):
Quote:¿Haz intentado hacer una transferencia de dominio, para sacar algún dominio de nic.mx a otro registrar?<br /><br />Si no, pues este será el calvario que te espera para lograrlo.&nbsp; :merecargo:<br /><br />Hace unos días un cliente mio me solicito le ayudara a mover sus dominios de un registrador a otro.

Apparently the post is incomplete since (marked in red):
Quote:¿Haz intentado hacer una transferencia de dominio, para sacar algún dominio de nic.mx a otro registrar?<br /><br />Si no, pues este será el calvario que te espera para lograrlo.&nbsp; :merecargo:<br /><br />Hace unos días un cliente mio me solicito le ayudara a mover sus dominios de un registrador a otro.

My problem is exactly the same as hungld86.

I use SMF 1.1.12
My version is MyBB 1.6.6
MyBB Merge System - Version: 1.6.3

And my language is Spanish.
(In this post I explain my problem: http://community.mybb.com/thread-97195-p...#pid836270 )

Thank you in advace
Seeking information on the forum, I found this old post: http://community.mybb.com/thread-75231-p...#pid550961

I check the file "/merge/boards/smf/bbcode_parser.php" of "Merge 1.6.3" and basing on the code of Dylan M. I made my own modification.

Code:
function convert($message)
	{
		$message = str_ireplace(array('[right]', '[/right]', '[left]', '[/left]', '[center]', '[/center]', "<br />", '[ftp', '[/ftp]', '<!-- m', '<!-- s', '-->', '&amp;', '&nbsp;', '&quot;', '<', '>'), array('[align=right]', '[/align]', '[align=left]', '[/align]', '[align=center]', '[/align]', "\n", '[url', '[/url]', '', '', '', '&', ' ', '"', '<', '>'), $message);
		$message = preg_replace("#\[size=([0-9\+\-]+?)p[tx]\](.*?)\[/size\]#si", "[size=$1]$2[/size]", $message);
		$message = preg_replace("#\[li\](.*?)\[/li\]#si", "[*]$1", $message);
		$message = preg_replace("#\[img width=([0-9\+\-]+?) height=([0-9\+\-]+?)\]#si", "[img=$1x$2]", $message);
		$message = preg_replace("#\[quote(.*?)\](.*?)\[\/quote\]#esi", "\$this->mycode_parse_post_quotes('$2', '$1')", $message);
		return $message;
	}

I just change the first line and I added this in the arrays:
'&amp;', '&nbsp;', '&quot;', '<', '>'
'&', ' ', '"', '<', '>'

This was done in order to correct incomplete posts. But it did not work. Sad
I found another error:

This is the quote in SMF:
[quote author=Nachillo link=topic=533.msg6874#msg6874 date=1298265645]

This is the quote (in the exact post) in MyBB after migration:
[quote='Nachillo link' pid='468' dateline='1298265645']

Maybe the correct quote (based in the sample) would be:
[quote='Nachillo' pid='6874' dateline='1298265645']

Is an error in preg_match in the function mycode_parse_post_quotes() the file "/merge/boards/smf/bbcode_parser.php".

In addition in the quotes nested, the migration don't work, just converts the first quote in the post.
Thanks for the quote parser error. Please post it on http://dev.mybb.com/projects/mybb-import/issues
Ok I want you to show me how to solve this problem

I did at first choosing " Yes " with this choice


" Automatically convert messages to UTF8?:
Turn this off if the conversion creates
weird characters in your forum's messages. "

But the forum appears as it seen in attachment 1

Next I have chosen " No "

but when I have start the second step

merging the users I have got the problem in attachment 2

MyBB has experienced an internal SQL error and cannot continue.

SQL Error:
1062 - Duplicate entry 'ÑÇÖí' for key 'username'
Query:
INSERT INTO mybb_users (`usergroup`,`additionalgroups`,`displaygroup`,`import_usergroup`,`import_additionalgroups`,`import_displaygroup`,`import_uid`,`username`,`password`,`salt`,`loginkey`,`email`,`regdate`,`lastactive`,`lastvisit`,`website`,`showsigs`,`signature`,`showavatars`,`timezone`,`avatardimensions`,`avatartype`,`avatar`,`lastpost`,`icq`,`aim`,`yahoo`,`msn`,`hideemail`,`allownotices`,`regip`,`lastip`,`longregip`,`longlastip`,`language`,`passwordconvert`,`passwordconverttype`,`postnum`,`invisible`,`birthday`,`birthdayprivacy`,`subscriptionmethod`,`receivepms`,`receivefrombuddy`,`pmnotice`,`pmnotify`,`showquickreply`,`ppp`,`tpp`,`daysprune`,`timeformat`,`dst`,`buddylist`,`ignorelist`,`style`,`away`,`awaydate`,`returndate`,`referrer`,`referrals`,`reputation`,`timeonline`,`showcodebuttons`,`totalpms`,`unreadpms`,`pmfolders`,`notepad`,`threadmode`,`showredirect`,`dateformat`,`dstcorrection`,`warningpoints`,`moderateposts`,`moderationtime`,`suspendposting`,`suspensiontime`,`suspendsignature`,`suspendsigtime`,`coppauser`,`classicpostbit`,`loginattempts`,`failedlogin`,`usernotes`,`passwordconvertsalt`) VALUES ('2','','2','','','0','661','ÑÇÖí','','','0','[email protected]','1297528129','1297528996','1297528129','','1','','1','','','','','0','','','','','1','1','218.111.7.156','','0','0','','95d33d3003bb8d9db6a7908fab0ffa9a','vb3','0','0','','all','2','1','0','1','1','1','0','0','0','0','0','','','0','0','0','0','0','0','0','0','1','0','0','1**Inbox$%%$2**Sent Items$%%$3**Drafts$%%$4**Trash Can','','','1','0','1','0','0','0','0','0','0','0','0','0','0','0','','z$,')


by the way merging from VB 3.8 to Mybb

I hope you can help me in this problem
this may helps code 128 http://community.mybb.com/thread-118617.html
"vBulletin 3.8" to "MYBB" is working perfect now!
Pages: 1 2