MyBB Community Forums

Full Version: [F] Umlauts in links [C-Imad Jomaa]
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2
If a link contains an umlaut it's broken in the post.

Example: http://www.kachelöfen.de/
Is this even a problem in MyBB? Do domains servers even allow those umlauts in URLs?
Okay so in inc/class_parser.php

$message = preg_replace("#([\>\s\(\)])(https?|ftp|news){1}://([\w\-]+\.([\w\-]+\.)*[\w]+(:[0-9]+)?(/[^\"\s<\[]*)?)#i", "$1[url]$2://$3[/url]", $message);

adding the "u" modifier (for unicode) fixes the problem, but then we'll have regression issues with PCRE/PHP versions that don't support unicode.
(2008-12-10, 10:37 PM)Ryan Gordon Wrote: [ -> ]Is this even a problem in MyBB? Do domains servers even allow those umlauts in URLs?

Yea they do. Not sure how old it is. But I'm sure it's a couple of years at least.
(2008-12-10, 10:37 PM)Ryan Gordon Wrote: [ -> ]Do domains servers even allow those umlauts in URLs?
Yes they do. See: http://www.denic.de/en/domains/idns/index.html
(2008-12-10, 11:05 PM)Ryan Gordon Wrote: [ -> ]Okay so in inc/class_parser.php

$message = preg_replace("#([\>\s\(\)])(https?|ftp|news){1}://([\w\-]+\.([\w\-]+\.)*[\w]+(:[0-9]+)?(/[^\"\s<\[]*)?)#i", "$1[url]$2://$3[/url]", $message);

adding the "u" modifier (for unicode) fixes the problem, but then we'll have regression issues with PCRE/PHP versions that don't support unicode.

We can expand the '\w' to a character class and include all the added special chars, but probably will be slower...

Any other ideas?

BTW, which versions of doesn't support unicode? should MyBB still support them?

(the 'u' modifier is a very convenient solution, not only to here but also to other places in the code of mybb)
(2009-02-14, 06:01 PM)dvb Wrote: [ -> ]
(2008-12-10, 11:05 PM)Ryan Gordon Wrote: [ -> ]Okay so in inc/class_parser.php

$message = preg_replace("#([\>\s\(\)])(https?|ftp|news){1}://([\w\-]+\.([\w\-]+\.)*[\w]+(:[0-9]+)?(/[^\"\s<\[]*)?)#i", "$1[url]$2://$3[/url]", $message);

adding the "u" modifier (for unicode) fixes the problem, but then we'll have regression issues with PCRE/PHP versions that don't support unicode.

We can expand the '\w' to a character class and include all the added special chars, but probably will be slower...

Any other ideas?

BTW, which versions of doesn't support unicode? should MyBB still support them?

(the 'u' modifier is a very convenient solution, not only to here but also to other places in the code of mybb)

No specific version, just if you don't have UTF8 support compiled into PCRE. Personally, I think it should be a requirement, but we can't really enforce it or known the stats on which hosts support it for impact on our users.
Okay, can you try this? In inc/class_parser.php find:

function mycode_auto_url($message)
{
	$message = " ".$message;
	$message = preg_replace("#([\>\s\(\)])(https?|ftp|news){1}://([\w\-]+\.([\w\-]+\.)*[\w]+(:[0-9]+)?(/[^\"\s<\[]*)?)#i", "$1[url]$2://$3[/url]", $message);
	$message = preg_replace("#([\>\s\(\)])(www|ftp)\.(([\w\-]+\.)*[\w]+(:[0-9]+)?(/[^\"\s<\[]*)?)#i", "$1[url]$2.$3[/url]", $message);
	$message = my_substr($message, 1);
		
	return $message;
}

replace with

function mycode_auto_url($message)
{
	static $utf8_pcre_supported;
	if(!isset($utf8_pcre_supported))
	{
		$utf8_pcre_supported = @preg_match('#^.#u', 'a');
	}
		
	if($utf8_pcre_supported)
	{
		$utf8_regex_chr = "u";
	}
	else
	{
		$utf8_regex_chr = "";
	}
		
	$message = " ".$message;
	$message = preg_replace("#([\>\s\(\)])(https?|ftp|news){1}://([\w\-]+\.([\w\-]+\.)*[\w]+(:[0-9]+)?(/[^\"\s<\[]*)?)#i".$utf8_regex_chr, "$1[url]$2://$3[/url]", $message);
	$message = preg_replace("#([\>\s\(\)])(www|ftp)\.(([\w\-]+\.)*[\w]+(:[0-9]+)?(/[^\"\s<\[]*)?)#i".$utf8_regex_chr, "$1[url]$2.$3[/url]", $message);
	$message = my_substr($message, 1);
		
	return $message;
}
I tested several things and it seems to be fine now. Smile
Sounds good. We might extend that to all preg_replace expressions in 1.6/2.0 then. Might use a my_preg_replace wrapper for ease.
Pages: 1 2