MyBB Community Forums

Full Version: HTML allowed - some weird replacements
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
<b>Test</b>
<script>alert('Test');</script>
<style>body { background-color: black; }</style>
<meta>Test</meta>
/me asks
/slap Euan

was my post. The opening script tag is replaced, the closing one apparently not (at least it isn't displayed). The style part disappears completely and the meta tag acts like the script one.

What should be done (looking at the code): script and style should be escaped and are disallowed, meta is run through htmlspecialchars and the rest shouldn't be changed.
Hence why we should use HTMLPurifier for 2.0, all this is done for us Toungue

Also, does 1.8 use Regex to remove dodgy HTML? If so, it is probably better to use something like DOMDocument.
The regex used:
            while(preg_match("#<s(cript|tyle)(.*)>(.*)</s(cript|tyle)(.*)>#is", $message))
            {
                $message = preg_replace("#<s(cript|tyle)(.*)>(.*)</s(cript|tyle)(.*)>#is", "&lt;s$1$2&gt;$3&lt;/s$4$5&gt;", $message);
            }
            $find = array('<?php', '<!--', '-->', '?>', "<br />\n", "<br>\n");
            $replace = array('&lt;?php', '&lt;!--', '--&gt;', '?&gt;', "\n", "\n");
            $message = str_replace($find, $replace, $message);

             $message = preg_replace_callback("#<((m[^a])|(b[^diloru>])|(s[^aemptu>]))(\s*[^>]*)>#si", create_function(
                '$matches',
                'return htmlspecialchars_uni($matches[0]);'
            ), $message);

HTMLPurifier simply replaces script, style and meta with <br> Toungue
Quote:
create_function(
                '$matches',
                'return htmlspecialchars_uni($matches[0]);'
            )

I all of a sudden have the horrible sound of alarm bells going off in my head...

That is a nasty bit of code. Compared to a DOMDocument based solution:

$doc = new DOMDocument();
$doc->loadHTML($message);

$script_tags = $doc->getElementsByTagName('script');
$length = $script_tags->length;

for ($i = 0; $i < $length; $i++) {
  $script_tags->item($i)->parentNode->removeChild($script_tags->item($i));
}

$message = $doc->saveHTML();

Only issue is that you then need to have DOMDocument for it to work.

Also see: http://stackoverflow.com/a/1732454/951577
https://gist.github.com/alexpearce/42e3f06274d5df814cf1 <- Specially crafted script to break certain RegEx matches (including, looking at it, the RegEx we're using...)
The current way of doing it is just bad:
- you can still add inline style and mess up the whole page by using CSS attributes like position, width, etc.
- not every inline JS HTML attribute gets broken - the new HTML5 ones can still be used for instance
- and those that get broken have opening <strong> inserted in the middle of a message, which also breaks HTML validation
I'll push this. Would be nice to get this fixed in 1.8.6
Hi,

Thank you for your report. We have pushed this issue to our Github repository for further analysis where you can track our commits and progress with fixing this bug. Discussions regarding this bug may also take place there too.

Follow this link to visit the issue on Github: https://github.com/mybb/mybb/issues/2068

Thanks for contributing to MyBB!

Regards,
The MyBB Group