MyBB Community Forums

Full Version: UTF-8
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
This is really annoying me. What is UTF-8 used for (Google can't explain enough) and how is it used in PHP and HTML applications? Does it even need to be used?

Help please Sad
It's just an encoding format for pages. It basically tells the browser to render pages in a specific format, so that they can be properly displayed in the browser. There are other encoding formats, but UTF-8 stands out because it can encode any character.

It is typically used in non-English websites because of characters like ã, á, ó, ç, ä. Without UTF-8 encoding those characters wouldn't be displayed properly. Whether you need UTF-8 or not it is considered good practice to do so and is highly recommended.

In HTML5, to set the encoding as UTF-8, you would simply declare the following in the <head> section:

<meta charset="utf-8">
(2012-01-02, 12:08 PM)Fábio Maia Wrote: [ -> ]It's just an encoding format for pages. It basically tells the browser to render pages in a specific format, so that they can be properly displayed in the browser. There are other encoding formats, but UTF-8 stands out because it can encode any character.

It is typically used in non-English websites because of characters like ã, á, ó, ç, ä. Without UTF-8 encoding those characters wouldn't be displayed properly. Whether you need UTF-8 or not it is considered good practice to do so and is highly recommended.

In HTML5, to set the encoding as UTF-8, you would simply declare the following in the <head> section:

<meta charset="utf-8">

Thank you Smile
UTF-8 is a text-encoding system that allows for far more characters than standard ASCII while still being backwards compatible. It is usually far more efficient than UTF-16/32 for pages that use the latin alphabet since A-Z/a-z/0-9 and many common symbols only take 1 byte. For languages that don't use the latin alphabet, like Russian or Japanese, UTF-16 is probably better since UTF-8 can bloom into up to 5 extra bytes depending on the character.

Before UTF-x came along, there were dozens of variations of ASCII for different regions with different needs, and if you wrote text in one and tried to display it in another you'd be lucky if it were readable. That's not counting charsets for Japanese and similar languages, developed separately so not even remotely compatible with ASCII. UTF aimed to fix the guess-and-hope by including all of the characters in human language. As computers became better, the only reason for not using UTF-8/16/32 was laziness on the developer's part.

In this day and age, UTF-8 or UTF-16 should always be preferred to iso-somethingorother-outdated. UTF-32 is not all that useful for storing or transmitting text since most characters can fit in 16 bits, and the ones that can't are the only ones that need be expanded, so it ends up being more bloated than it needs to be. It's more for internal processing since UTF-8/16 end up translating to UTF-32 for software to work with.