Unicode coming to PHP 6

The move from PHP 5 to PHP 6 will be a painful one. But once it’s done, I hope that it will be easier to handle safe web development for a global, multi-language internet.

After all these years, we still have major problems with encoding character sets and security vulnerabilities caused by improper use thereof. Many still think that addslashes is an effective method to avoid database injection. Chris Shiflett has put the addslashes vs. mysql_real_escape_string debate to rest. Thankfully, addslashes goes away in PHP 6.

To this day, I regularly log into my.yahoo.com and see hex data mangled in the headline!

KataUnix points out that you should set these variables in your mysql ini file:

default-character-set=utf8
default-collation=utf8_general_ci

In case you aren’t sure how your installation is set up, run this command:

show variables;

and make sure it matches the above values.

The tough part is even if you get the character sets running correctly, the tools you use to view the output may still be insufficient.

Oh, and don’t forget to have your web server send out the correct content type.

About these ads

3 responses to “Unicode coming to PHP 6

  1. Justin Swanhart

    I think that using UTF-8 across the board in the database is a bad idea. Indexes on UTF-8 columns and in memory string columns consume three bytes for every character, irrespective of the actual storage requirement for that character. You’ll use up your key buffer or buffer pool faster, you’ll make larger temporary tables and in general consume 3x the memory for sorting and grouping operations on the UTF-8 columns.

    Use Unicode sparingly where required and leave all other fields as latin1 to save space and improve performance.

  2. Matthew Montgomery

    mysql> show variables like ‘character_set%';

    You’re looking for character_set_server to be utf8

    mysql> show variables like ‘collation%';

    collation_server should be utf8_general_ci

  3. @Justin: That’s true. It all depends on your particular situation. Which is why I’ll bet when all is said and done PHP 6 will be another painful transition.

    One thing I’ve never seen discussed is if this problem is mitigated by packing keys on a MyISAM table (where doing so is desirable).

    @Matthew: Thanks for the improved commands to filter out unnecessary values.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s