Oct 23 2008

Using UTF8 In Your CakePHP App

Published by at 9:31 am under CakePHP,CakePHP tips

Your correspondent has been heavily developing a side-project for the last month in CakePHP, a MVC, RoR-inspired framework for building web applications in the dreaded language PHP.

Overall, it’s a great framework. There are some sore points (documentation is solid but there are lots of gaping holes), but overall your correspondent is confident that any PHP programmer who uses CakePHP for the first time will quickly be in the I’m-never-doing-PHP-again-unless-I-have-some-kind-of-framework camp. Seriously, never again.

Still, there were some things your correspondent wished he knew when he started. This will be the first is a series of posts on CakePHP tips for newbies.

CakePHP Tip: Start in UTF, everywhere

Character encoding sucks but you kinda have to know about it. To save yourself pain, just start on day one in UTF8 and there won’t be any encoding pain down the line.

First, add this to your layouts to have your pages output an HTML header that screams to the browser I’m fucking UTF, OK?. Put this line at the top of your layout (before any other output):

<?php header('Content-type: text/html; charset=UTF-8') ;?>

Just in case the first message fell on deaf browser ears, add this inside your HEAD tags as a backup:

<?php echo $html->charset('utf-8'); ?>

Then, make sure you set your MySQL collation to utf8_general_ci before you start building your database. If you started your database in some kind of latin collation, sucks to be you. You will have to update the collation manually in every table and every column that is text, varchar or char.

Lastly, there is one more little gem you need to do. You need to set the encoding on your CakePHP MySQL settings to utf8 to ensure that when Cake talks to MySQL, it’s using UTF. In your config/database.php file add this line as the last property for each connection:

'encoding' => 'utf8'

So each database connection looks something like this:

var $default = array(
'driver' => 'mysql',
'persistent' => false,
'host' => 'localhost',
'login' => 'username',
'password' => 'password',
'database' => 'dbname',
'prefix' => '',
'encoding' => 'utf8'
);

40 responses so far

40 Responses to “Using UTF8 In Your CakePHP App”

  1. stockholmon 15 Dec 2008 at 7:24 am

    you.. really saved my life. damn, I’m from germany and we got those nasty little umlauts. It took around 2 f*cking hours to find out, that umlauts aren’t supported the standard way in cakephp… I just started hating all the damm framework but you guy, you saved my life.

    Just a few keywords for everyone, who is searching the same solution as I was, till I found yours:

    CakePHP, Umlauts, Umlaute, Forms, Message, FormHelper

    Thank you so much,

    stockholm

  2. Justinon 15 Dec 2008 at 8:23 am

    @stockholm – Glad I could help! Keep the faith, cake rocks.

  3. Georgeon 07 Jan 2009 at 11:05 am

    Thank you very much. I absolutely love how you expressed your frustration :) It is like you were there when I was talking to myself the last couple of hours. I Like your writing style as well. Keep it up.

  4. Vladon 27 Apr 2009 at 3:45 pm

    You are really great….
    Helped me a lot with special romanian characters…….

  5. Druon 28 Apr 2009 at 7:18 pm

    You rock! I’m coming from RoR’s and I am required to use PHP for this project so CakePHP it is. You have saved me big time. Thank You! Oh, and if you didn’t get the first one, Thank You!

  6. Justinon 28 Apr 2009 at 8:31 pm

    @Dru – Happy to help.

  7. Sebastienon 28 May 2009 at 5:56 am

    I’m keep having those “diamonds question marks” on letter like… é É À Ç ….grrrrr

  8. Sebastienon 28 May 2009 at 6:00 am

    Ok… I found what I made wrong…. was my default.po (language file) who was encoded in ISO-8839-1… hehe… thanks a lot buddy

  9. Boomeron 09 Jul 2009 at 1:50 pm

    Nice! I had some issues with UTF-8, but now they’re gone. BTW, do you know if there’s another special procedure for Oracle? I’m starting some projects that require it, and the databases are already created, so there’s no option there…

    Thanks and keep it up!

  10. Justinon 09 Jul 2009 at 2:03 pm

    @Boomer – Not really familar with Oracle. I might try posting to StackOverflow.com

  11. Julienon 03 Aug 2009 at 6:29 am

    Thanks a lot for this post!

  12. Nehaon 29 Aug 2009 at 12:25 pm

    Thanks a lot. Really helpful post.
    ‘encoding’ => ‘utf8′ was something I couldn’t find in the documentation.

  13. Tomon 03 Sep 2009 at 11:48 am

    Brilliant! Thank you very, very much! I really wonder why those fundamental things are not in the Cake docs.

  14. Logic Lab Posting 1 « LogicBomb Mediaon 03 Nov 2009 at 10:31 am

    [...] The short explanation: Set everything to UTF-8 at the beginning of your project. The long explanation (which is just a rehash of this awesome post at missingfeatures.com): [...]

  15. steveon 05 Nov 2009 at 9:05 pm

    Fantastic! Your last “little gem” with adding the utf encoding to the database configuration is the last little piece that I’ve been looking for — for a LONG TIME NOW!! I had resorted to using
    Configure::write(‘App.encoding’, ‘ISO-8859-1′)
    in core.php, but now I’m all UTF8, all of the time!

    Thanks for the tip!

  16. mishuon 26 Jan 2010 at 6:20 am

    Hi man,
    I have a problem displaying some hungarian chars although the document is declared UTF 8 and in database.php I added the encoding line. The problem is that when I use “&#336 ;” – with no spacing – (which is the Ő or the O with two points above) in default.po the output is the actual code not the character. Any ideas how this can be sorted? I’d really appreciate an email answer since you got no “get comments on email” checkbox. Thanks mate.

  17. Justinon 26 Jan 2010 at 9:16 am

    @mishu – I emailed you this, but for completeness, here is one ideas to try:

    &#336; needs to be output as Ő for HTML to read it as an encoded character (in the source code of output HTML file). Try passing your output through html_entity_decode – http://www.php.net/manual/en/function.html-entity-decode.php.

  18. Mailson Liraon 20 Feb 2010 at 8:59 am

    I cannot believe that I spent the last 3 hours trying to solve this.
    Just one line, and problem solved!

    Thank you!

  19. igrion 13 Mar 2010 at 5:20 am

    If you need to convert your tables to UTF-8, may try this code:

    ALTER TABLE table_name CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;

  20. Justinon 13 Mar 2010 at 6:03 am

    Thanks, good tip. Does that also update the charset on all the text columns?

  21. Keberon 09 Jun 2010 at 11:10 am

    Very thanks, I was having troubles with a basic example in a view:

    link($post['Post']['title'],”/posts/view/”.$post['Post']['id']); ?>

    the line above doesn’t show anything, after a while I’ve got rid of the html->link part and saw those “?” icons and then I figured out that it was an encoding trouble; after adding the encoding to the db connection it just rocks.

    Thanks!!

  22. Carloson 21 Jun 2010 at 3:06 pm

    Thanks!!

  23. orion 02 Oct 2010 at 3:41 pm

    hey, thanks for that.
    The encoding in database.php did the trick for me.

    Before this change data from the website looked like garbage on phpmyadmin and vice versa: data inserted manually via SQL looked like garbage on the website.
    Weird that CakePHP did “SELECT CHARACTER_SET_NAME FROM INFORMATION_SCHEMA.COLLATIONS WHERE COLLATION_NAME= ‘utf8_general_ci’;” but then somehow used the wrong encoding….

  24. Codon 29 Oct 2010 at 4:17 am

    Dude, I love you! Really! ;-)

  25. Angelon 18 Nov 2010 at 9:52 pm

    Amigo, acabas de salvarme. Soy de México, y uso caracteres no válidos como acentos y signos de interrogación para abrir y cerrar las oraciones. Desafortunadamente cakePHP insertaba caracteres erróneos en la base de datos, y esto era un dolor de cabeza; pero la solución que propones resolvió mi problema. Muchas gracias.

    Use Google Translator to translate my message :p spanish>english

  26. Raffaeleon 23 Nov 2010 at 12:32 am

    Really, this should be added to cakePHP cookbook.
    Great post

  27. blueduckon 07 Dec 2010 at 2:05 am

    Read this post, loved it, and then had to figure out a few additional things:

    1. I found that my use of htmlentities was throwing off the display of character. You need to specify the charset sort of like this:
    htmlentities($s, ENT_COMPAT, ‘UTF-8′)

    2. I was kind of pulling my hair out because I got things running pretty smoothly. But then I noticed that my logs and database were still storing funny characters. I did the encoding trick in the cakephp database config, I checked my postgres client encoding. Nothing worked. Then–and laugh if you want, but hopefully it will help someone–I realized that it was simply that my ssh shell wouldn’t display the characters correctly.

    3. Also, I’m using postgres and my database config is ‘encoding’ => ‘utf-8′ with a dash, and everything seems to be running great.

    Peace,
    The Duck

  28. lybaon 14 Jan 2011 at 3:54 am

    I had additional problems with bake generating non utf-8 files.

    The simplest way I found to overcome this is to modify template file eg. cake\console\templates\default\classes\model.ctp to include utf-8 character somewhere, e.g.: //’message’ => ‘Your custom message here ł’, (notice last non ASCII character at the end of line. then converting and saving as UTF-8 makes sure template file is utf-8. Now, model files are generated as UTF-8. Do the same for other templates.

  29. Justinon 14 Jan 2011 at 8:58 am

    In Drupal land this is called hacking core. I’m not certain, but it feels like this is a core hack, no?

  30. [...] The short explanation: Set everything to UTF-8 at the beginning of your project. The long explanation (which is just a rehash of this awesome post at missingfeatures.com): [...]

  31. Travis Berryon 15 Feb 2011 at 7:30 pm

    @Justin – Drupal is a CMS where as CakePHP is a framework. You can modify many more files before it becomes much of an issue. Granted it’s not the best to modify the Cake core files, but this is just configuration and view files. Nothing bad about adding these.

  32. Bruno Oliveiraon 21 Mar 2011 at 6:20 pm

    Great article!

    I’ve just typed the “‘encoding’ => ‘utf8′” line into my database.php file and solved the problem, thank you man!

  33. Justinon 21 Mar 2011 at 6:21 pm

    No problem, happy to help! ;>

  34. Bruno Oliveiraon 21 Mar 2011 at 6:25 pm

    I always create my databases using the utf8_unicode_ci standard, it covers many ascii codes and it’s safe to use than using utf8_general_ci.

    http://www.davidtan.org/differences-between-utf8_unicode_ci-and-utf8_general_ci/

  35. Will Oramon 05 May 2011 at 9:46 pm

    This really helped me end a day-long frustrating search for answers. And the answer was only one line of code!

  36. ACon 28 May 2011 at 6:59 am

    Great stuff, one quick google search and my answer was solved!

  37. Steve Comrieon 05 Jul 2011 at 3:01 pm

    I wish I’d found this article 2 hours ago. Thanks for posting it.

  38. David Tanon 26 Jul 2011 at 9:35 am

    Thanks, adding that ‘encoding’ line did the magic. cake rocks!

  39. TulipVorlaxon 22 Sep 2011 at 8:09 pm

    Thanks.
    I’ve changed collation in the DB of a starting project and i didn’t realise i had to do it on the columns. At least, there wasn’t many of them. I was wondering why cake kept talking about the old collation on the bottom of the default view. Changing encoding in the DB config file was doing nothing because the columns were still in the old collation.

  40. Gabriel Spiterion 15 Dec 2011 at 3:12 am

    I just wanted to thank you this was a life saver.

Trackback URI | Comments RSS

Leave a Reply