Oct 23 2008
Using UTF8 In Your CakePHP App
Your correspondent has been heavily developing a side-project for the last month in CakePHP, a MVC, RoR-inspired framework for building web applications in the dreaded language PHP.
Overall, it’s a great framework. There are some sore points (documentation is solid but there are lots of gaping holes), but overall your correspondent is confident that any PHP programmer who uses CakePHP for the first time will quickly be in the I’m-never-doing-PHP-again-unless-I-have-some-kind-of-framework camp. Seriously, never again.
Still, there were some things your correspondent wished he knew when he started. This will be the first is a series of posts on CakePHP tips for newbies.
CakePHP Tip: Start in UTF, everywhere
Character encoding sucks but you kinda have to know about it. To save yourself pain, just start on day one in UTF8 and there won’t be any encoding pain down the line.
First, add this to your layouts to have your pages output an HTML header that screams to the browser I’m fucking UTF, OK?. Put this line at the top of your layout (before any other output):
<?php header('Content-type: text/html; charset=UTF-8') ;?>
Just in case the first message fell on deaf browser ears, add this inside your HEAD tags as a backup:
<?php echo $html->charset('utf-8'); ?>
Then, make sure you set your MySQL collation to utf8_general_ci before you start building your database. If you started your database in some kind of latin collation, sucks to be you. You will have to update the collation manually in every table and every column that is text, varchar or char.
Lastly, there is one more little gem you need to do. You need to set the encoding on your CakePHP MySQL settings to utf8 to ensure that when Cake talks to MySQL, it’s using UTF. In your config/database.php file add this line as the last property for each connection:
'encoding' => 'utf8'
So each database connection looks something like this:
var $default = array( 'driver' => 'mysql', 'persistent' => false, 'host' => 'localhost', 'login' => 'username', 'password' => 'password', 'database' => 'dbname', 'prefix' => '', 'encoding' => 'utf8' );
you.. really saved my life. damn, I’m from germany and we got those nasty little umlauts. It took around 2 f*cking hours to find out, that umlauts aren’t supported the standard way in cakephp… I just started hating all the damm framework but you guy, you saved my life.
Just a few keywords for everyone, who is searching the same solution as I was, till I found yours:
CakePHP, Umlauts, Umlaute, Forms, Message, FormHelper
Thank you so much,
stockholm
@stockholm – Glad I could help! Keep the faith, cake rocks.
Thank you very much. I absolutely love how you expressed your frustration
It is like you were there when I was talking to myself the last couple of hours. I Like your writing style as well. Keep it up.
You are really great….
Helped me a lot with special romanian characters…….
You rock! I’m coming from RoR’s and I am required to use PHP for this project so CakePHP it is. You have saved me big time. Thank You! Oh, and if you didn’t get the first one, Thank You!
@Dru – Happy to help.
I’m keep having those “diamonds question marks” on letter like… é É À Ç ….grrrrr
Ok… I found what I made wrong…. was my default.po (language file) who was encoded in ISO-8839-1… hehe… thanks a lot buddy
Nice! I had some issues with UTF-8, but now they’re gone. BTW, do you know if there’s another special procedure for Oracle? I’m starting some projects that require it, and the databases are already created, so there’s no option there…
Thanks and keep it up!
@Boomer – Not really familar with Oracle. I might try posting to StackOverflow.com
Thanks a lot for this post!
Thanks a lot. Really helpful post.
‘encoding’ => ‘utf8′ was something I couldn’t find in the documentation.
Brilliant! Thank you very, very much! I really wonder why those fundamental things are not in the Cake docs.
[...] The short explanation: Set everything to UTF-8 at the beginning of your project. The long explanation (which is just a rehash of this awesome post at missingfeatures.com): [...]
Fantastic! Your last “little gem” with adding the utf encoding to the database configuration is the last little piece that I’ve been looking for — for a LONG TIME NOW!! I had resorted to using
Configure::write(‘App.encoding’, ‘ISO-8859-1′)
in core.php, but now I’m all UTF8, all of the time!
Thanks for the tip!
Hi man,
I have a problem displaying some hungarian chars although the document is declared UTF 8 and in database.php I added the encoding line. The problem is that when I use “Ő ;” – with no spacing – (which is the Ő or the O with two points above) in default.po the output is the actual code not the character. Any ideas how this can be sorted? I’d really appreciate an email answer since you got no “get comments on email” checkbox. Thanks mate.
@mishu – I emailed you this, but for completeness, here is one ideas to try:
Ő needs to be output as Ő for HTML to read it as an encoded character (in the source code of output HTML file). Try passing your output through html_entity_decode – http://www.php.net/manual/en/function.html-entity-decode.php.
I cannot believe that I spent the last 3 hours trying to solve this.
Just one line, and problem solved!
Thank you!
If you need to convert your tables to UTF-8, may try this code:
ALTER TABLE table_name CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
Thanks, good tip. Does that also update the charset on all the text columns?