Wednesday 24 February 2010

Encoding Issues

A technical blog post for a change:

I've been having a really frustrating problem with my development environment recently, something that I've encountered before but never really understood or resolved definitively either: character encoding in source files.

Thanks to this blog post I think I've finally got a handle on what is going on and why. Essentially I had assumed that it was something odd in my environment but I think it's down to the fact that developers using Windows aren't setting the correct, UTF-8, encoding in their files which is therefore causing the problems for me.

The ultimate solution seems to be to fix these files by converting to UTF-8 and re-committing but I also have the issue that Maven appears to be using ASCII encoding by default so I'm not entirely sure if converting to UTF-8 will fix things. Unless of course it is expecting UTF-8 and is converting to ASCII to compile and the error message I'm getting is misleading. Will experiment later.

1 comment:

Jenny said...

Feel your pain. For me it was RSS feeds that contain accented characters and say they're UTF-8 but turn out to be different ISO 8859 character sets arbitrary selected by whatever authoring tool the content creator happened to be using. Bye-bye, a week of my time!

Not what I expected to be talking about in this sphere :)