- Create Account

Charset again

Forums > HTMLHttpRequest > Charset again
Author
Message

1) Arie Molendijk Group: Guests
IP: 84.86.--.--
Hello Angus,
I post my message here, since apparently, it doesn't come through in the old thread.
Here's my question.

I downloaded your HttpRequest script, put a 'é' in the text of htmlhttprequest.html, put also a 'é' in the text of advantages.html. In both cases, the French character does not display correctly: I get '?' in FF, anaother weird sign in IE. In both files, we have '<meta http-equiv="Content-type" content="text/html; charset=utf-8" />'. I tried other charsets (uniformly), but nothing works for all browsers and all files at the same time. Do you know what's the problem? (This is something you often see with Ajax-requests).

Thanks,
Arie.

2) sdthomas Group: Members
Posts: 6 Joined: 17 Jan 2008 Location: UK IP: 81.151.--.--
a quick fix :

find  this line is the script :
xmlhttp.setRequestHeader('Content-Type', 'application/x-www-form-urlencoded');

and change it to :
xmlhttp.setRequestHeader('Content-Type', 'application/x-www-form-urlencoded;charset=UTF-8');

It would be worth creating an 'encoding' variable in the script allowing it to be set via a function call so you can easily modify it though.


3) Angus Turnbull Group: Moderators
Posts: 4042 Joined: 7 Dec 2003 Location: New Zealand IP: 203.173.--.--
I tried Arie's initial experiment with Firefox 2 under Ubuntu Linux 7.10 (GNOME desktop). I used the built-in gEdit text editor and forced the file to save in UTF-8 format (which is actually the default).

It worked perfectly for both initial and loaded content. UTF-8 is a complex encoding, and editors without good support will probably just paste in é as a "high Latin" character which will of course display as a box s it breaks UTF-8 rather thoroughly (long story). Unicode, and in particular UTF-8, really is the solution to charset problems, you just need to have a good text editor that allows you to explicitly save in that format. Simple text editors usually just won't work as they'll save in ANSI/ASCII text.

For Windows you might want to try one of the Scintilla based editors like Notepad++:
http://notepad-plus.sourceforge.net

sdthomas:  I've never really tried that, but most browsers should POST UTF-8 by default, particularly if that's the document's current charset! In any case that won't affect the initial load of HTMLHttpRequest itself, only loaded documents, and even then will probably only affect the POSTed content not the content returned by the server (which of course sets a Content-type of its own in its HTTP headers and/or the document <head>).

-  Angus.

4) Arie Molendijk Group: Guests
IP: 84.86.--.--
The sdthomas-solution didn't work for me, so I'm going to try Notepad++.
Thanks,
Arie.

5) sdthomas Group: Members
Posts: 6 Joined: 17 Jan 2008 Location: UK IP: 81.151.--.--
Ah, i didn't realise you were trying to display ANSI/ASCII text from notepad.

If you are only working with western european languages then you can keep your ANSI/ASCII text as is, and use ISO-8859-1 aka Latin 1 encoding but be sure you keep all charset declarations the same, ie in your webpages, server, database etc..

ANSI/ASCII text is single byte ie each character is stored in 1 byte.

UTF-8 Unicode is multibyte and stores characters in upto 4 bytes and is used to broadly support multibyte charsets like those used for Chinese,Russian.. characters.

6) Angus Turnbull Group: Moderators
Posts: 4042 Joined: 7 Dec 2003 Location: New Zealand IP: 203.173.--.--
Yeah, that's the other option, save in straight ASCII text and set both documents to an 8-bit charset like sdthomas suggests. He's right, the main thing is that all charsets must be the same between loaded and parent documents, and your files must be valid for that charset.

- Angus.

7) Arie Molendijk Group: Guests
IP: 84.86.--.--
Thanks. That works.
Arie.

Post a Reply:

You are not logged in, and will be posting anonymously as a guest. If you want to post using an account, please login at the top of this page.