Charset problem reading filenames

Alles, was PHP betrifft, kann hier besprochen werden.

Charset problem reading filenames

Postby olbion » 25. January 2010 14:47

I have a problem which I think is charset-related. I have written a little app which does the following:
1. It downloads, using 'wget', a web page and saves it as X
2. When I access a webpage, a php script looks for the file X (using scandir and looping for matches) and reads and outputs its content

Simple enough. This works fine for "normal" characters. But whenever special characters, such as åäö, are used the script will decide that the file X (e.g. "läte") does not exist. After a lot of trial and error I can conclude that:
1. The script will state that the two strings do not compare (testing with ===, ==, strcmp, stricmp) even though they are output identically (i.e. the script states that "läte" does not equal "läte").
2. Using mb_detect_encoding(), PHP claims that both strings are encoded with UTF-8. (If the filename does not contain åäö it states it's encoded with ASCII)
3. If I encode the strings with htmlentities(), the output is different however:
String from PHP: läte
String from file system: la�te
4. The length of the strings is also different; the string from the file system is constantly one character longer.

I have tested this on both Mac OS and on Ubuntu with the same result. Both systems are setup with UTF-8, confirmed both by running "locale" and the headers received from Apache.

Any ideas what's going on? How can I match a multibyte file system string with the same multibyte string specified in php?

Thank you!
olbion
 
Posts: 1
Joined: 25. January 2010 14:38

Return to PHP

Who is online

Users browsing this forum: No registered users and 4 guests

cron