Check Unicode on a web server with php

If your filenames requires characters with diacritics or any Unicode characters and not only the strict latin characters (26 letters + 10 numbers + some symbols and punctuation marks), the web server should be checked and fixed.

Checks

First character check.

    $filename = 'éfilé.jpg';
    if (basename($filename) != $filename) {
        echo sprintf('An error occurs when testing function "basename(\'%s\') : %s".', $filename, basename($filename));
    } else {
        echo 'Success!';
    }
    
An error occurs when testing function "basename('éfilé.jpg') : filé.jpg".

Command line via web check (comparaison with a trivial function).

    // @see http://www.php.net/manual/function.escapeshellarg.php#111919
    function escapeshellarg_unicode($string)
    {
        return "'" . str_replace("'", "'\\''", $string) . "'";
    }
    $filename = "File~1 -À-é-ï-ô-ů-ȳ-Ø-ß-ñ-Ч-Ł-'.Test.png";
    if (escapeshellarg($filename) != escapeshellarg_unicode($filename)) {
        echo sprintf('An error occurs when testing function "escapeshellarg(\'%s\')": %s', $filename, escapeshellarg_unicode($filename));
    } else {
        echo 'Success!';
    }
    
An error occurs when testing function "escapeshellarg('File~1 -À-é-ï-ô-ů-ȳ-Ø-ß-ñ-Ч-Ł-'.Test.png')": 'File~1 -À-é-ï-ô-ů-ȳ-Ø-ß-ñ-Ч-Ł-'\''.Test.png'

Fix for Apache (here for Debian and derivative distribution)

Your server is not fully compatible with Unicode. The following fix (or another one) is required.


Two solutions are possible, and they require to config the file /etc/apache2/envvars, where it is indicated:

        ## The locale used by some modules like mod_dav
        export LANG=C
        ## Uncomment the following line to use the system default locale instead:
        #. /etc/default/locale
    

Don‘t forget to relaunch the server between two tests.

        sudo systemctl restart apache2
    

In fact, the default locale of Apache is "C" for historic and geographic reasons (USA based), so it should be changed to any UTF-8 compliant locale, for example the default locale of Debian, "en_US.UTF-8". Apache does not apply it by default, so it should be fixed.

Ideally, the default locale of Apache should be the generic "C.UTF-8", but it is not possible, because American people wouldn't understand why they would lose their "en_US.UTF-8".