|
|
처음 시도 php 함수 mb_detect_encoding으로 검사
파라미터에 대해서 인코딩방식이 어떤 것인지 판별하여 Return해준다.
string mb_detect_encoding ( string $str [, mixed $encoding_list= mb_detect_order() [, bool $strict= false ]] )
예제
사용은 대충 이런식으로 하게 된다.
/* Detect character encoding with current detect_order */
echo mb_detect_encoding($str);
/* "auto" is expanded to "ASCII,JIS,UTF-8,EUC-JP,SJIS" */
echo mb_detect_encoding($str, "auto");
/* Specify encoding_list character encoding by comma separated list */
echo mb_detect_encoding($str, "JIS, eucjp-win, sjis-win");
/* Use array to specify encoding_list */
$ary[] = "ASCII";
$ary[] = "JIS";
$ary[] = "EUC-JP";
echo mb_detect_encoding($str, $ary);
하지만 한글에 대해선 완벽하게 지원해주지 않는 걸 확인했다. URL의 파라미터에 직접 한글을 입력하여 한 결과 처음 자음이 ㅈ~ㅎ으로 시작하면 UTF-8로 인식한다.
따라 함수를 따로 만들거나 해야 함.
1. detect_encoding함수
Another light way to detect character encoding:
function detect_encoding($string) {
static $list = array('utf-8', 'windows-1251');
foreach ($list as $item) {
$sample = iconv($item, $item, $string);
if (md5($sample) == md5($string))
return $item;
}
return null;
}
2. Function to detect UTF-8, when mb_detect_encoding is not available it may be useful.
Function to detect UTF-8, when mb_detect_encoding is not available it may be useful.
function is_utf8($str) {
$c=0; $b=0;
$bits=0;
$len=strlen($str);
for($i=0; $i<$len; $i++){
$c=ord($str[$i]);
if($c > 128){
if(($c >= 254)) return false;
elseif($c >= 252) $bits=6;
elseif($c >= 248) $bits=5;
elseif($c >= 240) $bits=4;
elseif($c >= 224) $bits=3;
elseif($c >= 192) $bits=2;
else return false;
if(($i+$bits) > $len) return false;
while($bits > 1){
$i++;
$b=ord($str[$i]);
if($b < 128 || $b > 191) return false;
$bits--;
}
}
}
return true;
}
3. conver to Utf8 if $str is not equals to 'UTF-8'
/* *QQ: 290359552 * conver to Utf8 if $str is not equals to 'UTF-8' */ function convToUtf8($str){ if( mb_detect_encoding($str,"UTF-8, ISO-8859-1, GBK")!="UTF-8" ){ return iconv("gbk","utf-8",$str); }else{ return $str; } }
4. from PHPDIG
function isUTF8($str) { if ($str === mb_convert_encoding(mb_convert_encoding($str, "UTF-32", "UTF-8"), "UTF-8", "UTF-32")) { return true; } else { return false; } }
5. Much simpler UTF-8-ness checker using a regular expression created by the W3C:
// Returns true if $string is valid UTF-8 and false otherwise. function is_utf8($string) { // From http://w3.org/International/questions/qa-forms-utf-8.html return preg_match('%^(?: [\x09\x0A\x0D\x20-\x7E] # ASCII | [\xC2-\xDF][\x80-\xBF] # non-overlong 2-byte | \xE0[\xA0-\xBF][\x80-\xBF] # excluding overlongs | [\xE1-\xEC\xEE\xEF][\x80-\xBF]{2} # straight 3-byte | \xED[\x80-\x9F][\x80-\xBF] # excluding surrogates | \xF0[\x90-\xBF][\x80-\xBF]{2} # planes 1-3 | [\xF1-\xF3][\x80-\xBF]{3} # planes 4-15 | \xF4[\x80-\x8F][\x80-\xBF]{2} # plane 16 )*$%xs', $string); } // function is_utf8
6. Sometimes mb_detect_string is not what you need. When using pdflib for example you want to VERIFY the correctness of utf-8. mb_detect_encoding reports some iso-8859-1 encoded text as utf-8. To verify utf 8 use the following:
// // utf8 encoding validation developed based on Wikipedia entry at: // http://en.wikipedia.org/wiki/UTF-8 // // Implemented as a recursive descent parser based on a simple state machine // copyright 2005 Maarten Meijer // // This cries out for a C-implementation to be included in PHP core // function valid_1byte($char) { if(!is_int($char)) return false; return ($char & 0x80) == 0x00; } function valid_2byte($char) { if(!is_int($char)) return false; return ($char & 0xE0) == 0xC0; } function valid_3byte($char) { if(!is_int($char)) return false; return ($char & 0xF0) == 0xE0; } function valid_4byte($char) { if(!is_int($char)) return false; return ($char & 0xF8) == 0xF0; } function valid_nextbyte($char) { if(!is_int($char)) return false; return ($char & 0xC0) == 0x80; } function valid_utf8($string) { $len = strlen($string); $i = 0; while( $i < $len ) { $char = ord(substr($string, $i++, 1)); if(valid_1byte($char)) { // continue continue; } else if(valid_2byte($char)) { // check 1 byte if(!valid_nextbyte(ord(substr($string, $i++, 1)))) return false; } else if(valid_3byte($char)) { // check 2 bytes if(!valid_nextbyte(ord(substr($string, $i++, 1)))) return false; if(!valid_nextbyte(ord(substr($string, $i++, 1)))) return false; } else if(valid_4byte($char)) { // check 3 bytes if(!valid_nextbyte(ord(substr($string, $i++, 1)))) return false; if(!valid_nextbyte(ord(substr($string, $i++, 1)))) return false; if(!valid_nextbyte(ord(substr($string, $i++, 1)))) return false; } // goto next char } return true; // done } for a drawing of the statemachine see: http://www.xs4all.nl/~mjmeijer/unicode.png and http://www.xs4all.nl/~mjmeijer/unicode2.png
'Web Programming > PHP' 카테고리의 다른 글
Flex와 PHP(CodeIgniter+ZendAMF) 연동하기 (3) | 2010.10.02 |
---|---|
CodeIgniter 라이브러리에 ZendAMF 추가하기 (0) | 2010.09.29 |
[CodeIgniter] Multiple Image Upload (1) | 2009.11.26 |
CentOS 5.2 APM 구축 매뉴얼 (2) | 2009.11.03 |
PHP 사용자를 위한 EditPlus 설정 (3) | 2009.02.27 |