php之txt整本上传小说到mysql

php之txt整本上传小说到mysql
php之txt整本上传小说到mysql

最近客户网站需要开发使用txt上传章节

一共二种模式:正则模式、自动识别

1、检测内容编码

    function detect_encoding($file) {
        $list = array('GBK', 'UTF-8', 'UTF-16LE', 'UTF-16BE', 'ISO-8859-1');
        $str = file_get_contents($file);
        foreach ($list as $item) {
            $tmp = mb_convert_encoding($str, $item, $item);
            if (md5($tmp) == md5($str)) {
                return $item;
            }
        }
        return null;
    }

2、非UTF-8编码,转码

    function characet($data){
        if( !empty($data) ){
        $fileType = mb_detect_encoding($data , array('UTF-8','GBK','LATIN1','BIG5'));
            if( $fileType != 'UTF-8'){
                 $data = mb_convert_encoding($data ,'utf-8' , $fileType);
            }
        }
        return $data;
    }

接下来区分正则模式和自动识别,正则模式就是指章节名前面带符号,比如@@、##

3、自动识别章节名

function readList($content) {
    $list = array();
    if (preg_match_all('/\n.{0,30}(第(?:一|二|三|四|五|六|七|八|九|十|百|千|万|两|廿|卅|卌|零|〇|1|2|3|4|5|6|7|8|9|0|[0-9])+(?:章|节|回).{0,80})\n/', $content, $match, PREG_OFFSET_CAPTURE) || preg_match_all('/\n((?:d|第|弟|递|滴|低|地|底|帝|的)*(?:一|二|三|四|五|六|七|八|九|十|百|千|万|两|廿|卅|卌|零|〇|1|2|3|4|5|6|7|8|9|0|[0-9])+(?:部|卷|集|篇|册|章|节|回).{0,80})\n/', $content, $match, PREG_OFFSET_CAPTURE)) {

        if ($match[0][0][1] > 1000) {
            array_unshift($match[0], array('作品相关', -8));
            array_unshift($match[1], array('作品相关', -8));
        }
        $c = 0;
        foreach ($match[1] as $key => $val) {
            $start = $match[0][$key][1] + strlen($match[0][$key][0]);
            $end = isset($match[0][$key + 1]) ? $match[0][$key + 1][1] : strlen($content);
            $chapter = splitword(substr($content, $start, $end - $start), 0, conf("splitword"));
            if (is_array($chapter)) {
                foreach ($chapter as $k => $v) {
                    $list[$c] = array('0' => trim($val[0])." (".($k+1).")", '1' => iconv('UTF-8', 'UTF-8//IGNORE', str_replace("\n", "\r\n\r\n    ", $v)));
                    $c++;
                }
            } else {
                if (empty($chapter)) continue;
                $list[$c] = array('0' => trim($val[0]), '1' =>iconv('UTF-8', 'UTF-8//IGNORE', str_replace("\n", "\r\n\r\n    ", $chapter)));
                $c++;
            }
        }
    } else {
        $chapter = splitword($content, 0, conf("splitword"));
        if (!empty($chapter)) {
            $chapter = is_array($chapter) ? $chapter : array($chapter);
            foreach ($chapter as $key => $val) {
                $list[] = array('0' => "第 ".($key+1)." 章节", '1' =>  str_replace("\n", "\r\n\r\n    ", $val));
            }
        }
    }
    return $list;
}

4、正则模式

function ExpChapter($content,$expStr){
    preg_match_all("/".$expStr."([\s\S]*?(?:(?=".$expStr.")|$))/i",$content,$matche);
    $matches[] = $matche[1];
    $c = 0;
    foreach ($matches[0] as $k=>$v){
        $contentArr = explode("\r",$v);
        $title = $contentArr[0];
        $content = str_replace($contentArr[0],'', $v);
        if (empty($content)) continue;
        $list[$c] = array('0' => trim($title), '1' =>iconv('UTF-8', 'UTF-8//IGNORE', str_replace("\n", "\r\n\r\n    ", $content)));
        $c++;
    }
    return $list;
}

 

 

发表回复

您的电子邮箱地址不会被公开。