php之txt整本上传小说到mysql

最近客户网站需要开发使用txt上传章节
一共二种模式:正则模式、自动识别
1、检测内容编码
function detect_encoding($file) {
$list = array('GBK', 'UTF-8', 'UTF-16LE', 'UTF-16BE', 'ISO-8859-1');
$str = file_get_contents($file);
foreach ($list as $item) {
$tmp = mb_convert_encoding($str, $item, $item);
if (md5($tmp) == md5($str)) {
return $item;
}
}
return null;
}
2、非UTF-8编码,转码
function characet($data){
if( !empty($data) ){
$fileType = mb_detect_encoding($data , array('UTF-8','GBK','LATIN1','BIG5'));
if( $fileType != 'UTF-8'){
$data = mb_convert_encoding($data ,'utf-8' , $fileType);
}
}
return $data;
}
接下来区分正则模式和自动识别,正则模式就是指章节名前面带符号,比如@@、##
3、自动识别章节名
function readList($content) {
$list = array();
if (preg_match_all('/\n.{0,30}(第(?:一|二|三|四|五|六|七|八|九|十|百|千|万|两|廿|卅|卌|零|〇|1|2|3|4|5|6|7|8|9|0|[0-9])+(?:章|节|回).{0,80})\n/', $content, $match, PREG_OFFSET_CAPTURE) || preg_match_all('/\n((?:d|第|弟|递|滴|低|地|底|帝|的)*(?:一|二|三|四|五|六|七|八|九|十|百|千|万|两|廿|卅|卌|零|〇|1|2|3|4|5|6|7|8|9|0|[0-9])+(?:部|卷|集|篇|册|章|节|回).{0,80})\n/', $content, $match, PREG_OFFSET_CAPTURE)) {
if ($match[0][0][1] > 1000) {
array_unshift($match[0], array('作品相关', -8));
array_unshift($match[1], array('作品相关', -8));
}
$c = 0;
foreach ($match[1] as $key => $val) {
$start = $match[0][$key][1] + strlen($match[0][$key][0]);
$end = isset($match[0][$key + 1]) ? $match[0][$key + 1][1] : strlen($content);
$chapter = splitword(substr($content, $start, $end - $start), 0, conf("splitword"));
if (is_array($chapter)) {
foreach ($chapter as $k => $v) {
$list[$c] = array('0' => trim($val[0])." (".($k+1).")", '1' => iconv('UTF-8', 'UTF-8//IGNORE', str_replace("\n", "\r\n\r\n ", $v)));
$c++;
}
} else {
if (empty($chapter)) continue;
$list[$c] = array('0' => trim($val[0]), '1' =>iconv('UTF-8', 'UTF-8//IGNORE', str_replace("\n", "\r\n\r\n ", $chapter)));
$c++;
}
}
} else {
$chapter = splitword($content, 0, conf("splitword"));
if (!empty($chapter)) {
$chapter = is_array($chapter) ? $chapter : array($chapter);
foreach ($chapter as $key => $val) {
$list[] = array('0' => "第 ".($key+1)." 章节", '1' => str_replace("\n", "\r\n\r\n ", $val));
}
}
}
return $list;
}
4、正则模式
function ExpChapter($content,$expStr){
preg_match_all("/".$expStr."([\s\S]*?(?:(?=".$expStr.")|$))/i",$content,$matche);
$matches[] = $matche[1];
$c = 0;
foreach ($matches[0] as $k=>$v){
$contentArr = explode("\r",$v);
$title = $contentArr[0];
$content = str_replace($contentArr[0],'', $v);
if (empty($content)) continue;
$list[$c] = array('0' => trim($title), '1' =>iconv('UTF-8', 'UTF-8//IGNORE', str_replace("\n", "\r\n\r\n ", $content)));
$c++;
}
return $list;
}
声明:本站所有文章,如无特殊说明或标注,均为本站原创发布。任何个人或组织,在未征得本站同意时,禁止复制、盗用、采集、发布本站内容到任何网站、书籍等各类媒体平台。如若本站内容侵犯了原著者的合法权益,可联系我们进行处理。