A convolutional neural network based Chinese text detection algorithm via text structure modeling

Abstract

Text detection in natural scene environment plays an important role in many computer vision applications. While existing text detection methods are focused on English characters, there is strong application demands on text detection in other languages, such as Chinese. As Chinese characters are much more complex than English characters, innovative and more efficient text detection techniques are required for Chinese texts. In this paper, we present a novel text detection algorithm for Chinese characters based on a specific designed convolutional neural network (CNN). The CNN model contains a text structure component detector layer, a spatial pyramid layer and a multi-input-layer deep belief network (DBN). The CNN is pretrained via a convolutional sparse auto-encoder (CSAE) in an unsupervised way, which is specifically designed for extracting complex features from Chinese characters. In particular, the text structure component detectors enhance the accuracy and uniqueness of feature descriptors by extracting multiple text structure components in various ways. The spatial pyramid layer is then introduced to enhance the scale invariability of the CNN model for detecting texts in multiple scales. Finally, the multi-input-layer DBN is used as the fully connected layers in the CNN model to ensure that features from multiple scales are comparable. A multilingual text detection dataset, in which texts in Chinese, English and digits are labeled separately, is set up to evaluate the proposed text detection algorithm. The proposed algorithm shows a significant 10% performance improvement over the baseline CNN algorithms. In addition the proposed algorithm is evaluated over a public multilingual image benchmark and achieves state-of-the-art results for text detection under multiple languages. Furthermore a simplified version of the proposed algorithm with only general components is compared to existing general text detection algorithms on the ICDAR 2011 and 2013 datasets, showing comparable detection performance to the existing algorithms.

Publication DOI: https://doi.org/10.1109/TMM.2016.2625259
Divisions: College of Engineering & Physical Sciences > Adaptive communications networks research group
Additional Information: © 2017 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
Uncontrolled Keywords: Chinese text detection,unsupervised learning,text structure detector,convolutional neural network,Signal Processing,Media Technology,Computer Science Applications,Electrical and Electronic Engineering
Last Modified: 09 Dec 2024 08:16
Date Deposited: 17 Jan 2017 13:20
Full Text Link:
Related URLs: http://www.scop ... tnerID=8YFLogxK (Scopus URL)
PURE Output Type: Article
Published Date: 2017-03
Published Online Date: 2016-11-03
Accepted Date: 2016-10-24
Submitted Date: 2015-11-27
Authors: Ren, Xiaohang
Zhou, Yi
He, Jianhua (ORCID Profile 0000-0002-5738-8507)
Chen, Kai
Yang, Xiaokang
Sun, Jun

Download

[img]

Version: Accepted Version


Export / Share Citation


Statistics

Additional statistics for this record