End to End Text Detection and Recognition of Natural Images using Dictionary

Document Type : Original Article

Authors

1 Islamic Azad University Semnan Branch

2 Young Researchers and Elite Club, Semnan Branch, Islamic Azad University, Semnan, Iran

3 Department of Electronic Engineering, Garmsar Branch, Islamic Azad University, Garmsar, Iran

Abstract

In recent years, text detection and recognition in natural images have been extensively studied.In this study, a robust multi-oriented scene text localization system was proposed to obtain high efficiency in text detection based on a convolutional neural network (CNN). The proposed method includes three layers of feature extraction, feature-merging, and output. An improved ReLU layer (i.ReLU) is introduced in the feature extraction layer. An improved inception layer (i.inception) is also provided to detect texts with valuable information.An extra layer has been used to improve the feature extraction, which enables the proposed structureto detect multi-oriented even curved and vertical texts. We have proposed a pipeline framework for character recognition.The proposed pipeline framework consists of two parallel pipelines that are processed at the same time, and can recognize 62 characters. The first pipeline consists of cropped words and the second pipeline consists of text angles. Then, we formed a dictionary and used it to correct the possible error of the recognized words. Experiments on the ICDAR 2013, ICDAR 2015 and ICDAR 2019 datasets demonstrated the architectural superiority of the proposed structure over the previous works.

Keywords


[1] Neumann, L. and Matas, J., 2010, November. A method for text localization and recognition in real-world images. In Asian conference on computer vision (pp. 770-783). Springer, Berlin, Heidelberg.
 
[2] Chen, J., Zhao, H., Yang, J., Zhang, J., Li, T. and Wang, K., 2017. An intelligent character recognition method to filter spam images on cloud. Soft Computing, 21(3), pp.753-763.
[3] Zhu, Y., Yao, C. and Bai, X., 2016. Scene text detection and recognition: Recent advances and future trends. Frontiers of Computer Science, 10(1), pp.19-36.
[4] Zhu, W., Lou, J., Chen, L., Xia, Q. and Ren, M., 2017. Scene text detection via extremal region based double threshold convolutional network classification. PloS one, 12(8), p.e0182227.
[5] Shi, B., Wang, X., Lyu, P., Yao, C. and Bai, X., 2016. Robust scene text recognition with automatic rectification. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4168-4176).
[6] Ren, X., Zhou, Y., Huang, Z., Sun, J., Yang, X. and Chen, K., 2017. A novel text structure feature extractor for Chinese scene text detection and recognition. IEEE Access, 5, pp.3193-3204.
[7] Hanif, S.M. and Prevost, L., 2009, July. Text detection and localization in complex scene images using constrained adaboost algorithm. In 2009 10th international conference on document analysis and recognition (pp. 1-5). IEEE.
[8] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y. and Xue, X., 2018. Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia, 20(11), pp.3111-3122.
[9] Yao, C., Bai, X. and Liu, W., 2014. A unified framework for multioriented text detection and recognition. IEEE Transactions on Image Processing, 23(11), pp.4737-4749.
Liao, M., Shi, B. and Bai, X., 2018. Textboxes++: A single-shot oriented scene text detector. IEEE transactions on image processing, 27(8), pp.3676-3690.
Naiemi, F., Ghods, V. and Khalesi, H., 2019. An efficient character recognition method using enhanced HOG for spam image detection. Soft Computing, 23(22), pp.11759-11774.
Ye, Q. and Doermann, D., 2014. Text detection and recognition in imagery: A survey. IEEE transactions on pattern analysis and machine intelligence, 37(7), pp.1480-1500.
Cho, H., Sung, M. and Jun, B., 2016. Canny text detector: Fast and robust scene text localization algorithm. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 3566-3573).
Epshtein, B., Ofek, E. and Wexler, Y., 2010, June. Detecting text in natural scenes with stroke width transform. In 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (pp. 2963-2970). IEEE.
Jaderberg, M., Simonyan, K., Vedaldi, A. and Zisserman, A., 2014. Synthetic data and artificial neural networks for natural scene text recognition. arXiv preprint arXiv:1406.2227.
Wang, T., Wu, D.J., Coates, A. and Ng, A.Y., 2012, November. End-to-end text recognition with convolutional neural networks. In Proceedings of the 21st international conference on pattern recognition (ICPR2012) (pp. 3304-3308). IEEE.
Jaderberg, M., Vedaldi, A. and Zisserman, A., 2014, September. Deep features for text spotting. In European conference on computer vision (pp. 512-528). Springer, Cham.
Vasilopoulos, N. and Kavallieratou, E., 2017. Unified layout analysis and text localization framework. Journal of Electronic Imaging, 26(1), p.013009.
Neumann, L. and Matas, J., 2015. Real-time lexicon-free scene text localization and recognition. IEEE transactions on pattern analysis and machine intelligence, 38(9), pp.1872-1885.
Jaderberg, M., Simonyan, K., Vedaldi, A. and Zisserman, A., 2014. Deep structured output learning for unconstrained text recognition. arXiv preprint arXiv:1412.5903.
Jeong, M. and Jo, K.H., 2015, January. Multi language text detection using fast stroke width transform. In 2015 21st Korea-Japan Joint Workshop on Frontiers of Computer Vision (FCV) (pp. 1-4). IEEE.
Ye, Q., Huang, Q., Gao, W. and Zhao, D., 2005. Fast and robust text detection in images and video frames. Image and vision computing, 23(6), pp.565-576.
Pan, Y.F., Hou, X. and Liu, C.L., 2010. A hybrid approach to detect and localize texts in natural scene images. IEEE transactions on image processing, 20(3), pp.800-813.
Jain, A.K. and Yu, B., 1998. Automatic text location in images and video frames. Pattern recognition, 31(12), pp.2055-2076.
Koo, H.I. and Kim, D.H., 2013. Scene text detection via connected component clustering and nontext filtering. IEEE transactions on image processing, 22(6), pp.2296-2305.
Yao, C., Bai, X., Liu, W., Ma, Y. and Tu, Z., 2012, June. Detecting texts of arbitrary orientations in natural images. In 2012 IEEE conference on computer vision and pattern recognition (pp. 1083-1090). IEEE.
Liao, M., Shi, B., Bai, X., Wang, X. and Liu, W., 2017, February. Textboxes: A fast text detector with a single deep neural network. In Proceedings of the AAAI conference on artificial intelligence (Vol. 31, No. 1).
Jiang, Y., Zhu, X., Wang, X., Yang, S., Li, W., Wang, H., Fu, P. and Luo, Z., 2017. R2cnn: rotational region cnn for orientation robust scene text detection. arXiv preprint arXiv:1706.09579.
Luo, C., Jin, L. and Sun, Z., 2019. Moran: A multi-object rectified attention network for scene text recognition. Pattern Recognition, 90, pp.109-118.
Zheng, Y., Iwana, B.K. and Uchida, S., 2019. Mining the displacement of max-pooling for text recognition. Pattern Recognition, 93, pp.558-569.
Zhou, X., Yao, C., Wen, H., Wang, Y., Zhou, S., He, W. and Liang, J., 2017. East: an efficient and accurate scene text detector. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition (pp. 5551-5560).
Liu, F., Chen, C., Gu, D. and Zheng, J., 2019. FTPN: scene text detection with feature pyramid based text proposal network. IEEE Access, 7, pp.44219-44228.
Tian, Z., Huang, W., He, T., He, P. and Qiao, Y., 2016, October. Detecting text in natural image with connectionist text proposal network. In European conference on computer vision (pp. 56-72). Springer, Cham.
Huang, W., Qiao, Y. and Tang, X., 2014, September. Robust scene text detection with convolution neural network induced mser trees. In European conference on computer vision (pp. 497-511). Springer, Cham.
Wang, R., Sang, N. and Gao, C., 2015. Text detection approach based on confidence map and context information. Neurocomputing, 157, pp.153-165.
Yang, Q., Cheng, M., Zhou, W., Chen, Y., Qiu, M., Lin, W. and Chu, W., 2018. Inceptext: A new inception-text module with deformable psroi pooling for multi-oriented scene text detection. arXiv preprint arXiv:1805.01167.
Ghanei, S. and Faez, K., 2015. Robust localization of texts in real-world images. International Journal of Pattern Recognition and Artificial Intelligence, 29(07), p.1555012.
Ghavidel, J., Ahmadyfard, A. and Zahedi, M., 2019. Natural scene text localization using edge color signature. International Journal of Nonlinear Analysis and Applications, 10(1), pp.229-237.
Islam, M.R., Mondal, C., Azam, M.K. and Islam, A.S.M.J., 2016, May. Text detection and recognition using enhanced MSER detection and a novel OCR technique. In 2016 5th International Conference on Informatics, Electronics and Vision (ICIEV) (pp. 15-20). IEEE.
Zhang, Y., Nie, S., Liu, W., Xu, X., Zhang, D. and Shen, H.T., 2019. Sequence-to-sequence domain adaptation network for robust text image recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 2740-2749).
Wang, Q., Huang, Y., Jia, W., He, X., Blumenstein, M., Lyu, S. and Lu, Y., 2020. FACLSTM: ConvLSTM with focused attention for scene text recognition. Science China Information Sciences, 63(2), pp.1-14.
Hong, S., Roh, B., Kim, K.H., Cheon, Y. and Park, M., 2016. PVANet: Lightweight deep neural networks for real-time object detection. arXiv preprint arXiv:1611.08588.
Zhan, F., Zhu, H. and Lu, S., 2019. Scene text synthesis for efficient and effective deep network training. arXiv preprint arXiv:1901.09193.
Huang, L., Yang, Y., Deng, Y. and Yu, Y., 2015. Densebox: Unifying landmark localization with end to end object detection. arXiv preprint arXiv:1509.04874.
Kim, K.H., Hong, S., Roh, B., Cheon, Y. and Park, M., 2016. Pvanet: Deep but lightweight neural networks for real-time object detection. arXiv preprint arXiv:1608.08021.
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V. and Rabinovich, A., 2015. Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1-9).
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J. and Wojna, Z., 2016. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2818-2826).
Bissacco, A., Cummins, M., Netzer, Y. and Neven, H., 2013. Photoocr: Reading text in uncontrolled conditions. In Proceedings of the ieee international conference on computer vision (pp. 785-792).
Amin, K.M., Shahin, A.I. and Guo, Y., 2016. A novel breast tumor classification algorithm using neutrosophic score features. Measurement, 81, pp.210-220.
Jemni, S.K., Kessentini, Y. and Kanoun, S., 2019. Out of vocabulary word detection and recovery in Arabic handwritten text recognition. Pattern Recognition, 93, pp.507-520.
Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., i Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A. and De Las Heras, L.P., 2013, August. ICDAR 2013 robust reading competition. In 2013 12th International Conference on Document Analysis and Recognition (pp. 1484-1493). IEEE.
Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S. and Shafait, F., 2015, August. ICDAR 2015 competition on robust reading. In 2015 13th International Conference on Document Analysis and Recognition (ICDAR) (pp. 1156-1160). IEEE.
Biten, A.F., Tito, R., Mafla, A., Gomez, L., Rusinol, M., Mathew, M., Jawahar, C.V., Valveny, E. and Karatzas, D., 2019, September. Icdar 2019 competition on scene text visual question answering. In 2019 International Conference on Document Analysis and Recognition (ICDAR) (pp. 1563-1570). IEEE.
Bengio, Y., 2012. Practical recommendations for gradient-based training of deep architectures. In Neural networks: Tricks of the trade (pp. 437-478). Springer, Berlin, Heidelberg.
Breuel, T.M., 2015. The effects of hyperparameters on SGD training of neural networks. arXiv preprint arXiv:1508.02788.