آشکارسازی و بازشناسی یکپارچه متن از تصاویر طبیعی با به‌کارگیری فرهنگ لغت

نوع مقاله : مقاله پژوهشی

نویسندگان

1 دانشگاه آزاد اسلامی سمنان، گروه مهندسی برق

2 باشگاه پژوهشگران جوان و نخبگان، واحد سمنان، دانشگاه آزاد اسلامی، سمنان، ایران

3 گروه مهندسی برق، واحد گرمسار، دانشگاه آزاد اسلامی، گرمسار، ایران.

چکیده

 در سال‌های اخیرآشکارسازی و بازشناسی متن در تصاویر طبیعی به‌طور گسترده مورد مطالعه قرار گرفته است. در این پژوهش، یک سیستم مکان‌یابی متن در صحنه چندجهته مقاوم برای به دست آوردن بازدهی بالا در آشکارسازی متن بر اساس شبکه عصبی پیچشی(CNN) ارائه شده است. روش پیشنهادی شامل سه لایه استخراج ویژگی، ادغام ویژگی و خروجی می‌باشد. در لایه استخراج ویژگی، یک لایه ReLU بهبود یافته(i.ReLU)  معرفی شده است. همچنین به‌منظورآشکارسازی متون با ابعاد متنوع، یک لایه inception بهبود یافته (i.inception)  ارائه شده است. سپس، برای بهبود استخراج ویژگی از یک لایه اضافی استفاده شده است که ساختار پیشنهادی را قادر می‌سازد متون چندجهته حتی منحنی و عمودی را آشکارسازی نماید. همچنین، یک چارچوب خط لوله برای بازشناسی کاراکتر پیشنهاد نموده‌ایم. چارچوب خط لوله پیشنهادی شامل دو خط لوله موازی است که به‌طور هم‌زمان پردازش می‌شوند. خط لوله اول، متشکل از کلمات برش یافته و خط لوله دوم شامل زوایای متن می‌باشد. سپس، یک فرهنگ لغت جهت اصلاح خطای احتمالی کلمات بازشناسی شده استفاده نمودیم. آزمایش‌ها بر روی مجموعه داده‌های ICDAR 2013، ICDAR 2015 وICDAR 2019، نشان از برتری بارز سیستم پیشنهادی نسبت به کارهای پیشین دارد.

کلیدواژه‌ها


عنوان مقاله [English]

End to End Text Detection and Recognition of Natural Images using Dictionary

نویسندگان [English]

  • fatemeh naiemi 1
  • Vahid Ghods 2
  • Hassan Khalesi 3
1 Islamic Azad University Semnan Branch
2 Young Researchers and Elite Club, Semnan Branch, Islamic Azad University, Semnan, Iran
3 Department of Electronic Engineering, Garmsar Branch, Islamic Azad University, Garmsar, Iran
چکیده [English]

In recent years, text detection and recognition in natural images have been extensively studied.In this study, a robust multi-oriented scene text localization system was proposed to obtain high efficiency in text detection based on a convolutional neural network (CNN). The proposed method includes three layers of feature extraction, feature-merging, and output. An improved ReLU layer (i.ReLU) is introduced in the feature extraction layer. An improved inception layer (i.inception) is also provided to detect texts with valuable information.An extra layer has been used to improve the feature extraction, which enables the proposed structureto detect multi-oriented even curved and vertical texts. We have proposed a pipeline framework for character recognition.The proposed pipeline framework consists of two parallel pipelines that are processed at the same time, and can recognize 62 characters. The first pipeline consists of cropped words and the second pipeline consists of text angles. Then, we formed a dictionary and used it to correct the possible error of the recognized words. Experiments on the ICDAR 2013, ICDAR 2015 and ICDAR 2019 datasets demonstrated the architectural superiority of the proposed structure over the previous works.

کلیدواژه‌ها [English]

  • Scene text localization
  • Text image detection
  • Multi Oriented
  • Convolutional neural network
  • Text recognition
  • End to end recognition
  • Dictionary
[1] Neumann, L. and Matas, J., 2010, November. A method for text localization and recognition in real-world images. In Asian conference on computer vision (pp. 770-783). Springer, Berlin, Heidelberg.
 
[2] Chen, J., Zhao, H., Yang, J., Zhang, J., Li, T. and Wang, K., 2017. An intelligent character recognition method to filter spam images on cloud. Soft Computing, 21(3), pp.753-763.
[3] Zhu, Y., Yao, C. and Bai, X., 2016. Scene text detection and recognition: Recent advances and future trends. Frontiers of Computer Science, 10(1), pp.19-36.
[4] Zhu, W., Lou, J., Chen, L., Xia, Q. and Ren, M., 2017. Scene text detection via extremal region based double threshold convolutional network classification. PloS one, 12(8), p.e0182227.
[5] Shi, B., Wang, X., Lyu, P., Yao, C. and Bai, X., 2016. Robust scene text recognition with automatic rectification. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4168-4176).
[6] Ren, X., Zhou, Y., Huang, Z., Sun, J., Yang, X. and Chen, K., 2017. A novel text structure feature extractor for Chinese scene text detection and recognition. IEEE Access, 5, pp.3193-3204.
[7] Hanif, S.M. and Prevost, L., 2009, July. Text detection and localization in complex scene images using constrained adaboost algorithm. In 2009 10th international conference on document analysis and recognition (pp. 1-5). IEEE.
[8] Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y. and Xue, X., 2018. Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia, 20(11), pp.3111-3122.
[9] Yao, C., Bai, X. and Liu, W., 2014. A unified framework for multioriented text detection and recognition. IEEE Transactions on Image Processing, 23(11), pp.4737-4749.
Liao, M., Shi, B. and Bai, X., 2018. Textboxes++: A single-shot oriented scene text detector. IEEE transactions on image processing, 27(8), pp.3676-3690.
Naiemi, F., Ghods, V. and Khalesi, H., 2019. An efficient character recognition method using enhanced HOG for spam image detection. Soft Computing, 23(22), pp.11759-11774.
Ye, Q. and Doermann, D., 2014. Text detection and recognition in imagery: A survey. IEEE transactions on pattern analysis and machine intelligence, 37(7), pp.1480-1500.
Cho, H., Sung, M. and Jun, B., 2016. Canny text detector: Fast and robust scene text localization algorithm. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 3566-3573).
Epshtein, B., Ofek, E. and Wexler, Y., 2010, June. Detecting text in natural scenes with stroke width transform. In 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (pp. 2963-2970). IEEE.
Jaderberg, M., Simonyan, K., Vedaldi, A. and Zisserman, A., 2014. Synthetic data and artificial neural networks for natural scene text recognition. arXiv preprint arXiv:1406.2227.
Wang, T., Wu, D.J., Coates, A. and Ng, A.Y., 2012, November. End-to-end text recognition with convolutional neural networks. In Proceedings of the 21st international conference on pattern recognition (ICPR2012) (pp. 3304-3308). IEEE.
Jaderberg, M., Vedaldi, A. and Zisserman, A., 2014, September. Deep features for text spotting. In European conference on computer vision (pp. 512-528). Springer, Cham.
Vasilopoulos, N. and Kavallieratou, E., 2017. Unified layout analysis and text localization framework. Journal of Electronic Imaging, 26(1), p.013009.
Neumann, L. and Matas, J., 2015. Real-time lexicon-free scene text localization and recognition. IEEE transactions on pattern analysis and machine intelligence, 38(9), pp.1872-1885.
Jaderberg, M., Simonyan, K., Vedaldi, A. and Zisserman, A., 2014. Deep structured output learning for unconstrained text recognition. arXiv preprint arXiv:1412.5903.
Jeong, M. and Jo, K.H., 2015, January. Multi language text detection using fast stroke width transform. In 2015 21st Korea-Japan Joint Workshop on Frontiers of Computer Vision (FCV) (pp. 1-4). IEEE.
Ye, Q., Huang, Q., Gao, W. and Zhao, D., 2005. Fast and robust text detection in images and video frames. Image and vision computing, 23(6), pp.565-576.
Pan, Y.F., Hou, X. and Liu, C.L., 2010. A hybrid approach to detect and localize texts in natural scene images. IEEE transactions on image processing, 20(3), pp.800-813.
Jain, A.K. and Yu, B., 1998. Automatic text location in images and video frames. Pattern recognition, 31(12), pp.2055-2076.
Koo, H.I. and Kim, D.H., 2013. Scene text detection via connected component clustering and nontext filtering. IEEE transactions on image processing, 22(6), pp.2296-2305.
Yao, C., Bai, X., Liu, W., Ma, Y. and Tu, Z., 2012, June. Detecting texts of arbitrary orientations in natural images. In 2012 IEEE conference on computer vision and pattern recognition (pp. 1083-1090). IEEE.
Liao, M., Shi, B., Bai, X., Wang, X. and Liu, W., 2017, February. Textboxes: A fast text detector with a single deep neural network. In Proceedings of the AAAI conference on artificial intelligence (Vol. 31, No. 1).
Jiang, Y., Zhu, X., Wang, X., Yang, S., Li, W., Wang, H., Fu, P. and Luo, Z., 2017. R2cnn: rotational region cnn for orientation robust scene text detection. arXiv preprint arXiv:1706.09579.
Luo, C., Jin, L. and Sun, Z., 2019. Moran: A multi-object rectified attention network for scene text recognition. Pattern Recognition, 90, pp.109-118.
Zheng, Y., Iwana, B.K. and Uchida, S., 2019. Mining the displacement of max-pooling for text recognition. Pattern Recognition, 93, pp.558-569.
Zhou, X., Yao, C., Wen, H., Wang, Y., Zhou, S., He, W. and Liang, J., 2017. East: an efficient and accurate scene text detector. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition (pp. 5551-5560).
Liu, F., Chen, C., Gu, D. and Zheng, J., 2019. FTPN: scene text detection with feature pyramid based text proposal network. IEEE Access, 7, pp.44219-44228.
Tian, Z., Huang, W., He, T., He, P. and Qiao, Y., 2016, October. Detecting text in natural image with connectionist text proposal network. In European conference on computer vision (pp. 56-72). Springer, Cham.
Huang, W., Qiao, Y. and Tang, X., 2014, September. Robust scene text detection with convolution neural network induced mser trees. In European conference on computer vision (pp. 497-511). Springer, Cham.
Wang, R., Sang, N. and Gao, C., 2015. Text detection approach based on confidence map and context information. Neurocomputing, 157, pp.153-165.
Yang, Q., Cheng, M., Zhou, W., Chen, Y., Qiu, M., Lin, W. and Chu, W., 2018. Inceptext: A new inception-text module with deformable psroi pooling for multi-oriented scene text detection. arXiv preprint arXiv:1805.01167.
Ghanei, S. and Faez, K., 2015. Robust localization of texts in real-world images. International Journal of Pattern Recognition and Artificial Intelligence, 29(07), p.1555012.
Ghavidel, J., Ahmadyfard, A. and Zahedi, M., 2019. Natural scene text localization using edge color signature. International Journal of Nonlinear Analysis and Applications, 10(1), pp.229-237.
Islam, M.R., Mondal, C., Azam, M.K. and Islam, A.S.M.J., 2016, May. Text detection and recognition using enhanced MSER detection and a novel OCR technique. In 2016 5th International Conference on Informatics, Electronics and Vision (ICIEV) (pp. 15-20). IEEE.
Zhang, Y., Nie, S., Liu, W., Xu, X., Zhang, D. and Shen, H.T., 2019. Sequence-to-sequence domain adaptation network for robust text image recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 2740-2749).
Wang, Q., Huang, Y., Jia, W., He, X., Blumenstein, M., Lyu, S. and Lu, Y., 2020. FACLSTM: ConvLSTM with focused attention for scene text recognition. Science China Information Sciences, 63(2), pp.1-14.
Hong, S., Roh, B., Kim, K.H., Cheon, Y. and Park, M., 2016. PVANet: Lightweight deep neural networks for real-time object detection. arXiv preprint arXiv:1611.08588.
Zhan, F., Zhu, H. and Lu, S., 2019. Scene text synthesis for efficient and effective deep network training. arXiv preprint arXiv:1901.09193.
Huang, L., Yang, Y., Deng, Y. and Yu, Y., 2015. Densebox: Unifying landmark localization with end to end object detection. arXiv preprint arXiv:1509.04874.
Kim, K.H., Hong, S., Roh, B., Cheon, Y. and Park, M., 2016. Pvanet: Deep but lightweight neural networks for real-time object detection. arXiv preprint arXiv:1608.08021.
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V. and Rabinovich, A., 2015. Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1-9).
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J. and Wojna, Z., 2016. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2818-2826).
Bissacco, A., Cummins, M., Netzer, Y. and Neven, H., 2013. Photoocr: Reading text in uncontrolled conditions. In Proceedings of the ieee international conference on computer vision (pp. 785-792).
Amin, K.M., Shahin, A.I. and Guo, Y., 2016. A novel breast tumor classification algorithm using neutrosophic score features. Measurement, 81, pp.210-220.
Jemni, S.K., Kessentini, Y. and Kanoun, S., 2019. Out of vocabulary word detection and recovery in Arabic handwritten text recognition. Pattern Recognition, 93, pp.507-520.
Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., i Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A. and De Las Heras, L.P., 2013, August. ICDAR 2013 robust reading competition. In 2013 12th International Conference on Document Analysis and Recognition (pp. 1484-1493). IEEE.
Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S. and Shafait, F., 2015, August. ICDAR 2015 competition on robust reading. In 2015 13th International Conference on Document Analysis and Recognition (ICDAR) (pp. 1156-1160). IEEE.
Biten, A.F., Tito, R., Mafla, A., Gomez, L., Rusinol, M., Mathew, M., Jawahar, C.V., Valveny, E. and Karatzas, D., 2019, September. Icdar 2019 competition on scene text visual question answering. In 2019 International Conference on Document Analysis and Recognition (ICDAR) (pp. 1563-1570). IEEE.
Bengio, Y., 2012. Practical recommendations for gradient-based training of deep architectures. In Neural networks: Tricks of the trade (pp. 437-478). Springer, Berlin, Heidelberg.
Breuel, T.M., 2015. The effects of hyperparameters on SGD training of neural networks. arXiv preprint arXiv:1508.02788.