Automatic Image Annotation Using Bag of Words

Document Type : Original Article


Faculty of Electrical and Computer Engineering, Semnan University


Due to increase using images in different life application especially internet, recently many researchers interested in understanding in web and images. Automatic image annotation means attaching words, keywords or comments to an image. The inputs for image annotation system are features which are extracted from image. In this paper, a new algorithm for automatic image annotation using bag of words (BOW) and SIFT descriptor is presented. Considering the high dimensionality of SIFT features and to achieve satisfying efficiency, we apply dimension reduced technique PCA-SIFT and K-Means algorithm. Experimental results based on the images of Corel5k dataset show that the proposed method has better performance in precision and time measures.


[1] D. Zhang , Md. Monirul Islam,G. J. Lu “A review on automatic image annotation techniques,” Pattern Recognition, vol 45 ,pp. 346-362, 2012.
[2] R. Datta, D. Joshi, J. Li, and J. Wang, “Image retrieval: Ideas, influences, and trends of the new age,” ACM Comput. Surveys (CSUR),vol. 40, no. 2, pp. 5, 2008.
[3] Y. Liu, D. Zhang, G. Lu, and W. Ma,“survey of content-based image retrieval with high-level semantics,” Pattern Recognition, vol. 40, no. 1, pp. 262–282, 2007.
[4] Y. Han, F. Wu, Q. Tian, Y. Zhuang “Image annotation by input-output structural grouping sparsity,” IEEE Transactions on Image Processing, vol. 21, no. 6, pp. 3066-3079, 2012.
[5] H. Fu, Z. Chi, D. Feng, “Recognition of attentive objects with a concept association network for image annotation,” Pattern Recognition, vol 43, no. 10, pp. 3539-3547, 2010.
[6] J. Liu, M. Li, Q. Liu, H. Lu, and S. Ma, “Image annotation via graph learning,” Pattern Recognition, vol. 42, no. 2, pp. 218–228, Feb. 2009.
[7] Su, Ja-Hwung, et al. “Effective semantic annotation by image-to-concept distribution model,” IEEE Transactions on Multimedia, vol. 13, no. 3, pp. 530-538, 2011.
[8] S.Zhang, J. Huang, H. Li, and D. N. Metaxas, “Automatic image annotation and retrieval using group sparsity,” IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, vol. 42, no. 3, pp. 838-849, 2012.
[9] Y. Yang, Z. Huang, Y. Yang, J. Liu, H. T. Shen, and J. Luo, “Local image tagging via graph regularized joint group sparsity, ” Pattern Recognition, vol.46, no.5, pp.1358-1368, 2013.
[11] Wan S. “Image annotation using the simple decision tree,” International Conference on Management of e-Commerce and e-Government, pp. 141-146, 2011.
[12] V. Maihami, and F.Yaghmaee, “A review on the application of structured sparse representation at image annotation,” Artificial Intelligence Review, pp. 1-18, 2016, In Press.
[13] V.Maihami, and F. Yaghmaee, “Fuzzy Neighbor Voting for Automatic Image Annotation,” Journal of Electrical and Computer Engineering Innovations, pp. 9-16, 2016.
[14] C.F.Tsai, “Bag-of-words representation in image annotation: A review,” ISRN Artificial Intelligence, 2012.
[15] M.U. Kim, and K.Yoon, “Performance evaluation of large-scale object recognition system using bag-of-visual words model” Multimedia Tools and Applications, vol. 74, no. 7, pp. 2499-2517, 2015.
[16] M.Oszust, “BDSB: Binary descriptor with shared pixel blocks,” Journal of Visual Communication and Image Representation, 2016, In Press.
[17] G. H. Liu, J. Y. Yang, and Z. Li, “Content-based image retrieval using computational visual attention model,” Pattern Recognition, vol. 48, no. 8, pp. 2554-2566, 2015.
[18] A.G. Faheema, S. Rakshit, “Feature selection using bag of visual words representation”, IEEE 2nd International Advance Computing Conference (IACC), pp. 151-156, 2010.
[19] R. Valenzuela, W. Schwartz, H. Pedrini, “Dimensionality Reduction Through PCA over SIFT and SURF Descriptors,” University of Campinas, CIS, 2012.
[20] D.Nister and H.Stewnius, “Scalable recognition with a vocabulary tree.” In Proc. of IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1261-1268, Washington DC, USA, 2006.
[21] J. Yang, Y. Jiang, A. G. Hauptmann and C. Ngo, “Evaluating Bag-Of-Visual-Words Representation in Scene Classification,” In Proc. of International Workshop on Multimedia Information Retrieval (MIR), pp. 197-206, 2007.
[22] J. Jeon, V. Lavrenko, R. Manmatha, “Automatic image annotation and retrieval using cross-media relevance models,” In 26th annual international ACM SIGIR conference on research and development in information retrieval, ACM, Toronto, pp 119-126, 2003.
[23] V. Lavrenko, R. Manmatha, J. Jeon, “model for learning the semantics of pictures,” In 16th conference on advances in neural information processing systems (NIPS), Vancouver, MIT Press, Canada, pp. 8-13, 2003.
[24] A. Yavlinsky, E. Schofield, and S. Ruger, “Automated image annotation using global features and robust nonparametric density estimation,” In Proc. ACM Int. Conf. Image Video Retrieval, pp. 507–517, 2005.
[25] S. Zhu, X. Tan, “A novel automatic image annotation method based on multi-instance learning,” Procedia Eng., vol. 15, pp. 3439-3444, 2011.
[26] N. El-Bendary , T. Kim , A. Hassanien, M. Sami, “Automatic image annotation approach based on optimization of classes scores,” Computing, 2014.