Attention-Based Two-Stream Network for Multi-View Action Recognition Using Skeletal Data

Document Type : Original Article

Authors

1 Educational Department of Software Computer Engineering - Technical Faculty - Shahrekord University - Shahrekord

2 Faculty of Computer Engineering, University of Shahrekord, Shahrekord, Iran

Abstract

With the advancement of technology and the increasing use of intelligent machines, Human Action Recognition systems have become an important topic in the field of machine vision. In recent years, thanks to the concise and useful features of skeletal data, skeleton-based Action Recognition methods using graph convolutional neural networks (GCNs) have achieved significant performance. In previous methods, one-dimensional local convolutions used to investigate temporal relationships between adjacent frames and neglect correlations between non-adjacent frames. On the other hand, human movements include many changes that show a strong dependency between joint movements. Therefore, the recognition of an action requires a comprehensive analysis of correlations between joints in the spatial and temporal domain. In this paper, we propose MV AT-AR networks that learn the correlations between joints at different times by using the attention mechanism. The proposed network architecture uses two input streams and reflects the different characteristics of the human skeleton by using the skeletal complement graph, which enables Action Recognition with high accuracy. Evaluation on the NTU RGB+D dataset shows that the proposed network achieves an accuracy of 96.7%.

Keywords

Main Subjects