In the previous studies on group activity recognition, various models have been proposed to relate the appearance and behavior of a person in space and time, but the video input for their learning and inference is generally short. In this study, we propose the use of longer videos as input for group activity recognition, and discuss the effect of embedding temporal information in these videos.