Human action recognition remains a challenging problem for researchers. Several action representation approaches have been proposed to improve the action recognition performance. Recently, local space-time features have become a popular representation approach for human actions in video sequences. Many different space-time detectors and descriptors have been proposed. They are evaluated on different datasets using different experimental conditions. In this paper, the performance of Cuboid detector is evaluated with four space-time description methods; namely, Gradient, HOG, HOF and HOG-HOF. All descriptors were tested on two datasets (KTH and Weizmann) using the bag-of-words model and Support Vector Machine.