Human actions contain relevant information regarding the cultural content of video archives for research, educational or entertainment purposes. The aim of this "MEXaction2" dataset is to support the development and evaluation of methods for 'spotting' instances of short actions in a relatively large video database. For each action class, such a method should detect instances of this class in the video database and output the temporal boundaries of these detections, with an associated 'confidence' score. This task can also be seen as 'action retrieval': the 'query' is an action class and the results are instances of the class, ordered by decreasing 'confidence' score.
The dataset contains videos from three sources:
There are two annotated actions, described and illustrated below:
The total number of annotated examples is high. The number of instances for training, validation and test is given in the next table:
Action | Nb. training instances | Nb. validation instances | Nb. test instances |
BullChargeCape | 917 | 217 | 190 |
HorseRiding | 419 | 93 | 139 |
Beside the fact that the total amount of annotated video is relatively large compared to other existing datasets, this dataset is also interesting because it raises several difficulties:
This part of the Mex dataset can be made available via https://dataset.ina.fr/corpus/index.action?request_locale=en.
This part of the Mex dataset is intended for finalizing, experimenting with and evaluating search and analysis tools for multimedia content, strictly as part of a scientific research activity.
Mexican TV videos, catalogued by the INAH, INALI, Fonoteca Nacional (Mexico) and others, were downloaded from the YouTube web channels of some Mexican organisms and converted to MP4 format. From each video, only clips containing training examples of the actions of interest were retained. This dataset is intended for scientific research only. The set of clips can be downloaded here.
Are added as training instances for the HorseRiding class the Horse Riding clips from the UCF101 dataset. Please comply with the respective terms of use of the UCF101 dataset.
Action detection and localization is evaluated here as a retrieval problem: the system must produce a list of detections (temporal boundaries) with positive scores. Sorting these results by decreasing score allows to obtain precision/recall curves and to compute the Average Precision (AP) in order to characterize the detection performance.
Definition of positive and negative detections: for the Mex dataset the actions are rare but often occur in clusters in which they are separated by relatively few frames (actions appear successively in a chain). Thus, a detection window is likely to cover several ground truth (GT) annotations. We adopt an evaluation criterion that takes this into account. GT annotations Bi are marked as detected if they are at least 50% inside a detection window A:
| A ∩ Bi | ⁄ | Bi | > 0.5 (overlap Detection)
and cover at least 20% of the time span of the detection window:
∑i | A ∩ Bi | ⁄ | A | > 0.2 (overlap GT)
In the figure above,
To support comparisons, we provide below the results obtained by an action localization system using:
Action | Average AP |
BullChargeCape | 0.5026 |
HorseRiding | 0.4076 |
Contact: if needed, you can contact us at andrei dot stoian at gmail dot com and michel dot crucianu at cnam dot fr. Please note that we can only provide limited support.
If you refer to this dataset, please call it "MEXaction2" and cite this web page http://mexculture.cnam.fr/xwiki/bin/view/Datasets/Mex+action+dataset
The INA part of the dataset, called "MEXaction", with a more challenging partitioning between training / validation / test, was employed in the following publications:
Stoian, A., Ferecatu, M., Benois-Pineau, J., Crucianu, M. Fast action localization in large scale video archives. IEEE Transactions on Circuits and Systems for Video Technology (IEEE TCSVT), DOI 10.1109/TCSVT.2015.2475835.
Stoian, A., Ferecatu, M., Benois-Pineau, J., Crucianu, M. Scalable action localization with kernel-space hashing, IEEE International Conference on Image Processing (ICIP), Québec, Canada, Sept. 27-30, 2015.
[1] H. Wang, A. Kläser, C. Schmid, and C.L. Liu. Dense trajectories and motion boundary descriptors for action recognition. International Journal of Computer Vision, 103(1): 60-79, May 2013.