{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Video Classification with a CNN-RNN Architecture" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Original Author:** Sayak Paul \n", "**Date created:** 2021/05/28 \n", "**Last modified:** 2021/06/05 \n", "**Description:** Training a video classifier with transfer learning and a recurrent model on the UCF101 dataset. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This example demonstrates video classification, an important use-case with applications in recommendations, security, and so on. We will be using the UCF101 dataset to build our video classifier. The dataset consists of videos categorized into different actions, like cricket shot, punching, biking, etc. This dataset is commonly used to build action recognizers, which are an application of video classification.\n", "\n", "A video consists of an ordered sequence of frames. Each frame contains spatial information, and the sequence of those frames contains temporal information. To model both of these aspects, we use a hybrid architecture that consists of convolutions (for spatial processing) as well as recurrent layers (for temporal processing). Specifically, we'll use a Convolutional Neural Network (CNN) and a Recurrent Neural Network (RNN) consisting of GRU layers. This kind of hybrid architecture is popularly known as a CNN-RNN." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "from tensorflow_docs.vis import embed\n", "from tensorflow import keras\n", "from imutils import paths\n", "\n", "import matplotlib.pyplot as plt\n", "import tensorflow as tf\n", "import pandas as pd\n", "import numpy as np\n", "import imageio\n", "import cv2\n", "import os" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [], "source": [ "IMG_SIZE = 224\n", "BATCH_SIZE = 64\n", "EPOCHS = 12\n", "\n", "MAX_SEQ_LENGTH = 20\n", "NUM_FEATURES = 2048" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Data Collection \n", "\n", "In order to keep the runtime of this example relatively short, we will be using a subsampled version of the original UCF101 dataset. You can refer to this notebook to know how the subsampling was done." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "!wget -q https://git.io/JGc31 -O ucf101_top5.tar.gz\n", "!tar xf ucf101_top5.tar.gz" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Data Preparation\n", "\n", "*P.S. I already did the preparation and saved it to npy files in order to make the training faster if you want to skip data preparation part.*" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Total videos for training: 594\n", "Total videos for testing: 224\n" ] }, { "data": { "text/html": [ "
\n", " | video_name | \n", "tag | \n", "
---|---|---|
495 | \n", "v_TennisSwing_g10_c06.avi | \n", "TennisSwing | \n", "
160 | \n", "v_PlayingCello_g14_c02.avi | \n", "PlayingCello | \n", "
455 | \n", "v_ShavingBeard_g22_c06.avi | \n", "ShavingBeard | \n", "
532 | \n", "v_TennisSwing_g16_c01.avi | \n", "TennisSwing | \n", "
332 | \n", "v_Punch_g22_c02.avi | \n", "Punch | \n", "
341 | \n", "v_Punch_g23_c04.avi | \n", "Punch | \n", "
280 | \n", "v_Punch_g14_c02.avi | \n", "Punch | \n", "
11 | \n", "v_CricketShot_g09_c05.avi | \n", "CricketShot | \n", "
486 | \n", "v_TennisSwing_g09_c04.avi | \n", "TennisSwing | \n", "
445 | \n", "v_ShavingBeard_g21_c03.avi | \n", "ShavingBeard | \n", "