Deep Learning to Clone Driving Behavior

In this project we will Train a deep neural network to drive a car like us!

The goal is to drive a car autonomously in a simulator using a deep neural network trained on human driving behavior. Udacity has provided a simulator and a python script (drive.py) that connects our DNN with the simulator. The simulator has two modes. In the "training mode" on track-1 the car can be controlled through a keyboard to generate training data. Training data consists of images captured by three cameras mounted on the car and corresponding steering angle. A model is trained on the training data that can be used to predict steering angles while driving on unseen track-2 in simulator.

Prologue

I'm a student of November, 2016 cohort. My guess is most of the submissions for this project coming in these days must be from October cohort, not November. Being a student of later cohort gave me two advantages. First, Udacity has published a helpful guide with some hints and advice here. Second, based on feedback from earlier cohort, Udacity has provided sample data for track 1 here.

Although, I drove the car around on track-1 to get a feel of problem faced by October students who did not have a joystick or a steering wheel, I knew I'll end up using the sample data provided by Udacity. This itself must have saved me about a week worth of time. Thank you Dhruv and team! One problem solved for me.

I also gathered from the above wiki guide that lot of students are struggling because they don't have adequate GPU hardware. Udacity has provided $50 credit for AWS. This is very generous of Udacity. Thank you David and team! I too don't have a powerful machine to iterate my ideas on. After having spent an "obscene amount" on AWS for past experiments, I refused to spend a dime on AWS for this project. This created a new challenge for me.

Student comments on the same wiki indicated that there are two kinds of students in the class. There are few students who are either Genius or have a prior background in the field. Most of the students are new to the subject and are really working hard to learn.

Problem Statement

So I took upon myself to find a solution that will be extremely light weight. It should not require any powerful GPU machine or Amazon EC2 instance. It should run easily on my 2012 MBP with 8 GB memory. Yet it should be fast. Each epoch should not take more than 2mins. Also, for the design and model architecture, I refused to entertain any idea or concept that were not taught in the class. Although, I do assume that you have atleast taken "Intro to Machine Learning" class. It should be a required prerequisite IMHO.

Let's begin!

In [1]:
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
from tqdm import tqdm_notebook
import time
import shutil
import os
import random
import cv2
import math
import json

import keras
from keras.preprocessing.image import *
from keras.models import Sequential, Model
from keras.layers import Convolution2D, Flatten, MaxPooling2D, Lambda, ELU
from keras.layers.core import Dense, Dropout, Activation
from keras.optimizers import Adam
from keras.callbacks import Callback
from keras.layers.normalization import BatchNormalization
from keras.regularizers import l2

from IPython.display import display # Allows the use of display() for DataFrames

# Visualizations will be shown in the notebook.
%matplotlib inline
Using TensorFlow backend.

Loading Dataset

We are using only sample data for track 1 provided by Udacity. Nothing more. It can be downloaded from here

The dataset is stored in 'driving_log.csv' file. This file has pointers to the camera image files on the disk and corresponding steering angles.

Let us load the dataset and analyze its contents.

In [2]:
# fix random seed for reproducibility
seed = 7
np.random.seed(seed)

columns = ['center', 'left', 'right', 'steering_angle', 'throttle', 'brake', 'speed']
data = pd.read_csv('driving_log.csv', names=columns)

print("Dataset Columns:", columns, "\n")
print("Shape of the dataset:", data.shape, "\n")
print(data.describe(), "\n")

print("Data loaded...")
Dataset Columns: ['center', 'left', 'right', 'steering_angle', 'throttle', 'brake', 'speed'] 

Shape of the dataset: (8036, 7) 

       steering_angle     throttle        brake        speed
count     8036.000000  8036.000000  8036.000000  8036.000000
mean         0.004070     0.869660     0.001970    28.169839
std          0.128840     0.301326     0.036565     6.149327
min         -0.942695     0.000000     0.000000     0.502490
25%          0.000000     0.985533     0.000000    30.183093
50%          0.000000     0.985533     0.000000    30.186400
75%          0.000000     0.985533     0.000000    30.186640
max          1.000000     0.985533     1.000000    30.709360 

Data loaded...

Exploring Dataset

Let us analyze the loaded dataset.

We are particularly interested in the steering angle attribute. Let us plot the histogram of steering angles.

In [3]:
binwidth = 0.025

# histogram before image augmentation
plt.hist(data.steering_angle,bins=np.arange(min(data.steering_angle), max(data.steering_angle) + binwidth, binwidth))
plt.title('Number of images per steering angle')
plt.xlabel('Steering Angle')
plt.ylabel('# Frames')
plt.show()

Oh boy! We have a very biased dataset in our hands here. We have tons of samples for steering angle 0.0 compared to all other steering angles combined. This is quite understandable as steering angle 0.0 corresponds to car going straight. During the "training mode" in track-1 most of the road is straight with occassional curves and turns. There isn't enough samples in the training dataset for non-zero steering angles. If we train our car with this dataset, it will surely learn just to drive straight and never (or struggle) to make turns.

To fix the dataset bias problem, Machine Learning concepts teach us to try following:

  • Get More Data
  • Invent More Data

Get More Data will put us back in the square one, which will require us to own a joystick and spend hours collecting and cleaning data. So we will avoid this approach.

Invent More Data means, if your data are vectors of numbers, create randomly modified versions of existing vectors. If your data are images, create randomly modified versions of existing images. If your data are text, you get the idea... Often this is called data augmentation or data generation.

Augmentation refers to the process of generating new training data from a smaller dataset such that the new dataset represents the real world data one may see in practice. A typical convolutional neural network can have up to a million parameters, and tuning these parameters requires millions of training instances of uncorrelated data. For our car example, this will require us to drive (or generate images) the car under different weather, lighting, traffic and road conditions. As you will soon see, the easist way to achieve this is to use image augmentation.

Data Partitioning

Let us shuffle and split our dataset into two parts: Training data and Validation data. Since we need lots of training data we will keep aside 90% of the data for training and remaining for validation.

We are not setting aside any Test data here because real test of the model is actually driving on the track.

In [4]:
# Get randomized datasets for training and validation

# shuffle data
data = data.reindex(np.random.permutation(data.index))

num_train = int((len(data) / 10.) * 9.)

X_train = data.iloc[:num_train]
X_validation = data.iloc[num_train:]

print("X_train has {} elements.".format(len(X_train)))
print("X_valid has {} elements.".format(len(X_validation)))
X_train has 7232 elements.
X_valid has 804 elements.

Configurable Variables

These are all the configurable variables in our program:

In [5]:
# image augmentation variables
CAMERA_OFFSET = 0.25
CHANNEL_SHIFT_RANGE = 0.2
WIDTH_SHIFT_RANGE = 100
HEIGHT_SHIFT_RANGE = 40

# processed image variables
PROCESSED_IMG_COLS = 64
PROCESSED_IMG_ROWS = 64
PROCESSED_IMG_CHANNELS = 3

# model training variables
NB_EPOCH = 8
BATCH_SIZE = 256
  • CAMERA_OFFSET: Our car has three cameras. Center, Left, and Right. We will utilize images from all three cameras so we can generate additional training data to simulate recovery. We will add a small angle .25 to the left camera and subtract a small angle of 0.25 from the right camera. The main idea being the left camera has to move right to get to center, and right camera has to move left.
  • CHANNEL_SHIFT_RANGE: One of the tricks in augmenting an image is to shift its color channel by a small fraction. We will randomly shift input images in the range [-0.2, 0.2]
  • WIDTH_SHIFT_RANGE: Shifting the input image horizontally by a small fraction is another technique in image augmentation. We will randomly shift input images in the range [-100, 100]
  • HEIGHT_SHIFT_RANGE: Similarly input image can be shifted vertically by a small fraction. We will randomly shift vertically in the range [-40, 40]
  • PROCESSED_IMG_COLS: Final step in our image augmentation process is Croping and Resizing. We have chosen to train our model on 64x64 pixel images. Hence height of the processed image will be 64px.
  • PROCESSED_IMG_ROWS: Width of the processed image will be 64px.
  • PROCESSED_IMG_CHANNELS: We have chosen to work with color images to train our model. Hence processed image will have 3 channels.
  • NB_EPOCH: Our DNN will be trained for 8 epochs.
  • BATCH_SIZE: In order to keep the memory footprint of the program low, we will augment and process images in batches of 256.

Image Augmentation Functions

As noted earlier, we will generate new training data from a smaller dataset for under represented steering angles by a technique called Image Augmentation.

We will use following 7 augmentation functions in our image processing pipeline:

  • horizontal_flip
  • channel_shift
  • height_shift
  • width_shift
  • brightness_shift
  • crop_image
  • resize_image
In [6]:
# flip images horizontally
def horizontal_flip(img, steering_angle):
    flipped_image = cv2.flip(img, 1)
    steering_angle = -1 * steering_angle
    return flipped_image, steering_angle

This function flips the input image horizontally using OpenCV apis. In doing so it appropriately reverses the sign of the steering angle to reflect the transformation.

In [7]:
# shift range for each channels
def channel_shift(img, channel_shift_range=CHANNEL_SHIFT_RANGE):
    img_channel_index = 2 # tf indexing
    channel_shifted_image = random_channel_shift(img, channel_shift_range, img_channel_index)
    return channel_shifted_image

One of the tricks in augmenting an image is to shift its color channel by a small fraction. This function randomly shifts input image channel in the range [-0.2, 0.2] by using keras image processing apis. Steering angle does not change after this transformation.

In [8]:
# shift height/width of the image by a small fraction
def height_width_shift(img, steering_angle):
    rows, cols, channels = img.shape
    
    # Translation
    tx = WIDTH_SHIFT_RANGE * np.random.uniform() - WIDTH_SHIFT_RANGE / 2
    ty = HEIGHT_SHIFT_RANGE * np.random.uniform() - HEIGHT_SHIFT_RANGE / 2
    steering_angle = steering_angle + tx / WIDTH_SHIFT_RANGE * 2 * .2
    
    transform_matrix = np.float32([[1, 0, tx],
                                   [0, 1, ty]])
    
    translated_image = cv2.warpAffine(img, transform_matrix, (cols, rows))
    return translated_image, steering_angle

Shifting the input image horizontally and vertically by a small fraction is another technique in image augmentation. This function randomly shifts images both vertically and horizontally in the given range by using OpenCV apis. A new steering angle is also calculated to match the transformation.

In [9]:
def brightness_shift(img, bright_value=None):
    img = cv2.cvtColor(img, cv2.COLOR_RGB2HSV)
    
    if bright_value:
        img[:,:,2] += bright_value
    else:
        random_bright = .25 + np.random.uniform()
        img[:,:,2] = img[:,:,2] * random_bright
    
    img = cv2.cvtColor(img, cv2.COLOR_HSV2RGB)
    return img

This function randomly changes the brightness (or darkness) of the input image using OpenCV apis. Steering angle does not change after this transformation.

In [10]:
# crop the top 1/5 of the image to remove the horizon and the bottom 25 pixels to remove the car’s hood
def crop_resize_image(img):
    shape = img.shape
    img = img[math.floor(shape[0]/5):shape[0]-25, 0:shape[1]]
    img = cv2.resize(img, (PROCESSED_IMG_COLS, PROCESSED_IMG_ROWS), interpolation=cv2.INTER_AREA)    
    return img

This function will crop the top 1/5 of the image to remove the horizon and the bottom 25 pixels to remove the car’s hood. Steering angle does not change after this transformation.

In [11]:
def apply_random_transformation(img, steering_angle):
    
    transformed_image, steering_angle = height_width_shift(img, steering_angle)
    transformed_image = brightness_shift(transformed_image)
    # transformed_image = channel_shift(transformed_image) # increasing train time. not much benefit. commented
    
    if np.random.random() < 0.5:
        transformed_image, steering_angle = horizontal_flip(transformed_image, steering_angle)
            
    transformed_image = crop_resize_image(transformed_image)
    
    return transformed_image, steering_angle

This is the wrapper function which makes use of all the previous transformations to augment an image. This function is directly used in the program.

Showcase: Image Augmentation

Let us now see all the image transformation functions in action. Here we take an image and apply all the transformations to it. The ouput of the transformation is plotted below.

At the bottom of each transformed image, we have noted the name of the transformation function and corresponding steering angle change (if any).

In [12]:
def read_image(fn):
    img = load_img(fn)
    img = img_to_array(img) 
    return img

test_fn = "IMG/center_2016_12_01_13_32_43_457.jpg"
steering_angle = 0.0617599

test_image = read_image(test_fn)

plt.subplots(figsize=(5, 18))

# original image
plt.subplot(611)
plt.xlabel("Original Test Image, Steering angle: " + str(steering_angle))
plt.imshow(array_to_img(test_image))

# horizontal flip augmentation
flipped_image, new_steering_angle = horizontal_flip(test_image, steering_angle)
plt.subplot(612)
plt.xlabel("Horizontally Flipped, New steering angle: " + str(new_steering_angle))
plt.imshow(array_to_img(flipped_image))

# channel shift augmentation
channel_shifted_image = channel_shift(test_image, 255)
plt.subplot(613)
plt.xlabel("Random Channel Shifted, Steering angle: " + str(steering_angle))
plt.imshow(array_to_img(channel_shifted_image))

# width shift augmentation
width_shifted_image, new_steering_angle = height_width_shift(test_image, steering_angle)
new_steering_angle = "{:.7f}".format(new_steering_angle)
plt.subplot(614)
plt.xlabel("Random HT and WD Shifted, New steering angle: " + str(new_steering_angle))
plt.imshow(array_to_img(width_shifted_image))

# brightened image
brightened_image = brightness_shift(test_image, 255)
plt.subplot(615)
plt.xlabel("Brightened, Steering angle: " + str(steering_angle))
plt.imshow(array_to_img(brightened_image))

# crop augmentation
cropped_image = crop_resize_image(test_image)
plt.subplot(616)
plt.xlabel("Cropped and Resized, Steering angle: " + str(steering_angle))
_ = plt.imshow(array_to_img(cropped_image))