Object Detection

Basic Tensorflow SSD / RCNN Webcam Object Detection.

In the following video I’ll show you how you can easily use a pre-trained model to detect objects in your webcam video.


1) Clone GermanEngineering/ObjectDetection Git repository

   git clone https://github.com/GermanEngineering/ObjectDetection.git

2) Install dependencies

  • If you can use a GPU
       pip install -r requirementsGpu.txt
  • If you need to run on CPU
       pip install -r requirements.txt

3) Run program by executing


Depending on your specific usecase, you can pick another pre trained model with different speed and accuracy from the following selection:
The accuracy of the model is ususally described by the mAP (mean Average Prescision) value on a range from 0 to 100 where higher numbers denote a higher accuracy of the model.
The speed of the model is usually given in ms for a computation with a specific system setup. Hence, you will probably not be able to get the same times, but can use it as a relative metric.
To use a different model, simply assign the name of the model as a string to the “MODEL_NAME” variable.

If you are looking for significantly faster detection speed you should take a look into YOLO detectors.
In general you are having the following options if you want to apply object detection:

1. Faster RCNN

Region Convolutional Neural Network
– Generates region proposals of where the objects in the image are probably located by grouping pixels with similar color, intensity, …
– Region proposals are provided as input to the Convolutional Neural Network which outputs a feature map denoting the positions of specific characteristics in the image.
– The last layer(s) of the network are used for classification by mapping the detected features to a class.
–> Comparably slow speed with high accuracies.

2. SSD

Single Shot Multi Box Detector
– Similar to RCNNs, but object localization and classification are done in one forward pass of the network.
–> Comparably higher speeds than RCNNs while maintaining good accuracies.


You Only Look Once
(currently not available in the Tensorflow detection model zoo)
– Image is split into grid and multiple bounding boxes are created within each cell.
– Network outputs the probability values for each bounding box.
– All bounding boxes having a class probability above a certain threashold are used to classify and locate the object in the image.
–> Significantly faster but lower accuracies especially for small objects.

Main sources:

Tensorflow on GitHub
Harrison Kinsley

