Basic Tensorflow SSD / RCNN Webcam Object Detection.
In the following video I’ll show you how you can easily use a pre-trained model to detect objects in your webcam video.
- Python 3.6.8
- Optionally (if you want to use the NVIDIA GPU of your computer):
Latest GPU driver
1) Clone GermanEngineering/ObjectDetection Git repository
git clone https://github.com/GermanEngineering/ObjectDetection.git
2) Install dependencies
- If you can use a GPU
pip install -r requirementsGpu.txt
- If you need to run on CPU
pip install -r requirements.txt
3) Run program by executing
Depending on your specific usecase, you can pick another pre trained model with different speed and accuracy from the following selection:
The accuracy of the model is ususally described by the mAP (mean Average Prescision) value on a range from 0 to 100 where higher numbers denote a higher accuracy of the model.
The speed of the model is usually given in ms for a computation with a specific system setup. Hence, you will probably not be able to get the same times, but can use it as a relative metric.
To use a different model, simply assign the name of the model as a string to the “MODEL_NAME” variable.
If you are looking for significantly faster detection speed you should take a look into YOLO detectors.
In general you are having the following options if you want to apply object detection:
1. Faster RCNN
Region Convolutional Neural Network
– Generates region proposals of where the objects in the image are probably located by grouping pixels with similar color, intensity, …
– Region proposals are provided as input to the Convolutional Neural Network which outputs a feature map denoting the positions of specific characteristics in the image.
– The last layer(s) of the network are used for classification by mapping the detected features to a class.
–> Comparably slow speed with high accuracies.
Single Shot Multi Box Detector
– Similar to RCNNs, but object localization and classification are done in one forward pass of the network.
–> Comparably higher speeds than RCNNs while maintaining good accuracies.
You Only Look Once
(currently not available in the Tensorflow detection model zoo)
– Image is split into grid and multiple bounding boxes are created within each cell.
– Network outputs the probability values for each bounding box.
– All bounding boxes having a class probability above a certain threashold are used to classify and locate the object in the image.
–> Significantly faster but lower accuracies especially for small objects.
Tensorflow on GitHub