Real-time object detect

This scripts allow to detect what object those the camera is watching, as an example a person, chair, door, etc.

Enter into python virtual enviroment:

# workon cv

Now the first thing we need to detect and determine is the object that we want to be tracked. So for that we need to execute:

# python realDetectionl.py --prototxt MobileNetSSD_deploy.prototxt.txt --model MobileNetSSD_deploy.caffemodel

As we can see we need to give two parameters that are the pre-trained model and de caffe deploy.

Explaining the code.

First we need to import the packages that we are going to use

# import the necessary packages
from imutils.video import VideoStream
from imutils.video import FPS
import numpy as np
import argparse
import imutils
import time
import cv2

Initialice the parameters that we are going to use

  • Caffe deploy

  • Caffe pre-trained model

ap = argparse.ArgumentParser()
ap.add_argument("-p", "--prototxt", required=True,
        help="path to Caffe 'deploy' prototxt file")
ap.add_argument("-m", "--model", required=True,
        help="path to Caffe pre-trained model")
ap.add_argument("-c", "--confidence", type=float, default=0.2,
        help="minimum probability to filter weak detections")
args = vars(ap.parse_args())

Give the list of object that our MobileNet SSD was trained, and create rectangles to introduce to the frame when a object is detected.

CLASSES = ["background", "aeroplane", "bicycle", "bird", "boat",
        "bottle", "bus", "car", "cat", "chair", "cow", "diningtable",
        "dog", "horse", "motorbike", "person", "pottedplant", "sheep",
        "sofa", "train", "tvmonitor"]
COLORS = np.random.uniform(0, 255, size=(len(CLASSES), 3))

Initialize the serialized model and the video stream.

print("Loading model...")
net = cv2.dnn.readNetFromCaffe(args["prototxt"], args["model"])

print("Starting video stream...")
vs = VideoStream(src=0).start()
time.sleep(2.0)
fps = FPS().start()

Take the frame and resize it to 400 pixels and then covert it to a blob file.

*This width is define to optimize our code.

        frame = vs.read()
        frame = imutils.resize(frame, width=400)

        (h, w) = frame.shape[:2]
        blob = cv2.dnn.blobFromImage(cv2.resize(frame, (300, 300)),
                0.007843, (300, 300), 127.5)

GIve the blob file to the network so it can start predicting

net.setInput(blob)
detections = net.forward()

Start looping over the detections that we get, then we extract them and associate with the prediction.

for i in np.arange(0, detections.shape[2]):
        confidence = detections[0, 0, i, 2]

We guarantee that the detections are greater than the minium of confidence.

Then we star to get the index of the class label from the array that we defin, so we could start paintin the boxes around the objects

if confidence > args["confidence"]:
       idx = int(detections[0, 0, i, 1])
       box = detections[0, 0, i, 3:7] * np.array([w, h, w, h])
       (startX, startY, endX, endY) = box.astype("int")

       label = "{}: {:.2f}%".format(CLASSES[idx],confidence * 100)
       cv2.rectangle(frame, (startX, startY), (endX, endY),COLORS[idx], 2)
       y = startY - 15 if startY - 15 > 15 else startY + 15
       cv2.putText(frame, label, (startX, y),cv2.FONT_HERSHEY_SIMPLEX, 0.5, COLORS[idx], 2)

Finally we show the frame to the user.

        cv2.imshow("Frame", frame)

Last updated