Python real time image classification problems with Neural Networks

Question

I'm attempting use caffe and python to do real-time image classification. I'm using OpenCV to stream from my webcam in one process, and in a separate process, using caffe to perform image classification on the frames pulled from the webcam. Then I'm passing the result of the classification back to the main thread to caption the webcam stream.

The problem is that even though I have an NVIDIA GPU and am performing the caffe predictions on the GPU, the main thread gets slown down. Normally without doing any predictions, my webcam stream runs at 30 fps; however, with the predictions, my webcam stream gets at best 15 fps.

I've verified that caffe is indeed using the GPU when performing the predictions, and that my GPU or GPU memory is not maxing out. I've also verified that my CPU cores are not getting maxed out at any point during the program. I'm wondering if I am doing something wrong or if there is no way to keep these 2 processes truly separate. Any advice is appreciated. Here is my code for reference

class Consumer(multiprocessing.Process):

    def __init__(self, task_queue, result_queue):
        multiprocessing.Process.__init__(self)
        self.task_queue = task_queue
        self.result_queue = result_queue
        #other initialization stuff

    def run(self):
        caffe.set_mode_gpu()
        caffe.set_device(0)
        #Load caffe net -- code omitted 
        while True:
            image = self.task_queue.get()
            #crop image -- code omitted
            text = net.predict(image)
            self.result_queue.put(text)

        return

import cv2
import caffe
import multiprocessing
import Queue 

tasks = multiprocessing.Queue()
results = multiprocessing.Queue()
consumer = Consumer(tasks,results)
consumer.start()

#Creating window and starting video capturer from camera
cv2.namedWindow("preview")
vc = cv2.VideoCapture(0)
#Try to get the first frame
if vc.isOpened():
    rval, frame = vc.read()
else:
    rval = False
frame_copy[:] = frame
task_empty = True
while rval:
    if task_empty:
       tasks.put(frame_copy)
       task_empty = False
    if not results.empty():
       text = results.get()
       #Add text to frame
       cv2.putText(frame,text)
       task_empty = True

    #Showing the frame with all the applied modifications
    cv2.imshow("preview", frame)

    #Getting next frame from camera
    rval, frame = vc.read()
    frame_copy[:] = frame
    #Getting keyboard input 
    key = cv2.waitKey(1)
    #exit on ESC
    if key == 27:
        break

I am pretty sure it is the caffe prediction slowing everything down, because when I comment out the prediction and pass dummy text back and forth between the processes, I get 30 fps again.

class Consumer(multiprocessing.Process):

    def __init__(self, task_queue, result_queue):
        multiprocessing.Process.__init__(self)
        self.task_queue = task_queue
        self.result_queue = result_queue
        #other initialization stuff

    def run(self):
        caffe.set_mode_gpu()
        caffe.set_device(0)
        #Load caffe net -- code omitted
        while True:
            image = self.task_queue.get()
            #crop image -- code omitted
            #text = net.predict(image)
            text = "dummy text"
            self.result_queue.put(text)

        return

import cv2
import caffe
import multiprocessing
import Queue 

tasks = multiprocessing.Queue()
results = multiprocessing.Queue()
consumer = Consumer(tasks,results)
consumer.start()

#Creating window and starting video capturer from camera
cv2.namedWindow("preview")
vc = cv2.VideoCapture(0)
#Try to get the first frame
if vc.isOpened():
    rval, frame = vc.read()
else:
    rval = False
frame_copy[:] = frame
task_empty = True
while rval:
    if task_empty:
       tasks.put(frame_copy)
       task_empty = False
    if not results.empty():
       text = results.get()
       #Add text to frame
       cv2.putText(frame,text)
       task_empty = True

    #Showing the frame with all the applied modifications
    cv2.imshow("preview", frame)

    #Getting next frame from camera
    rval, frame = vc.read()
    frame_copy[:] = frame
    #Getting keyboard input 
    key = cv2.waitKey(1)
    #exit on ESC
    if key == 27:
        break

Show source
| python   | opencv   | multiprocessing   | gpgpu   | pycaffe   2016-09-16 03:09 2 Answers

Answers to Python real time image classification problems with Neural Networks ( 2 )

  1. 2016-09-16 09:09

    Update:

    Given the fact that the data transfer from CPU to GPU within subprocess contributes much to the overhead and it seems that there is no way to avoid it(e.g. CNN with smaller input size won't help?) and also you want to keep the main process operating at full capacity allowing lag between image prediction and the webcam stream, another solution I can think of is to set the main process with a higher priority and the subprocess with a lower one using psutil such as(on windows):

    import cv2
    import caffe
    import multiprocessing
    import Queue 
    import psutil 
    
    tasks = multiprocessing.Queue()
    results = multiprocessing.Queue()
    consumer = Consumer(tasks,results)
    consumer.start()
    
    #set processes' priority
    current_process = psutil.Process()
    current_process.nice(value=psutil.ABOVE_NORMAL_PRIORITY_CLASS)
    sub_process_list = current_process.children()
    sub_process_list[0].nice(value=psutil.NORMAL_PRIORITY_CLASS)
    #remained stuff
    ...
    

    , in which the value of argument value can be found here and on UNIX, the value usually goes from -20 to 20. The higher the value, the lower the priority of the process. See here for details.

    This is not such a technical solution though.


    Since your CPU cores and GPU are not getting maxed out, and no blocking exists between your 2 process , inspired by this post, I suspect that the bottleneck that slows down the main process is I/O bound, which specifically here comes from:

    1. The much frequent reading frames from camera at rval, frame = vc.read() in the main process;
    2. The frequent data transfer from CPU to GPU within text = net.predict(image) in the subprocess.(This mainly happens when feeding input data into CNN.)

    You can verify this bottleneck by disabling the dense I/O operations at the 2nd place above, which doesn't mean to simply comment out text = net.predict(image) but rather to temporarily use net.forward(blobs=None, start=1, end=LAYERS_NUM - 2) instead of net.predict() to skip feeding image data into CNN to avoid the data transfer from CPU to GPU while keeping almost the same working load of GPU to see whether the main process will be close to 30fps again.

    If it is, then you can consider to use a CNN model with smaller input size to reduce the data transfer from CPU to GPU.

    Finally, hope it will work.

  2. 2016-09-21 08:09

    One think might happen in your code, that is it works in gpu mode for the first call and on later calls it calculates the classification under cpu mode as it the default mode. On older version of caffe set gpu mode for once was enough, now newer version it needs to set mode everytime. You can try with following change:

    def run(self):
    
            #Load caffe net -- code omitted 
            while True:
                caffe.set_mode_gpu()
                caffe.set_device(0)
                image = self.task_queue.get()
                #crop image -- code omitted
                text = net.predict(image)
                self.result_queue.put(text)
    
            return
    

    Also please have a look at the gpu timings while the consumer thread is running. You can use following command for nvidia:

    nvidia-smi
    

    Above command will show you the gpu utilization at runtime.

    If it not solves another solution is, make the opencv frame extraction code under a thread. As it is related with I/O and device access you might get benefit running it on separate thread from GUI thread/main thread. That thread will push frames in an queue and current consumer thread will predict. In that case carefully handle the queue with critical block.

Leave a reply to - Python real time image classification problems with Neural Networks

◀ Go back