Distributed Video Processing Using Deep Learning on Networked Devices

The vast adoption of mobile devices with cameras has greatly assisted in the proliferation of the creation and distribution of videos. Videos, which are a rich source of information, can be exploited for on-demand information retrieval. Deep learning using Convolutional Neural Networks (CNNs) is state of the art computer vision techniques that can be used for information retrieval. However, due to the high computation of video processing using CNNs, it is not feasible or costs too much to process all videos at a centralized entity, considering a large set of videos which is common in this big data epoch. Therefore, the aim of this research is to provide distributed video processing platforms for on-demand information retrieval using deep learning on big video data, and the focus of this research is to design and implement systems for different application scenarios.

NetVision

First, we consider on-demand information retrieval from videos stored across a wireless network, where network devices may different computation capabilities, e.g., computers with a much powerful GPU can significantly accelerate video processing using deep learning. An example application scenario is wireless surveillance systems, where camera devices and more computationally capable computers are deployed to record, store and process videos.

We has designed and built NetVision, a distributed video processing system using deep learning in a wireless network. NetVision takes information queries, filters out stored videos based on metadata, performs CNNs based video processing, balances the computing workload among network devices by computation offload (i.e., videos can be uploaded to more computationally capable devices). We designed the algorithm that determines the near-optimal video allocation to minimize the overall video processing time, considering the limitations of wireless channels. We has implemented and evaluated NetVision on a small testbed and built an emulation environment of both real devices and virtual devices for large-scale experiments.

CrowdVision

Then, we consider an application scenario for large-scale videos, e.g., municipal agencies ask the public to assist in identifying terrorists or criminals through scanning of their videos that may capture suspects, as the FBI did after the Boston Marathon bombing. We propose a much smarter and more automatic way to handle this using deep learning. We also allow individuals to process some of the data rather than having a centralized entity process everything, which we call crowdprocessing.

We designed and built CrowdVision, a system that enables smartphones to crowdprocess videos using deep learning framework Caffe in a distributed and energy-efficient way, leveraging cloud offload under different network connections. CrowdVision is built as a computing platform to quickly and efficiently retrieve information from big video data. It breaks down the workflow of video processing using Caffe, characterizes the computing of CNNs, considers the data and energy usage constraints imposed by users, and determines whether or at which step to offload computation so as to optimize the performance. We have implemented CrowdVision as an Android app and evaluated it on an off-the-shelf smartphone to confirm the performance gains for video crowdprocessing.


Publications

In Proceedings of IEEE International Conference on Network Protocols (ICNP), November 8-11, 2016.
(Acceptance Rate: 20%=46229)
In Proceedings of ACM International Conference on Multimedia (MM), October 23-27, 2017.
(Acceptance Rate: 28%=191675)
In Proceedings of IEEE International Conference on Computer Communications (INFOCOM), April 15-19, 2018.
(Acceptance Rate: 19%=3091606)
IEEE Transactions on Mobile Computing, vol.18, no. 7, pp. 1513-1526, 2019
IEEE/ACM Transactions on Networking, vol 28, no. 1, pp. 196-209, 2020
IEEE Transactions on Mobile Computing, vol. 20, no. 2, pp. 352-365, 2021
IEEE Transactions on Mobile Computing, vol. 20, no. 4, pp. 1574-1589, 2021.