A Xi’an Jiaotong-Liverpool University student’s research could improve technology used in the film and television industry, among other applications.

Mingjie Sun, a PhD student in XJTLU’s School of Advanced Technology, has designed a method to “train” a computer program to quickly and accurately track an object in a video and separate it from the background.

You may have watched a video where viewers’ real-time comments fly across the screen like bullets. The comments are supposed to appear behind any people or objects in the video. However, sometimes the comments accidentally appear over those images. Sun’s research attempts to this type of solve problem.

Comments appear to fly across the screen behind the actors

Sun, pictured below, and other researchers created an algorithm to train a computer program to quickly and accurately track an object in a video – video objects tracking (VOT) – and separate it from the background – video objects segmentation (VOS).


To improve the running speed and accuracy of VOT and VOS, Sun’s team is focussing on how “target templates” can be used in a new way to teach a computer program to identify an object.

“When a computer is tasked with recognising and tracking a target object of each frame in the video, it needs a reference object to compare to what it is trying to track. This reference object is called the target template,” Sun said.

“For example, it can be a photo of the target object taken at other places.”

Because the object may change location or appearance as the frames of a video progress, the target template must also be updated for the computer to continue to accurately recognise the object, Sun explained.

"Traditionally, the target template is updated in a rough way, without taking the correctness or quality of the predicted result into consideration,” Sun said.

“Therefore, the target template can be replaced by an incorrect result, causing the tracker to drift to the wrong target.

“After training, the computer program can learn and upgrade actively. Just like a ‘smart switch’, it can independently decide whether to update the target template based on whether the newly predicted result is of sufficiently high quality.”

The method Sun uses to train the computer program whether to update the target template is a technique called “reinforcement learning,” a type of artificial intelligence that learns based on its experience of trials and errors, successes and failures.

Test results of Sun’s team’s method show that their approach improves both running speed and accuracy. The efficiency of tracking and segmenting video objects is better than that of other research groups’.

Dr Jimin Xiao, Sun’s advisor, noted that reinforcement learning is a cutting-edge technology in AI: AlphaGo, the AI robot that has defeated the world champion Go player, is a well-known application.

Sun’s research paper was recently selected for publication at IEEE International Conference on Computer Vision and Pattern Recognition (CVPR).

Professor Eng Gee Lim, dean of the School of Advanced Technology and director of the AI Research Centre at XJTLU, commended Sun’s achievements under Dr Xiao.

“Publishing in a top-level computer vision conference as a first-year PhD student demonstrates the high quality of research conducted in the University,” he said.

By Huatian Jin, edited by Tamara Kaup

  • For more information please contact...

  • Name

    University Marketing and Communications
  • Email

    news@xjtlu.edu.cn