Survey of visual object tracking algorithms based on deep learning (2024)

Agravante D J, De Magistris G, Munawar A, Vinayavekhin P and Tachibana R. 2018. Deep learning with predictive control for human motion tracking[EB/OL].2018-08-07[2019-07-01]. https://arxiv.org/pdf/1808.02200.pdf

Al-Shedivat M, Bansal T, Burda Y, Sutskever I, Mordatch I and Abbeel P. 2018. Continuous adaptation via meta-learning in nonstationary and competitive environments[EB/OL]. 2018-02-23[2019-07-01]. https://arxiv.org/pdf/1710.03641.pdf

Arjovsky M, Chintala S and Bottou L. 2017. Wasserstein GAN[DB/OL].[2019-07-02].https://arxiv.org/pdf/1701.07875.pdf

Avidan S. 2007. Ensemble tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(2): 261-271 [DOI:10.1109/TPAMI.2007.35]

Bertinetto L, Henriques J F, Valmadre J, Torr P and Vedaldi A. 2016a. Learning feed-forward one-shot learners//Proceedings of International Conference on Neural Information Processing Systems. Barcelona, Spain: NIPS, 523-531

Bertinetto L, Valmadre J, Henriques J F, Vedaldi A and Torr P H S. 2016b, Fully-convolutional siamese networks for object tracking//Proceedings of the European Conference on Computer Vision. Amsterdam, The Netherlands: Springer, 850-865[DOI:10.1007/978-3-319-48881-3_56]

Bhat G, Johnander J, Danelljan M, Khan F S, Felsberg M. 2018. Unveiling the power of deep tracking//Proceedings of the 15th European Conference on Computer Vision. Munich, Germany: Springer: 493-509 [DOI:10.1007/978-3-030-01216-8_30]

Bolme D, Beveridge J R, Draper B A and Lui Y M. 2010. Visual object tracking using adaptive correlation filters//Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition. San Francisco, CA, USA: IEEE, 2544-2550[DOI:10.1109/cvpr.2010.5539960]

Bonin-Font F, Ortiz A, Oliver G. 2008. Visual navigation for mobile robots:a survey. Journal of Intelligent and Robotic Systems, 53(3): 263-296 [DOI:10.1007/s10846-008-9235-4]

Bosch A, Zisserman A and Munoz X. 2007. Image classification using random forests and ferns//Proceedings of 2007 IEEE International Conference on Computer Vision. Rio de Janeiro, Brazil: IEEE, 1-8[DOI:10.1109/ICCV.2007.4409066]

Chen B, Wang D, Li P X, Wang S and Lu H. 2018. Real-time 'actor-critic' tracking//Proceedings of the 15th European Conference on Computer Vision. Munich, Germany: Springer, 328-345[DOI:10.1007/978-3-030-01234-2_20]

Chi Z Z, Li H Y, Lu H C, Yang M-H. 2017. Dual deep network for visual tracking. IEEE Transactions on Image Processing, 26(4): 2005-2015 [DOI:10.1109/TIP.2017.2669880]

Cho K, Van Merriënboer B, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, Bengio Y. 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation//Proceedings of 2014 Conference on Empirical Methods in Natural Language Processing. Doha, Qatar: EMNLP: 1724-1734

Choi J, Chang H J, Fischer T, Yun S, Lee K, Jeong J, Demiris Y and Choi J Y. 2018. Context-aware deep feature compression for high-speed visual tracking//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, UT, USA: IEEE, 479-488[DOI:10.1109/CVPR.2018.00057]

Choi J, Kwon J and Lee K M. 2017. Deep meta learning for real-time visual tracking based on target-specific feature space[DB/OL] [2019-07-02].https://arxiv.org/pdf/1712.09153.pdf

Collins R T, Lipton A J and Kanade T. 2000. A system for video surveillance and monitoring[R]. VSAM Final Report, Pittsburgh: Carnegie Mellon University, 329-337

Collins R T, Liu Y X. 2003. On-line selection of discriminative tracking features//Proceedings of the 9th IEEE International Conference on Computer Vision. Nice, France. IEEE: 346-352 [DOI:10.1109/iccv.2003.1238365]

Collins R, Zhou X H and Teh S K. 2005. An open source tracking testbed and evaluation website//Proceedings of 2005 IEEE International Workshop on Performance Evaluation of Tracking and Surveillance. Breckenridge, Colorado: IEEE, #35

Comaniciu D, Meer P. 2002. Mean shift:a robust approach toward feature space analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(5): 603-619 [DOI:10.1109/34.1000236]

Cruz-Mota J, Bogdanova I, Paquier B, Bierlaire M, Thiran J P. 2012. Scale invariant feature transform on the sphere:theory and applications. International Journal of Computer Vision, 98(2): 217-241 [DOI:10.1007/s11263-011-0505-4]

Danelljan M, Bhat G, Khan F S, Felsberg M. 2017. Eco:efficient convolution operators for tracking//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, HI, USA: IEEE: 6931-6939 [DOI:10.1109/CVPR.2017.733]

Danelljan M, Häger G, Khan F S, Felsberg M. 2014. Accurate scale estimation for robust visual tracking//Proceedings of the British Machine Vision Conference. Nottingham, UK: BMVA Press [DOI:10.5244/C.28.65]

Danelljan M, Häger G, Khan F S and Felsberg M. 2015a. Learning spatially regularized correlation filters for visual tracking//Proceedings of 2015 IEEE International Conference on Computer Vision. Santiago, Chile: IEEE, 4310-4318[DOI: 10.1109/ICCV.2015.490]

Danelljan M, Häger G, Shahbaz Khan F and Felsberg M. 2015b. Convolutional features for correlation filter based visual tracking//Proceedings of 2015 IEEE International Conference on Computer Vision Workshop. Santiago, Chile: IEEE, 621-629[DOI:10.1109/ICCVW.2015.84]

Danelljan M, Robinson A, Khan F S and Felsberg M. 2016. Beyond correlation filters: learning continuous convolution operators for visual tracking//Proceedings of the 14th European Conference on Computer Vision. Amsterdam, The Netherlands: Springer, 472-488[DOI:10.1007/978-3-319-46454-1_29]

Dong X P, Shen J B, Wang W G, Liu Y, Shao L and Porikli F. 2018. Hyperparameter optimization for tracking with continuous deep Q-learning//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, UT, USA: IEEE, 518-527[DOI:10.1109/CVPR.2018.00061]

Fan H and Ling H B. 2017. SANet: structure-aware network for visual tracking//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops. Honolulu, HI, USA: IEEE, 2217-2224[DOI:10.1109/CVPRW.2017.275]

Fisher R B. 2004. The PETS04 surveillance ground-truth data sets//Proceedings of 2004 IEEE International Workshop on Performance Evaluation of Tracking and Surveillance. Prague, Czech Republic: IEEE, 1-5

Galoogahi H K, fa*gg A and Lucey S. 2017. Learning background-aware correlation filters for visual tracking//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE, 1144-1152[DOI:10.1109/ICCV.2017.129]

Grabner H, Leistner C and Bischof H. 2008. Semi-supervised on-line boosting for robust tracking//Proceedings of the 10th European Conference on Computer Vision. Marseille, France: Springer, 234-247[DOI:10.1007/978-3-540-88682-2_19]

Guan H, Xue X Y, An Z Y. 2016. Advances on application of deep learning for video object tracking. Acta Automatica Sinica, 42(6): 834-847 (管皓, 薛向阳, 安志勇. 2016. 深度学习在视频目标跟踪中的应用进展与展望. 自动化学报, 42(6): 834-847) [DOI:10.16383/j.aas.2016.c150705]

Gulrajani I, Ahmed F, Arjovsky M, Dumoulin V and Courville A C. 2017. Improved training of wasserstein GANs//Proceedings of Advances in Neural Information Processing Systems. Long Beach, CA, USA: NIPS, 5767-5777

Gundogdu E, Alatan A A. 2018. Good features to correlate for visual tracking. IEEE Transactions on Image Processing, 27(5): 2526-2540 [DOI:10.1109/TIP.2018.2806280]

Guo Q, Feng W, Zhou C, Huang R, Wan L and Wang S. 2017. Learning dynamic Siamese network for visual object tracking//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE, 1781-1789[DOI:10.1109/ICCV.2017.196]

Han B, Sim J and Adam H. 2017. BranchOut: regularization for online ensemble tracking with convolutional neural networks//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, HI, USA: IEEE, 521-530[DOI:10.1109/CVPR.2017.63]

Hare S, Golodetz S, Saffari A, Vineet V, Cheng M M, Hicks S L, Torr P H S. 2016. Struck:structured output tracking with kernels. IEEE Transactions on Pattern Analysis and Machine Intelligence, 38(10): 2096-2109 [DOI:10.1109/TPAMI.2015.2509974]

Haritaoglu I, Harwood D, Davis L S. 2000. W⁴:real-time surveillance of people and their activities. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8): 809-830 [DOI:10.1109/34.868683]

He A F, Luo C, Tian X M, Zeng W. 2018. A twofold siamese network for real-time object tracking//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, UT, USA: IEEE: 4834-4843 [DOI:10.1109/CVPR.2018.00508]

He K M, Zhang X Y, Ren S Q and Sun J. 2016. Deep residual learning for image recognition//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA: IEEE, 770-778[DOI:10.1109/CVPR.2016.90]

Held D, Thrun S and Savarese S. 2016. Learning to track at 100 FPS with deep regression networks//Proceedings of the 14th European Conference on Computer Vision. Amsterdam, The Netherlands: Springer, 749-765[DOI:10.1007/978-3-319-46448-0_45]

Henriques J F, Rui C, Martins P Vineet V, Cheng M, Hicks S L and Torr P H S. 2012. Exploiting the circulant structure of tracking-by-detection with kernels//Proceedings of the 12th European Conference on Computer Vision. Florence, Italy: Springer, 702-715[DOI:10.1007/978-3-642-33765-9_50]

Henriques J F, Caseiro R, Martins P, Batista J. 2015. High-speed tracking with kernelized correlation filters. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(3): 583-596 [DOI:10.1109/tpami.2014.2345390]

Hester T, Vecerik M, Pietquin O, Lanctot M, Schaul T, Piot B, Dan H, Quan J, Sendonaris A and Dulacarnold G. 2018. Deep Qlearning from demonstrations//Proceedings of the 32nd AAAI Conference on Artificial Intelligence. New Orleans, LA, USA: AAAI

Hochreiter S, Schmidhuber J. 1997. Long short-term memory. Neural Computation, 9(8): 1735-1780 [DOI:10.1162/neco.1997.9.8.1735]

Horn B K P, Schunck B G. 1981. Determining optical flow. Artificial Intelligence, 17(1-3): 185-203 [DOI:10.1016/0004-3702(81)90024-2]

Hu W M, Xie D, Fu Z Y, Zeng W, Maybank S. 2007. Semantic-based surveillance video retrieval. IEEE Transactions on Image Processing, 16(4): 1168-1181 [DOI:10.1109/TIP.2006.891352]

Huang C, Lucey S and Ramanan D. 2017a. Learning policies for adaptive tracking with deep feature cascades//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE, 105-114[DOI:10.1109/ICCV.2017.21]

Huang G, Liu Z, Van Der Maaten L and Weinberger K Q. 2017b. Densely connected convolutional networks//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, HI, USA: IEEE, 2261-2269[DOI:10.1109/CVPR.2017.243]

Huang K Q, Chen X T, Kang Y F, Tan T N. 2015. Intelligent visual surveillance:a review. Chinese Journal of Computers, 38(6): 1093-1118 (黄凯奇, 陈晓棠, 康运锋, 谭铁军. 2015. 智能视频监控技术综述. 计算机学报, 38(6): 1093-1118) [DOI:10.11897/SP.J.1016.2015.01093]

Isard M, Blake A. 1998. CONDENSATION-conditional density propagation for visual tracking. International Journal of Computer Vision, 29(1): 5-28 [DOI:10.1023/A:1008078328650]

Jepson A D, Fleet D J, El-Maraghi T F. 2003. Robust online appearance models for visual tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(10): 1296-1311 [DOI:10.1109/TPAMI.2003.1233903]

Kalal Z, Mikolajczyk K, Matas J. 2012. Tracking-learning-detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(7): 1409-1422 [DOI:10.1109/TPAMI.2011.239]

Kingma D P and Welling M. 2013. Auto-encoding variational bayes[DB/OL] [2019-07-02].https://arxiv.org/pdf/1312.6114.pdf

Kristan M, Matas J, Leonardis A, Felsberg M, Cehovin L, Fernandez G, Vojir T, Hager G, Nebehay G and Pflugfelder R. 2015a. The visual object tracking VOT2015 challenge results//Proceedings of 2015 IEEE International Conference on Computer vision Workshop. Santiago, Chile: IEEE, 564-586[DOI:10.1109/ICCVW.2015.79]

Kristan M, Matas J, Leonardis A, Vojíř T, Pflugfelder R, Fernández G, Nebehay G, Porikli F, Čehovin L. 2016. A novel performance evaluation methodology for single-target trackers. IEEE Transactions on Pattern Analysis and Machine Intelligence, 38(11): 2137-2155 [DOI:10.1109/TPAMI.2016.2516982]

Kristan M, Pflugfelder R, Leonardis A, Matas J, Čehovin L, Nebehay G, Vojíř T, Fernández G, Lukežič A, Dimitriev A, Petrosino A, Saffari A, Li B, Han B, Heng C, Garcia C, Pangeršič D, Häger G, Khan F S, Oven F, Possegger H, Bischof H, Nam H, Zhu J, Li J, Choi J Y, Choi J-W, Henriques J F, van de Weijer J, Batista J, Lebeda K, Öfjäll K, Yi K M, Qin L, Wen L, Maresca M E, Danelljan M, Felsberg M, Cheng M-M, Torr P, Huang Q, Bowden R, Hare S, Lim S Y, Hong S, Liao S, Hadfield S, Li S Z, Duffner S, Golodetz S, Mauthner T, Vineet V, Lin W, Li Y, Qi Y, Lei Z and Niu Z H. 2015b. The visual object tracking VOT2014 challenge results//Proceedings of 2014 European Conference on Computer vision. Zurich, Switzerland: Springer, 191-217[DOI:10.1007/978-3-319-16181-5_14]

Kristan M, Pflugfelder R, Leonardis A, Matas J, Porikli F, Cehovin L, Nebehay G, Fernandez G, Vojir T, Gatt A, Khajenezhad A, Salahledin A, Soltani-Farani A, Zarezade A, Petrosino A, Milton A, Bozorgtabar B, Li B, Chan C S, Heng C, Ward D, Kearney D, Monekosso D, Karaimer H C, Rabiee H R, Zhu J, Gao J, Xiao J, Zhang J, Xing J, Huang K, Lebeda K, Cao L, Maresca M E, Lim M K, Helw M E, Felsberg M, Remagnino P, Bowden R, Goecke R, Stolkin R, Lim S Y, Maher S, Poullot S, Wong S, Satoh S, Chen W, Hu W, Zhang X, Li Y and Niu Z. 2013. The visual object tracking vot2013 challenge results//Proceedings of 2013 IEEE International Conference on Computer Vision Workshops. Sydney, NSW, Australia: IEEE, 98-111[DOI:10.1109/ICCVW.2013.20]

Krizhevsky A, Sutskever I and Hinton G E. 2012. ImageNet classification with deep convolutional neural networks//Proceedings of the 25th International Conference on Neural Information Processing Systems. Lake Tahoe, Nevada: Curran Associates Inc, 1097-1105

Kumarawadu S, Watanabe K, Kiguchi K and Izumi K. 2002. Adaptive output tracking of partly known robotic systems using softmax function networks//Proceedings of 2002 International Joint Conference on Neural Networks. Honolulu, HI, USA, USA: IEEE, 483-488[DOI:10.1109/IJCNN.2002.1005520]

Li A N, Lin M, Wu Y, Yang M, Yan S. 2016. NUS-PRO:a new visual tracking challenge. IEEE Transactions on Pattern Analysis and Machine Intelligence, 38(2): 335-349 [DOI:10.1109/TPAMI.2015.2417577]

Li B, Yan J J, Wu W, Zhu Z and Hu X. 2018a. High performance visual tracking with siamese region proposal network//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, UT, USA: IEEE, 8971-8980[DOI: 10.1109/CVPR.2018.00935]

Li F, Tian C, Zuo W M, Zhang L and Yang M H. 2018b. Learning spatial-temporal regularized correlation filters for visual tracking//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, UT, USA: IEEE, 4904-4913[DOI:10.1109/CVPR.2018.00515]

Li P X, Wang D, Wang L J, Lu H. 2018c. Deep visual tracking:Review and experimental comparison. Pattern Recognition, 76: 323-338 [DOI:10.1016/j.patcog.2017.11.007]

Li B, Yan J J, Wu W, Zhu Z and Hu X L.2018d. High performance visual tracking with Siamese region proposal network//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, UT: IEEE, 8971-8980[DOI:10.1109/cvpr.2018.00935]

Li T H S, Chang S J. 2005. Autonomous fuzzy parking control of a carlike mobile robot. IEEE Transactions on Systems, Man, and Cybernetics-Part A:Systems and Humans, 33(4): 451-465 [DOI:10.1109/TSMCA.2003.811766]

Li Y and Zhu J K. 2015. A scale adaptive kernel correlation filter tracker with feature integration//Proceedings of 2014 European Conference on Computer Vision. Zurich, Switzerland: Springer, 254-265[DOI:10.1007/978-3-319-16181-5_18]

Liang P P, Blasch E, Ling H B. 2015. Encoding color information for visual tracking:algorithms and benchmark. IEEE Transactions on Image Processing, 24(12): 5630-5644 [DOI:10.1109/TIP.2015.2482905]

Lin Y M, Shen J, Cheng S Y and Pantic M. Mobile face tracking: a survey and benchmar[DB/OL].[2019-07-02].i

Lowe D G. 2004. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2): 91-110 [DOI:10.1023/B:VISI.0000029664.99615.94]

Lu H C, Fang G L, Wang C, Chen Y W. 2010. A novel method for gaze tracking by local pattern model and support vector regressor. Signal Processing, 90(4): 1290-1299 [DOI:10.1016/j.sigpro.2009.10.014]

Lu H C, Li P X, Wang D. 2018. Visual object tracking:a survey. Pattern Recognition and Artificial Intelligence, 31(1): 61-76 (卢湖川, 李佩霞, 王栋. 2018. 目标跟踪算法综述. 模式识别与人工智能, 31(1): 61-76) [DOI:10.16451/j.cnki.issn1003-6059.201801006]

Lu X K, Ma C, Ni B B, Yang X, Reid I and Yang M-H. 2018. Deep regression tracking with shrinkage loss//Proceedings of the 15th European Conference on Computer Vision. Munich, Germany: Springer, 369-386[DOI:10.1007/978-3-030-01264-9_22]

Lukežic A, Vojír T, Zajc L C, Matas J and Kristan M. 2017. Discriminative correlation filter with channel and spatial reliability//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, HI, USA: IEEE, 4847-4856[DOI:10.1109/CVPR.2017.515]

Ma C, Huang J B, Yang X K and Yang M H. 2015. Hierarchical convolutional features for visual tracking//Proceedings of 2015 IEEE International Conference on Computer Vision. Santiago, Chile: IEEE, 3074-3082[DOI:10.1109/ICCV.2015.352]

Matas J, Chum O, Urban M, Pajdla T. 2004. Robust wide-baseline stereo from maximally stable extremal regions. Image and Vision Computing, 22(10): 761-767 [DOI:10.1016/j.imavis.2004.02.006]

Meshgi K, Oba S and Ishii S. 2017. Efficient diverse ensemble for discriminative co-tracking[DB/OL].[2019-07-02]. https://arxiv.org/pdf/1711.06564.pdf

Mita T, Kaneko T, Hori O. 2005. Joint haar-like features for face detection//Proceedings of the 10th IEEE International Conference on Computer Vision. Beijing: IEEE: 1619-1626 [DOI:10.1109/ICCV.2005.129]

Mueller M, Smith N and Ghanem B. 2016. A benchmark and simulator for UAV tracking//Proceedings of the 14th European Conference on Computer Vision. Amsterdam, The Netherlands: Springer, 445-461[DOI:10.1007/978-3-319-46448-0_27]

Müller M, Bibi A, Giancola S, Al-Subaihi S and Ghanem B. 2018. TrackingNet: a large-scale dataset and benchmark for object tracking in the wild//Proceedings of the 15th European Conference on Computer Vision. Munich, Germany: Springer, 310-327[DOI:10.1007/978-3-030-01246-5_19]

Mei X, Ling H. 2011. Robust visual tracking and vehicle classification via sparse representation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(11): 2259-2272 [DOI:10.1109/TPAMI.2011.66]

Nam H, Baek M and Han B. 2016. Modeling and propagating CNNs in a tree structure for visual tracking[DB/OL].[2019-07-02].https://arxiv.org/pdf/1608.07242.pdf

Nam H and Han B. 2016. Learning multi-domain convolutional neural networks for visual tracking//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA: IEEE, 4293-4302[DOI:10.1109/CVPR.2016.465]

Osogami T and Otsuka M. 2015. Seven neurons memorizing sequences of alphabetical images via spike-timing dependent plasticity. Scientific Reports, 5: #14149[DOI:10.1038/srep14149]

Park E and Berg A C. 2018. Meta-tracker: fast and robust online adaptation for visual object trackers//Proceedings of the 15th European Conference on Computer Vision. Munich, Germany: Springer, 587-604[DOI:10.1007/978-3-030-01219-9_35]

Qi Y K, Zhang S P, Qin L, Yao H, Huang Q, Lim J and Yang M H. 2016. Hedged deep tracking//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA: IEEE, 4303-4311[DOI:10.1109/CVPR.2016.466]

Radford A, Metz L and Chintala S. 2016. Unsupervised representation learning with deep convolutional generative adversarial network[DB/OL].[2019-07-02].https://arxiv.org/pdf/1511.06434.pdf

Ren L L, Yuan X, Lu J W, Yang M and Zhou J. 2018. Deep reinforcement learning with iterative shift for visual tracking//Proceedings of the 15th European Conference on Computer Vision. Munich, Germany: Springer, 697-713[DOI:10.1007/978-3-030-01240-3_42]

Ross D A, Lim J, Lin R S, Yang M H. 2008. Incremental learning for robust visual tracking. International Journal of Computer Vision, 77(1-3): 125-141 [DOI:10.1007/s11263-007-0075-7]

Schroff F, Kalenichenko D and Philbin J. 2015. FaceNet: a unified embedding for face recognition and clustering//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, MA, USA: IEEE, 815-823[DOI:10.1109/CVPR.2015.7298682]

Shi J B and Tomasi C. 1994. Good features to track//Proceedings of 1994 IEEE Conference on Computer Vision and Pattern Recognition. Seattle, WA, USA: IEEE, 593-600[DOI:10.1109/CVPR.1994.323794]

Shi X J, Chen Z R, Wang H, Yeung D Y, Wong W K and Woo W-C. 2015. Convolutional LSTM network: a machine learning approach for precipitation nowcasting//Proceedings of the 28th International Conference on Neural Information Processing Systems. Montreal, Canada: MIT Press, 802-810

Shu C F, Hampapur A, Lu M, Brown L, Connell J, Senior A and Tian Y. 2005. IBM smart surveillance system (S3): a open and extensible framework for event based surveillance//Proceedings of 2005 IEEE Conference on Advanced Video and Signal Based Surveillance. Como, Italy: IEEE, 318-323[DOI:10.1109/AVSS.2005.1577288]

Simonyan K and Zisserman A. 2015. Very deep convolutional networks for large-scale image recognition//Proceedings of International Conference on Learning Representations. San Diego, CA: ICLR

Smeulders A W M, Chu D M, Cucchiara R, Calderara S, Dehghan A, Shah M. 2014. Visual tracking:an experimental survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(7): 1442-1468 [DOI:10.1109/TPAMI.2013.230]

Song Y B, Ma C, Gong L J, Zhang J, Lau R W and Yang M H. 2017. CREST: convolutional residual learning for visual tracking//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE, 2574-2583[DOI:10.1109/ICCV.2017.279]

Song Y B, Ma C, Wu X H, Gong L, Bao L, Zuo W, Shen C, Lau R and Yang M H. 2018. Vital: visual tracking via adversarial learning//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, UT, USA: IEEE, 8990-8999[DOI:10.1109/CVPR.2018.00937]

Sun C, Wang D, Lu H C and Yang M-H. 2018. Correlation tracking via joint discrimination and reliability learning//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, UT, USA: IEEE, 489-497[DOI:10.1109/CVPR.2018.00058]

Sun Y, Wang X G and Tang X O. 2015. Deeply learned face representations are sparse, selective, and robust//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, MA, USA: IEEE, 2892-2900[DOI:10.1109/CVPR.2015.7298907]

Supančič III J and Ramanan D. 2017. Tracking as online decision-making: Learning a policy from streaming videos with reinforcement learning//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE, 322-331[DOI:10.1109/ICCV.2017.43]

Suykens J A K, Vandewalle J. 1999. Least squares support vector machine classifiers. Neural Processing Letters, 9(3): 293-300 [DOI:10.1023/A:1018628609742]

Svetnik V, Liaw A, Tong C, Culberson J C, Sheridan R P, Feuston B P. 2003. Random forest:a classification and regression tool for compound classification and QSAR modeling. Journal of Chemical Information and Computer Sciences, 43(6): 1947-1958 [DOI:10.1021/ci034160g]

Szegedy C, Liu W, Jia Y Q, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V and Rabinovich A. 2015. Going deeper with convolutions//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, MA, USA: IEEE, 1-9[DOI:10.1109/CVPR.2015.7298594]

Tao R, Gavves E and Smeulders A W M. 2016. Siamese instance search for tracking//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA: IEEE, 1420-1429[DOI:10.1109/CVPR.2016.158]

Valmadre J, Bertinetto L, Henriques J F, Tao R, Vedaldi A, Smeulders A, Torr P and Gavves E. 2018. Long-term tracking in the wild: a benchmark//Proceedings of the 15th European Conference on Computer Vision. Munich, Germany: Springer, 692-707[DOI:10.1007/978-3-030-01219-9_41]

Valmadre J, Bertinetto L, Henriques J, Vedaldi A and Torr P H. 2017. End-to-end representation learning for correlation filter based tracking//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, HI, USA: IEEE, 5000-5008[DOI:10.1109/CVPR.2017.531]

Van De Weijer J, Schmid C, Verbeek J, Larlus D. 2009. Learning color names for real-world applications. IEEE Transactions on Image Processing, 18(7): 1512-1523 [DOI:10.1109/TIP.2009.2019809]

Vincent P, Larochelle H, Lajoie I, Bengio Y, Manzagol P A. 2010. Stacked denoising autoencoders:Learning useful representations in a deep network with a local denoising criterion. Journal of Machine Learning Research, 11: 3371-3408

Viola P and Jones M. 2001. Fast and robust classification using asymmetric adaboost and a detector cascade//Proceedings of the 14th International Conference on Neural Information Processing Systems: Natural and Synthetic. Vancouver, British Columbia, Canada: MIT Press, 1311-1318

Wang L J, Ouyang W L, Wang X G and Lu H. 2016. STCT: sequentially training convolutional networks for visual tracking//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA: IEEE, 1373-1381[DOI:10.1109/CVPR.2016.153]

Wang N Y and Yeung D Y. 2013. Learning a deep compact image representation for visual tracking//Proceedings of the 26th International Conference on Neural Information Processing Systems. Lake Tahoe, Nevada: ACM, 809-817

Wang N Y, Shi J P, Yeung D Y and Jia J. 2015. Understanding and diagnosing visual tracking systems//Proceedings of 2015 IEEE International Conference on Computer Vision. Santiago, Chile: IEEE, 3101-3109[DOI:10.1109/ICCV.2015.355]

Wang N, Zhou W G, Tian Q, Hong R, Wang M and Li H. 2018a. Multi-cue correlation filters for robust visual tracking//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, UT, USA: IEEE, 4844-4853[DOI:10.1109/CVPR.2018.00509]

Wang Q, Zhang M D, Xing J L, Gao J, Hu W and Maybank S. 2018b. Do not lose the details: reinforced representation learning for high performance visual tracking//Proceedings of the 27th International Joint Conference on Artificial Intelligence. Stockholm, Sweden: IJCAI, 985-997[DOI:10.24963/ijcai.2018/137]

Wang X, Li C L, Luo B and Tang J. 2018c. SINT++: robust visual tracking via adversarial positive instance generation//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, UT, USA: IEEE, 4864-4873[DOI:10.1109/CVPR.2018.00511]

Wu Y, Lim J, Yang M H. 2015. Object tracking benchmark. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(9): 1834-1848 [DOI:10.1109/TPAMI.2014.2388226]

Wu Y, Lim J and Yang M H. 2013. Online object tracking: a benchmark//Proceedings of 2013 IEEE Conference on Computer Vision and Pattern Recognition. Portland, OR, USA: IEEE, 2411-2418[DOI:10.1109/CVPR.2013.312]

Yang T Y and Chan A B. 2018. Learning dynamic memory networks for object tracking//Proceedings of the 15th European Conference on Computer Vision. Munich, Germany: Springer, 153-169[DOI:10.1007/978-3-030-01240-3_10]

Yun S, Choi J and Yun Y. 2017. Action-decision networks for visual tracking with deep reinforcement learning//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, HI, USA: IEEE, 1349-1358[DOI:10.1109/CVPR.2017.148]

Zeiler M D and Fergus R. 2014. Visualizing and understanding convolutional networks//Proceedings of the 13th European Conference on Computer Vision. Zurich, Switzerland: Springer, 818-833[DOI:10.1007/978-3-319-10590-1_53]

Zhang K H, Liu Q S, Wu Y, Yang M H. 2016. Robust visual tracking via convolutional networks without training. IEEE Transactions on Image Processing, 25(4): 1779-1792 [DOI:10.1109/TIP.2016.2531283]

Zhao F, Wang J Q, Wu Y, Tang M J I T. 2019. Adversarial deep tracking. IEEE Transactions on Circuits and Systems for Video Technology, 29(7): 1998-2011 [DOI:10.1109/TCSVT.2018.2856540]

Zhu Z, Wang Q, Li B, Wu W, Yan J and Hu W. 2018a. Distractor-aware Siamese networks for visual object tracking//Proceedings of the 15th European Conference on Computer Vision. Munich, Germany: Springer, 103-119[DOI:10.1007/978-3-030-01240-3_7]

Zhu Z, Wu W, Zou W and Yan J. 2018b. End-to-end flow correlation tracking with spatial-temporal attention//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, UT, USA: IEEE, 548-557[DOI:10.1109/CVPR.2018.00064]

Survey of visual object tracking algorithms based on deep learning (2024)

References