标题:Shorter-is-better: Venue category estimation from micro-video
作者:Zhang, Jianglong ;Nie, Liqiang ;Wang, Xiang ;He, Xiangnan ;Huang, Xianglin ;Chua, Tat-Seng
作者机构:[Zhang, Jianglong ;Huang, Xianglin ] Faculty of Science and Technolgy, Communication University of China, Beijing, China;[Nie, Liqiang ] Department of 更多
会议名称:24th ACM Multimedia Conference, MM 2016
会议日期:15 October 2016 through 19 October 2016
来源:MM 2016 - Proceedings of the 2016 ACM Multimedia Conference
出版年:2016
页码:1415-1424
DOI:10.1145/2964284.2964307
关键词:Micro-video analysis; Multi-modal multi-task learning; Venue category estimatio
摘要:According to our statistics on over 2 million micro-videos, only 1.22% of them are associated with venue information, which greatly hinders the location-oriented applications and personalized services. To alleviate this problem, we aim to label the bite-sized video clips with venue categories. It is, however, nontrivial due to three reasons: 1) no available benchmark dataset; 2) insufficient information, low quality, and information loss; and 3) complex relatedness among venue categories. Towards this end, we propose a scheme comprising of two components. In particular, we first crawl a representative set of micro-videos from Vine and extract a rich set of features from textual, visual and acoustic modalities. We then, in the second component, build a tree-guided multi-task multi-modal learning model to estimate the venue category for each unseen micro-video. This model is able to jointly learn a common space from multi-modalities and leverage the predefined Foursquare hierarchical structure to regularize the relatedness among venue categories. Extensive experiments have well-validated our model. As a side research contribution, we have released our data, codes and involved parameters. © 2016 ACM.
收录类别:EI;SCOPUS
Scopus被引频次:12
资源类型:会议论文;期刊论文
原文链接:https://www.scopus.com/inward/record.uri?eid=2-s2.0-84994607834&doi=10.1145%2f2964284.2964307&partnerID=40&md5=8c4f68447857650b1d5b37d6be16b40b
TOP