标题：Multimodal Dialog System: Generating Responses via Adaptive Decoders
作者：Nie, Liqiang; Wang, Wenjie; Hong, Richang; Wang, Meng; Tian, Qi
作者机构：[Nie, Liqiang; Wang, Wenjie] Shandong Univ, Jinan, Peoples R China.; [Hong, Richang; Wang, Meng] Hefei Univ Technol, Hefei, Peoples R China.; [Tia 更多
会议名称：27th ACM International Conference on Multimedia (MM)
会议日期：OCT 21-25, 2019
来源：PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19)
关键词：Multimodal Dialog Systems; Multiform Knowledge-aware Decoder; Adaptive; Decoders
摘要：On the shoulders of textual dialog systems, the multimodal ones, recently have engaged increasing attention, especially in the retail domain. Despite the commercial value of multimodal dialog systems, they still suffer from the following challenges: 1) automatically generate the right responses in appropriate medium forms; 2) jointly consider the visual cues and the side information while selecting product images; and 3) guide the response generation with multi-faceted and heterogeneous knowledge. To address the aforementioned issues, we present a Multimodal diAloG system with adaptIve deCoders, MAGIC for short. In particular, MAGIC first judges the response type and the corresponding medium form via understanding the intention of the given multimodal context. Hereafter, it employs adaptive decoders to generate the desired responses: a simple recurrent neural network (RNN) is applied to generating general responses, then a knowledge-aware RNN decoder is designed to encode the multiform domain knowledge to enrich the response, and the multimodal response decoder incorporates an image recommendation model which jointly considers the textual attributes and the visual images via a neural model optimized by the max-margin loss. We comparatively justify MAGIC over a benchmark dataset. Experiment results demonstrate that MAGIC outperforms the existing methods and achieves the state-of-the-art performance.