Extractive summarization of documents with images based on multi-modal RNN

Abstract

Rapid growth of multi-modal documents containing images on the Internet expresses strong demand on multi-modal summarization. The challenge is to create a computing method that can uniformly process text and image. Deep learning provides basic models for meeting this challenge. This paper treats extractive multi-modal summarization as a classification problem and proposes a sentence–image classification method based on the multi-modal RNN model. Our method encodes words and sentences with the hierarchical RNN models and encodes the ordered image set with the CNN model and the RNN model, and then calculates the selection probability of sentences and the sentence–image alignment probability through a logistic classifier taking text coverage, text redundancy, image set coverage, and image set redundancy as features. Two methods are proposed to compute the image set redundancy feature by combining the important scores of sentences and the hidden sentence–image alignment. Experiments on the extended DailyMail corpora constructed by collecting images and captions from the Web show that our method outperforms 11 baseline text summarization methods and that adopting the two image-related features in the classification method can improve text summarization. Our method is able to mine the hidden sentence–image alignments and to create informative well-aligned multi-modal summaries.

Publication DOI: https://doi.org/10.1016/j.future.2019.04.045
Divisions: College of Engineering & Physical Sciences > Systems analytics research institute (SARI)
?? 50811700Jl ??
College of Engineering & Physical Sciences
Funding Information: The research was sponsored by the National Natural Science Foundation of China (No. 61806101 , No. 61876048 , No. 61602256 , No. 61876091 ), and the Open Foundation of Key Laboratory of Intelligent Information Processing, ICT, CAS, China ( IIP2019-2 ). Professor Hai Zhuge is the corresponding author of this paper. Jingqiang Chen is a Lecturer in the School of Computer Science, Nanjing University of Posts and Telecommunications, China. He received the Ph.D. degree in computer science from the Institute of Computing Technology, Chinese Academy of Sciences, in 2014. His current research interests include Semantic Link Network and text summarization. Hai Zhuge is a Distinguished Scientist of the ACM and a Fellow of British Computer Society. He has made a systematic contribution to semantics and knowledge modeling through lasting fundamental research on the Semantic Link Network and the Resource Space Model based on multi-dimensional methodology. He is leading research toward Cyber--Physical Society through methodological, theoretical and technical innovation. He gave 17 keynotes at international conferences and invited lectures in universities of many countries as a Distinguished Speaker of the ACM. As a chair in computer science, he leads the International Research Network on Cyber–Physical–Social Intelligence consisting of Aston University, Guangzhou University, KLIIP at Institute of Computing Technology in Chinese Academy of Sciences, and University of Chinese Academy of Sciences. He was a Distinguished Visiting Fellow of Royal Academy of Engineering. He is the author of four monographs: Cyber–Physical–Social Intelligence on Human–Machine–Nature Symbiosis (Springer, 2019), Multi-Dimensional Summarization in Cyber–Physical Society (Morgan Kaufmann, 2016), The Knowledge Grid: Toward Cyber–Physical Society (World Scientific, 2012, 2nd edition), and The Web Resource Space Model (Springer, 2008). He was an associate editor of the Future Generation Computer Systems. He is serving as an associate editor of the IEEE Intelligent Systems. Homepage: http://www.knowledgegrid.net/ h.zhuge .
Additional Information: © 2019, Elsevier. Licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International http://creativecommons.org/licenses/by-nc-nd/4.0/ Funding: National Natural Science Foundation of China (No. 61806101, No. 61876048, No. 61602256, No. 61876091), and the Open Foundation of Key Laboratory of Intelligent Information Processing, ICT, CAS, China (IIP2019-2).
Uncontrolled Keywords: Document summarization,Extractive summarization,Multi-modal summarization,RNN,Summarization,Software,Hardware and Architecture,Computer Networks and Communications
Publication ISSN: 1872-7115
Last Modified: 05 Feb 2026 08:04
Date Deposited: 28 May 2019 09:01
Full Text Link:
Related URLs: http://www.scop ... tnerID=8YFLogxK (Scopus URL)
https://www.sci ... 6876?via%3Dihub (Publisher URL)
PURE Output Type: Article
Published Date: 2019-10-01
Published Online Date: 2019-04-25
Accepted Date: 2019-04-19
Authors: Chen, Jingqiang
Zhuge, Hai (ORCID Profile 0000-0001-8250-6408)

Export / Share Citation


Statistics

Additional statistics for this record