CLApr 25, 2020
Combining Word Embeddings and N-grams for Unsupervised Document SummarizationZhuolin Jiang, Manaj Srivastava, Sanjay Krishna et al.
Graph-based extractive document summarization relies on the quality of the sentence similarity graph. Bag-of-words or tf-idf based sentence similarity uses exact word matching, but fails to measure the semantic similarity between individual words or to consider the semantic structure of sentences. In order to improve the similarity measure between sentences, we employ off-the-shelf deep embedding features and tf-idf features, and introduce a new text similarity metric. An improved sentence similarity graph is built and used in a submodular objective function for extractive summarization, which consists of a weighted coverage term and a diversity term. A Transformer based compression model is developed for sentence compression to aid in document summarization. Our summarization approach is extractive and unsupervised. Experiments demonstrate that our approach can outperform the tf-idf based approach and achieve state-of-the-art performance on the DUC04 dataset, and comparable performance to the fully supervised learning methods on the CNN/DM and NYT datasets.
SRApr 8, 2019
Desaturating EUV observations of solar flaring stormsSabrina Guastavino, Michele Piana, Anna Maria Massone et al.
Image saturation has been an issue for several instruments in solar astronomy, mainly at EUV wavelengths. However, with the launch of the Atmospheric Imaging Assembly (AIA) as part of the payload of the Solar Dynamic Observatory (SDO) image saturation has become a big data issue, involving around 10^$ frames of the impressive dataset this beautiful telescope has been providing every year since February 2010. This paper introduces a novel desaturation method, which is able to recover the signal in the saturated region of any AIA image by exploiting no other information but the one contained in the image itself. This peculiar methodological property, jointly with the unprecedented statistical reliability of the desaturated images, could make this algorithm the perfect tool for the realization of a reconstruction pipeline for AIA data, able to work properly even in the case of long-lasting, very energetic flaring events.
CLJun 1, 2015
Statistical Machine Translation Features with Multitask Tensor NetworksHendra Setiawan, Zhongqiang Huang, Jacob Devlin et al.
We present a three-pronged approach to improving Statistical Machine Translation (SMT), building on recent success in the application of neural networks to SMT. First, we propose new features based on neural networks to model various non-local translation phenomena. Second, we augment the architecture of the neural network with tensor layers that capture important higher-order interaction among the network units. Third, we apply multitask learning to estimate the neural network parameters jointly. Each of our proposed methods results in significant improvements that are complementary. The overall improvement is +2.7 and +1.8 BLEU points for Arabic-English and Chinese-English translation over a state-of-the-art system that already includes neural network features.