LGNov 27, 2021
Factorized Fourier Neural OperatorsAlasdair Tran, Alexander Mathews, Lexing Xie et al.
We propose the Factorized Fourier Neural Operator (F-FNO), a learning-based approach for simulating partial differential equations (PDEs). Starting from a recently proposed Fourier representation of flow fields, the F-FNO bridges the performance gap between pure machine learning approaches to that of the best numerical or hybrid solvers. This is achieved with new representations - separable spectral layers and improved residual connections - and a combination of training strategies such as the Markov assumption, Gaussian noise, and cosine learning rate decay. On several challenging benchmark PDEs on regular grids, structured meshes, and point clouds, the F-FNO can scale to deeper networks and outperform both the FNO and the geo-FNO, reducing the error by 83% on the Navier-Stokes problem, 31% on the elasticity problem, 57% on the airfoil flow problem, and 60% on the plastic forging problem. Compared to the state-of-the-art pseudo-spectral method, the F-FNO can take a step size that is an order of magnitude larger in time and achieve an order of magnitude speedup to produce the same solution quality.
SIFeb 15, 2021
Radflow: A Recurrent, Aggregated, and Decomposable Model for Networks of Time SeriesAlasdair Tran, Alexander Mathews, Cheng Soon Ong et al.
We propose a new model for networks of time series that influence each other. Graph structures among time series are found in diverse domains, such as web traffic influenced by hyperlinks, product sales influenced by recommendation, or urban transport volume influenced by road networks and weather. There has been recent progress in graph modeling and in time series forecasting, respectively, but an expressive and scalable approach for a network of series does not yet exist. We introduce Radflow, a novel model that embodies three key ideas: a recurrent neural network to obtain node embeddings that depend on time, the aggregation of the flow of influence from neighboring nodes with multi-head attention, and the multi-layer decomposition of time series. Radflow naturally takes into account dynamic networks where nodes and edges change over time, and it can be used for prediction and data imputation tasks. On real-world datasets ranging from a few hundred to a few hundred thousand nodes, we observe that Radflow variants are the best performing model across a wide range of settings. The recurrent component in Radflow also outperforms N-BEATS, the state-of-the-art time series model. We show that Radflow can learn different trends and seasonal patterns, that it is robust to missing nodes and edges, and that correlated temporal patterns among network neighbors reflect influence strength. We curate WikiTraffic, the largest dynamic network of time series with 366K nodes and 22M time-dependent links spanning five years. This dataset provides an open benchmark for developing models in this area, with applications that include optimizing resources for the web. More broadly, Radflow has the potential to improve forecasts in correlated time series networks such as the stock market, and impute missing measurements in geographically dispersed networks of natural phenomena.
SIFeb 3, 2021
AttentionFlow: Visualising Influence in Networks of Time SeriesMinjeong Shin, Alasdair Tran, Siqi Wu et al.
The collective attention on online items such as web pages, search terms, and videos reflects trends that are of social, cultural, and economic interest. Moreover, attention trends of different items exhibit mutual influence via mechanisms such as hyperlinks or recommendations. Many visualisation tools exist for time series, network evolution, or network influence; however, few systems connect all three. In this work, we present AttentionFlow, a new system to visualise networks of time series and the dynamic influence they have on one another. Centred around an ego node, our system simultaneously presents the time series on each node using two visual encodings: a tree ring for an overview and a line chart for details. AttentionFlow supports interactions such as overlaying time series of influence and filtering neighbours by time or flux. We demonstrate AttentionFlow using two real-world datasets, VevoMusic and WikiTraffic. We show that attention spikes in songs can be explained by external events such as major awards, or changes in the network such as the release of a new song. Separate case studies also demonstrate how an artist's influence changes over their career, and that correlated Wikipedia traffic is driven by cultural interests. More broadly, AttentionFlow can be generalised to visualise networks of time series on physical infrastructures such as road networks, or natural phenomena such as weather and geological measurements.
LGJul 28, 2020
UNIPoint: Universally Approximating Point Processes IntensitiesAlexander Soen, Alexander Mathews, Daniel Grixti-Cheng et al.
Point processes are a useful mathematical tool for describing events over time, and so there are many recent approaches for representing and learning them. One notable open question is how to precisely describe the flexibility of point process models and whether there exists a general model that can represent all point processes. Our work bridges this gap. Focusing on the widely used event intensity function representation of point processes, we provide a proof that a class of learnable functions can universally approximate any valid intensity function. The proof connects the well known Stone-Weierstrass Theorem for function approximation, the uniform density of non-negative continuous functions using a transfer functions, the formulation of the parameters of a piece-wise continuous functions as a dynamic system, and a recurrent neural network implementation for capturing the dynamics. Using these insights, we design and implement UNIPoint, a novel neural point process model, using recurrent neural networks to parameterise sums of basis function upon each event. Evaluations on synthetic and real world datasets show that this simpler representation performs better than Hawkes process variants and more complex neural network-based approaches. We expect this result will provide a practical basis for selecting and tuning models, as well as furthering theoretical work on representational complexity and learnability.
CVApr 17, 2020
Transform and Tell: Entity-Aware News Image CaptioningAlasdair Tran, Alexander Mathews, Lexing Xie
We propose an end-to-end model which generates captions for images embedded in news articles. News images present two key challenges: they rely on real-world knowledge, especially about named entities; and they typically have linguistically rich captions that include uncommon words. We address the first challenge by associating words in the caption with faces and objects in the image, via a multi-modal, multi-head attention mechanism. We tackle the second challenge with a state-of-the-art transformer language model that uses byte-pair-encoding to generate captions as a sequence of word parts. On the GoodNews dataset, our model outperforms the previous state of the art by a factor of four in CIDEr score (13 to 54). This performance gain comes from a unique combination of language models, word representation, image embeddings, face embeddings, object embeddings, and improvements in neural network design. We also introduce the NYTimes800k dataset which is 70% larger than GoodNews, has higher article quality, and includes the locations of images within articles as an additional contextual cue.
IRDec 6, 2018
Comparative Document Summarisation via ClassificationUmanga Bista, Alexander Mathews, Minjeong Shin et al.
This paper considers extractive summarisation in a comparative setting: given two or more document groups (e.g., separated by publication time), the goal is to select a small number of documents that are representative of each group, and also maximally distinguishable from other groups. We formulate a set of new objective functions for this problem that connect recent literature on document summarisation, interpretable machine learning, and data subset selection. In particular, by casting the problem as a binary classification amongst different groups, we derive objectives based on the notion of maximum mean discrepancy, as well as a simple yet effective gradient-based optimisation strategy. Our new formulation allows scalable evaluations of comparative summarisation as a classification task, both automatically and via crowd-sourcing. To this end, we evaluate comparative summarisation methods on a newly curated collection of controversial news topics over 13 months. We observe that gradient-based optimisation outperforms discrete and baseline approaches in 14 out of 24 different automatic evaluation settings. In crowd-sourced evaluations, summaries from gradient optimisation elicit 7% more accurate classification from human workers than discrete optimisation. Our result contrasts with recent literature on submodular data subset selection that favours discrete optimisation. We posit that our formulation of comparative summarisation will prove useful in a diverse range of use cases such as comparing content sources, authors, related topics, or distinct view points.
CVMay 18, 2018
SemStyle: Learning to Generate Stylised Image Captions using Unaligned TextAlexander Mathews, Lexing Xie, Xuming He
Linguistic style is an essential part of written communication, with the power to affect both clarity and attractiveness. With recent advances in vision and language, we can start to tackle the problem of generating image captions that are both visually grounded and appropriately styled. Existing approaches either require styled training captions aligned to images or generate captions with low relevance. We develop a model that learns to generate visually relevant styled captions from a large corpus of styled text without aligned images. The core idea of this model, called SemStyle, is to separate semantics and style. One key component is a novel and concise semantic term representation generated using natural language processing techniques and frame semantics. In addition, we develop a unified language model that decodes sentences with diverse word choices and syntax for different styles. Evaluations, both automatic and manual, show captions from SemStyle preserve image semantics, are descriptive, and are style shifted. More broadly, this work provides possibilities to learn richer image descriptions from the plethora of linguistic data available on the web.
CLMay 15, 2018
Simplifying Sentences with Sequence to Sequence ModelsAlexander Mathews, Lexing Xie, Xuming He
We simplify sentences with an attentive neural network sequence to sequence model, dubbed S4. The model includes a novel word-copy mechanism and loss function to exploit linguistic similarities between the original and simplified sentences. It also jointly uses pre-trained and fine-tuned word embeddings to capture the semantics of complex sentences and to mitigate the effects of limited data. When trained and evaluated on pairs of sentences from thousands of news articles, we observe a 8.8 point improvement in BLEU score over a sequence to sequence baseline; however, learning word substitutions remains difficult. Such sequence to sequence models are promising for other text generation tasks such as style transfer.
CVOct 6, 2015
SentiCap: Generating Image Descriptions with SentimentsAlexander Mathews, Lexing Xie, Xuming He
The recent progress on image recognition and language modeling is making automatic description of image content a reality. However, stylized, non-factual aspects of the written description are missing from the current systems. One such style is descriptions with emotions, which is commonplace in everyday communication, and influences decision-making and interpersonal relationships. We design a system to describe an image with emotions, and present a model that automatically generates captions with positive or negative sentiments. We propose a novel switching recurrent neural network with word-level regularization, which is able to produce emotional image captions using only 2000+ training sentences containing sentiments. We evaluate the captions with different automatic and crowd-sourcing metrics. Our model compares favourably in common quality metrics for image captioning. In 84.6% of cases the generated positive captions were judged as being at least as descriptive as the factual captions. Of these positive captions 88% were confirmed by the crowd-sourced workers as having the appropriate sentiment.