Matthew Dixon

LG
h-index38
8papers
2,458citations
Novelty39%
AI Score32

8 Papers

CLApr 22, 2024Code
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

Marah Abdin, Jyoti Aneja, Hany Awadalla et al. · microsoft-research, stanford

We introduce phi-3-mini, a 3.8 billion parameter language model trained on 3.3 trillion tokens, whose overall performance, as measured by both academic benchmarks and internal testing, rivals that of models such as Mixtral 8x7B and GPT-3.5 (e.g., phi-3-mini achieves 69% on MMLU and 8.38 on MT-bench), despite being small enough to be deployed on a phone. Our training dataset is a scaled-up version of the one used for phi-2, composed of heavily filtered publicly available web data and synthetic data. The model is also further aligned for robustness, safety, and chat format. We also provide parameter-scaling results with a 7B, 14B models trained for 4.8T tokens, called phi-3-small, phi-3-medium, both significantly more capable than phi-3-mini (e.g., respectively 75%, 78% on MMLU, and 8.7, 8.9 on MT-bench). To enhance multilingual, multimodal, and long-context capabilities, we introduce three models in the phi-3.5 series: phi-3.5-mini, phi-3.5-MoE, and phi-3.5-Vision. The phi-3.5-MoE, a 16 x 3.8B MoE model with 6.6 billion active parameters, achieves superior performance in language reasoning, math, and code tasks compared to other open-source models of similar scale, such as Llama 3.1 and the Mixtral series, and on par with Gemini-1.5-Flash and GPT-4o-mini. Meanwhile, phi-3.5-Vision, a 4.2 billion parameter model derived from phi-3.5-mini, excels in reasoning tasks and is adept at handling both single-image and text prompts, as well as multi-image and text prompts.

NAApr 25, 2010
Error Control of Iterative Linear Solvers for Integrated Groundwater Models

Matthew Dixon, Zhaojun Bai, Charles Brush et al.

An open problem that arises when using modern iterative linear solvers, such as the preconditioned conjugate gradient (PCG) method or Generalized Minimum RESidual method (GMRES) is how to choose the residual tolerance in the linear solver to be consistent with the tolerance on the solution error. This problem is especially acute for integrated groundwater models which are implicitly coupled to another model, such as surface water models, and resolve both multiple scales of flow and temporal interaction terms, giving rise to linear systems with variable scaling. This article uses the theory of 'forward error bound estimation' to show how rescaling the linear system affects the correspondence between the residual error in the preconditioned linear system and the solution error. Using examples of linear systems from models developed using the USGS GSFLOW package and the California State Department of Water Resources' Integrated Water Flow Model (IWFM), we observe that this error bound guides the choice of a practical measure for controlling the error in rescaled linear systems. It is found that forward error can be controlled in preconditioned GMRES by rescaling the linear system and normalizing the stopping tolerance. We implemented a preconditioned GMRES algorithm and benchmarked it against the Successive-Over-Relaxation (SOR) method. Improved error control reduces redundant iterations in the GMRES algorithm and results in overall simulation speedups as large as 7.7x. This research is expected to broadly impact groundwater modelers through the demonstration of a practical approach for setting the residual tolerance in line with the solution error tolerance.

NAJan 28, 2010
Conservative Properties of the Variational Free-Lagrange Method for Shallow Water

Matthew Dixon, Todd Ringler

The variational free-Lagrange (VFL) method for shallow water is a free-Lagrange method with the additional property that it preserves the variational structure of shallow water. The VFL method was first derived in this context by \cite{AUG84} who discretized Hamilton's action principle with a free-Lagrange data structure. The purpose of this article is to assess the long-time conservation properties of the VFL method for regularized shallow water which are useful for climate simulation. Long-time regularized shallow water simulations show that the VFL method exhibits no secular drift in the (i) energy error through the application of symplectic integrators; and (ii) the potential vorticity error through the construction of discrete curl, divergence and gradient operators which satisfy semi-discrete divergence and potential vorticity conservation laws. These diagnostic semi-discrete equations augment the description of the VFL method by characterizing the evolution of its respective irrotational and solenoidal components in the Lagrangian frame. Like the continuum equations, the former exhibits a $\text{div}^2\mathbf{U}$ term which indicates that the flow has a very strong tendency towards a purely rotational state. Numerical results show (i) the preservation of shape and strength of an initially radially symmetric vortex pair in purely rotational regularized shallow water and (ii) how the Voronoi diagram retains the history of the flow field and (iii) that energy is conserved to $\mathcal{O}(Δ^2)$ and potential vorticity error to within 5% with no secular growth over a 50 year period.

CRJul 1, 2024
On the Abuse and Detection of Polyglot Files

Luke Koch, Sean Oesch, Amul Chaulagain et al.

A polyglot is a file that is valid in two or more formats. Polyglot files pose a problem for malware detection systems that route files to format-specific detectors/signatures, as well as file upload and sanitization tools. In this work we found that existing file-format and embedded-file detection tools, even those developed specifically for polyglot files, fail to reliably detect polyglot files used in the wild, leaving organizations vulnerable to attack. To address this issue, we studied the use of polyglot files by malicious actors in the wild, finding $30$ polyglot samples and $15$ attack chains that leveraged polyglot files. In this report, we highlight two well-known APTs whose cyber attack chains relied on polyglot files to bypass detection mechanisms. Using knowledge from our survey of polyglot usage in the wild -- the first of its kind -- we created a novel data set based on adversary techniques. We then trained a machine learning detection solution, PolyConv, using this data set. PolyConv achieves a precision-recall area-under-curve score of $0.999$ with an F1 score of $99.20$% for polyglot detection and $99.47$% for file-format identification, significantly outperforming all other tools tested. We developed a content disarmament and reconstruction tool, ImSan, that successfully sanitized $100$% of the tested image-based polyglots, which were the most common type found via the survey. Our work provides concrete tools and suggestions to enable defenders to better defend themselves against polyglot files, as well as directions for future work to create more robust file specifications and methods of disarmament.

LGMar 29, 2016Code
Classification-based Financial Markets Prediction using Deep Neural Networks

Matthew Dixon, Diego Klabjan, Jin Hoon Bang

Deep neural networks (DNNs) are powerful types of artificial neural networks (ANNs) that use several hidden layers. They have recently gained considerable attention in the speech transcription and image recognition community (Krizhevsky et al., 2012) for their superior predictive properties including robustness to overfitting. However their application to algorithmic trading has not been previously researched, partly because of their computational complexity. This paper describes the application of DNNs to predicting financial market movement directions. In particular we describe the configuration and training approach and then demonstrate their application to backtesting a simple trading strategy over 43 different Commodity and FX future mid-prices at 5-minute intervals. All results in this paper are generated using a C++ implementation on the Intel Xeon Phi co-processor which is 11.4x faster than the serial version and a Python strategy backtesting environment both of which are available as open source code written by the authors.

LGNov 18, 2021
MS-nowcasting: Operational Precipitation Nowcasting with Convolutional LSTMs at Microsoft Weather

Sylwester Klocek, Haiyu Dong, Matthew Dixon et al.

We present the encoder-forecaster convolutional long short-term memory (LSTM) deep-learning model that powers Microsoft Weather's operational precipitation nowcasting product. This model takes as input a sequence of weather radar mosaics and deterministically predicts future radar reflectivity at lead times up to 6 hours. By stacking a large input receptive field along the feature dimension and conditioning the model's forecaster with predictions from the physics-based High Resolution Rapid Refresh (HRRR) model, we are able to outperform optical flow and HRRR baselines by 20-25% on multiple metrics averaged over all lead times.

PMFeb 25, 2020
G-Learner and GIRL: Goal Based Wealth Management with Reinforcement Learning

Matthew Dixon, Igor Halperin

We present a reinforcement learning approach to goal based wealth management problems such as optimization of retirement plans or target dated funds. In such problems, an investor seeks to achieve a financial goal by making periodic investments in the portfolio while being employed, and periodically draws from the account when in retirement, in addition to the ability to re-balance the portfolio by selling and buying different assets (e.g. stocks). Instead of relying on a utility of consumption, we present G-Learner: a reinforcement learning algorithm that operates with explicitly defined one-step rewards, does not assume a data generation process, and is suitable for noisy data. Our approach is based on G-learning - a probabilistic extension of the Q-learning method of reinforcement learning. In this paper, we demonstrate how G-learning, when applied to a quadratic reward and Gaussian reference policy, gives an entropy-regulated Linear Quadratic Regulator (LQR). This critical insight provides a novel and computationally tractable tool for wealth management tasks which scales to high dimensional portfolios. In addition to the solution of the direct problem of G-learning, we also present a new algorithm, GIRL, that extends our goal-based G-learning approach to the setting of Inverse Reinforcement Learning (IRL) where rewards collected by the agent are not observed, and should instead be inferred. We demonstrate that GIRL can successfully learn the reward parameters of a G-Learner agent and thus imitate its behavior. Finally, we discuss potential applications of the G-Learner and GIRL algorithms for wealth management and robo-advising.

CONov 27, 2017
OSTSC: Over Sampling for Time Series Classification in R

Matthew Dixon, Diego Klabjan, Lan Wei

The OSTSC package is a powerful oversampling approach for classifying univariant, but multinomial time series data in R. This article provides a brief overview of the oversampling methodology implemented by the package. A tutorial of the OSTSC package is provided. We begin by providing three test cases for the user to quickly validate the functionality in the package. To demonstrate the performance impact of OSTSC, we then provide two medium size imbalanced time series datasets. Each example applies a TensorFlow implementation of a Long Short-Term Memory (LSTM) classifier - a type of a Recurrent Neural Network (RNN) classifier - to imbalanced time series. The classifier performance is compared with and without oversampling. Finally, larger versions of these two datasets are evaluated to demonstrate the scalability of the package. The examples demonstrate that the OSTSC package improves the performance of RNN classifiers applied to highly imbalanced time series data. In particular, OSTSC is observed to increase the AUC of LSTM from 0.543 to 0.784 on a high frequency trading dataset consisting of 30,000 time series observations.