Eric Chung

NA
h-index57
27papers
1,080citations
Novelty46%
AI Score55

27 Papers

CLApr 4, 2025Code
Nemotron-H: A Family of Accurate and Efficient Hybrid Mamba-Transformer Models

Aaron Blakeman, Aarti Basant, Abhinav Khattar et al. · nvidia

As inference-time scaling becomes critical for enhanced reasoning capabilities, it is increasingly becoming important to build models that are efficient to infer. We introduce Nemotron-H, a family of 8B and 56B/47B hybrid Mamba-Transformer models designed to reduce inference cost for a given accuracy level. To achieve this goal, we replace the majority of self-attention layers in the common Transformer model architecture with Mamba layers that perform constant computation and require constant memory per generated token. We show that Nemotron-H models offer either better or on-par accuracy compared to other similarly-sized state-of-the-art open-sourced Transformer models (e.g., Qwen-2.5-7B/72B and Llama-3.1-8B/70B), while being up to 3$\times$ faster at inference. To further increase inference speed and reduce the memory required at inference time, we created Nemotron-H-47B-Base from the 56B model using a new compression via pruning and distillation technique called MiniPuzzle. Nemotron-H-47B-Base achieves similar accuracy to the 56B model, but is 20% faster to infer. In addition, we introduce an FP8-based training recipe and show that it can achieve on par results with BF16-based training. This recipe is used to train the 56B model. We are releasing Nemotron-H base model checkpoints with support in Hugging Face and NeMo.

95.3LGApr 14Code
Nemotron 3 Super: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning

Aakshita Chandiramani, Aaron Blakeman, Abdullahi Olaoye et al. · amazon-science, cmu

We describe the pre-training, post-training, and quantization of Nemotron 3 Super, a 120 billion (active 12 billion) parameter hybrid Mamba-Attention Mixture-of-Experts model. Nemotron 3 Super is the first model in the Nemotron 3 family to 1) be pre-trained in NVFP4, 2) leverage LatentMoE, a new Mixture-of-Experts architecture that optimizes for both accuracy per FLOP and accuracy per parameter, and 3) include MTP layers for inference acceleration through native speculative decoding. We pre-trained Nemotron 3 Super on 25 trillion tokens followed by post-training using supervised fine tuning (SFT) and reinforcement learning (RL). The final model supports up to 1M context length and achieves comparable accuracy on common benchmarks, while also achieving up to 2.2x and 7.5x higher inference throughput compared to GPT-OSS-120B and Qwen3.5-122B, respectively. Nemotron 3 Super datasets, along with the base, post-trained, and quantized checkpoints, are open-sourced on HuggingFace.

CLAug 20, 2025
NVIDIA Nemotron Nano 2: An Accurate and Efficient Hybrid Mamba-Transformer Reasoning Model

Aarti Basant, Abhijit Khairnar, Abhijit Paithankar et al. · nvidia

We introduce Nemotron-Nano-9B-v2, a hybrid Mamba-Transformer language model designed to increase throughput for reasoning workloads while achieving state-of-the-art accuracy compared to similarly-sized models. Nemotron-Nano-9B-v2 builds on the Nemotron-H architecture, in which the majority of the self-attention layers in the common Transformer architecture are replaced with Mamba-2 layers, to achieve improved inference speed when generating the long thinking traces needed for reasoning. We create Nemotron-Nano-9B-v2 by first pre-training a 12-billion-parameter model (Nemotron-Nano-12B-v2-Base) on 20 trillion tokens using an FP8 training recipe. After aligning Nemotron-Nano-12B-v2-Base, we employ the Minitron strategy to compress and distill the model with the goal of enabling inference on up to 128k tokens on a single NVIDIA A10G GPU (22GiB of memory, bfloat16 precision). Compared to existing similarly-sized models (e.g., Qwen3-8B), we show that Nemotron-Nano-9B-v2 achieves on-par or better accuracy on reasoning benchmarks while achieving up to 6x higher inference throughput in reasoning settings like 8k input and 16k output tokens. We are releasing Nemotron-Nano-9B-v2, Nemotron-Nano12B-v2-Base, and Nemotron-Nano-9B-v2-Base checkpoints along with the majority of our pre- and post-training datasets on Hugging Face.

NAApr 28, 2016
Adaptive multiscale model reduction with Generalized Multiscale Finite Element Methods

Eric Chung, Yalchin Efendiev, Thomas Y. Hou

In this paper, we discuss a general multiscale model reduction framework based on multiscale finite element methods. We give a brief overview of related multiscale methods. Due to page limitations, the overview focuses on a few related methods and is not intended to be comprehensive. We present a general adaptive multiscale model reduction framework, the Generalized Multiscale Finite Element Method. Besides the method's basic outline, we discuss some important ingredients needed for the method's success. We also discuss several applications. The proposed method allows performing local model reduction in the presence of high contrast and no scale separation.

CLSep 29, 2025
Pretraining Large Language Models with NVFP4

Felix Abecassis, Anjulie Agrusa, Dong Ahn et al. · nvidia

Large Language Models (LLMs) today are powerful problem solvers across many domains, and they continue to get stronger as they scale in model size, training set size, and training set quality, as shown by extensive research and experimentation across the industry. Training a frontier model today requires on the order of tens to hundreds of yottaflops, which is a massive investment of time, compute, and energy. Improving pretraining efficiency is therefore essential to enable the next generation of even more capable LLMs. While 8-bit floating point (FP8) training is now widely adopted, transitioning to even narrower precision, such as 4-bit floating point (FP4), could unlock additional improvements in computational speed and resource utilization. However, quantization at this level poses challenges to training stability, convergence, and implementation, notably for large-scale models trained on long token horizons. In this study, we introduce a novel approach for stable and accurate training of large language models (LLMs) using the NVFP4 format. Our method integrates Random Hadamard transforms (RHT) to bound block-level outliers, employs a two-dimensional quantization scheme for consistent representations across both the forward and backward passes, utilizes stochastic rounding for unbiased gradient estimation, and incorporates selective high-precision layers. We validate our approach by training a 12-billion-parameter model on 10 trillion tokens -- the longest publicly documented training run in 4-bit precision to date. Our results show that the model trained with our NVFP4-based pretraining technique achieves training loss and downstream task accuracies comparable to an FP8 baseline. These findings highlight that NVFP4, when combined with our training approach, represents a major step forward in narrow-precision LLM training algorithms.

LGFeb 16, 2023
With Shared Microexponents, A Little Shifting Goes a Long Way

Bita Rouhani, Ritchie Zhao, Venmugil Elango et al.

This paper introduces Block Data Representations (BDR), a framework for exploring and evaluating a wide spectrum of narrow-precision formats for deep learning. It enables comparison of popular quantization standards, and through BDR, new formats based on shared microexponents (MX) are identified, which outperform other state-of-the-art quantization approaches, including narrow-precision floating-point and block floating-point. MX utilizes multiple levels of quantization scaling with ultra-fine scaling factors based on shared microexponents in the hardware. The effectiveness of MX is demonstrated on real-world models including large-scale generative pretraining and inferencing, and production-scale recommendation systems.

NAMay 16, 2017
Constraint Energy Minimizing Generalized Multiscale Finite Element Method in the Mixed Formulation

Eric Chung, Yalchin Efendiev, Wing Tat Leung

This paper presents a novel mass-conservative mixed multiscale method for solving flow equations in heterogeneous porous media. The media properties (the permeability) contain multiple scales and high contrast. The proposed method solves the flow equation in a mixed formulation on a coarse grid by constructing multiscale basis functions. The resulting velocity field is mass conservative on the fine grid. Our main goal is to obtain first-order convergence in terms of the mesh size which is independent of local contrast. This is achieved, first, by constructing some auxiliary spaces, which contain global information that can not be localized, in general. This is built on our previous work on the Generalized Multiscale Finite Element Method (GMsFEM). In the auxiliary space, multiscale basis functions corresponding to small (contrast-dependent) eigenvalues are selected. These basis functions represent the high-conductivity channels (which connect the boundaries of a coarse block). Next, we solve local problems to construct multiscale basis functions for the velocity field. These local problems are formulated in the oversampled domain taking into account some constraints with respect to auxiliary spaces. The latter allows fast spatial decay of local solutions and, thus, allows taking smaller oversampled regions. The number of basis functions depends on small eigenvalues of the local spectral problems. Moreover, multiscale pressure basis functions are needed in constructing the velocity space. Our multiscale spaces have a minimal dimension, which is needed to avoid contrast-dependence in the convergence. The method's convergence requires an oversampling of several layers. We present an analysis of our approach. Our numerical results confirm that the convergence rate is first order with respect to the mesh size and independent of the contrast.

CLDec 23, 2025
Nemotron 3 Nano: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning

Aaron Blakeman, Aaron Grattafiori, Aarti Basant et al. · nvidia

We present Nemotron 3 Nano 30B-A3B, a Mixture-of-Experts hybrid Mamba-Transformer language model. Nemotron 3 Nano was pretrained on 25 trillion text tokens, including more than 3 trillion new unique tokens over Nemotron 2, followed by supervised fine tuning and large-scale RL on diverse environments. Nemotron 3 Nano achieves better accuracy than our previous generation Nemotron 2 Nano while activating less than half of the parameters per forward pass. It achieves up to 3.3x higher inference throughput than similarly-sized open models like GPT-OSS-20B and Qwen3-30B-A3B-Thinking-2507, while also being more accurate on popular benchmarks. Nemotron 3 Nano demonstrates enhanced agentic, reasoning, and chat abilities and supports context lengths up to 1M tokens. We release both our pretrained Nemotron 3 Nano 30B-A3B Base and post-trained Nemotron 3 Nano 30B-A3B checkpoints on Hugging Face.

CLDec 24, 2025
NVIDIA Nemotron 3: Efficient and Open Intelligence

Aaron Blakeman, Aaron Grattafiori, Aarti Basant et al. · nvidia

We introduce the Nemotron 3 family of models - Nano, Super, and Ultra. These models deliver strong agentic, reasoning, and conversational capabilities. The Nemotron 3 family uses a Mixture-of-Experts hybrid Mamba-Transformer architecture to provide best-in-class throughput and context lengths of up to 1M tokens. Super and Ultra models are trained with NVFP4 and incorporate LatentMoE, a novel approach that improves model quality. The two larger models also include MTP layers for faster text generation. All Nemotron 3 models are post-trained using multi-environment reinforcement learning enabling reasoning, multi-step tool use, and support granular reasoning budget control. Nano, the smallest model, outperforms comparable models in accuracy while remaining extremely cost-efficient for inference. Super is optimized for collaborative agents and high-volume workloads such as IT ticket automation. Ultra, the largest model, provides state-of-the-art accuracy and reasoning performance. Nano is released together with its technical report and this white paper, while Super and Ultra will follow in the coming months. We will openly release the model weights, pre- and post-training software, recipes, and all data for which we hold redistribution rights.

NAJun 24, 2016
BDDC and FETI-DP algorithms with adaptive coarse spaces for three-dimensional elliptic problems with oscillatory and high contrast coefficients

Hyea Hyun Kim, Eric Chung, Junxian Wang

BDDC and FETI-DP algorithms are developed for three-dimensional elliptic problems with adaptively enriched coarse components. It is known that these enriched components are necessary in the development of robust preconditioners. To form the adaptive coarse components, carefully designed generalized eigenvalue problems are introduced for each faces and edges, and the coarse components are formed by using eigenvectors with their corresponding eigenvalues larger than a given tolerance $λ_{TOL}$. Upper bounds for condition numbers of the preconditioned systems are shown to be $C λ_{TOL}$, with the constant $C$ depending only on the maximum number of edges and faces per subdomain, and the maximum number of subdomains sharing an edge. Numerical results are presented to test the robustness of the proposed approach.

NAOct 24, 2018
Edge Multiscale Methods for elliptic problems with heterogeneous coefficients

Shubin Fu, Eric Chung, Guanglian Li

In this paper, we proposed two new types of edge multiscale methods motivated by \cite{GL18} to solve Partial Differential Equations (PDEs) with high-contrast heterogeneous coefficients: Edge spectral multiscale Finte Element method (ESMsFEM) and Wavelet-based edge multiscale Finite Element method (WEMsFEM). Their convergence rates for elliptic problems with high-contrast heterogeneous coefficients are demonstrated in terms of the coarse mesh size $H$, the number of spectral basis functions and the level of the wavelet space $\ell$, which are verified by extensive numerical tests.

LGOct 16, 2023
Microscaling Data Formats for Deep Learning

Bita Darvish Rouhani, Ritchie Zhao, Ankit More et al.

Narrow bit-width data formats are key to reducing the computational and storage costs of modern deep learning applications. This paper evaluates Microscaling (MX) data formats that combine a per-block scaling factor with narrow floating-point and integer types for individual elements. MX formats balance the competing needs of hardware efficiency, model accuracy, and user friction. Empirical results on over two dozen benchmarks demonstrate practicality of MX data formats as a drop-in replacement for baseline FP32 for AI inference and training with low user friction. We also show the first instance of training generative language models at sub-8-bit weights, activations, and gradients with minimal accuracy loss and no modifications to the training recipe.

NADec 22, 2018
Computational multiscale methods for linear heterogeneous poroelasticity

Robert Altmann, Eric Chung, Roland Maier et al.

We consider a strongly heterogeneous medium saturated by an incompressible viscous fluid as it appears in geomechanical modeling. This poroelasticity problem suffers from rapidly oscillating material parameters, which calls for a thorough numerical treatment. In this paper, we propose a method based on the local orthogonal decomposition technique and motivated by a similar approach used for linear thermoelasticity. Therein, local corrector problems are constructed in line with the static equations, whereas we propose to consider the full system. This allows to benefit from the given saddle point structure and results in two decoupled corrector problems for the displacement and the pressure. We prove the optimal first-order convergence of this method and verify the result by numerical experiments.

NAJun 13, 2018
A Constraint energy minimizing generalized multiscale finite element method for parabolic equations

Mengnan Li, Eric Chung, Lijian Jiang

In this paper, we present a Constraint Energy Minimizing Generalized Multiscale Finite Element Method (CEM-GMsFEM) for parabolic equations with multiscale coefficients, arising from applications in porous media. We will present the construction of CEM-GMsFEM and rigorously analyze its convergence for the parabolic equations. The convergence rate is characterized by the coarse grid size and the eigenvalue decay of local spectral problems, but is independent of the scale length and contrast of the media. The analysis shows that the method has a first order convergence rate with respect to coarse grid size in the energy norm and second order convergence rate with respect to coarse grid size in $L^2$ norm under some appropriate assumptions. For the temporal discretization, finite difference techniques are used and the convergence analysis of full discrete scheme is given. Moreover, a posteriori error estimator is derived and analyzed. A few numerical results for porous media applications are presented to confirm the theoretical findings and demonstrate the performance of the approach.

NAMar 25, 2019
A local-global multiscale mortar mixed finite element method for multiphase transport in heterogeneous media

Shubin Fu, Eric Chung

In this paper, we propose a local-global multiscale mortar mixed finite element method (MMMFEM) for multiphase transport in heterogeneous media. We consider the two-phase flow system, the pressure equation is solved via the multiscale mortar mixed finite element method, a mass conservation velocity field can be obtained, then we use explicit finite volume method to solve the saturation equation. We use polynomials and multiscale basis to form the coarse mortar space. The multiscale basis is the restriction of global pressure field obtained at previous time step on the coarse interface. We solve the pressure equation on the fine grid to initialize the simulation. Numerical experiments on some benchmark 2D and 3D heterogeneous models are provided to validate the performance of our method.

NAJun 20, 2018
A mass conservative scheme for fluid-structure interaction problems by the staggered discontinuous Galerkin method

Siu Wun Cheung, Eric Chung, Hyea Hyun Kim

In this paper, we develop a new mass conservative numerical scheme for the simulations of a class of fluid-structure interaction problems. We will use the immersed boundary method to model the fluid-structure interaction, while the fluid flow is governed by the incompressible Navier-Stokes equations. The immersed boundary method is proven to be a successful scheme to model fluid-structure interactions. To ensure mass conservation, we will use the staggered discontinuous Galerkin method to discretize the incompressible Navier-Stokes equations. The staggered discontinuous Galerkin method is able to preserve the skew-symmetry of the convection term. In addition, by using a local postprocessing technique, the weakly divergence free velocity can be used to compute a new postprocessed velocity, which is exactly divergence free and has a superconvergence property. This strongly divergence free velocity field is the key to the mass conservation. Furthermore, energy stability is improved by the skew-symmetric discretization of the convection term. We will present several numerical results to show the performance of the method.

CLMay 2, 2025Code
Llama-Nemotron: Efficient Reasoning Models

Akhiad Bercovich, Itay Levy, Izik Golan et al. · nvidia

We introduce the Llama-Nemotron series of models, an open family of heterogeneous reasoning models that deliver exceptional reasoning capabilities, inference efficiency, and an open license for enterprise use. The family comes in three sizes -- Nano (8B), Super (49B), and Ultra (253B) -- and performs competitively with state-of-the-art reasoning models such as DeepSeek-R1 while offering superior inference throughput and memory efficiency. In this report, we discuss the training procedure for these models, which entails using neural architecture search from Llama 3 models for accelerated inference, knowledge distillation, and continued pretraining, followed by a reasoning-focused post-training stage consisting of two main parts: supervised fine-tuning and large scale reinforcement learning. Llama-Nemotron models are the first open-source models to support a dynamic reasoning toggle, allowing users to switch between standard chat and reasoning modes during inference. To further support open research and facilitate model development, we provide the following resources: 1. We release the Llama-Nemotron reasoning models -- LN-Nano, LN-Super, and LN-Ultra -- under the commercially permissive NVIDIA Open Model License Agreement. 2. We release the complete post-training dataset: Llama-Nemotron-Post-Training-Dataset. 3. We also release our training codebases: NeMo, NeMo-Aligner, and Megatron-LM.

96.2LGMar 19
SOL-ExecBench: Speed-of-Light Benchmarking for Real-World GPU Kernels Against Hardware Limits

Edward Lin, Sahil Modi, Siva Kumar Sastry Hari et al.

As agentic AI systems become increasingly capable of generating and optimizing GPU kernels, progress is constrained by benchmarks that reward speedup over software baselines rather than proximity to hardware-efficient execution. We present SOL-ExecBench, a benchmark of 235 CUDA kernel optimization problems extracted from 124 production and emerging AI models spanning language, diffusion, vision, audio, video, and hybrid architectures, targeting NVIDIA Blackwell GPUs. The benchmark covers forward and backward workloads across BF16, FP8, and NVFP4, including kernels whose best performance is expected to rely on Blackwell-specific capabilities. Unlike prior benchmarks that evaluate kernels primarily relative to software implementations, SOL-ExecBench measures performance against analytically derived Speed-of-Light (SOL) bounds computed by SOLAR, our pipeline for deriving hardware-grounded SOL bounds, yielding a fixed target for hardware-efficient optimization. We report a SOL Score that quantifies how much of the gap between a release-defined scoring baseline and the hardware SOL bound a candidate kernel closes. To support robust evaluation of agentic optimizers, we additionally provide a sandboxed harness with GPU clock locking, L2 cache clearing, isolated subprocess execution, and static analysis based checks against common reward-hacking strategies. SOL-ExecBench reframes GPU kernel benchmarking from beating a mutable software baseline to closing the remaining gap to hardware Speed-of-Light.

NAFeb 2, 2019
Parametric FEM for Shape Optimization applied to Golgi Stack

Xinshi Chen, Eric Chung

The thesis is about an application of the shape optimization to the morphological evolution of Golgi stack. Golgi stack consists of multiple layers of cisternae. It is an organelle in the biological cells. Inspired by the Helfrich Model \cite{Helfrich}, which is a model for vesicles typically applied to biological cells, a new model specially designed for Golgi stack is developed and then implemented using FEM in this thesis. In the Golgi model, each cisternae of the Golgi stack is viewed as a closed vesicle without topological changes, and our model is adaptable to both single-vesicle case and multiple-vesicle case. The main idea of the math model is to minimize the elastic energy(bending energy) of the vesicles, with some constraints designed regarding the biological properties of Golgi stack. With these constraints attached to the math model, we could extend this model to an obstacle-type problem. Hence, in the thesis, not only the simulations of Golgi stack are shown, but some interesting examples without biological meanings are also demonstrated. Also, as multiple cisternaes are considered as a whole, this is also a model handling multiple objects. A set of numerical examples is shown to compare with the observed shape of Golgi stack, so we can lay down some possible explanations to the morphological performance of trans-Golgi cisternae.

LGJan 27
Quantization-Aware Distillation for NVFP4 Inference Accuracy Recovery

Meng Xin, Sweta Priyadarshi, Jingyu Xin et al.

This technical report presents quantization-aware distillation (QAD) and our best practices for recovering accuracy of NVFP4-quantized large language models (LLMs) and vision-language models (VLMs). QAD distills a full-precision teacher model into a quantized student model using a KL divergence loss. While applying distillation to quantized models is not a new idea, we observe key advantages of QAD for today's LLMs: 1. It shows remarkable effectiveness and stability for models trained through multi-stage post-training pipelines, including supervised fine-tuning (SFT), reinforcement learning (RL), and model merging, where traditional quantization-aware training (QAT) suffers from engineering complexity and training instability; 2. It is robust to data quality and coverage, enabling accuracy recovery without full training data. We evaluate QAD across multiple post-trained models including AceReason Nemotron, Nemotron 3 Nano, Nemotron Nano V2, Nemotron Nano V2 VL (VLM), and Llama Nemotron Super v1, showing consistent recovery to near-BF16 accuracy.

LGNov 17, 2020
Multi-agent Reinforcement Learning Accelerated MCMC on Multiscale Inversion Problem

Eric Chung, Yalchin Efendiev, Wing Tat Leung et al.

In this work, we propose a multi-agent actor-critic reinforcement learning (RL) algorithm to accelerate the multi-level Monte Carlo Markov Chain (MCMC) sampling algorithms. The policies (actors) of the agents are used to generate the proposal in the MCMC steps; and the critic, which is centralized, is in charge of estimating the long term reward. We verify our proposed algorithm by solving an inverse problem with multiple scales. There are several difficulties in the implementation of this problem by using traditional MCMC sampling. Firstly, the computation of the posterior distribution involves evaluating the forward solver, which is very time consuming for a problem with heterogeneous. We hence propose to use the multi-level algorithm. More precisely, we use the generalized multiscale finite element method (GMsFEM) as the forward solver in evaluating a posterior distribution in the multi-level rejection procedure. Secondly, it is hard to find a function which can generate samplings which are meaningful. To solve this issue, we learn an RL policy as the proposal generator. Our experiments show that the proposed method significantly improves the sampling process

LGMar 29, 2019
MLSys: The New Frontier of Machine Learning Systems

Alexander Ratner, Dan Alistarh, Gustavo Alonso et al.

Machine learning (ML) techniques are enjoying rapidly increasing adoption. However, designing and implementing the systems that support ML models in real-world deployments remains a significant obstacle, in large part due to the radically different development and deployment profile of modern ML methods, and the range of practical concerns that come with broader adoption. We propose to foster a new systems machine learning research community at the intersection of the traditional systems and ML communities, focused on topics such as hardware systems for ML, software systems for ML, and ML optimized for metrics beyond predictive accuracy. To do this, we describe a new conference, MLSys, that explicitly targets research at the intersection of systems and machine learning with a program committee split evenly between experts in systems and ML, and an explicit focus on topics at the intersection of the two.

NAApr 27, 2019
Staggered discontinuous Galerkin methods for the Helmholtz equations with large wave number

Lina Zhao, Eun-Jae Park, Eric Chung

In this paper we investigate staggered discontinuous Galerkin method for the Helmholtz equation with large wave number on general quadrilateral and polygonal meshes. The method is highly flexible by allowing rough grids such as the trapezoidal grids and highly distorted grids, and at the same time, is numerical flux free. Furthermore, it allows hanging nodes, which can be simply treated as additional vertices. By exploiting a modified duality argument, the stability and convergence can be proved under the condition that $κh$ is sufficiently small, where $κ$ is the wave number and $h$ is the mesh size. Error estimates for both the scalar and vector variables in $L^2$ norm are established. Several numerical experiments are tested to verify our theoretical results and to present the capability of our method for capturing singular solutions.

NAApr 15, 2019
Generalized multiscale finite element method for the steady state linear Boltzmann equation

Eric Chung, Yalchin Efendiev, Yanbo Li et al.

The Boltzmann equation, as a model equation in statistical mechanics, is used to describe the statistical behavior of a large number of particles driven by the same physics laws. Depending on the media and the particles to be modeled, the equation has slightly different forms. In this article, we investigate a model Boltzmann equation with highly oscillatory media in the small Knudsen number regime, and study the numerical behavior of the Generalized Multi-scale Finite Element Method (GMsFEM) in the fluid regime when high oscillation in the media presents. The Generalized Multi-scale Finite Element Method (GMsFEM) is a general approach to numerically treat equations with multi-scale structures. The method is divided into the offline and online steps. In the offline step, basis functions are prepared from a snapshot space via a well-designed generalized eigenvalue problem (GEP), and these basis functions are then utilized to patch up for a solution through DG formulation in the online step to incorporate specific boundary and source information. We prove the wellposedness of the method on the Boltzmann equation, and show that the GEP formulation provides a set of optimal basis functions that achieve spectral convergence. Such convergence is independent of the oscillation in the media, or the smallness of the Knudsen number, making it one of the few methods that simultaneously achieve numerical homogenization and asymptotic preserving properties across all scales of oscillations and the Knudsen number.

NAMay 25, 2017
On overlapping domain decomposition methods for high-contrast multiscale problems

Juan Galvis, Eric Chung, Yalchin Efendiev et al.

We review some important ideas in the design and analysis of robust overlapping domain decomposition algorithms for high-contrast multiscale problems and propose a domain decomposition method better performance in terms of the number of iterations. The main novelty of our approaches is the construction of coarse spaces, which are computed using spectral information of local bilinear forms. We present several approaches to incorporate the spectral information into the coarse problem in order to obtain minimal coarse space dimension. We show that using these coarse spaces, we can obtain a domain decomposition preconditioner with the condition number independent of contrast and small scales. To minimize further the number of iterations until convergence, we use this minimal dimensional coarse spaces in a construction combining them with large overlap local problems that take advantage of the possibility of localizing global fields orthogonal to the coarse space. We obtain a condition number close to 1 for the new method. We discuss possible drawbacks and further extensions.

CRMay 27, 2015
DiscoverFriends: Secure Social Network Communication in Mobile Ad Hoc Networks

Joshua Joy, Eric Chung, Zengwen Yuan et al.

This paper presents a secure communication application called DiscoverFriends. Its purpose is to securely communicate to a group of online friends while bypassing their respective social networking servers under a mobile ad hoc network environment. DiscoverFriends leverages Bloom filters and a hybrid encryption technique with a self-organized public-key management scheme to securely identify friends and provide authentication. Additionally, DiscoverFriends enables anonymous location check-ins by utilizing a new cryptographic primitive called Function Secret Sharing. Finally, to the best of our knowledge, DiscoverFriends implements and evaluates the first Android multi-hop WiFi direct protocol using IPv6.

NAAug 3, 2015
Sparse Generalized Multiscale Finite Element Methods and their applications

Eric Chung, Yalchin Efendiev, Wing Tat Leung et al.

In a number of previous papers, local (coarse grid) multiscale model reduction techniques are developed using a Generalized Multiscale Finite Element Method. In these approaches, multiscale basis functions are constructed using local snapshot spaces, where a snapshot space is a large space that represents the solution behavior in a coarse block. In a number of applications (e.g., those discussed in the paper), one may have a sparsity in the snapshot space for an appropriate choice of a snapshot space. More precisely, the solution may only involve a portion of the snapshot space. In this case, one can use sparsity techniques to identify multiscale basis functions. In this paper, we consider two such sparse local multiscale model reduction approaches. In the first approach (which is used for parameter-dependent multiscale PDEs), we use local minimization techniques, such as sparse POD, to identify multiscale basis functions, which are sparse in the snapshot space. These minimization techniques use $l_1$ minimization to find local multiscale basis functions, which are further used for finding the solution. In the second approach (which is used for the Helmholtz equation), we directly apply $l_1$ minimization techniques to solve the underlying PDEs. This approach is more expensive as it involves a large snapshot space; however, in this example, we can not identify a local minimization principle, such as local generalized SVD.