Marco Pistoia

QUANT-PH
h-index68
22papers
489citations
Novelty52%
AI Score50

22 Papers

QUANT-PHMar 29, 2023
Quantum Deep Hedging

El Amine Cherrat, Snehal Raj, Iordanis Kerenidis et al.

Quantum machine learning has the potential for a transformative impact across industry sectors and in particular in finance. In our work we look at the problem of hedging where deep reinforcement learning offers a powerful framework for real markets. We develop quantum reinforcement learning methods based on policy-search and distributional actor-critic algorithms that use quantum neural network architectures with orthogonal and compound layers for the policy and value functions. We prove that the quantum neural networks we use are trainable, and we perform extensive simulations that show that quantum models can reduce the number of trainable parameters while achieving comparable performance and that the distributional approach obtains better performance than other standard approaches, both classical and quantum. We successfully implement the proposed models on a trapped-ion quantum processor, utilizing circuits with up to $16$ qubits, and observe performance that agrees well with noiseless simulation. Our quantum techniques are general and can be applied to other reinforcement learning problems beyond hedging.

LGOct 18, 2022
MaSS: Multi-attribute Selective Suppression

Chun-Fu Chen, Shaohan Hu, Zhonghao Shi et al.

The recent rapid advances in machine learning technologies largely depend on the vast richness of data available today, in terms of both the quantity and the rich content contained within. For example, biometric data such as images and voices could reveal people's attributes like age, gender, sentiment, and origin, whereas location/motion data could be used to infer people's activity levels, transportation modes, and life habits. Along with the new services and applications enabled by such technological advances, various governmental policies are put in place to regulate such data usage and protect people's privacy and rights. As a result, data owners often opt for simple data obfuscation (e.g., blur people's faces in images) or withholding data altogether, which leads to severe data quality degradation and greatly limits the data's potential utility. Aiming for a sophisticated mechanism which gives data owners fine-grained control while retaining the maximal degree of data utility, we propose Multi-attribute Selective Suppression, or MaSS, a general framework for performing precisely targeted data surgery to simultaneously suppress any selected set of attributes while preserving the rest for downstream machine learning tasks. MaSS learns a data modifier through adversarial games between two sets of networks, where one is aimed at suppressing selected attributes, and the other ensures the retention of the rest of the attributes via general contrastive loss as well as explicit classification metrics. We carried out an extensive evaluation of our proposed method using multiple datasets from different domains including facial images, voice audio, and video clips, and obtained promising results in MaSS' generalizability and capability of suppressing targeted attributes without negatively affecting the data's usability in other downstream ML tasks.

QUANT-PHNov 29, 2022
Numerical evidence against advantage with quantum fidelity kernels on classical data

Lucas Slattery, Ruslan Shaydulin, Shouvanik Chakrabarti et al.

Quantum machine learning techniques are commonly considered one of the most promising candidates for demonstrating practical quantum advantage. In particular, quantum kernel methods have been demonstrated to be able to learn certain classically intractable functions efficiently if the kernel is well-aligned with the target function. In the more general case, quantum kernels are known to suffer from exponential "flattening" of the spectrum as the number of qubits grows, preventing generalization and necessitating the control of the inductive bias by hyperparameters. We show that the general-purpose hyperparameter tuning techniques proposed to improve the generalization of quantum kernels lead to the kernel becoming well-approximated by a classical kernel, removing the possibility of quantum advantage. We provide extensive numerical evidence for this phenomenon utilizing multiple previously studied quantum feature maps and both synthetic and real data. Our results show that unless novel techniques are developed to control the inductive bias of quantum kernels, they are unlikely to provide a quantum advantage on classical data.

36.7QUANT-PHMay 4
Iterative Interpolation Schedules for Quantum Approximate Optimization Algorithm

Anuj Apte, Shree Hari Sureshbabu, Ruslan Shaydulin et al.

Quantum Approximate Optimization Algorithm (QAOA) is a promising quantum heuristic with empirical evidence of speedup over classical state-of-the-art for some problems. QAOA uses a parameterized circuit with $p$ layers, where higher $p$ yields better solutions, but requires optimizing $2p$ independent parameters, which is challenging at large $p$. We present an iterative interpolation method that exploits the smoothness of optimal parameter schedules by expressing them in a basis of orthogonal functions, generalizing the work of Zhou et al. By optimizing a small number of basis coefficients and iteratively increasing both circuit depth and coefficient count until convergence, our method constructs high-quality schedules for large $p$. We provide theoretical justification using Jackson's theorem and Lipschitz continuity to bound the required number of basis coefficients for a given accuracy. Our approach achieves better performance with fewer optimization steps than existing methods across three benchmark problems: the Sherrington-Kirkpatrick (SK) model, portfolio optimization, and Low Autocorrelation Binary Sequences (LABS). For the largest LABS instance, we achieve near-optimal merit factors with schedules exceeding 1000 layers, an order of magnitude beyond previous methods. Additionally, we observe that a mild growth in QAOA depth suffices to solve the SK model exactly, a result of independent theoretical interest.

93.9QUANT-PHApr 29
Iceberg Beyond the Tip: Co-Compilation of a Quantum Error Detection Code and a Quantum Algorithm

Yuwei Jin, Zichang He, Tianyi Hao et al.

The rapid progress in quantum hardware is expected to make them viable tools for the study of quantum algorithms in the near term. The timeline to useful algorithmic experimentation can be accelerated by techniques that use many noisy shots to produce an accurate estimate of the observable of interest. One such technique is to encode the quantum circuit using an error detection code and discard the samples for which an error has been detected. An underexplored property of error-detecting codes is the flexibility in the circuit encoding and fault-tolerant gadgets, which enables their co-optimization with the algorthmic circuit. However, standard circuit optimization tools cannot be used to exploit this flexibility as optimization must preserve the fault-tolerance of the gadget. In this work, we focus on the $[[k+2, k, 2]]$ Iceberg quantum error detection code, which is tailored to trapped-ion quantum processors. We design new flexible fault-tolerant gadgets for the Iceberg code, which we then co-optimize with the algorithmic circuit for the quantum approximate optimization algorithm (QAOA) using tree search. By co-optimizing the QAOA circuit and the Iceberg gadgets, we achieve an improvement in QAOA success probability from $44\%$ to $65\%$ and an increase in post-selection rate from $4\%$ to $33\%$ at 22 algorithmic qubits, utilizing 330 algorithmic two-qubit gates and 744 physical two-qubit gates on the Quantinuum H2-1 quantum computer, compared to the previous state-of-the-art hardware demonstration. Furthermore, we demonstrate better-than-unencoded performance for up to 34 algorithmic qubits, employing 510 algorithmic two-qubit gates and 1140 physical two-qubit gates.

QUANT-PHOct 19, 2023
Blind quantum machine learning with quantum bipartite correlator

Changhao Li, Boning Li, Omar Amer et al.

Distributed quantum computing is a promising computational paradigm for performing computations that are beyond the reach of individual quantum devices. Privacy in distributed quantum computing is critical for maintaining confidentiality and protecting the data in the presence of untrusted computing nodes. In this work, we introduce novel blind quantum machine learning protocols based on the quantum bipartite correlator algorithm. Our protocols have reduced communication overhead while preserving the privacy of data from untrusted parties. We introduce robust algorithm-specific privacy-preserving mechanisms with low computational overhead that do not require complex cryptographic techniques. We then validate the effectiveness of the proposed protocols through complexity and privacy analysis. Our findings pave the way for advancements in distributed quantum computing, opening up new possibilities for privacy-aware machine learning applications in the era of quantum technologies.

QUANT-PHSep 22, 2023
Expressive variational quantum circuits provide inherent privacy in federated learning

Niraj Kumar, Jamie Heredge, Changhao Li et al.

Federated learning has emerged as a viable distributed solution to train machine learning models without the actual need to share data with the central aggregator. However, standard neural network-based federated learning models have been shown to be susceptible to data leakage from the gradients shared with the server. In this work, we introduce federated learning with variational quantum circuit model built using expressive encoding maps coupled with overparameterized ansätze. We show that expressive maps lead to inherent privacy against gradient inversion attacks, while overparameterization ensures model trainability. Our privacy framework centers on the complexity of solving the system of high-degree multivariate Chebyshev polynomials generated by the gradients of quantum circuit. We present compelling arguments highlighting the inherent difficulty in solving these equations, both in exact and approximate scenarios. Additionally, we delve into machine learning-based attack strategies and establish a direct connection between overparameterization in the original federated learning model and underparameterization in the attack model. Furthermore, we provide numerical scaling arguments showcasing that underparameterization of the expressive map in the attack model leads to the loss landscape being swamped with exponentially many spurious local minima points, thus making it extremely hard to realize a successful attack. This provides a strong claim, for the first time, that the nature of quantum machine learning models inherently helps prevent data leakage in federated learning.

QUANT-PHSep 18, 2023
Des-q: a quantum algorithm to provably speedup retraining of decision trees

Niraj Kumar, Romina Yalovetzky, Changhao Li et al.

Decision trees are widely adopted machine learning models due to their simplicity and explainability. However, as training data size grows, standard methods become increasingly slow, scaling polynomially with the number of training examples. In this work, we introduce Des-q, a novel quantum algorithm to construct and retrain decision trees for regression and binary classification tasks. Assuming the data stream produces small, periodic increments of new training examples, Des-q significantly reduces the tree retraining time. Des-q achieves a logarithmic complexity in the combined total number of old and new examples, even accounting for the time needed to load the new samples into quantum-accessible memory. Our approach to grow the tree from any given node involves performing piecewise linear splits to generate multiple hyperplanes, thus partitioning the input feature space into distinct regions. To determine the suitable anchor points for these splits, we develop an efficient quantum-supervised clustering method, building upon the q-means algorithm introduced by Kerenidis et al. We benchmark the simulated version of Des-q against the state-of-the-art classical methods on multiple data sets and observe that our algorithm exhibits similar performance to the state-of-the-art decision trees while significantly speeding up the periodic tree retraining.

LGMay 23, 2024Code
MaSS: Multi-attribute Selective Suppression for Utility-preserving Data Transformation from an Information-theoretic Perspective

Yizhuo Chen, Chun-Fu Chen, Hsiang Hsu et al.

The growing richness of large-scale datasets has been crucial in driving the rapid advancement and wide adoption of machine learning technologies. The massive collection and usage of data, however, pose an increasing risk for people's private and sensitive information due to either inadvertent mishandling or malicious exploitation. Besides legislative solutions, many technical approaches have been proposed towards data privacy protection. However, they bear various limitations such as leading to degraded data availability and utility, or relying on heuristics and lacking solid theoretical bases. To overcome these limitations, we propose a formal information-theoretic definition for this utility-preserving privacy protection problem, and design a data-driven learnable data transformation framework that is capable of selectively suppressing sensitive attributes from target datasets while preserving the other useful attributes, regardless of whether or not they are known in advance or explicitly annotated for preservation. We provide rigorous theoretical analyses on the operational bounds for our framework, and carry out comprehensive experimental evaluations using datasets of a variety of modalities, including facial images, voice audio clips, and human activity motion sensor signals. Results demonstrate the effectiveness and generalizability of our method under various configurations on a multitude of tasks. Our code is available at https://github.com/jpmorganchase/MaSS.

CVDec 2, 2019Code
More Is Less: Learning Efficient Video Representations by Big-Little Network and Depthwise Temporal Aggregation

Quanfu Fan, Chun-Fu Chen, Hilde Kuehne et al.

Current state-of-the-art models for video action recognition are mostly based on expensive 3D ConvNets. This results in a need for large GPU clusters to train and evaluate such architectures. To address this problem, we present a lightweight and memory-friendly architecture for action recognition that performs on par with or better than current architectures by using only a fraction of resources. The proposed architecture is based on a combination of a deep subnet operating on low-resolution frames with a compact subnet operating on high-resolution frames, allowing for high efficiency and accuracy at the same time. We demonstrate that our approach achieves a reduction by $3\sim4$ times in FLOPs and $\sim2$ times in memory usage compared to the baseline. This enables training deeper models with more input frames under the same computational budget. To further obviate the need for large-scale 3D convolutions, a temporal aggregation module is proposed to model temporal dependencies in a video at very small additional computational costs. Our models achieve strong performance on several action recognition benchmarks including Kinetics, Something-Something and Moments-in-time. The code and models are available at https://github.com/IBM/bLVNet-TAM.

QUANT-PHDec 7, 2023
Privacy-preserving quantum federated learning via gradient hiding

Changhao Li, Niraj Kumar, Zhixin Song et al.

Distributed quantum computing, particularly distributed quantum machine learning, has gained substantial prominence for its capacity to harness the collective power of distributed quantum resources, transcending the limitations of individual quantum nodes. Meanwhile, the critical concern of privacy within distributed computing protocols remains a significant challenge, particularly in standard classical federated learning (FL) scenarios where data of participating clients is susceptible to leakage via gradient inversion attacks by the server. This paper presents innovative quantum protocols with quantum communication designed to address the FL problem, strengthen privacy measures, and optimize communication efficiency. In contrast to previous works that leverage expressive variational quantum circuits or differential privacy techniques, we consider gradient information concealment using quantum states and propose two distinct FL protocols, one based on private inner-product estimation and the other on incremental learning. These protocols offer substantial advancements in privacy preservation with low communication resources, forging a path toward efficient quantum communication-assisted FL protocols and contributing to the development of secure distributed quantum machine learning, thus addressing critical privacy concerns in the quantum computing era.

QUANT-PHMay 14, 2024
Prospects of Privacy Advantage in Quantum Machine Learning

Jamie Heredge, Niraj Kumar, Dylan Herman et al.

Ensuring data privacy in machine learning models is critical, particularly in distributed settings where model gradients are typically shared among multiple parties to allow collaborative learning. Motivated by the increasing success of recovering input data from the gradients of classical models, this study addresses a central question: How hard is it to recover the input data from the gradients of quantum machine learning models? Focusing on variational quantum circuits (VQC) as learning models, we uncover the crucial role played by the dynamical Lie algebra (DLA) of the VQC ansatz in determining privacy vulnerabilities. While the DLA has previously been linked to the classical simulatability and trainability of VQC models, this work, for the first time, establishes its connection to the privacy of VQC models. In particular, we show that properties conducive to the trainability of VQCs, such as a polynomial-sized DLA, also facilitate the extraction of detailed snapshots of the input. We term this a weak privacy breach, as the snapshots enable training VQC models for distinct learning tasks without direct access to the original input. Further, we investigate the conditions for a strong privacy breach where the original input data can be recovered from these snapshots by classical or quantum-assisted polynomial time methods. We establish conditions on the encoding map such as classical simulatability, overlap with DLA basis, and its Fourier frequency characteristics that enable such a privacy breach of VQC models. Our findings thus play a crucial role in detailing the prospects of quantum privacy advantage by guiding the requirements for designing quantum machine learning models that balance trainability with robust privacy protection.

LGMay 23, 2024
Model-Agnostic Utility-Preserving Biometric Information Anonymization

Chun-Fu Chen, Bill Moriarty, Shaohan Hu et al.

The recent rapid advancements in both sensing and machine learning technologies have given rise to the universal collection and utilization of people's biometrics, such as fingerprints, voices, retina/facial scans, or gait/motion/gestures data, enabling a wide range of applications including authentication, health monitoring, or much more sophisticated analytics. While providing better user experiences and deeper business insights, the use of biometrics has raised serious privacy concerns due to their intrinsic sensitive nature and the accompanying high risk of leaking sensitive information such as identity or medical conditions. In this paper, we propose a novel modality-agnostic data transformation framework that is capable of anonymizing biometric data by suppressing its sensitive attributes and retaining features relevant to downstream machine learning-based analyses that are of research and business values. We carried out a thorough experimental evaluation using publicly available facial, voice, and motion datasets. Results show that our proposed framework can achieve a \highlight{high suppression level for sensitive information}, while at the same time retain underlying data utility such that subsequent analyses on the anonymized biometric data could still be carried out to yield satisfactory accuracy.

QUANT-PHApr 29, 2025
Provably faster randomized and quantum algorithms for $k$-means clustering via uniform sampling

Tyler Chen, Archan Ray, Akshay Seshadri et al.

The $k$-means algorithm (Lloyd's algorithm) is a widely used method for clustering unlabeled data. A key bottleneck of the $k$-means algorithm is that each iteration requires time linear in the number of data points, which can be expensive in big data applications. This was improved in recent works proposing quantum and quantum-inspired classical algorithms to approximate the $k$-means algorithm locally, in time depending only logarithmically on the number of data points (along with data dependent parameters) [q-means: A quantum algorithm for unsupervised machine learning, Kerenidis, Landman, Luongo, and Prakash, NeurIPS 2019; Do you know what $q$-means?, Cornelissen, Doriguello, Luongo, Tang, QTML 2025]. In this work, we describe a simple randomized mini-batch $k$-means algorithm and a quantum algorithm inspired by the classical algorithm. We demonstrate that the worst case guarantees of these algorithms can significantly improve upon the bounds for algorithms in prior work. Our improvements are due to a careful use of uniform sampling, which preserves certain symmetries of the $k$-means problem that are not preserved in previous algorithms that use data norm-based sampling.

LGApr 17, 2025
A Numerical Gradient Inversion Attack in Variational Quantum Neural-Networks

Georgios Papadopoulos, Shaltiel Eloul, Yash Satsangi et al.

The loss landscape of Variational Quantum Neural Networks (VQNNs) is characterized by local minima that grow exponentially with increasing qubits. Because of this, it is more challenging to recover information from model gradients during training compared to classical Neural Networks (NNs). In this paper we present a numerical scheme that successfully reconstructs input training, real-world, practical data from trainable VQNNs' gradients. Our scheme is based on gradient inversion that works by combining gradients estimation with the finite difference method and adaptive low-pass filtering. The scheme is further optimized with Kalman filter to obtain efficient convergence. Our experiments show that our algorithm can invert even batch-trained data, given the VQNN model is sufficiently over-parameterized.

LGJun 10, 2025
MetaTT: A Global Tensor-Train Adapter for Parameter-Efficient Fine-Tuning

Javier Lopez-Piqueres, Pranav Deshpande, Archan Ray et al.

We present MetaTT, a Tensor Train (TT) adapter framework for fine-tuning of pre-trained transformers. MetaTT enables flexible and parameter-efficient model adaptation by using a single shared TT to factorize transformer sub-modules. This factorization indexes key structural dimensions, including layer and matrix type, and can optionally incorporate heads and tasks. This design allows MetaTT's parameter count to scale with the sum, rather than the product, of the modes, resulting in a substantially more compact adapter. Our benchmarks compare MetaTT with LoRA along with recent state-of-the-art matrix and tensor decomposition based fine-tuning methods. We observe that when tested on single-task standard language modeling benchmarks, MetaTT achieves competitive parameter efficiency to accuracy tradeoff. We further demonstrate that MetaTT performs competitively when compared to state-of-the-art methods on multi-task learning. Finally, we leverage the TT-ansatz to design a rank adaptive optimizer inspired by the DMRG method from many-body physics. Our results demonstrate that integrating this approach with AdamW enhances optimization performance for a specified target rank.

LGJun 5, 2025
A Unified Framework for Provably Efficient Algorithms to Estimate Shapley Values

Tyler Chen, Akshay Seshadri, Mattia J. Villani et al.

Shapley values have emerged as a critical tool for explaining which features impact the decisions made by machine learning models. However, computing exact Shapley values is difficult, generally requiring an exponential (in the feature dimension) number of model evaluations. To address this, many model-agnostic randomized estimators have been developed, the most influential and widely used being the KernelSHAP method (Lundberg & Lee, 2017). While related estimators such as unbiased KernelSHAP (Covert & Lee, 2021) and LeverageSHAP (Musco & Witter, 2025) are known to satisfy theoretical guarantees, bounds for KernelSHAP have remained elusive. We describe a broad and unified framework that encompasses KernelSHAP and related estimators constructed using both with and without replacement sampling strategies. We then prove strong non-asymptotic theoretical guarantees that apply to all estimators from our framework. This provides, to the best of our knowledge, the first theoretical guarantees for KernelSHAP and sheds further light on tradeoffs between existing estimators. Through comprehensive benchmarking on small and medium dimensional datasets for Decision-Tree models, we validate our approach against exact Shapley values, consistently achieving low mean squared error with modest sample sizes. Furthermore, we make specific implementation improvements to enable scalability of our methods to high-dimensional datasets. Our methods, tested on datasets such MNIST and CIFAR10, provide consistently better results compared to the KernelSHAP library.

QUANT-PHJun 17, 2024
QC-Forest: a Classical-Quantum Algorithm to Provably Speedup Retraining of Random Forest

Romina Yalovetzky, Niraj Kumar, Changhao Li et al.

Random Forest (RF) is a popular tree-ensemble method for supervised learning, prized for its ease of use and flexibility. Online RF models require to account for new training data to maintain model accuracy. This is particularly important in applications where data is periodically and sequentially generated over time in data streams, such as auto-driving systems, and credit card payments. In this setting, performing periodic model retraining with the old and new data accumulated is beneficial as it fully captures possible drifts in the data distribution over time. However, this is unpractical with state-of-the-art classical algorithms for RF as they scale linearly with the accumulated number of samples. We propose QC-Forest, a classical-quantum algorithm designed to time-efficiently retrain RF models in the streaming setting for multi-class classification and regression, achieving a runtime poly-logarithmic in the total number of accumulated samples. QC-Forest leverages Des-q, a quantum algorithm for single tree construction and retraining proposed by Kumar et al. by expanding to multi-class classification, as the original proposal was limited to binary classes, and introducing an exact classical method to replace an underlying quantum subroutine incurring a finite error, while maintaining the same poly-logarithmic dependence. Finally, we showcase that QC-Forest achieves competitive accuracy in comparison to state-of-the-art RF methods on widely used benchmark datasets with up to 80,000 samples, while significantly speeding up the model retrain.

QUANT-PHFeb 15, 2022
Paving the Way towards 800 Gbps Quantum-Secured Optical Channel Deployment in Mission-Critical Environments

Marco Pistoia, Omar Amer, Monik R. Behera et al.

This article describes experimental research studies conducted towards understanding the implementation aspects of high-capacity quantum-secured optical channels in mission-critical metro-scale operational environments using Quantum Key Distribution (QKD) technology. To the best of our knowledge, this is the first time that an 800 Gbps quantum-secured optical channel -- along with several other Dense Wavelength Division Multiplexed (DWDM) channels on the C-band and multiplexed with the QKD channel on the O-band -- was established at distances up to 100 km, with secret key-rates relevant for practical industry use cases. In addition, during the course of these trials, transporting a blockchain application over this established channel was utilized as a demonstration of securing a financial transaction in transit over a quantum-secured optical channel. The findings of this research pave the way towards the deployment of QKD-secured optical channels in high-capacity, metro-scale, mission-critical operational environments, such as Inter-Data Center Interconnects.

QUANT-PHSep 9, 2021
Quantum Machine Learning for Finance

Marco Pistoia, Syed Farhan Ahmad, Akshay Ajagekar et al.

Quantum computers are expected to surpass the computational capabilities of classical computers during this decade, and achieve disruptive impact on numerous industry sectors, particularly finance. In fact, finance is estimated to be the first industry sector to benefit from Quantum Computing not only in the medium and long terms, but even in the short term. This review paper presents the state of the art of quantum algorithms for financial applications, with particular focus to those use cases that can be solved via Machine Learning.

QUANT-PHOct 21, 2019
A Domain-agnostic, Noise-resistant, Hardware-efficient Evolutionary Variational Quantum Eigensolver

Arthur G. Rattew, Shaohan Hu, Marco Pistoia et al.

Variational quantum algorithms have shown promise in numerous fields due to their versatility in solving problems of scientific and commercial interest. However, leading algorithms for Hamiltonian simulation, such as the Variational Quantum Eigensolver (VQE), use fixed preconstructed ansatzes, limiting their general applicability and accuracy. Thus, variational forms---the quantum circuits that implement ansatzes ---are either crafted heuristically or by encoding domain-specific knowledge. In this paper, we present an Evolutionary Variational Quantum Eigensolver (EVQE), a novel variational algorithm that uses evolutionary programming techniques to minimize the expectation value of a given Hamiltonian by dynamically generating and optimizing an ansatz. The algorithm is equally applicable to optimization problems in all domains, obtaining accurate energy evaluations with hardware-efficient ansatzes. In molecular simulations, the variational forms generated by EVQE are up to $18.6\times$ shallower and use up to $12\times$ fewer CX gates than those obtained by VQE with a unitary coupled cluster ansatz. EVQE demonstrates significant noise-resistance properties, obtaining results in noisy simulation with at least $3.6\times$ less error than VQE using any tested ansatz configuration. We successfully evaluated EVQE on a real 5-qubit IBMQ quantum computer. The experimental results, which we obtained both via simulation and on real quantum hardware, demonstrate the effectiveness of EVQE for general-purpose optimization on the quantum computers of the present and near future.

CVAug 7, 2018
Efficient Fusion of Sparse and Complementary Convolutions

Chun-Fu Chen, Quanfu Fan, Marco Pistoia et al.

We propose a new method to create compact convolutional neural networks (CNNs) by exploiting sparse convolutions. Different from previous works that learn sparsity in models, we directly employ hand-crafted kernels with regular sparse patterns, which result in the computational gain in practice without sophisticated and dedicated software or hardware. The core of our approach is an efficient network module that linearly combines sparse kernels to yield feature representations as strong as those from regular kernels. We integrate this module into various network architectures and demonstrate its effectiveness on three vision tasks, object classification, localization and detection. For object classification and localization, our approach achieves comparable or better performance than several baselines and related works while providing lower computational costs with fewer parameters (on average, a $2-4\times$ reduction of convolutional parameters and computation). For object detection, our approach leads to a VGG-16-based Faster RCNN detector that is 12.4$\times$ smaller and about 3$\times$ faster than the baseline.