Ming Yan

h-index24

25papers

1,318citations

Novelty53%

AI Score47

Ranked #33,525 of 194,257 authors (top 17%)#7,909 in LG (top 20%)

25 Papers

5.2OCMay 16, 2017

Fast L1-L2 minimization via a proximal operator

Yifei Lou, Ming Yan

This paper aims to develop new and fast algorithms for recovering a sparse vector from a small number of measurements, which is a fundamental problem in the field of compressive sensing (CS). Currently, CS favors incoherent systems, in which any two measurements are as little correlated as possible. In reality, however, many problems are coherent, and conventional methods such as $L_1$ minimization do not work well. Recently, the difference of the $L_1$ and $L_2$ norms, denoted as $L_1$-$L_2$, is shown to have superior performance over the classic $L_1$ method, but it is computationally expensive. We derive an analytical solution for the proximal operator of the $L_1$-$L_2$ metric, and it makes some fast $L_1$ solvers such as forward-backward splitting (FBS) and alternating direction method of multipliers (ADMM) applicable for $L_1$-$L_2$. We describe in details how to incorporate the proximal operator into FBS and ADMM and show that the resulting algorithms are convergent under mild conditions. Both algorithms are shown to be much more efficient than the original implementation of $L_1$-$L_2$ based on a difference-of-convex approach in the numerical experiments.

6.5CVSep 16, 2024

SimInversion: A Simple Framework for Inversion-Based Text-to-Image Editing

Qi Qian, Haiyang Xu, Ming Yan et al.

Diffusion models demonstrate impressive image generation performance with text guidance. Inspired by the learning process of diffusion, existing images can be edited according to text by DDIM inversion. However, the vanilla DDIM inversion is not optimized for classifier-free guidance and the accumulated error will result in the undesired performance. While many algorithms are developed to improve the framework of DDIM inversion for editing, in this work, we investigate the approximation error in DDIM inversion and propose to disentangle the guidance scale for the source and target branches to reduce the error while keeping the original framework. Moreover, a better guidance scale (i.e., 0.5) than default settings can be derived theoretically. Experiments on PIE-Bench show that our proposal can improve the performance of DDIM inversion dramatically without sacrificing efficiency.

2.4AIFeb 6

An Adaptive Differentially Private Federated Learning Framework with Bi-level Optimization

Jin Wang, Hui Ma, Fei Xing et al.

Federated learning enables collaborative model training across distributed clients while preserving data privacy. However, in practical deployments, device heterogeneity, non-independent, and identically distributed (Non-IID) data often lead to highly unstable and biased gradient updates. When differential privacy is enforced, conventional fixed gradient clipping and Gaussian noise injection may further amplify gradient perturbations, resulting in training oscillation and performance degradation and degraded model performance. To address these challenges, we propose an adaptive differentially private federated learning framework that explicitly targets model efficiency under heterogeneous and privacy-constrained settings. On the client side, a lightweight local compressed module is introduced to regularize intermediate representations and constrain gradient variability, thereby mitigating noise amplification during local optimization. On the server side, an adaptive gradient clipping strategy dynamically adjusts clipping thresholds based on historical update statistics to avoid over-clipping and noise domination. Furthermore, a constraint-aware aggregation mechanism is designed to suppress unreliable or noise-dominated client updates and stabilize global optimization. Extensive experiments on CIFAR-10 and SVHN demonstrate improved convergence stability and classification accuracy.

27.8LGJul 5, 2021Code

Elastic Graph Neural Networks

Xiaorui Liu, Wei Jin, Yao Ma et al.

While many existing graph neural networks (GNNs) have been proven to perform $\ell_2$-based graph smoothing that enforces smoothness globally, in this work we aim to further enhance the local smoothness adaptivity of GNNs via $\ell_1$-based graph smoothing. As a result, we introduce a family of GNNs (Elastic GNNs) based on $\ell_1$ and $\ell_2$-based graph smoothing. In particular, we propose a novel and general message passing scheme into GNNs. This message passing algorithm is not only friendly to back-propagation training but also achieves the desired smoothing properties with a theoretical convergence guarantee. Experiments on semi-supervised learning tasks demonstrate that the proposed Elastic GNNs obtain better adaptivity on benchmark datasets and are significantly robust to graph adversarial attacks. The implementation of Elastic GNNs is available at \url{https://github.com/lxiaorui/ElasticGNN}.

14.4LGJun 24, 2025

Convolution-weighting method for the physics-informed neural network: A Primal-Dual Optimization Perspective

Chenhao Si, Ming Yan

Physics-informed neural networks (PINNs) are extensively employed to solve partial differential equations (PDEs) by ensuring that the outputs and gradients of deep learning models adhere to the governing equations. However, constrained by computational limitations, PINNs are typically optimized using a finite set of points, which poses significant challenges in guaranteeing their convergence and accuracy. In this study, we proposed a new weighting scheme that will adaptively change the weights to the loss functions from isolated points to their continuous neighborhood regions. The empirical results show that our weighting scheme can reduce the relative $L^2$ errors to a lower value.

9.2LGAug 10, 2021

Decentralized Composite Optimization with Compression

Yao Li, Xiaorui Liu, Jiliang Tang et al.

Decentralized optimization and communication compression have exhibited their great potential in accelerating distributed machine learning by mitigating the communication bottleneck in practice. While existing decentralized algorithms with communication compression mostly focus on the problems with only smooth components, we study the decentralized stochastic composite optimization problem with a potentially non-smooth component. A \underline{Prox}imal gradient \underline{L}in\underline{EA}r convergent \underline{D}ecentralized algorithm with compression, Prox-LEAD, is proposed with rigorous theoretical analyses in the general stochastic setting and the finite-sum setting. Our theorems indicate that Prox-LEAD works with arbitrary compression precision, and it tremendously reduces the communication cost almost for free. The superiorities of the proposed algorithms are demonstrated through the comparison with state-of-the-art algorithms in terms of convergence complexities and numerical experiments. Our algorithmic framework also generally enlightens the compressed communication on other primal-dual algorithms by reducing the impact of inexact iterations, which might be of independent interest.

4.0OCJul 26, 2021

Provably Accelerated Decentralized Gradient Method Over Unbalanced Directed Graphs

Zhuoqing Song, Lei Shi, Shi Pu et al.

We consider the decentralized optimization problem, where a network of $n$ agents aims to collaboratively minimize the average of their individual smooth and convex objective functions through peer-to-peer communication in a directed graph. To tackle this problem, we propose two accelerated gradient tracking methods, namely APD and APD-SC, for non-strongly convex and strongly convex objective functions, respectively. We show that APD and APD-SC converge at the rates $O\left(\frac{1}{k^2}\right)$ and $O\left(\left(1 - C\sqrt{\fracμ{L}}\right)^k\right)$, respectively, up to constant factors depending only on the mixing matrix. APD and APD-SC are the first decentralized methods over unbalanced directed graphs that achieve the same provable acceleration as centralized methods. Numerical experiments demonstrate the effectiveness of both methods.

13.6OCJun 14, 2021

Compressed Gradient Tracking for Decentralized Optimization Over General Directed Networks

Zhuoqing Song, Lei Shi, Shi Pu et al.

In this paper, we propose two communication efficient decentralized optimization algorithms over a general directed multi-agent network. The first algorithm, termed Compressed Push-Pull (CPP), combines the gradient tracking Push-Pull method with communication compression. We show that CPP is applicable to a general class of unbiased compression operators and achieves linear convergence rate for strongly convex and smooth objective functions. The second algorithm is a broadcast-like version of CPP (B-CPP), and it also achieves linear convergence rate under the same conditions on the objective functions. B-CPP can be applied in an asynchronous broadcast setting and further reduce communication costs compared to CPP. Numerical experiments complement the theoretical analysis and confirm the effectiveness of the proposed methods.

2.3NAAug 18, 2020Code

Fast algorithms for robust principal component analysis with an upper bound on the rank

Ningyu Sha, Lei Shi, Ming Yan

The robust principal component analysis (RPCA) decomposes a data matrix into a low-rank part and a sparse part. There are mainly two types of algorithms for RPCA. The first type of algorithm applies regularization terms on the singular values of a matrix to obtain a low-rank matrix. However, calculating singular values can be very expensive for large matrices. The second type of algorithm replaces the low-rank matrix as the multiplication of two small matrices. They are faster than the first type because no singular value decomposition (SVD) is required. However, the rank of the low-rank matrix is required, and an accurate rank estimation is needed to obtain a reasonable solution. In this paper, we propose algorithms that combine both types. Our proposed algorithms require an upper bound of the rank and SVD on small matrices. First, they are faster than the first type because the cost of SVD on small matrices is negligible. Second, they are more robust than the second type because an upper bound of the rank instead of the exact rank is required. Furthermore, we apply the Gauss-Newton method to increase the speed of our algorithms. Numerical experiments show the better performance of our proposed algorithms.

15.3LGJul 1, 2020

Linear Convergent Decentralized Optimization with Compression

Xiaorui Liu, Yao Li, Rongrong Wang et al.

Communication compression has become a key strategy to speed up distributed optimization. However, existing decentralized algorithms with compression mainly focus on compressing DGD-type algorithms. They are unsatisfactory in terms of convergence rate, stability, and the capability to handle heterogeneous data. Motivated by primal-dual algorithms, this paper proposes the first \underline{L}in\underline{EA}r convergent \underline{D}ecentralized algorithm with compression, LEAD. Our theory describes the coupled dynamics of the inexact primal and dual update as well as compression error, and we provide the first consensus error bound in such settings without assuming bounded gradients. Experiments on convex problems validate our theoretical analysis, and empirical study on deep neural nets shows that LEAD is applicable to non-convex problems.

1.8LGNov 10, 2019

Manifold Denoising by Nonlinear Robust Principal Component Analysis

He Lyu, Ningyu Sha, Shuyang Qin et al.

This paper extends robust principal component analysis (RPCA) to nonlinear manifolds. Suppose that the observed data matrix is the sum of a sparse component and a component drawn from some low dimensional manifold. Is it possible to separate them by using similar ideas as RPCA? Is there any benefit in treating the manifold as a whole as opposed to treating each local region independently? We answer these two questions affirmatively by proposing and analyzing an optimization framework that separates the sparse component from the manifold under noisy data. Theoretical error bounds are provided when the tangent spaces of the manifold satisfy certain incoherence conditions. We also provide a near optimal choice of the tuning parameters for the proposed optimization formulation with the help of a new curvature estimation method. The efficacy of our method is demonstrated on both synthetic and real datasets.

18.4LGOct 16, 2019

A Double Residual Compression Algorithm for Efficient Distributed Learning

Xiaorui Liu, Yao Li, Jiliang Tang et al.

Large-scale machine learning models are often trained by parallel stochastic gradient descent algorithms. However, the communication cost of gradient aggregation and model synchronization between the master and worker nodes becomes the major obstacle for efficient learning as the number of workers and the dimension of the model increase. In this paper, we propose DORE, a DOuble REsidual compression stochastic gradient descent algorithm, to reduce over $95\%$ of the overall communication such that the obstacle can be immensely mitigated. Our theoretical analyses demonstrate that the proposed strategy has superior convergence properties for both strongly convex and nonconvex objective functions. The experimental results validate that DORE achieves the best communication efficiency while maintaining similar model accuracy and convergence speed in comparison with start-of-the-art baselines.

7.6OCJun 17, 2019

On linear convergence of two decentralized algorithms

Yao Li, Ming Yan

Decentralized algorithms solve multi-agent problems over a connected network, where the information can only be exchanged with the accessible neighbors. Though there exist several decentralized optimization algorithms, there are still gaps in convergence conditions and rates between decentralized and centralized algorithms. In this paper, we fill some gaps by considering two decentralized algorithms: EXTRA and NIDS. They both converge linearly with strongly convex objective functions. We will answer two questions regarding them. What are the optimal upper bounds for their stepsizes? Do decentralized algorithms require more properties on the functions for linear convergence than centralized ones? More specifically, we relax the required conditions for linear convergence of both algorithms. For EXTRA, we show that the stepsize is comparable to that of centralized algorithms. For NIDS, the upper bound of the stepsize is shown to be exactly the same as the centralized ones. In addition, we relax the requirement for the objective functions and the mixing matrices. We provide the linear convergence results for both algorithms under the weakest conditions.

29.2DCMar 19, 2018

D$^2$: Decentralized Training over Decentralized Data

Hanlin Tang, Xiangru Lian, Ming Yan et al.

While training a machine learning model using multiple workers, each of which collects data from their own data sources, it would be most useful when the data collected from different workers can be {\em unique} and {\em different}. Ironically, recent analysis of decentralized parallel stochastic gradient descent (D-PSGD) relies on the assumption that the data hosted on different workers are {\em not too different}. In this paper, we ask the question: {\em Can we design a decentralized parallel stochastic gradient descent algorithm that is less sensitive to the data variance across workers?} In this paper, we present D$^2$, a novel decentralized parallel stochastic gradient descent algorithm designed for large data variance \xr{among workers} (imprecisely, "decentralized" data). The core of D$^2$ is a variance blackuction extension of the standard D-PSGD algorithm, which improves the convergence rate from $O\left({σ\over \sqrt{nT}} + {(nζ^2)^{\frac{1}{3}} \over T^{2/3}}\right)$ to $O\left({σ\over \sqrt{nT}}\right)$ where $ζ^{2}$ denotes the variance among data on different workers. As a result, D$^2$ is robust to data variance among workers. We empirically evaluated D$^2$ on image classification tasks where each worker has access to only the data of a limited set of labels, and find that D$^2$ significantly outperforms D-PSGD.

11.3IRAug 15, 2017

Ensemble Methods for Personalized E-Commerce Search Challenge at CIKM Cup 2016

Chen Wu, Ming Yan, Luo Si

Personalized search has been a hot research topic for many years and has been widely used in e-commerce. This paper describes our solution to tackle the challenge of personalized e-commerce search at CIKM Cup 2016. The goal of this competition is to predict search relevance and re-rank the result items in SERP according to the personalized search, browsing and purchasing preferences. Based on a detailed analysis of the provided data, we extract three different types of features, i.e., statistic features, query-item features and session features. Different models are used on these features, including logistic regression, gradient boosted decision trees, rank svm and a novel deep match model. With the blending of multiple models, a stacking ensemble model is built to integrate the output of individual models and produce a more accurate prediction result. Based on these efforts, our solution won the champion of the competition on all the evaluation metrics.

4.1MLJul 18, 2017

Exploring Outliers in Crowdsourced Ranking for QoE

Qianqian Xu, Ming Yan, Chendi Huang et al.

Outlier detection is a crucial part of robust evaluation for crowdsourceable assessment of Quality of Experience (QoE) and has attracted much attention in recent years. In this paper, we propose some simple and fast algorithms for outlier detection and robust QoE evaluation based on the nonconvex optimization principle. Several iterative procedures are designed with or without knowing the number of outliers in samples. Theoretical analysis is given to show that such procedures can reach statistically good estimates under mild conditions. Finally, experimental results with simulated and real-world crowdsourcing datasets show that the proposed algorithms could produce similar performance to Huber-LASSO approach in robust ranking, yet with nearly 8 or 90 times speed-up, without or with a prior knowledge on the sparsity size of outliers, respectively. Therefore the proposed methodology provides us a set of helpful tools for robust QoE evaluation with crowdsourcing data.

5.7LGJun 4, 2017

Nonconvex penalties with analytical solutions for one-bit compressive sensing

Xiaolin Huang, Ming Yan

One-bit measurements widely exist in the real world, and they can be used to recover sparse signals. This task is known as the problem of learning halfspaces in learning theory and one-bit compressive sensing (1bit-CS) in signal processing. In this paper, we propose novel algorithms based on both convex and nonconvex sparsity-inducing penalties for robust 1bit-CS. We provide a sufficient condition to verify whether a solution is globally optimal or not. Then we show that the globally optimal solution for positive homogeneous penalties can be obtained in two steps: a proximal operator and a normalization step. For several nonconvex penalties, including minimax concave penalty (MCP), $\ell_0$ norm, and sorted $\ell_1$ penalty, we provide fast algorithms for finding the analytical solutions by solving the dual problem. Specifically, our algorithm is more than $200$ times faster than the existing algorithm for MCP. Its efficiency is comparable to the algorithm for the $\ell_1$ penalty in time, while its performance is much better. Among these penalties, the sorted $\ell_1$ penalty is most robust to noise in different settings.

31.2OCApr 25, 2017

A decentralized proximal-gradient method with network independent step-sizes and separated convergence rates

Zhi Li, Wei Shi, Ming Yan

This paper proposes a novel proximal-gradient algorithm for a decentralized optimization problem with a composite objective containing smooth and non-smooth terms. Specifically, the smooth and nonsmooth terms are dealt with by gradient and proximal updates, respectively. The proposed algorithm is closely related to a previous algorithm, PG-EXTRA \cite{shi2015proximal}, but has a few advantages. First of all, agents use uncoordinated step-sizes, and the stable upper bounds on step-sizes are independent of network topologies. The step-sizes depend on local objective functions, and they can be as large as those of the gradient descent. Secondly, for the special case without non-smooth terms, linear convergence can be achieved under the strong convexity assumption. The dependence of the convergence rate on the objective functions and the network are separated, and the convergence rate of the new algorithm is as good as one of the two convergence rates that match the typical rates for the general gradient descent and the consensus averaging. We provide numerical experiments to demonstrate the efficacy of the introduced algorithm and validate our theoretical discoveries.

1.7CVJan 3, 2017

Mixed one-bit compressive sensing with applications to overexposure correction for CT reconstruction

Xiaolin Huang, Yan Xia, Lei Shi et al.

When a measurement falls outside the quantization or measurable range, it becomes saturated and cannot be used in classical reconstruction methods. For example, in C-arm angiography systems, which provide projection radiography, fluoroscopy, digital subtraction angiography, and are widely used for medical diagnoses and interventions, the limited dynamic range of C-arm flat detectors leads to overexposure in some projections during an acquisition, such as imaging relatively thin body parts (e.g., the knee). Aiming at overexposure correction for computed tomography (CT) reconstruction, we in this paper propose a mixed one-bit compressive sensing (M1bit-CS) to acquire information from both regular and saturated measurements. This method is inspired by the recent progress on one-bit compressive sensing, which deals with only sign observations. Its successful applications imply that information carried by saturated measurements is useful to improve recovery quality. For the proposed M1bit-CS model, alternating direction methods of multipliers is developed and an iterative saturation detection scheme is established. Then we evaluate M1bit-CS on one-dimensional signal recovery tasks. In some experiments, the performance of the proposed algorithms on mixed measurements is almost the same as recovery on unsaturated ones with the same amount of measurements. Finally, we apply the proposed method to overexposure correction for CT reconstruction on a phantom and a simulated clinical image. The results are promising, as the typical streaking artifacts and capping artifacts introduced by saturated projection data are effectively reduced, yielding significant error reduction compared with existing algorithms based on extrapolation.

19.4OCNov 29, 2016Code

A new primal-dual algorithm for minimizing the sum of three functions with a linear operator

Ming Yan

In this paper, we propose a new primal-dual algorithm for minimizing $f(x) + g(x) + h(Ax)$, where $f$, $g$, and $h$ are proper lower semi-continuous convex functions, $f$ is differentiable with a Lipschitz continuous gradient, and $A$ is a bounded linear operator. The proposed algorithm has some famous primal-dual algorithms for minimizing the sum of two functions as special cases. E.g., it reduces to the Chambolle-Pock algorithm when $f = 0$ and the proximal alternating predictor-corrector when $g = 0$. For the general convex case, we prove the convergence of this new algorithm in terms of the distance to a fixed point by showing that the iteration is a nonexpansive operator. In addition, we prove the $O(1/k)$ ergodic convergence rate in the primal-dual gap. With additional assumptions, we derive the linear convergence rate in terms of the distance to the fixed point. Comparing to other primal-dual algorithms for solving the same problem, this algorithm extends the range of acceptable parameters to ensure its convergence and has a smaller per-iteration cost. The numerical experiments show the efficiency of this algorithm.

11.3LGSep 30, 2016Code

Asynchronous Multi-Task Learning

Inci M. Baytas, Ming Yan, Anil K. Jain et al.

Many real-world machine learning applications involve several learning tasks which are inter-related. For example, in healthcare domain, we need to learn a predictive model of a certain disease for many hospitals. The models for each hospital may be different because of the inherent differences in the distributions of the patient populations. However, the models are also closely related because of the nature of the learning tasks modeling the same disease. By simultaneously learning all the tasks, multi-task learning (MTL) paradigm performs inductive knowledge transfer among tasks to improve the generalization performance. When datasets for the learning tasks are stored at different locations, it may not always be feasible to transfer the data to provide a data-centralized computing environment due to various practical issues such as high data volume and privacy. In this paper, we propose a principled MTL framework for distributed and asynchronous optimization to address the aforementioned challenges. In our framework, gradient update does not wait for collecting the gradient information from all the tasks. Therefore, the proposed method is very efficient when the communication delay is too high for some task nodes. We show that many regularized MTL formulations can benefit from this framework, including the low-rank MTL for shared subspace learning. Empirical studies on both synthetic and real-world datasets demonstrate the efficiency and effectiveness of the proposed framework.

3.3ITMay 14, 2015

Pinball Loss Minimization for One-bit Compressive Sensing: Convex Models and Algorithms

Xiaolin Huang, Lei Shi, Ming Yan et al.

The one-bit quantization is implemented by one single comparator that operates at low power and a high rate. Hence one-bit compressive sensing (1bit-CS) becomes attractive in signal processing. When measurements are corrupted by noise during signal acquisition and transmission, 1bit-CS is usually modeled as minimizing a loss function with a sparsity constraint. The one-sided $\ell_1$ loss and the linear loss are two popular loss functions for 1bit-CS. To improve the decoding performance on noisy data, we consider the pinball loss, which provides a bridge between the one-sided $\ell_1$ loss and the linear loss. Using the pinball loss, two convex models, an elastic-net pinball model and its modification with the $\ell_1$-norm constraint, are proposed. To efficiently solve them, the corresponding dual coordinate ascent algorithms are designed and their convergence is proved. The numerical experiments confirm the effectiveness of the proposed algorithms and the performance of the pinball loss minimization for 1bit-CS.

9.6OCApr 9, 2015

A Multiphase Image Segmentation Based on Fuzzy Membership Functions and L1-norm Fidelity

Fang Li, Stanley Osher, Jing Qin et al.

In this paper, we propose a variational multiphase image segmentation model based on fuzzy membership functions and L1-norm fidelity. Then we apply the alternating direction method of multipliers to solve an equivalent problem. All the subproblems can be solved efficiently. Specifically, we propose a fast method to calculate the fuzzy median. Experimental results and comparisons show that the L1-norm based method is more robust to outliers such as impulse noise and keeps better contrast than its L2-norm counterpart. Theoretically, we prove the existence of the minimizer and analyze the convergence of the algorithm.

2.3MMJul 29, 2014

Fast Adaptive Algorithm for Robust Evaluation of Quality of Experience

Qianqian Xu, Ming Yan, Yuan Yao

Outlier detection is an integral part of robust evaluation for crowdsourceable Quality of Experience (QoE) and has attracted much attention in recent years. In QoE for multimedia, outliers happen because of different test conditions, human errors, abnormal variations in context, {etc}. In this paper, we propose a simple yet effective algorithm for outlier detection and robust QoE evaluation named iterative Least Trimmed Squares (iLTS). The algorithm assigns binary weights to samples, i.e., 0 or 1 indicating if a sample is an outlier, then the outlier-trimmed subset least squares solutions give robust ranking scores. An iterative optimization is carried alternatively between updating weights and ranking scores which converges to a local optimizer in finite steps. In our test setting, iLTS is up to 190 times faster than LASSO-based methods with a comparable performance. Moreover, a varied version of this method shows adaptation in outlier detection, which provides an automatic detection to determine whether a data sample is an outlier without \emph{a priori} knowledge about the amount of the outliers. The effectiveness and efficiency of iLTS are demonstrated on both simulated examples and real-world applications. A Matlab package is provided to researchers exploiting crowdsourcing paired comparison data for robust ranking.

1.2DGJan 8, 2014

The Continuity of Images by Transmission Imaging Revisited

Zhitao Fan, Feng Guan, Chunlin Wu et al.

Transmission imaging, as an important imaging technique widely used in astronomy, medical diagnosis, and biology science, has been shown in [49] quite different from reflection imaging used in our everyday life. Understanding the structures of images (the prior information) is important for designing, testing, and choosing image processing methods, and good image processing methods are helpful for further uses of the image data, e.g., increasing the accuracy of the object reconstruction methods in transmission imaging applications. In reflection imaging, the images are usually modeled as discontinuous functions and even piecewise constant functions. In transmission imaging, it was shown very recently in [49] that almost all images are continuous functions. However, the author in [49] considered only the case of parallel beam geometry and used some too strong assumptions in the proof, which exclude some common cases such as cylindrical objects. In this paper, we consider more general beam geometries and simplify the assumptions by using totally different techniques. In particular, we will prove that almost all images in transmission imaging with both parallel and divergent beam geometries (two most typical beam geometries) are continuous functions, under much weaker assumptions than those in [49], which admit almost all practical cases. Besides, taking into accounts our analysis, we compare two image processing methods for Poisson noise (which is the most significant noise in transmission imaging) removal. Numerical experiments will be provided to demonstrate our analysis.