Tuyen Trung Truong

OC
h-index32
15papers
60citations
Novelty40%
AI Score23

15 Papers

IVApr 13, 2022
CapillaryX: A Software Design Pattern for Analyzing Medical Images in Real-time using Deep Learning

Maged Abdalla Helmy Abdou, Paulo Ferreira, Eric Jul et al.

Recent advances in digital imaging, e.g., increased number of pixels captured, have meant that the volume of data to be processed and analyzed from these images has also increased. Deep learning algorithms are state-of-the-art for analyzing such images, given their high accuracy when trained with a large data volume of data. Nevertheless, such analysis requires considerable computational power, making such algorithms time- and resource-demanding. Such high demands can be met by using third-party cloud service providers. However, analyzing medical images using such services raises several legal and privacy challenges and does not necessarily provide real-time results. This paper provides a computing architecture that locally and in parallel can analyze medical images in real-time using deep learning thus avoiding the legal and privacy challenges stemming from uploading data to a third-party cloud provider. To make local image processing efficient on modern multi-core processors, we utilize parallel execution to offset the resource-intensive demands of deep neural networks. We focus on a specific medical-industrial case study, namely the quantifying of blood vessels in microcirculation images for which we have developed a working system. It is currently used in an industrial, clinical research setting as part of an e-health application. Our results show that our system is approximately 78% faster than its serial system counterpart and 12% faster than a master-slave parallel system architecture.

OCSep 20, 2023
Creating walls to avoid unwanted points in root finding and optimization

Tuyen Trung Truong

In root finding and optimization, there are many cases where there is a closed set $A$ one likes that the sequence constructed by one's favourite method will not converge to A (here, we do not assume extra properties on $A$ such as being convex or connected). For example, if one wants to find roots, and one chooses initial points in the basin of attraction for 1 root $z^*$ (a fact which one may not know before hand), then one will always end up in that root. In this case, one would like to have a mechanism to avoid this point $z^*$ in the next runs of one's algorithm. Assume that one already has a method IM for optimization (and root finding) for non-constrained optimization. We provide a simple modification IM1 of the method to treat the situation discussed in the previous paragraph. If the method IM has strong theoretical guarantees, then so is IM1. As applications, we prove two theoretical applications: one concerns finding roots of a meromorphic function in an open subset of a Riemann surface, and the other concerns finding local minima of a function in an open subset of a Euclidean space inside it the function has at most countably many critical points. Along the way, we compare with main existing relevant methods in the current literature. We provide several examples in various different settings to illustrate the usefulness of the new approach.

OCJan 2, 2024
Backtracking New Q-Newton's method, Newton's flow, Voronoi's diagram and Stochastic root finding

John Erik Fornaess, Mi Hu, Tuyen Trung Truong et al.

A new variant of Newton's method - named Backtracking New Q-Newton's method (BNQN) - which has strong theoretical guarantee, is easy to implement, and has good experimental performance, was recently introduced by the third author. Experiments performed previously showed some remarkable properties of the basins of attractions for finding roots of polynomials and meromorphic functions, with BNQN. In general, they look more smooth than that of Newton's method. In this paper, we continue to experimentally explore in depth this remarkable phenomenon, and connect BNQN to Newton's flow and Voronoi's diagram. This link poses a couple of challenging puzzles to be explained. Experiments also indicate that BNQN is more robust against random perturbations than Newton's method and Random Relaxed Newton's method.

OCSep 23, 2021
Generalisations and improvements of New Q-Newton's method Backtracking

Tuyen Trung Truong

In this paper, we propose a general framework for the algorithm New Q-Newton's method Backtracking, developed in the author's previous work. For a symmetric, square real matrix $A$, we define $minsp(A):=\min _{||e||=1} ||Ae||$. Given a $C^2$ cost function $f:\mathbb{R}^m\rightarrow \mathbb{R}$ and a real number $0<τ$, as well as $m+1$ fixed real numbers $δ_0,\ldots ,δ_m$, we define for each $x\in \mathbb{R}^m$ with $\nabla f(x)\not= 0$ the following quantities: $κ:=\min _{i\not= j}|δ_i-δ_j|$; $A(x):=\nabla ^2f(x)+δ||\nabla f(x)||^τId$, where $δ$ is the first element in the sequence $\{δ_0,\ldots ,δ_m\}$ for which $minsp(A(x))\geq κ||\nabla f(x)||^τ$; $e_1(x),\ldots ,e_m(x)$ are an orthonormal basis of $\mathbb{R}^m$, chosen appropriately; $w(x)=$ the step direction, given by the formula: $$w(x)=\sum _{i=1}^m\frac{<\nabla f(x),e_i(x)>}{||A(x)e_i(x)||}e_i(x);$$ (we can also normalise by $w(x)/\max \{1,||w(x)||\}$ when needed) $γ(x)>0$ learning rate chosen by Backtracking line search so that Armijo's condition is satisfied: $$f(x-γ(x)w(x))-f(x)\leq -\frac{1}{3}γ(x)<\nabla f(x),w(x)>.$$ The update rule for our algorithm is $x\mapsto H(x)=x-γ(x)w(x)$. In New Q-Newton's method Backtracking, the choices are $τ=1+α>1$ and $e_1(x),\ldots ,e_m(x)$'s are eigenvectors of $\nabla ^2f(x)$. In this paper, we allow more flexibility and generality, for example $τ$ can be chosen to be $<1$ or $e_1(x),\ldots ,e_m(x)$'s are not necessarily eigenvectors of $\nabla ^2f(x)$. New Q-Newton's method Backtracking (as well as Backtracking gradient descent) is a special case, and some versions have flavours of quasi-Newton's methods. Several versions allow good theoretical guarantees. An application to solving systems of polynomial equations is given.

OCAug 23, 2021
New Q-Newton's method meets Backtracking line search: good convergence guarantee, saddle points avoidance, quadratic rate of convergence, and easy implementation

Tuyen Trung Truong

In a recent joint work, the author has developed a modification of Newton's method, named New Q-Newton's method, which can avoid saddle points and has quadratic rate of convergence. While good theoretical convergence guarantee has not been established for this method, experiments on small scale problems show that the method works very competitively against other well known modifications of Newton's method such as Adaptive Cubic Regularization and BFGS, as well as first order methods such as Unbounded Two-way Backtracking Gradient Descent. In this paper, we resolve the convergence guarantee issue by proposing a modification of New Q-Newton's method, named New Q-Newton's method Backtracking, which incorporates a more sophisticated use of hyperparameters and a Backtracking line search. This new method has very good theoretical guarantees, which for a {\bf Morse function} yields the following (which is unknown for New Q-Newton's method): {\bf Theorem.} Let $f:\mathbb{R}^m\rightarrow \mathbb{R}$ be a Morse function, that is all its critical points have invertible Hessian. Then for a sequence $\{x_n\}$ constructed by New Q-Newton's method Backtracking from a random initial point $x_0$, we have the following two alternatives: i) $\lim_{n\rightarrow\infty}||x_n||=\infty$, or ii) $\{x_n\}$ converges to a point $x_{\infty}$ which is a {\bf local minimum} of $f$, and the rate of convergence is {\bf quadratic}. Moreover, if $f$ has compact sublevels, then only case ii) happens. As far as we know, for Morse functions, this is the best theoretical guarantee for iterative optimization algorithms so far in the literature. We have tested in experiments on small scale, with some further simplified versions of New Q-Newton's method Backtracking, and found that the new method significantly improve New Q-Newton's method.

IVApr 23, 2021
CapillaryNet: An Automated System to Quantify Skin Capillary Density and Red Blood Cell Velocity from Handheld Vital Microscopy

Maged Helmy, Anastasiya Dykyy, Tuyen Trung Truong et al.

Capillaries are the smallest vessels in the body responsible for delivering oxygen and nutrients to surrounding cells. Various life-threatening diseases are known to alter the density of healthy capillaries and the flow velocity of erythrocytes within the capillaries. In previous studies, capillary density and flow velocity were manually assessed by trained specialists. However, manual analysis of a standard 20-second microvascular video requires 20 minutes on average and necessitates extensive training. Thus, manual analysis has been reported to hinder the application of microvascular microscopy in a clinical environment. To address this problem, this paper presents a fully automated state-of-the-art system to quantify skin nutritive capillary density and red blood cell velocity captured by handheld-based microscopy videos. The proposed method combines the speed of traditional computer vision algorithms with the accuracy of convolutional neural networks to enable clinical capillary analysis. The results show that the proposed system fully automates capillary detection with an accuracy exceeding that of trained analysts and measures several novel microvascular parameters that had eluded quantification thus far, namely, capillary hematocrit and intracapillary flow velocity heterogeneity. The proposed end-to-end system, named CapillaryNet, can detect capillaries at $\sim$0.9 seconds per frame with $\sim$93\% accuracy. The system is currently being used as a clinical research product in a larger e-health application to analyse capillary data captured from patients suffering from COVID-19, pancreatitis, and acute heart diseases. CapillaryNet narrows the gap between the analysis of microcirculation images in a clinical environment and state-of-the-art systems.

OCAug 25, 2020
Unconstrained optimisation on Riemannian manifolds

Tuyen Trung Truong

In this paper, we give explicit descriptions of versions of (Local-) Backtracking Gradient Descent and New Q-Newton's method to the Riemannian setting.Here are some easy to state consequences of results in this paper, where X is a general Riemannian manifold of finite dimension and $f:X\rightarrow \mathbb{R}$ a $C^2$ function which is Morse (that is, all its critical points are non-degenerate). {\bf Theorem.} For random choices of the hyperparameters in the Riemanian Local Backtracking Gradient Descent algorithm and for random choices of the initial point $x_0$, the sequence $\{x_n\}$ constructed by the algorithm either (i) converges to a local minimum of $f$ or (ii) eventually leaves every compact subsets of $X$ (in other words, diverges to infinity on $X$). If $f$ has compact sublevels, then only the former alternative happens. The convergence rate is the same as in the classical paper by Armijo. {\bf Theorem.} Assume that $f$ is $C^3$. For random choices of the hyperparametes in the Riemannian New Q-Newton's method, if the sequence constructed by the algorithm converges, then the limit is a critical point of $f$. We have a local Stable-Center manifold theorem, near saddle points of $f$, for the dynamical system associated to the algorithm. If the limit point is a non-degenerate minimum point, then the rate of convergence is quadratic. If moreover $X$ is an open subset of a Lie group and the initial point $x_0$ is chosen randomly, then we can globally avoid saddle points. As an application, we propose a general method using Riemannian Backtracking GD to find minimum of a function on a bounded ball in a Euclidean space, and do explicit calculations for calculating the smallest eigenvalue of a symmetric square matrix.

OCJul 7, 2020
Asymptotic behaviour of learning rates in Armijo's condition

Tuyen Trung Truong, Tuan Hang Nguyen

Fix a constant $0<α<1$. For a $C^1$ function $f:\mathbb{R}^k\rightarrow \mathbb{R}$, a point $x$ and a positive number $δ>0$, we say that Armijo's condition is satisfied if $f(x-δ\nabla f(x))-f(x)\leq -αδ||\nabla f(x)||^2$. It is a basis for the well known Backtracking Gradient Descent (Backtracking GD) algorithm. Consider a sequence $\{x_n\}$ defined by $x_{n+1}=x_n-δ_n\nabla f(x_n)$, for positive numbers $δ_n$ for which Armijo's condition is satisfied. We show that if $\{x_n\}$ converges to a non-degenerate critical point, then $\{δ_n\}$ must be bounded. Moreover this boundedness can be quantified in terms of the norms of the Hessian $\nabla ^2f$ and its inverse at the limit point. This complements the first author's results on Unbounded Backtracking GD, and shows that in case of convergence to a non-degenerate critical point the behaviour of Unbounded Backtracking GD is not too different from that of usual Backtracking GD. On the other hand, in case of convergence to a degenerate critical point the behaviours can be very much different. We run some experiments to illustrate that both scenrios can really happen. In another part of the paper, we argue that Backtracking GD has the correct unit (according to a definition by Zeiler in his Adadelta's paper). The main point is that since learning rate in Backtracking GD is bound by Armijo's condition, it is not unitless.

OCJun 2, 2020
A fast and simple modification of Newton's method helping to avoid saddle points

Tuyen Trung Truong, Tat Dat To, Tuan Hang Nguyen et al.

We propose in this paper New Q-Newton's method. The update rule is very simple conceptually, for example $x_{n+1}=x_n-w_n$ where $w_n=pr_{A_n,+}(v_n)-pr_{A_n,-}(v_n)$, with $A_n=\nabla ^2f(x_n)+δ_n||\nabla f(x_n)||^2.Id$ and $v_n=A_n^{-1}.\nabla f(x_n)$. Here $δ_n$ is an appropriate real number so that $A_n$ is invertible, and $pr_{A_n,\pm}$ are projections to the vector subspaces generated by eigenvectors of positive (correspondingly negative) eigenvalues of $A_n$. The main result of this paper roughly says that if $f$ is $C^3$ (can be unbounded from below) and a sequence $\{x_n\}$, constructed by the New Q-Newton's method from a random initial point $x_0$, {\bf converges}, then the limit point is a critical point and is not a saddle point, and the convergence rate is the same as that of Newton's method. The first author has recently been successful incorporating Backtracking line search to New Q-Newton's method, thus resolving the convergence guarantee issue observed for some (non-smooth) cost functions. An application to quickly finding zeros of a univariate meromorphic function will be discussed. Various experiments are performed, against well known algorithms such as BFGS and Adaptive Cubic Regularization are presented.

OCMar 11, 2020
Coordinate-wise Armijo's condition: General case

Tuyen Trung Truong

Let $z=(x,y)$ be coordinates for the product space $\mathbb{R}^{m_1}\times \mathbb{R}^{m_2}$. Let $f:\mathbb{R}^{m_1}\times \mathbb{R}^{m_2}\rightarrow \mathbb{R}$ be a $C^1$ function, and $\nabla f=(\partial _xf,\partial _yf)$ its gradient. Fix $0<α<1$. For a point $(x,y) \in \mathbb{R}^{m_1}\times \mathbb{R}^{m_2}$, a number $δ>0$ satisfies Armijo's condition at $(x,y)$ if the following inequality holds: \begin{eqnarray*} f(x-δ\partial _xf,y-δ\partial _yf)-f(x,y)\leq -αδ(||\partial _xf||^2+||\partial _yf||^2). \end{eqnarray*} In one previous paper, we proposed the following {\bf coordinate-wise} Armijo's condition. Fix again $0<α<1$. A pair of positive numbers $δ_1,δ_2>0$ satisfies the coordinate-wise variant of Armijo's condition at $(x,y)$ if the following inequality holds: \begin{eqnarray*} [f(x-δ_1\partial _xf(x,y), y-δ_2\partial _y f(x,y))]-[f(x,y)]\leq -α(δ_1||\partial _xf(x,y)||^2+δ_2||\partial _yf(x,y)||^2). \end{eqnarray*} Previously we applied this condition for functions of the form $f(x,y)=f(x)+g(y)$, and proved various convergent results for them. For a general function, it is crucial - for being able to do real computations - to have a systematic algorithm for obtaining $δ_1$ and $δ_2$ satisfying the coordinate-wise version of Armijo's condition, much like Backtracking for the usual Armijo's condition. In this paper we propose such an algorithm, and prove according convergent results. We then analyse and present experimental results for some functions such as $f(x,y)=a|x|+y$ (given by Asl and Overton in connection to Wolfe's method), $f(x,y)=x^3 sin (1/x) + y^3 sin(1/y)$ and Rosenbrock's function.

OCJan 16, 2020
Some convergent results for Backtracking Gradient Descent method on Banach spaces

Tuyen Trung Truong

Our main result concerns the following condition: {\bf Condition C.} Let $X$ be a Banach space. A $C^1$ function $f:X\rightarrow \mathbb{R}$ satisfies Condition C if whenever $\{x_n\}$ weakly converges to $x$ and $\lim _{n\rightarrow\infty}||\nabla f(x_n)||=0$, then $\nabla f(x)=0$. We assume that there is given a canonical isomorphism between $X$ and its dual $X^*$, for example when $X$ is a Hilbert space. {\bf Theorem.} Let $X$ be a reflexive, complete Banach space and $f:X\rightarrow \mathbb{R}$ be a $C^2$ function which satisfies Condition C. Moreover, we assume that for every bounded set $S\subset X$, then $\sup _{x\in S}||\nabla ^2f(x)||<\infty$. We choose a random point $x_0\in X$ and construct by the Local Backtracking GD procedure (which depends on $3$ hyper-parameters $α,β,δ_0$, see later for details) the sequence $x_{n+1}=x_n-δ(x_n)\nabla f(x_n)$. Then we have: 1) Every cluster point of $\{x_n\}$, in the {\bf weak} topology, is a critical point of $f$. 2) Either $\lim _{n\rightarrow\infty}f(x_n)=-\infty$ or $\lim _{n\rightarrow\infty}||x_{n+1}-x_n||=0$. 3) Here we work with the weak topology. Let $\mathcal{C}$ be the set of critical points of $f$. Assume that $\mathcal{C}$ has a bounded component $A$. Let $\mathcal{B}$ be the set of cluster points of $\{x_n\}$. If $\mathcal{B}\cap A\not= \emptyset$, then $\mathcal{B}\subset A$ and $\mathcal{B}$ is connected. 4) Assume that $X$ is separable. Then for generic choices of $α,β,δ_0$ and the initial point $x_0$, if the sequence $\{x_n\}$ converges - in the {\bf weak} topology, then the limit point cannot be a saddle point.

OCJan 7, 2020
Backtracking Gradient Descent allowing unbounded learning rates

Tuyen Trung Truong

In unconstrained optimisation on an Euclidean space, to prove convergence in Gradient Descent processes (GD) $x_{n+1}=x_n-δ_n \nabla f(x_n)$ it usually is required that the learning rates $δ_n$'s are bounded: $δ_n\leq δ$ for some positive $δ$. Under this assumption, if the sequence $x_n$ converges to a critical point $z$, then with large values of $n$ the update will be small because $||x_{n+1}-x_n||\lesssim ||\nabla f(x_n)||$. This may also force the sequence to converge to a bad minimum. If we can allow, at least theoretically, that the learning rates $δ_n$'s are not bounded, then we may have better convergence to better minima. A previous joint paper by the author showed convergence for the usual version of Backtracking GD under very general assumptions on the cost function $f$. In this paper, we allow the learning rates $δ_n$ to be unbounded, in the sense that there is a function $h:(0,\infty)\rightarrow (0,\infty )$ such that $\lim _{t\rightarrow 0}th(t)=0$ and $δ_n\lesssim \max \{h(x_n),δ\}$ satisfies Armijo's condition for all $n$, and prove convergence under the same assumptions as in the mentioned paper. It will be shown that this growth rate of $h$ is best possible if one wants convergence of the sequence $\{x_n\}$. A specific way for choosing $δ_n$ in a discrete way connects to Two-way Backtracking GD defined in the mentioned paper. We provide some results which either improve or are implicitly contained in those in the mentioned paper and another recent paper on avoidance of saddle points.

OCNov 18, 2019
Coordinate-wise Armijo's condition

Tuyen Trung Truong

Let $z=(x,y)$ be coordinates for the product space $\mathbb{R}^{m_1}\times \mathbb{R}^{m_2}$. Let $f:\mathbb{R}^{m_1}\times \mathbb{R}^{m_2}\rightarrow \mathbb{R}$ be a $C^1$ function, and $\nabla f=(\partial _xf,\partial _yf)$ its gradient. Fix $0<α<1$. For a point $(x,y) \in \mathbb{R}^{m_1}\times \mathbb{R}^{m_2}$, a number $δ>0$ satisfies Armijo's condition at $(x,y)$ if the following inequality holds: \begin{eqnarray*} f(x-δ\partial _xf,y-δ\partial _yf)-f(x,y)\leq -αδ(||\partial _xf||^2+||\partial _yf||^2). \end{eqnarray*} When $f(x,y)=f_1(x)+f_2(y)$ is a coordinate-wise sum map, we propose the following {\bf coordinate-wise} Armijo's condition. Fix again $0<α<1$. A pair of positive numbers $δ_1,δ_2>0$ satisfies the coordinate-wise variant of Armijo's condition at $(x,y)$ if the following inequality holds: \begin{eqnarray*} [f_1(x-δ_1\nabla f_1(x))+f_2(y-δ_2\nabla f_2(y))]-[f_1(x)+f_2(y)]\leq -α(δ_1||\nabla f_1(x)||^2+δ_2||\nabla f_2(y)||^2). \end{eqnarray*} We then extend results in our recent previous results, on Backtracking Gradient Descent and some variants, to this setting. We show by an example the advantage of using coordinate-wise Armijo's condition over the usual Armijo's condition.

OCNov 11, 2019
Convergence to minima for the continuous version of Backtracking Gradient Descent

Tuyen Trung Truong

The main result of this paper is: {\bf Theorem.} Let $f:\mathbb{R}^k\rightarrow \mathbb{R}$ be a $C^{1}$ function, so that $\nabla f$ is locally Lipschitz continuous. Assume moreover that $f$ is $C^2$ near its generalised saddle points. Fix real numbers $δ_0>0$ and $0<α<1$. Then there is a smooth function $h:\mathbb{R}^k\rightarrow (0,δ_0]$ so that the map $H:\mathbb{R}^k\rightarrow \mathbb{R}^k$ defined by $H(x)=x-h(x)\nabla f(x)$ has the following property: (i) For all $x\in \mathbb{R}^k$, we have $f(H(x)))-f(x)\leq -αh(x)||\nabla f(x)||^2$. (ii) For every $x_0\in \mathbb{R}^k$, the sequence $x_{n+1}=H(x_n)$ either satisfies $\lim_{n\rightarrow\infty}||x_{n+1}-x_n||=0$ or $ \lim_{n\rightarrow\infty}||x_n||=\infty$. Each cluster point of $\{x_n\}$ is a critical point of $f$. If moreover $f$ has at most countably many critical points, then $\{x_n\}$ either converges to a critical point of $f$ or $\lim_{n\rightarrow\infty}||x_n||=\infty$. (iii) There is a set $\mathcal{E}_1\subset \mathbb{R}^k$ of Lebesgue measure $0$ so that for all $x_0\in \mathbb{R}^k\backslash \mathcal{E}_1$, the sequence $x_{n+1}=H(x_n)$, {\bf if converges}, cannot converge to a {\bf generalised} saddle point. (iv) There is a set $\mathcal{E}_2\subset \mathbb{R}^k$ of Lebesgue measure $0$ so that for all $x_0\in \mathbb{R}^k\backslash \mathcal{E}_2$, any cluster point of the sequence $x_{n+1}=H(x_n)$ is not a saddle point, and more generally cannot be an isolated generalised saddle point. Some other results are proven.

OCAug 15, 2018
Backtracking gradient descent method for general $C^1$ functions, with applications to Deep Learning

Tuyen Trung Truong, Tuan Hang Nguyen

While Standard gradient descent is one very popular optimisation method, its convergence cannot be proven beyond the class of functions whose gradient is globally Lipschitz continuous. As such, it is not actually applicable to realistic applications such as Deep Neural Networks. In this paper, we prove that its backtracking variant behaves very nicely, in particular convergence can be shown for all Morse functions. The main theoretical result of this paper is as follows. Theorem. Let $f:\mathbb{R}^k\rightarrow \mathbb{R}$ be a $C^1$ function, and $\{z_n\}$ a sequence constructed from the Backtracking gradient descent algorithm. (1) Either $\lim _{n\rightarrow\infty}||z_n||=\infty$ or $\lim _{n\rightarrow\infty}||z_{n+1}-z_n||=0$. (2) Assume that $f$ has at most countably many critical points. Then either $\lim _{n\rightarrow\infty}||z_n||=\infty$ or $\{z_n\}$ converges to a critical point of $f$. (3) More generally, assume that all connected components of the set of critical points of $f$ are compact. Then either $\lim _{n\rightarrow\infty}||z_n||=\infty$ or $\{z_n\}$ is bounded. Moreover, in the latter case the set of cluster points of $\{z_n\}$ is connected. Some generalised versions of this result, including an inexact version, are included. Another result in this paper concerns the problem of saddle points. We then present a heuristic argument to explain why Standard gradient descent method works so well, and modifications of the backtracking versions of GD, MMT and NAG. Experiments with datasets CIFAR10 and CIFAR100 on various popular architectures verify the heuristic argument also for the mini-batch practice and show that our new algorithms, while automatically fine tuning learning rates, perform better than current state-of-the-art methods such as MMT, NAG, Adagrad, Adadelta, RMSProp, Adam and Adamax.