LGApr 24, 2023Code
Sample-Efficient and Surrogate-Based Design Optimization of Underwater Vehicle HullsHarsh Vardhan, David Hyde, Umesh Timalsina et al.
Physics simulations like computational fluid dynamics (CFD) are a computational bottleneck in computer-aided design (CAD) optimization processes. To overcome this bottleneck, one requires either an optimization framework that is highly sample-efficient, or a fast data-driven proxy (surrogate model) for long-running simulations. Both approaches have benefits and limitations. Bayesian optimization is often used for sample efficiency, but it solves one specific problem and struggles with transferability; alternatively, surrogate models can offer fast and often more generalizable solutions for CFD problems, but gathering data for and training such models can be computationally demanding. In this work, we leverage recent advances in optimization and artificial intelligence (AI) to explore both of these potential approaches, in the context of designing an optimal unmanned underwater vehicle (UUV) hull. Our study finds that the Bayesian Optimization-Lower Condition Bound (BO-LCB) algorithm is the most sample-efficient optimization framework and has the best convergence behavior of those considered. Subsequently, we show that our DNN-based surrogate model predicts drag force on test data in tight agreement with CFD simulations, with a mean absolute percentage error (MAPE) of 1.85%. Combining these results, we demonstrate a two-orders-of-magnitude speedup (with comparable accuracy) for the design optimization process when the surrogate model is used. To our knowledge, this is the first study applying Bayesian optimization and DNN-based surrogate modeling to the problem of UUV design optimization, and we share our developments as open-source software.
LGMay 28
Improving Selective Classification with Pairwise Queries for Binary ClassificationHarsh Vardhan, Sunav Choudhary, Natwar Modani et al.
In selective classification, a model predicts the labels of data samples where it is confident, and abstains from predicting labels for samples on which it is not confident. The rejected samples are often labeled by an expert, which is expensive. The budget for the expert is best utilized when the model has low error on non-rejected samples. However, the estimate of a model's confidence might be inconsistent with the model's predictions, which can lead to high error on non-rejected points. Such situations can readily occur in in-context binary classification by LLMs. To remedy this, we propose making additional pairwise queries to the same model. These pairwise queries can detect high-error samples and be incorporated into selective classification techniques to reduce the error on non-rejected samples. Theoretically, we establish the conditions under which a simple algorithm using pairwise queries outperforms an inconsistent confidence estimate. We support this insight through extensive experiments for $1$ synthetic and $4$ in-context learning-based real binary classification datasets. In all these cases, we show that our algorithms, using pairwise queries, obtain a better accuracy-cost tradeoff than using only the raw confidence estimates, for instance, the LLM's next-token logits.
MLJun 3
Flatness and Generalization: Learning Multi-Index Models with Homogeneous Neural NetworksHarsh Vardhan, Hossein Taheri, Arya Mazumdar
A common heuristic used to explain the generalization of first-order gradient methods on non-convex neural networks is that "flat interpolators generalize well" (Hochreiter and Schmidhuber, 1994; Keskar et al., 2017), where flatness can be measured by the trace of the Hessian of the empirical loss. However, Dinh et al. 2017) showed that, using symmetry of the network that can change flatness while keeping the population and empirical losses unchanged, any interpolator can be made sharper or flatter. This result makes the earlier heuristic statement vacuous. In this paper, we show that for learning an unknown multi-index model with $2$-layer non-convex homogeneous neural networks, there is a connection between flatness and generalization, despite the existence of symmetries. This connection pertains to the "flattest" interpolators, i.e., the interpolators that have orderwise minimum flatness among all interpolators. First, we show that there exists a natural class of non-generalizing interpolators whose flatness cannot be made closer to the flattest possible, even using symmetries. Second, we show that for data generated by a sum of single-index models, if the approximation error and label noise are low, any flattest interpolator achieves small population loss, i.e., the flattest interpolators always generalize. This establishes a direct link between flatness and generalization which applies to a large class of activations and realistic data distributions.
LGJun 24, 2022Code
Deep Active Learning for Regression Using $ε$-weighted Hybrid Query StrategyHarsh Vardhan, Janos Sztipanovits
Designing an inexpensive approximate surrogate model that captures the salient features of an expensive high-fidelity behavior is a prevalent approach in design optimization. In recent times, Deep Learning (DL) models are being used as a promising surrogate computational model for engineering problems. However, the main challenge in creating a DL-based surrogate is to simulate/label a large number of design points, which is time-consuming for computationally costly and/or high-dimensional engineering problems. In the present work, we propose a novel sampling technique by combining the active learning (AL) method with DL. We call this method $ε$-weighted hybrid query strategy ($ε$-HQS) , which focuses on the evaluation of the surrogate at each learning iteration and provides an estimate of the failure probability of the surrogate in the Design Space. By reusing already collected training and test data, the learned failure probability guides the next iteration's sampling process to the region of the high probability of failure. During the empirical evaluation, better accuracy of the surrogate was observed in comparison to other methods of sample selection. We empirically evaluated this method in two different engineering design domains, finite element based static stress analysis of submarine pressure vessel(computationally costly process) and second submarine propeller design( high dimensional problem). https://github.com/vardhah/epsilon_weighted_Hybrid_Query_Strategy
LGJan 17, 2023
Async-HFL: Efficient and Robust Asynchronous Federated Learning in Hierarchical IoT NetworksXiaofan Yu, Ludmila Cherkasova, Harsh Vardhan et al.
Federated Learning (FL) has gained increasing interest in recent years as a distributed on-device learning paradigm. However, multiple challenges remain to be addressed for deploying FL in real-world Internet-of-Things (IoT) networks with hierarchies. Although existing works have proposed various approaches to account data heterogeneity, system heterogeneity, unexpected stragglers and scalibility, none of them provides a systematic solution to address all of the challenges in a hierarchical and unreliable IoT network. In this paper, we propose an asynchronous and hierarchical framework (Async-HFL) for performing FL in a common three-tier IoT network architecture. In response to the largely varied delays, Async-HFL employs asynchronous aggregations at both the gateway and the cloud levels thus avoids long waiting time. To fully unleash the potential of Async-HFL in converging speed under system heterogeneities and stragglers, we design device selection at the gateway level and device-gateway association at the cloud level. Device selection chooses edge devices to trigger local training in real-time while device-gateway association determines the network topology periodically after several cloud epochs, both satisfying bandwidth limitation. We evaluate Async-HFL's convergence speedup using large-scale simulations based on ns-3 and a network topology from NYCMesh. Our results show that Async-HFL converges 1.08-1.31x faster in wall-clock time and saves up to 21.6% total communication cost compared to state-of-the-art asynchronous FL algorithms (with client selection). We further validate Async-HFL on a physical deployment and observe robust convergence under unexpected stragglers.
LGJun 11, 2022
Rare event failure test case generation in Learning-Enabled-ControllersHarsh Vardhan, Janos Sztipanovits
Machine learning models have prevalent applications in many real-world problems, which increases the importance of correctness in the behaviour of these trained models. Finding a good test case that can reveal the potential failure in these trained systems can help to retrain these models to increase their correctness. For a well-trained model, the occurrence of a failure is rare. Consequently, searching these rare scenarios by evaluating each sample in input search space or randomized search would be costly and sometimes intractable due to large search space, limited computational resources, and available time. In this paper, we tried to address this challenge of finding these failure scenarios faster than traditional randomized search. The central idea of our approach is to separate the input data space in region of high failure probability and region of low/minimal failure probability based on the observation made by training data, data drawn from real-world statistics, and knowledge from a domain expert. Using these information, we can design a generative model from which we can generate scenarios that have a high likelihood to reveal the potential failure. We evaluated this approach on two different experimental scenarios and able to speed up the discovery of such failures a thousand-fold faster than the traditional randomized search.
ROFeb 28, 2023
Constrained Bayesian Optimization for Automatic Underwater Vehicle Hull DesignHarsh Vardhan, Peter Volgyesi, Will Hedgecock et al.
Automatic underwater vehicle hull Design optimization is a complex engineering process for generating a UUV hull with optimized properties on a given requirement. First, it involves the integration of involved computationally complex engineering simulation tools. Second, it needs integration of a sample efficient optimization framework with the integrated toolchain. To this end, we integrated the CAD tool called FreeCAD with CFD tool openFoam for automatic design evaluation. For optimization, we chose Bayesian optimization (BO), which is a well-known technique developed for optimizing time-consuming expensive engineering simulations and has proven to be very sample efficient in a variety of problems, including hyper-parameter tuning and experimental design. During the optimization process, we can handle infeasible design as constraints integrated into the optimization process. By integrating domain-specific toolchain with AI-based optimization, we executed the automatic design optimization of underwater vehicle hull design. For empirical evaluation, we took two different use cases of real-world underwater vehicle design to validate the execution of our tool.
IMMay 2, 2022
ASTROMER: A transformer-based embedding for the representation of light curvesC. Donoso-Oliva, I. Becker, P. Protopapas et al.
Taking inspiration from natural language embeddings, we present ASTROMER, a transformer-based model to create representations of light curves. ASTROMER was pre-trained in a self-supervised manner, requiring no human-labeled data. We used millions of R-band light sequences to adjust the ASTROMER weights. The learned representation can be easily adapted to other surveys by re-training ASTROMER on new sources. The power of ASTROMER consists of using the representation to extract light curve embeddings that can enhance the training of other models, such as classifiers or regressors. As an example, we used ASTROMER embeddings to train two neural-based classifiers that use labeled variable stars from MACHO, OGLE-III, and ATLAS. In all experiments, ASTROMER-based classifiers outperformed a baseline recurrent neural network trained on light curves directly when limited labeled data was available. Furthermore, using ASTROMER embeddings decreases computational resources needed while achieving state-of-the-art results. Finally, we provide a Python library that includes all the functionalities employed in this work. The library, main code, and pre-trained weights are available at https://github.com/astromer-science
LGNov 16, 2022
Data efficient surrogate modeling for engineering design: Ensemble-free batch mode deep active learning for regressionSarthak Kapoor, Harsh Vardhan, Umesh Timalsina et al.
High fidelity design evaluation processes such as Computational Fluid Dynamics and Finite Element Analysis are often replaced with data driven surrogates to reduce computational cost in engineering design optimization. However, building accurate surrogate models still requires a large number of expensive simulations. To address this challenge, we introduce epsilon HQS, a scalable active learning strategy that leverages a student teacher framework to train deep neural networks efficiently. Unlike Bayesian AL methods, which are computationally demanding with DNNs, epsilon HQS selectively queries informative samples to reduce labeling cost. Applied to CFD, FEA, and propeller design tasks, our method achieves higher accuracy under fixed labeling cost budgets.
CEFeb 18, 2023
Search for universal minimum drag resistance underwater vehicle hull using CFDHarsh Vardhan, Janos Sztipanovits
In Autonomous Underwater Vehicles (AUVs) design, hull resistance is an important factor in determining the power requirements and range of vehicle and consequently affect battery size, weight, and volume requirement of the design. In this paper, we leverage on AI-based optimization algorithm along with Computational Fluid Dynamics (CFD) simulation to study the optimal hull design that minimizing the resistance. By running the CFD-based optimization at different operating velocities and turbulence intensity, we want to study/search the possibility of a universal design that will provide least resistance/near-optimal design across all operating conditions (operating velocity) and environmental conditions (turbulence intensity). Early result demonstrated that the optimal design found at low velocity and low turbulence condition performs very poor at high velocity and high turbulence conditions. However, a design that is optimal at high velocity and high turbulence conditions performs near-optimal across many considered velocity and turbulence conditions.
LGJun 18, 2022
Reduced Robust Random Cut Forest for Out-Of-Distribution detection in machine learning modelsHarsh Vardhan, Janos Sztipanovits
Most machine learning-based regressors extract information from data collected via past observations of limited length to make predictions in the future. Consequently, when input to these trained models is data with significantly different statistical properties from data used for training, there is no guarantee of accurate prediction. Consequently, using these models on out-of-distribution input data may result in a completely different predicted outcome from the desired one, which is not only erroneous but can also be hazardous in some cases. Successful deployment of these machine learning models in any system requires a detection system, which should be able to distinguish between out-of-distribution and in-distribution data (i.e. similar to training data). In this paper, we introduce a novel approach for this detection process using a Reduced Robust Random Cut Forest (RRRCF) data structure, which can be used on both small and large data sets. Similar to the Robust Random Cut Forest (RRCF), RRRCF is a structured, but a reduced representation of the training data sub-space in form of cut trees. Empirical results of this method on both low and high-dimensional data showed that inference about data being in/out of training distribution can be made efficiently and the model is easy to train with no difficult hyper-parameter tuning. The paper discusses two different use-cases for testing and validating results.
LGFeb 28, 2023
Fusion of ML with numerical simulation for optimized propeller designHarsh Vardhan, Peter Volgyesi, Janos Sztipanovits
In computer-aided engineering design, the goal of a designer is to find an optimal design on a given requirement using the numerical simulator in loop with an optimization method. In this design optimization process, a good design optimization process is one that can reduce the time from inception to design. In this work, we take a class of design problem, that is computationally cheap to evaluate but has high dimensional design space. In such cases, traditional surrogate-based optimization does not offer any benefits. In this work, we propose an alternative way to use ML model to surrogate the design process that formulates the search problem as an inverse problem and can save time by finding the optimal design or at least a good initial seed design for optimization. By using this trained surrogate model with the traditional optimization method, we can get the best of both worlds. We call this as Surrogate Assisted Optimization (SAO)- a hybrid approach by mixing ML surrogate with the traditional optimization method. Empirical evaluations of propeller design problems show that a better efficient design can be found in fewer evaluations using SAO.
LGJun 6, 2022
Deep Learning-based Finite Element Analysis (FEA) surrogate for sub-sea pressure vesselHarsh Vardhan, Janos Sztipanovits
During the design process of an autonomous underwater vehicle (AUV), the pressure vessel has a critical role. The pressure vessel contains dry electronics, power sources, and other sensors that can not be flooded. A traditional design approach for a pressure vessel design involves running multiple Finite Element Analysis (FEA) based simulations and optimizing the design to find the best suitable design which meets the requirement. Running these FEAs are computationally very costly for any optimization process and it becomes difficult to run even hundreds of evaluation. In such a case, a better approach is the surrogate design with the goal of replacing FEA-based prediction with some learning-based regressor. Once the surrogate is trained for a class of problem, then the learned response surface can be used to analyze the stress effect without running the FEA for that class of problem. The challenge of creating a surrogate for a class of problems is data generation. Since the process is computationally costly, it is not possible to densely sample the design space and the learning response surface on sparse data set becomes difficult. During experimentation, we observed that a Deep Learning-based surrogate outperforms other regression models on such sparse data. In the present work, we are utilizing the Deep Learning-based model to replace the costly finite element analysis-based simulation process. By creating the surrogate we speed up the prediction on the other design much faster than direct Finite element Analysis. We also compared our DL-based surrogate with other classical Machine Learning (ML) based regression models( random forest and Gradient Boost regressor). We observed on the sparser data, the DL-based surrogate performs much better than other regression models.
CEJun 24, 2024Code
Anvil: An integration of artificial intelligence, sampling techniques, and a combined CAD-CFD toolHarsh Vardhan, Umesh Timalsina, Michael Sandborn et al.
In this work, we introduce an open-source integrated CAD-CFD tool, Anvil, which combines FreeCAD for CAD modeling and OpenFOAM for CFD analysis, along with an AI-based optimization method (Bayesian optimization) and other sampling algorithms. Anvil serves as a scientific machine learning tool for shape optimization in three modes: data generation, CFD evaluation, and shape optimization. In data generation mode, it automatically runs CFD evaluations and generates data for training a surrogate model. In optimization mode, it searches for the optimal design under given requirements and optimization metrics. In CFD mode, a single CAD file can be evaluated with a single OpenFOAM run. To use Anvil, experimenters provide a JSON configuration file and a parametric CAD seed design. Anvil can be used to study solid-fluid dynamics for any subsonic flow conditions and has been demonstrated in various simulation and optimization use cases. The open-source code for the tool, installation process, artifacts (such as CAD seed designs and example STL models), experimentation results, and detailed documentation can be found at \url{https://github.com/symbench/Anvil}.
CLDec 3, 2025
Fine-grained Narrative Classification in Biased News ArticlesZeba Afroz, Harsh Vardhan, Pawan Bhakuni et al.
Narratives are the cognitive and emotional scaffolds of propaganda. They organize isolated persuasive techniques into coherent stories that justify actions, attribute blame, and evoke identification with ideological camps. In this paper, we propose a novel fine-grained narrative classification in biased news articles. We also explore article-bias classification as the precursor task to narrative classification and fine-grained persuasive technique identification. We develop INDI-PROP, the first ideologically grounded fine-grained narrative dataset with multi-level annotation for analyzing propaganda in Indian news media. Our dataset INDI-PROP comprises 1,266 articles focusing on two polarizing socio-political events in recent times: CAA and the Farmers' protest. Each article is annotated at three hierarchical levels: (i) ideological article-bias (pro-government, pro-opposition, neutral), (ii) event-specific fine-grained narrative frames anchored in ideological polarity and communicative intent, and (iii) persuasive techniques. We propose FANTA and TPTC, two GPT-4o-mini guided multi-hop prompt-based reasoning frameworks for the bias, narrative, and persuasive technique classification. FANTA leverages multi-layered communicative phenomena by integrating information extraction and contextual framing for hierarchical reasoning. On the other hand, TPTC adopts systematic decomposition of persuasive cues via a two-stage approach. Our evaluation suggests substantial improvement over underlying baselines in each case.
HCJul 5, 2025
Generative AI for CAD Automation: Leveraging Large Language Models for 3D ModellingSumit Kumar, Sarthak Kapoor, Harsh Vardhan et al.
Large Language Models (LLMs) are revolutionizing industries by enhancing efficiency, scalability, and innovation. This paper investigates the potential of LLMs in automating Computer-Aided Design (CAD) workflows, by integrating FreeCAD with LLM as CAD design tool. Traditional CAD processes are often complex and require specialized sketching skills, posing challenges for rapid prototyping and generative design. We propose a framework where LLMs generate initial CAD scripts from natural language descriptions, which are then executed and refined iteratively based on error feedback. Through a series of experiments with increasing complexity, we assess the effectiveness of this approach. Our findings reveal that LLMs perform well for simple to moderately complex designs but struggle with highly constrained models, necessitating multiple refinements. The study highlights the need for improved memory retrieval, adaptive prompt engineering, and hybrid AI techniques to enhance script robustness. Future directions include integrating cloud-based execution and exploring advanced LLM capabilities to further streamline CAD automation. This work underscores the transformative potential of LLMs in design workflows while identifying critical areas for future development.
LGApr 2, 2025
Client Selection in Federated Learning with Data Heterogeneity and Network LatenciesHarsh Vardhan, Xiaofan Yu, Tajana Rosing et al.
Federated learning (FL) is a distributed machine learning paradigm where multiple clients conduct local training based on their private data, then the updated models are sent to a central server for global aggregation. The practical convergence of FL is challenged by multiple factors, with the primary hurdle being the heterogeneity among clients. This heterogeneity manifests as data heterogeneity concerning local data distribution and latency heterogeneity during model transmission to the server. While prior research has introduced various efficient client selection methods to alleviate the negative impacts of either of these heterogeneities individually, efficient methods to handle real-world settings where both these heterogeneities exist simultaneously do not exist. In this paper, we propose two novel theoretically optimal client selection schemes that can handle both these heterogeneities. Our methods involve solving simple optimization problems every round obtained by minimizing the theoretical runtime to convergence. Empirical evaluations on 9 datasets with non-iid data distributions, 2 practical delay distributions, and non-convex neural network models demonstrate that our algorithms are at least competitive to and at most 20 times better than best existing baselines.
MLJan 26
Collaborative Compressors in Distributed Mean Estimation with Limited Communication BudgetHarsh Vardhan, Arya Mazumdar
Distributed high dimensional mean estimation is a common aggregation routine used often in distributed optimization methods. Most of these applications call for a communication-constrained setting where vectors, whose mean is to be estimated, have to be compressed before sharing. One could independently encode and decode these to achieve compression, but that overlooks the fact that these vectors are often close to each other. To exploit these similarities, recently Suresh et al., 2022, Jhunjhunwala et al., 2021, Jiang et al, 2023, proposed multiple correlation-aware compression schemes. However, in most cases, the correlations have to be known for these schemes to work. Moreover, a theoretical analysis of graceful degradation of these correlation-aware compression schemes with increasing dissimilarity is limited to only the $\ell_2$-error in the literature. In this paper, we propose four different collaborative compression schemes that agnostically exploit the similarities among vectors in a distributed setting. Our schemes are all simple to implement and computationally efficient, while resulting in big savings in communication. The analysis of our proposed schemes show how the $\ell_2$, $\ell_\infty$ and cosine estimation error varies with the degree of similarity among vectors.
MLMay 23, 2025
LocalKMeans: Convergence of Lloyd's Algorithm with Distributed Local IterationsHarsh Vardhan, Heng Zhu, Avishek Ghosh et al.
In this paper, we analyze the classical $K$-means alternating-minimization algorithm, also known as Lloyd's algorithm (Lloyd, 1956), for a mixture of Gaussians in a data-distributed setting that incorporates local iteration steps. Assuming unlabeled data distributed across multiple machines, we propose an algorithm, LocalKMeans, that performs Lloyd's algorithm in parallel in the machines by running its iterations on local data, synchronizing only every $L$ of such local steps. We characterize the cost of these local iterations against the non-distributed setting, and show that the price paid for the local steps is a higher required signal-to-noise ratio. While local iterations were theoretically studied in the past for gradient-based learning methods, the analysis of unsupervised learning methods is more involved owing to the presence of latent variables, e.g. cluster identities, than that of an iterative gradient-based algorithm. To obtain our results, we adapt a virtual iterate method to work with a non-convex, non-smooth objective function, in conjunction with a tight statistical analysis of Lloyd steps.
MLApr 29, 2025
Learning and Generalization with Mixture DataHarsh Vardhan, Avishek Ghosh, Arya Mazumdar
In many, if not most, machine learning applications the training data is naturally heterogeneous (e.g. federated learning, adversarial attacks and domain adaptation in neural net training). Data heterogeneity is identified as one of the major challenges in modern day large-scale learning. A classical way to represent heterogeneous data is via a mixture model. In this paper, we study generalization performance and statistical rates when data is sampled from a mixture distribution. We first characterize the heterogeneity of the mixture in terms of the pairwise total variation distance of the sub-population distributions. Thereafter, as a central theme of this paper, we characterize the range where the mixture may be treated as a single (homogeneous) distribution for learning. In particular, we study the generalization performance under the classical PAC framework and the statistical error rates for parametric (linear regression, mixture of hyperplanes) as well as non-parametric (Lipschitz, convex and Hölder-smooth) regression problems. In order to do this, we obtain Rademacher complexity and (local) Gaussian complexity bounds with mixture data, and apply them to get the generalization and convergence rates respectively. We observe that as the (regression) function classes get more complex, the requirement on the pairwise total variation distance gets stringent, which matches our intuition. We also do a finer analysis for the case of mixed linear regression and provide a tight bound on the generalization error in terms of heterogeneity.
LGDec 10, 2024
Distributed Gradient Descent with Many Local Steps in Overparameterized ModelsHeng Zhu, Harsh Vardhan, Arya Mazumdar
In distributed training of machine learning models, gradient descent with local iterative steps is a very popular method, variants of which are commonly known as Local-SGD or the Federated Averaging (FedAvg). In this method, gradient steps based on local datasets are taken independently in distributed compute nodes to update the local models, which are then aggregated intermittently. Although the existing convergence analysis suggests that with heterogeneous data, FedAvg encounters quick performance degradation as the number of local steps increases, it is shown to work quite well in practice, especially in the distributed training of large language models. In this work we try to explain this good performance from a viewpoint of implicit bias in Local Gradient Descent (Local-GD) with a large number of local steps. In overparameterized regime, the gradient descent at each compute node would lead the model to a specific direction locally. We characterize the dynamics of the aggregated global model and compare it to the centralized model trained with all of the data in one place. In particular, we analyze the implicit bias of gradient descent on linear models, for both regression and classification tasks. Our analysis shows that the aggregated global model converges exactly to the centralized model for regression tasks, and converges (in direction) to the same feasible set as centralized model for classification tasks. We further propose a Modified Local-GD with a refined aggregation and theoretically show it converges to the centralized model in direction for linear classification. We empirically verified our theoretical findings in linear models and also conducted experiments on distributed fine-tuning of pretrained neural networks to further apply our theory.
LGFeb 18, 2022
Tackling benign nonconvexity with smoothing and stochastic gradientsHarsh Vardhan, Sebastian U. Stich
Non-convex optimization problems are ubiquitous in machine learning, especially in Deep Learning. While such complex problems can often be successfully optimized in practice by using stochastic gradient descent (SGD), theoretical analysis cannot adequately explain this success. In particular, the standard analyses do not show global convergence of SGD on non-convex functions, and instead show convergence to stationary points (which can also be local minima or saddle points). We identify a broad class of nonconvex functions for which we can show that perturbed SGD (gradient descent perturbed by stochastic noise -- covering SGD as a special case) converges to a global minimum (or a neighborhood thereof), in contrast to gradient descent without noise that can get stuck in local minima far from a global solution. For example, on non-convex functions that are relatively close to a convex-like (strongly convex or PL) function we show that SGD can converge linearly to a global optimum.