Phillipe R. Sampaio

OC
h-index4
3papers
9citations
Novelty53%
AI Score41

3 Papers

OCDec 29, 2019Code
DEFT-FUNNEL: an open-source global optimization solver for constrained grey-box and black-box problems

Phillipe R. Sampaio

The fast-growing need for grey-box and black-box optimization methods for constrained global optimization problems in fields such as medicine, chemistry, engineering and artificial intelligence, has contributed for the design of new efficient algorithms for finding the best possible solution. In this work, we present DEFT-FUNNEL, an open-source global optimization algorithm for general constrained grey-box and black-box problems that belongs to the class of trust-region sequential quadratic optimization algorithms. It extends the previous works by Sampaio and Toint (2015, 2016) to a global optimization solver that is able to exploit information from closed-form functions. Polynomial interpolation models are used as surrogates for the black-box functions and a clustering-based multistart strategy is applied for searching for the global minima. Numerical experiments show that DEFT-FUNNEL compares favorably with other state-of-the-art methods on two sets of benchmark problems: one set containing problems where every function is a black box and another set with problems where some of the functions and their derivatives are known to the solver. The code as well as the test sets used for experiments are available at the Github repository http://github.com/phrsampaio/deft-funnel.

OCSep 16, 2025
Complexity Bounds for Smooth Convex Multiobjective Optimization

Phillipe R. Sampaio

We study the oracle complexity of finding $\varepsilon$-Pareto stationary points in smooth multiobjective optimization with $m$ objectives. The progress metric is the Pareto stationarity gap $\mathcal{G}(x)$ (the norm of an optimal convex combination of gradients). Our contributions are fourfold. (i) For strongly convex objectives, any span first-order method (iterates lie in the span of past gradients) exhibits linear convergence no faster than $\exp(-Θ(T/\sqrtκ))$ after $T$ oracle calls, where $κ$ is the condition number, implying $Θ(\sqrtκ\log(1/\varepsilon))$ iterations; this matches classical accelerated upper bounds. (ii) For convex problems and oblivious one-step methods (a fixed scalarization with pre-scheduled step sizes), we prove a lower bound of order $1/T$ on the best gradient norm among the first $T$ iterates. (iii) Although accelerated gradient descent is outside this restricted class, it is an oblivious span method and attains the same $1/T$ upper rate on a fixed scalarization. (iv) For convex problems and general span methods with adaptive scalarizations, we establish a universal lower bound of order $1/T^{2}$ on the gradient norm of the final iterate after $T$ steps, highlighting a gap between known upper bounds and worst-case guarantees. All bounds hold on non-degenerate instances with distinct objectives and non-singleton Pareto fronts; rates are stated up to universal constants and natural problem scaling.

CLJun 13, 2025
Unsupervised Document and Template Clustering using Multimodal Embeddings

Phillipe R. Sampaio, Helene Maxcici

We study unsupervised clustering of documents at both the category and template levels using frozen multimodal encoders and classical clustering algorithms. We systematize a model-agnostic pipeline that (i) projects heterogeneous last-layer states from text-layout-vision encoders into token-type-aware document vectors and (ii) performs clustering with centroid- or density-based methods, including an HDBSCAN + $k$-NN assignment to eliminate unlabeled points. We evaluate eight encoders (text-only, layout-aware, vision-only, and vision-language) with $k$-Means, DBSCAN, HDBSCAN + $k$-NN, and BIRCH on five corpora spanning clean synthetic invoices, their heavily degraded print-and-scan counterparts, scanned receipts, and real identity and certificate documents. The study reveals modality-specific failure modes and a robustness-accuracy trade-off, with vision features nearly solving template discovery on clean pages while text dominates under covariate shift, and fused encoders offering the best balance. We detail a reproducible, oracle-free tuning protocol and the curated evaluation settings to guide future work on unsupervised document organization.