LGFeb 21
L2G-Net: Local to Global Spectral Graph Neural Networks via Cauchy FactorizationsSamuel Fernández-Menduiña, Eduardo Pavez, Antonio Ortega
Despite their theoretical advantages, spectral methods based on the graph Fourier transform (GFT) are seldom used in graph neural networks (GNNs) due to the cost of computing the eigenbasis and the lack of vertex-domain locality in spectral representations. As a result, most GNNs rely on local approximations such as polynomial Laplacian filters or message passing, which limit their ability to model long-range dependencies. In this paper, we introduce a novel factorization of the GFT into operators acting on subgraphs, which are then combined via a sequence of Cauchy matrices. We use this factorization to propose a new class of spectral GNNs, which we term L2G-Net (Local-to-Global Net). Unlike existing spectral methods, which are either fully global (when they use the GFT) or local (when they use polynomial filters), L2G-Net operates by processing the spectral representations of subgraphs and then combining them via structured matrices. Our algorithm avoids full eigendecompositions, exploiting graph topology to construct the factorization with quadratic complexity in the number of nodes, scaled by the subgraph interface size. Experiments on benchmarks stressing non-local dependencies show that L2G-Net outperforms existing spectral techniques and is competitive with the state-of-the-art with orders of magnitude fewer learnable parameters.
IVApr 3, 2025
Image Coding for Machines via Feature-Preserving Rate-Distortion OptimizationSamuel Fernández-Menduiña, Eduardo Pavez, Antonio Ortega
Many images and videos are primarily processed by computer vision algorithms, involving only occasional human inspection. When this content requires compression before processing, e.g., in distributed applications, coding methods must optimize for both visual quality and downstream task performance. We first show theoretically that an approach to reduce the effect of compression for a given task loss is to perform rate-distortion optimization (RDO) using the distance between features, obtained from the original and the decoded images, as a distortion metric. However, optimizing directly such a rate-distortion objective is computationally impractical because it requires iteratively encoding and decoding the entire image-plus feature evaluation-for each possible coding configuration. We address this problem by simplifying the RDO formulation to make the distortion term computable using block-based encoders. We first apply Taylor's expansion to the feature extractor, recasting the feature distance as a quadratic metric involving the Jacobian matrix of the neural network. Then, we replace the linearized metric with a block-wise approximation, which we call input-dependent squared error (IDSE). To make the metric computable, we approximate IDSE using sketches of the Jacobian. The resulting loss can be evaluated block-wise in the transform domain and combined with the sum of squared errors (SSE) to address both visual quality and computer vision performance. Simulations with AVC and HEVC across multiple feature extractors and downstream networks show up to 17 % bit-rate savings for the same task accuracy compared to RDO based on SSE, with no decoder complexity overhead and a small (7.86 %) encoder complexity increase.