Houman Safaai

LG
4papers
1citation
Novelty55%
AI Score47

4 Papers

CVMay 8
Task Relevance Is Not Local Replaceability: A Two-Axis View of Channel Information

Houman Safaai, Andrew T. Landau, Celia C. Beron et al.

Channel importance in vision networks is usually summarized by a single score. That summary hides two different questions: how much a channel is related to the task, and whether its function can be supplied by same-layer peers when the channel is removed. We call the second property local replaceability. We introduce a two-axis view that separates these questions. The local axis measures input capture and peer overlap, while the target axis measures task information and target-excess information. Across ResNet-18, VGG-16, and MobileNetV2 trained on CIFAR-100, the two axes are weakly aligned, induce different channel groupings, and separate rapidly during training despite being strongly coupled at random initialization. A Gaussian linear analysis accounts for how this separation can arise through residualized gradient directions, and lesion plus peer-replacement experiments show that peer support refines removability beyond input capture and task relevance alone. Under the fixed FLOPs-matched pruning protocol, local-axis metrics are more reliable predictors of removability than target-axis metrics across the three CIFAR-100 backbones, with the same direction preserved in stress tests on CIFAR-10, Tiny-ImageNet, ImageNet-100, and a ConvNeXt-T/ImageNet-100 pilot. These findings identify an axis-level distinction rather than a universal ranking of pruning scores: local replaceability is a more reliable guide to removability than target relevance, while norm-based baselines remain competitive in architectures such as VGG-16. Relevance-based scores ask what a channel says about the task; pruning asks whether the network still needs that channel when its peers remain available.

MLMay 4
Dynamic Vine Copulas: Detecting and Quantifying Time-Varying Higher-Order Interactions

Houman Safaai, Alessandro Marin Vargas

Time varying dependence is often modeled through dynamic correlations or Gaussian graphical models, yet many multivariate systems change through tail behavior, asymmetry, or conditional structure while correlations change little. We introduce Dynamic Vine Copulas (DVC), a temporal vine copula framework for estimating and diagnosing sequence wide non-Gaussian dependence. DVC keeps a chosen vine factorization fixed for comparability, can use C-, D-, or R-vines, and couples pair copula states across time through smooth parameter trajectories or temporally regularized family switching paths. Its central diagnostic contrasts held-out scores from a full vine and its matched 1-truncated counterpart, separating flexible first-tree pairwise evidence from higher-tree conditional evidence. At the population level, under a correct fixed vine and simplifying assumption, this contrast is the higher-tree term of a vine total correlation decomposition; in finite samples, it is a predictive diagnostic. Across controlled benchmarks, DVC detects Student-t tail degree changes, Clayton-to-Gumbel switches, and recurrent conditional interaction episodes that Gaussian dynamic baselines miss or conflate. The higher-tree score stays near zero in pairwise only regimes but rises selectively during conditional interaction regimes. On Allen Visual Behavior Neuropixels data, DVC identifies a reproducible time indexed higher-tree signal that is positive across held-out splits and disappears under a decorrelated null, indicating simultaneous cross-area dependence. Together, these results show that DVC is both a flexible temporal copula model and an interpretable diagnostic for whether time varying dependence changes are pairwise or conditional.

LGApr 22
Amortized Vine Copulas for High-Dimensional Density and Information Estimation

Houman Safaai

Modeling high-dimensional dependencies while keeping likelihoods tractable remains challenging. Classical vine-copula pipelines are interpretable but can be expensive, while many neural estimators are flexible but less structured. In this work, we propose Vine Denoising Copula (VDC), an amortized vine-copula pipeline that trains a single bivariate denoising model and reuses it across all vine edges. For each edge, given pseudo-observations, the model predicts a density grid. We then apply an IPFP/Sinkhorn projection that enforces non-negativity, unit mass, and uniform marginals. This keeps the exact vine likelihood and preserves the usual copula interpretation while replacing repeated per-edge optimization with GPU inference. Across synthetic and real-data benchmarks, VDC delivers strong bivariate density accuracy, competitive MI/TC estimation, and substantial speedups for high-dimensional vine fitting. In practice, these gains make explicit information estimation and dependence decomposition feasible at scales where repeated vine fitting would otherwise be costly, although conditional downstream inference remains mixed.

LGApr 26
Supernodes and Halos: Loss-Critical Hubs in LLM Feed-Forward Layers

Audrey Cherilyn, Houman Safaai

We study the organization of channel-level importance in transformer feed-forward networks (FFNs). Using a Fisher-style loss proxy (LP) based on activation-gradient second moments, we show that loss sensitivity is concentrated in a small set of channels within each layer. In Llama-3.1-8B, the top 1% of channels per layer accounts for a median of 58.7% of LP mass, with a range of 33.0% to 86.1%. We call these loss-critical channels supernodes. Although FFN layers also contain strong activation outliers, LP-defined supernodes overlap only weakly with activation-defined outliers and are not explained by activation power or weight norms alone. Around this core, we find a weaker but consistent halo structure: some non-supernode channels share the supernodes' write support and show stronger redundancy with the protected core. We use one-shot structured FFN pruning as a diagnostic test of this organization. At 50% FFN sparsity, baselines that prune many supernodes degrade sharply, whereas our SCAR variants explicitly protect the supernode core; the strongest variant, SCAR-Prot, reaches perplexity 54.8 compared with 989.2 for Wanda-channel. The LP-concentration pattern appears across Mistral-7B, Llama-2-7B, and Qwen2-7B, remains visible in targeted Llama-3.1-70B experiments, and increases during OLMo-2-7B pretraining. These results suggest that LLM FFNs develop a small learned core of loss-critical channels, and that preserving this core is important for reliable structured pruning.