LGNov 26, 2022
Why Neural Networks WorkSayandev Mukherjee, Bernardo A. Huberman
We argue that many properties of fully-connected feedforward neural networks (FCNNs), also called multi-layer perceptrons (MLPs), are explainable from the analysis of a single pair of operations, namely a random projection into a higher-dimensional space than the input, followed by a sparsification operation. For convenience, we call this pair of successive operations expand-and-sparsify following the terminology of Dasgupta. We show how expand-and-sparsify can explain the observed phenomena that have been discussed in the literature, such as the so-called Lottery Ticket Hypothesis, the surprisingly good performance of randomly-initialized untrained neural networks, the efficacy of Dropout in training and most importantly, the mysterious generalization ability of overparameterized models, first highlighted by Zhang et al. and subsequently identified even in non-neural network models by Belkin et al.
51.2QUANT-PHApr 25
Online Voting using Point to MultiPoint Quantum Key DistributionBernardo A. Huberman, Jing Wang
We propose using Point-to-Multipoint quantum key distribution (QKD) via time division multiplexing (TDM) and wavelength division multiplexing (WDM) in passive optical networks (PON) to improve the security of online voting systems.
HCFeb 8, 2024
Randomness Is All You Need: Semantic Traversal of Problem-Solution Spaces with Large Language ModelsThomas Sandholm, Sayandev Mukherjee, Bernardo A. Huberman
We present a novel approach to exploring innovation problem and solution domains using LLM fine-tuning with a custom idea database. By semantically traversing the bi-directional problem and solution tree at different temperature levels we achieve high diversity in solution edit distance while still remaining close to the original problem statement semantically. In addition to finding a variety of solutions to a given problem, this method can also be used to refine and clarify the original problem statement. As further validation of the approach, we implemented a proof-of-concept Slack bot to serve as an innovation assistant.
MLOct 13, 2021
Reinforcement Learning for Standards DesignShahrukh Khan Kasi, Sayandev Mukherjee, Lin Cheng et al.
Communications standards are designed via committees of humans holding repeated meetings over months or even years until consensus is achieved. This includes decisions regarding the modulation and coding schemes to be supported over an air interface. We propose a way to "automate" the selection of the set of modulation and coding schemes to be supported over a given air interface and thereby streamline both the standards design process and the ease of extending the standard to support new modulation schemes applicable to new higher-level applications and services. Our scheme involves machine learning, whereby a constructor entity submits proposals to an evaluator entity, which returns a score for the proposal. The constructor employs reinforcement learning to iterate on its submitted proposals until a score is achieved that was previously agreed upon by both constructor and evaluator to be indicative of satisfying the required design criteria (including performance metrics for transmissions over the interface).
DCAug 12, 2021
SAFE: Secure Aggregation with Failover and EncryptionThomas Sandholm, Sayandev Mukherjee, Bernardo A. Huberman
We propose and experimentally evaluate a novel secure aggregation algorithm targeted at cross-organizational federated learning applications with a fixed set of participating learners. Our solution organizes learners in a chain and encrypts all traffic to reduce the controller of the aggregation to a mere message broker. We show that our algorithm scales better and is less resource demanding than existing solutions, while being easy to implement on constrained platforms. With 36 nodes our method outperforms state-of-the-art secure aggregation by 70x, and 56x with and without failover, respectively.
CROct 7, 2020
Privacy and Data Balkanization: Circumventing the BarriersBernardo A. Huberman, Tad Hogg
The rapid growth in digital data forms the basis for a wide range of new services and research, e.g, large-scale medical studies. At the same time, increasingly restrictive privacy concerns and laws are leading to significant overhead in arranging for sharing or combining different data sets to obtain these benefits. For new applications, where the benefit of combined data is not yet clear, this overhead can inhibit organizations from even trying to determine whether they can mutually benefit from sharing their data. In this paper, we discuss techniques to overcome this difficulty by employing private information transfer to determine whether there is a benefit from sharing data, and whether there is room to negotiate acceptable prices. These techniques involve cryptographic protocols. While currently considered secure, these protocols are potentially vulnerable to the development of quantum technology, particularly for ensuring privacy over significant periods of time into the future. To mitigate this concern, we describe how developments in practical quantum technology can improve the security of these protocols.
CYNov 12, 2014
Deciding what to display: maximizing the information value of social mediaSandra Servia-Rodríguez, Bernardo A. Huberman, Sitaram Asur
In information-rich environments, the competition for users' attention leads to a flood of content from which people often find hard to sort out the most relevant and useful pieces. Using Twitter as a case study, we applied an attention economy solution to generate the most informative tweets for its users. By considering the novelty and popularity of tweets as objective measures of their relevance and utility, we used the Huberman-Wu algorithm to automatically select the ones that will receive the most attention in the next time interval. Their predicted popularity was confirmed by using Twitter data collected for a period of 2 months.
CYNov 5, 2013
Semantic Stability in Social Tagging StreamsClaudia Wagner, Philipp Singer, Markus Strohmaier et al.
One potential disadvantage of social tagging systems is that due to the lack of a centralized vocabulary, a crowd of users may never manage to reach a consensus on the description of resources (e.g., books, users or songs) on the Web. Yet, previous research has provided interesting evidence that the tag distributions of resources may become semantically stable over time as more and more users tag them. At the same time, previous work has raised an array of new questions such as: (i) How can we assess the semantic stability of social tagging systems in a robust and methodical way? (ii) Does semantic stabilization of tags vary across different social tagging systems and ultimately, (iii) what are the factors that can explain semantic stabilization in such systems? In this work we tackle these questions by (i) presenting a novel and robust method which overcomes a number of limitations in existing methods, (ii) empirically investigating semantic stabilization processes in a wide range of social tagging systems with distinct domains and properties and (iii) detecting potential causes for semantic stabilization, specifically imitation behavior, shared background knowledge and intrinsic properties of natural language. Our results show that tagging streams which are generated by a combination of imitation dynamics and shared background knowledge exhibit faster and higher semantic stability than tagging streams which are generated via imitation dynamics or natural language streams alone.