Sam Green

h-index6

5papers

27citations

Novelty28%

AI Score20

Ranked #184,677 of 194,257 authors (top 95%)#39,241 in LG (top 98%)

5 Papers

1.2CPDec 13, 2022Code

Multi-Agent Dynamic Pricing in a Blockchain Protocol Using Gaussian Bandits

Alexis Asseman, Tomasz Kornuta, Anirudh Patel et al.

The Graph Protocol indexes historical blockchain transaction data and makes it available for querying. As the protocol is decentralized, there are many independent Indexers that index and compete with each other for serving queries to the Consumers. One dimension along which Indexers compete is pricing. In this paper, we propose a bandit-based algorithm for maximization of Indexers' revenue via Consumer budget discovery. We present the design and the considerations we had to make for a dynamic pricing algorithm being used by multiple agents simultaneously. We discuss the results achieved by our dynamic pricing bandits both in simulation and deployed into production on one of the Indexers operating on Ethereum. We have open-sourced both the simulation framework and tools we created, which other Indexers have since started to adapt into their own workflows.

5.8HCSep 25, 2020

Investigation of the Effect of Fear and Stress on Password Choice (Extended Version)

Tom Fordyce, Sam Green, Thomas Groß

Background. The current cognitive state, such as cognitive effort and depletion, incidental affect or stress may impact the strength of a chosen password unconsciously. Aim. We investigate the effect of incidental fear and stress on the measured strength of a chosen password. Method. We conducted two experiments with within-subject designs measuring the Zxcvbn \textsf{log10} number of guesses as strength of chosen passwords as dependent variable. In both experiments, participants were signed up to a site holding their personal data and, for the second run a day later, asked under a security incident pretext to change their password. (a) Fear. $N_\mathsf{F} = 34$ participants were exposed to standardized fear and happiness stimulus videos in random order. (b) \textbf{Stress.} $N_\mathsf{S} = 50$ participants were either exposed to a battery of standard stress tasks or left in a control condition in random order. The Zxcvbn password strength was compared across conditions. Results. We did not observe a statistically significant difference in mean Zxcvbn password strengths on fear (Hedges' $g_{\mathsf{av}} = -0.11$, 95\% CI $[-0.45, 0.23]$) or stress (and control group, Hedges' $g_{\mathsf{av}} = 0.01$, 95\% CI $[-0.31, 0.33]$). However, we found a statistically significant cross-over interaction of stress and TLX mental demand. Conclusions. While having observed negligible main effect size estimates for incidental fear and stress, we offer evidence towards the interaction between stress and cognitive effort that vouches for further investigation.

1.8LGNov 8, 2019

RAPDARTS: Resource-Aware Progressive Differentiable Architecture Search

Sam Green, Craig M. Vineyard, Ryan Helinski et al.

Early neural network architectures were designed by so-called "grad student descent". Since then, the field of Neural Architecture Search (NAS) has developed with the goal of algorithmically designing architectures tailored for a dataset of interest. Recently, gradient-based NAS approaches have been created to rapidly perform the search. Gradient-based approaches impose more structure on the search, compared to alternative NAS methods, enabling faster search phase optimization. In the real-world, neural architecture performance is measured by more than just high accuracy. There is increasing need for efficient neural architectures, where resources such as model size or latency must also be considered. Gradient-based NAS is also suitable for such multi-objective optimization. In this work we extend a popular gradient-based NAS method to support one or more resource costs. We then perform in-depth analysis on the discovery of architectures satisfying single-resource constraints for classification of CIFAR-10.

6.6LGJan 23, 2019

Distillation Strategies for Proximal Policy Optimization

Sam Green, Craig M. Vineyard, Çetin Kaya Koç

Vision-based deep reinforcement learning (RL) typically obtains performance benefit by using high capacity and relatively large convolutional neural networks (CNN). However, a large network leads to higher inference costs (power, latency, silicon area, MAC count). Many inference optimizations have been developed for CNNs. Some optimization techniques offer theoretical efficiency, such as sparsity, but designing actual hardware to support them is difficult. On the other hand, distillation is a simple general-purpose optimization technique which is broadly applicable for transferring knowledge from a trained, high capacity teacher network to an untrained, low capacity student network. DQN distillation extended the original distillation idea to transfer information stored in a high performance, high capacity teacher Q-function trained via the Deep Q-Learning (DQN) algorithm. Our work adapts the DQN distillation work to the actor-critic Proximal Policy Optimization algorithm. PPO is simple to implement and has much higher performance than the seminal DQN algorithm. We show that a distilled PPO student can attain far higher performance compared to a DQN teacher. We also show that a low capacity distilled student is generally able to outperform a low capacity agent that directly trains in the environment. Finally, we show that distillation, followed by "fine-tuning" in the environment, enables the distilled PPO student to achieve parity with teacher performance. In general, the lessons learned in this work should transfer to other modern actor-critic RL algorithms.

2.9LGSep 14, 2018

Visual Diagnostics for Deep Reinforcement Learning Policy Development

Jieliang Luo, Sam Green, Peter Feghali et al.

Modern vision-based reinforcement learning techniques often use convolutional neural networks (CNN) as universal function approximators to choose which action to take for a given visual input. Until recently, CNNs have been treated like black-box functions, but this mindset is especially dangerous when used for control in safety-critical settings. In this paper, we present our extensions of CNN visualization algorithms to the domain of vision-based reinforcement learning. We use a simulated drone environment as an example scenario. These visualization algorithms are an important tool for behavior introspection and provide insight into the qualities and flaws of trained policies when interacting with the physical world. A video may be seen at https://sites.google.com/view/drlvisual .