Alex 'Sandy' Pentland

LG
h-index17
9papers
396citations
Novelty41%
AI Score29

9 Papers

CLOct 22, 2023
The Law and NLP: Bridging Disciplinary Disconnects

Robert Mahari, Dominik Stammbach, Elliott Ash et al.

Legal practice is intrinsically rooted in the fabric of language, yet legal practitioners and scholars have been slow to adopt tools from natural language processing (NLP). At the same time, the legal system is experiencing an access to justice crisis, which could be partially alleviated with NLP. In this position paper, we argue that the slow uptake of NLP in legal practice is exacerbated by a disconnect between the needs of the legal community and the focus of NLP researchers. In a review of recent trends in the legal NLP literature, we find limited overlap between the legal NLP community and legal academia. Our interpretation is that some of the most popular legal NLP tasks fail to address the needs of legal practitioners. We discuss examples of legal NLP tasks that promise to bridge disciplinary disconnects and highlight interesting areas for legal NLP research that remain underexplored.

HCOct 5, 2017Code
Open Badges: A Low-Cost Toolkit for Measuring Team Communication and Dynamics

Oren Lederman, Dan Calacci, Angus MacMullen et al.

We present Open Badges, an open-source framework an toolkit for measuring and shaping face-to-face social interactions using either custom hardware devices or smart phones, and real-time web-based visualizations. Open Badges is a modular system that allows researchers to monitor and collect interaction data from people engaged in real-life social settings. In this paper we describe the technical aspects of the Open Badges project and the motivation for its creation.

CLFeb 26, 2024
Leveraging Large Language Models for Learning Complex Legal Concepts through Storytelling

Hang Jiang, Xiajie Zhang, Robert Mahari et al. · allen-ai

Making legal knowledge accessible to non-experts is crucial for enhancing general legal literacy and encouraging civic participation in democracy. However, legal documents are often challenging to understand for people without legal backgrounds. In this paper, we present a novel application of large language models (LLMs) in legal education to help non-experts learn intricate legal concepts through storytelling, an effective pedagogical tool in conveying complex and abstract concepts. We also introduce a new dataset LegalStories, which consists of 294 complex legal doctrines, each accompanied by a story and a set of multiple-choice questions generated by LLMs. To construct the dataset, we experiment with various LLMs to generate legal stories explaining these concepts. Furthermore, we use an expert-in-the-loop approach to iteratively design multiple-choice questions. Then, we evaluate the effectiveness of storytelling with LLMs through randomized controlled trials (RCTs) with legal novices on 10 samples from the dataset. We find that LLM-generated stories enhance comprehension of legal concepts and interest in law among non-native speakers compared to only definitions. Moreover, stories consistently help participants relate legal concepts to their lives. Finally, we find that learning with stories shows a higher retention rate for non-native speakers in the follow-up assessment. Our work has strong implications for using LLMs in promoting teaching and learning in the legal field and beyond.

LGFeb 5, 2024
Verifiable evaluations of machine learning models using zkSNARKs

Tobin South, Alexander Camuto, Shrey Jain et al. · mit

In a world of increasing closed-source commercial machine learning models, model evaluations from developers must be taken at face value. These benchmark results-whether over task accuracy, bias evaluations, or safety checks-are traditionally impossible to verify by a model end-user without the costly or impossible process of re-performing the benchmark on black-box model outputs. This work presents a method of verifiable model evaluation using model inference through zkSNARKs. The resulting zero-knowledge computational proofs of model outputs over datasets can be packaged into verifiable evaluation attestations showing that models with fixed private weights achieve stated performance or fairness metrics over public inputs. We present a flexible proving system that enables verifiable attestations to be performed on any standard neural network model with varying compute requirements. For the first time, we demonstrate this across a sample of real-world models and highlight key challenges and design solutions. This presents a new transparency paradigm in the verifiable evaluation of private models.

LGJun 16, 2020
A Study of Compositional Generalization in Neural Models

Tim Klinger, Dhaval Adjodah, Vincent Marois et al.

Compositional and relational learning is a hallmark of human intelligence, but one which presents challenges for neural models. One difficulty in the development of such models is the lack of benchmarks with clear compositional and relational task structure on which to systematically evaluate them. In this paper, we introduce an environment called ConceptWorld, which enables the generation of images from compositional and relational concepts, defined using a logical domain specific language. We use it to generate images for a variety of compositional structures: 2x2 squares, pentominoes, sequences, scenes involving these objects, and other more complex concepts. We perform experiments to test the ability of standard neural architectures to generalize on relations with compositional arguments as the compositional depth of those arguments increases and under substitution. We compare standard neural networks such as MLP, CNN and ResNet, as well as state-of-the-art relational networks including WReN and PrediNet in a multi-class image classification setting. For simple problems, all models generalize well to close concepts but struggle with longer compositional chains. For more complex tests involving substitutivity, all models struggle, even with short chains. In highlighting these difficulties and providing an environment for further experimentation, we hope to encourage the development of models which are able to generalize effectively in compositional, relational domains.

CRMar 31, 2020
Assessing Disease Exposure Risk with Location Data: A Proposal for Cryptographic Preservation of Privacy

Alex Berke, Michiel Bakker, Praneeth Vepakomma et al.

Governments and researchers around the world are implementing digital contact tracing solutions to stem the spread of infectious disease, namely COVID-19. Many of these solutions threaten individual rights and privacy. Our goal is to break past the false dichotomy of effective versus privacy-preserving contact tracing. We offer an alternative approach to assess and communicate users' risk of exposure to an infectious disease while preserving individual privacy. Our proposal uses recent GPS location histories, which are transformed and encrypted, and a private set intersection protocol to interface with a semi-trusted authority. There have been other recent proposals for privacy-preserving contact tracing, based on Bluetooth and decentralization, that could further eliminate the need for trust in authority. However, solutions with Bluetooth are currently limited to certain devices and contexts while decentralization adds complexity. The goal of this work is two-fold: we aim to propose a location-based system that is more privacy-preserving than what is currently being adopted by governments around the world, and that is also practical to implement with the immediacy needed to stem a viral outbreak.

ROApr 19, 2019
Secure and secret cooperation in robotic swarms

Eduardo Castelló Ferrer, Thomas Hardjono, Alex 'Sandy' Pentland et al.

The importance of swarm robotics systems in both academic research and real-world applications is steadily increasing. However, to reach widespread adoption, new models that ensure the secure cooperation of large groups of robots need to be developed. This work introduces a novel method to encapsulate cooperative robotic missions in an authenticated data structure known as Merkle tree. With this method, operators can provide the "blueprint" of the swarm's mission without disclosing its raw data. In other words, data verification can be separated from data itself. We propose a system where robots in a swarm, to cooperate towards mission completion, have to "prove" their integrity to their peers by exchanging cryptographic proofs. We show the implications of this approach for two different swarm robotics missions: foraging and maze formation. In both missions, swarm robots were able to cooperate and carry out sequential operations without having explicit knowledge about the mission's high-level objectives. The results presented in this work demonstrate the feasibility of using Merkle trees as a cooperation mechanism for swarm robotics systems in both simulation and real-robot experiments, which has implications for future decentralized robotics applications where security plays a crucial role such as environmental monitoring, infrastructure surveillance, and disaster management.

HCJul 6, 2016
Breakout: An Open Measurement and Intervention Tool for Distributed Peer Learning Groups

Dan Calacci, Oren Lederman, David Shrier et al.

We present Breakout, a group interaction platform for online courses that enables the creation and measurement of face-to-face peer learning groups in online settings. Breakout is designed to help students easily engage in synchronous, video breakout session based peer learning in settings that otherwise force students to rely on asynchronous text-based communication. The platform also offers data collection and intervention tools for studying the communication patterns inherent in online learning environments. The goals of the system are twofold: to enhance student engagement in online learning settings and to create a platform for research into the relationship between distributed group interaction patterns and learning outcomes.

LGNov 20, 2015
Modeling the Temporal Nature of Human Behavior for Demographics Prediction

Bjarke Felbo, Pål Sundsøy, Alex 'Sandy' Pentland et al.

Mobile phone metadata is increasingly used for humanitarian purposes in developing countries as traditional data is scarce. Basic demographic information is however often absent from mobile phone datasets, limiting the operational impact of the datasets. For these reasons, there has been a growing interest in predicting demographic information from mobile phone metadata. Previous work focused on creating increasingly advanced features to be modeled with standard machine learning algorithms. We here instead model the raw mobile phone metadata directly using deep learning, exploiting the temporal nature of the patterns in the data. From high-level assumptions we design a data representation and convolutional network architecture for modeling patterns within a week. We then examine three strategies for aggregating patterns across weeks and show that our method reaches state-of-the-art accuracy on both age and gender prediction using only the temporal modality in mobile metadata. We finally validate our method on low activity users and evaluate the modeling assumptions.