Jean-Philippe Bernardy

CL
h-index16
9papers
614citations
Novelty32%
AI Score42

9 Papers

CLMay 7, 2022
UniMorph 4.0: Universal Morphology

Khuyagbaatar Batsuren, Omer Goldman, Salam Khalifa et al. · eth-zurich, microsoft-research

The Universal Morphology (UniMorph) project is a collaborative effort providing broad-coverage instantiated normalized morphological inflection tables for hundreds of diverse world languages. The project comprises two major thrusts: a language-independent feature schema for rich morphological annotation and a type-level resource of annotated data in diverse languages realizing that schema. This paper presents the expansions and improvements made on several fronts over the last couple of years (since McCarthy et al. (2020)). Collaborative efforts by numerous linguists have added 67 new languages, including 30 endangered languages. We have implemented several improvements to the extraction pipeline to tackle some issues, e.g. missing gender and macron information. We have also amended the schema to use a hierarchical structure that is needed for morphological phenomena like multiple-argument agreement and case stacking, while adding some missing morphological features to make the schema more inclusive. In light of the last UniMorph release, we also augmented the database with morpheme segmentation for 16 languages. Lastly, this new release makes a push towards inclusion of derivational morphology in UniMorph by enriching the data and annotation schema with instances representing derivational processes from MorphyNet.

CLAug 11, 2022
Assessing the Unitary RNN as an End-to-End Compositional Model of Syntax

Jean-Philippe Bernardy, Shalom Lappin

We show that both an LSTM and a unitary-evolution recurrent neural network (URN) can achieve encouraging accuracy on two types of syntactic patterns: context-free long distance agreement, and mildly context-sensitive cross serial dependencies. This work extends recent experiments on deeply nested context-free long distance dependencies, with similar results. URNs differ from LSTMs in that they avoid non-linear activation functions, and they apply matrix multiplication to word embeddings encoded as unitary matrices. This permits them to retain all information in the processing of an input string over arbitrary distances. It also causes them to satisfy strict compositionality. URNs constitute a significant advance in the search for explainable models in deep learning applied to NLP.

50.8PLApr 23
Linear Constraints

Arnaud Spiwack, Csongor Kiss, Jean-Philippe Bernardy et al.

Linear constraints are the linear counterpart of Haskell's class constraints. Linearly typed parameters allow the programmer to control resources such as file handles and manually managed memory as linear arguments. Indeed, a linear type system can verify that these resources are used safely. However, writing code with explicit linear arguments requires bureaucracy. Linear constraints address this shortcoming: a linear constraint acts as an implicit linear argument that can be filled in automatically by the compiler. We present this new feature as a qualified type system, together with an inference algorithm which extends GHC's existing constraint solver algorithm. Soundness of linear constraints is ensured by the fact that they desugar into Linear Haskell. This paper is a revised and extended version of a previous paper by the same authors (arXiv:2103.06127). The formal system and the constraint solver have been significantly simplified and numerous additional applications are described.

LGDec 26, 2023Code
Algebraic Positional Encodings

Konstantinos Kogkalidis, Jean-Philippe Bernardy, Vikas Garg

We introduce a novel positional encoding strategy for Transformer-style models, addressing the shortcomings of existing, often ad hoc, approaches. Our framework provides a flexible mapping from the algebraic specification of a domain to an interpretation as orthogonal operators. This design preserves the algebraic characteristics of the source domain, ensuring that the model upholds its desired structural properties. Our scheme can accommodate various structures, ncluding sequences, grids and trees, as well as their compositions. We conduct a series of experiments to demonstrate the practical applicability of our approach. Results suggest performance on par with or surpassing the current state-of-the-art, without hyper-parameter optimizations or "task search" of any kind. Code is available at https://github.com/konstantinosKokos/ape.

LGFeb 3, 2024
Learning Structure-Aware Representations of Dependent Types

Konstantinos Kogkalidis, Orestis Melkonian, Jean-Philippe Bernardy

Agda is a dependently-typed programming language and a proof assistant, pivotal in proof formalization and programming language theory. This paper extends the Agda ecosystem into machine learning territory, and, vice versa, makes Agda-related resources available to machine learning practitioners. We introduce and release a novel dataset of Agda program-proofs that is elaborate and extensive enough to support various machine learning applications -- the first of its kind. Leveraging the dataset's ultra-high resolution, which details proof states at the sub-type level, we propose a novel neural architecture targeted at faithfully representing dependently-typed programs on the basis of structural rather than nominal principles. We instantiate and evaluate our architecture in a premise selection setup, where it achieves promising initial results, surpassing strong baselines.

CLDec 19, 2020
FraCaS: Temporal Analysis

Jean-Philippe Bernardy, Stergios Chatzikyriakidis

In this paper, we propose an implementation of temporal semantics which is suitable for inference problems. This implementation translates syntax trees to logical formulas, suitable for consumption by the Coq proof assistant. We support several phenomena including: temporal references, temporal adverbs, aspectual classes and progressives. We apply these semantics to the complete FraCaS testsuite. We obtain an accuracy of 81 percent overall and 73 percent for problems explicitly marked as related to temporal reference.

PLMay 10, 2020
Dynamic IFC Theorems for Free!

Maximilian Algehed, Jean-Philippe Bernardy, Catalin Hritcu

We show that noninterference and transparency, the key soundness theorems for dynamic IFC libraries, can be obtained "for free", as direct consequences of the more general parametricity theorem of type abstraction. This allows us to give very short soundness proofs for dynamic IFC libraries such as faceted values and LIO. Our proofs stay short even when fully mechanized for Agda implementations of the libraries in terms of type abstraction.

CLDec 14, 2018
A corpus of precise natural textual entailment problems

Jean-Philippe Bernardy, Stergios Chatzikyriakidis

In this paper, we present a new corpus of entailment problems. This corpus combines the following characteristics: 1. it is precise (does not leave out implicit hypotheses) 2. it is based on "real-world" texts (i.e. most of the premises were written for purposes other than testing textual entailment). 3. its size is 150. The corpus was constructed by taking problems from the Real Text Entailment and discovering missing hypotheses using a crowd of experts. We believe that this corpus constitutes a first step towards wide-coverage testing of precise natural-language inference systems.

CLJun 13, 2017
Modelling prosodic structure using Artificial Neural Networks

Jean-Philippe Bernardy, Charalambos Themistocleous

The ability to accurately perceive whether a speaker is asking a question or is making a statement is crucial for any successful interaction. However, learning and classifying tonal patterns has been a challenging task for automatic speech recognition and for models of tonal representation, as tonal contours are characterized by significant variation. This paper provides a classification model of Cypriot Greek questions and statements. We evaluate two state-of-the-art network architectures: a Long Short-Term Memory (LSTM) network and a convolutional network (ConvNet). The ConvNet outperforms the LSTM in the classification task and exhibited an excellent performance with 95% classification accuracy.