NEApr 20, 2022
Noise mitigation strategies in physical feedforward neural networksNadezhda Semenova, Daniel Brunner
Physical neural networks are promising candidates for next generation artificial intelligence hardware. In such architectures, neurons and connections are physically realized and do not leverage digital concepts with their practically infinite signal-to-noise ratio to encode, transduce and transform information. They therefore are prone to noise with a variety of statistical and architectural properties, and effective strategies leveraging network-inherent assets to mitigate noise in an hardware-efficient manner are important in the pursuit of next generation neural network hardware. Based on analytical derivations, we here introduce and analyse a variety of different noise-mitigation approaches. We analytically show that intra-layer connections in which the connection matrix's squared mean exceeds the mean of its square fully suppresses uncorrelated noise. We go beyond and develop two synergistic strategies for noise that is uncorrelated and correlated across populations of neurons. First, we introduce the concept of ghost neurons, where each group of neurons perturbed by correlated noise has a negative connection to a single neuron, yet without receiving any input information. Secondly, we show that pooling of neuron populations is an efficient approach to suppress uncorrelated noise. As such, we developed a general noise mitigation strategy leveraging the statistical properties of the different noise terms most relevant in analogue hardware. Finally, we demonstrate the effectiveness of this combined approach for trained neural network classifying the MNIST handwritten digits, for which we achieve a 4-fold improvement of the output signal-to-noise ratio and increase the classification accuracy almost to the level of the noise-free network.
CLOct 7, 2023
Automatic Anonymization of Swiss Federal Supreme Court RulingsJoel Niklaus, Robin Mamié, Matthias Stürmer et al.
Releasing court decisions to the public relies on proper anonymization to protect all involved parties, where necessary. The Swiss Federal Supreme Court relies on an existing system that combines different traditional computational methods with human experts. In this work, we enhance the existing anonymization software using a large dataset annotated with entities to be anonymized. We compared BERT-based models with models pre-trained on in-domain data. Our results show that using in-domain data to pre-train the models further improves the F1-score by more than 5\% compared to existing models. Our work demonstrates that combining existing anonymization methods, such as regular expressions, with machine learning can further reduce manual labor and enhance automatic suggestions.
CLMay 19, 2025Code
LEXam: Benchmarking Legal Reasoning on 340 Law ExamsYu Fan, Jingwei Ni, Jakob Merane et al. · eth-zurich
Long-form legal reasoning remains a key challenge for large language models (LLMs) in spite of recent advances in test-time scaling. To address this, we introduce \textsc{LEXam}, a novel benchmark derived from 340 law exams spanning 116 law school courses across a range of subjects and degree levels. The dataset comprises 4,886 law exam questions in English and German, including 2,841 long-form, open-ended questions and 2,045 multiple-choice questions. Besides reference answers, the open questions are also accompanied by explicit guidance outlining the expected legal reasoning approach such as issue spotting, rule recall, or rule application. Our evaluation on both open-ended and multiple-choice questions present significant challenges for current LLMs; in particular, they notably struggle with open questions that require structured, multi-step legal reasoning. Moreover, our results underscore the effectiveness of the dataset in differentiating between models with varying capabilities. Deploying an ensemble LLM-as-a-Judge paradigm with rigorous human expert validation, we demonstrate how model-generated reasoning steps can be evaluated consistently and accurately, closely aligning with human expert assessments. Our evaluation setup provides a scalable method to assess legal reasoning quality beyond simple accuracy metrics. We have open-sourced our code on https://github.com/LEXam-Benchmark/LEXam and released our data on https://huggingface.co/datasets/LEXam-Benchmark/LEXam. Project page: https://lexam-benchmark.github.io.
CLMar 3, 2025
SwiLTra-Bench: The Swiss Legal Translation BenchmarkJoel Niklaus, Jakob Merane, Luka Nenadic et al.
In Switzerland legal translation is uniquely important due to the country's four official languages and requirements for multilingual legal documentation. However, this process traditionally relies on professionals who must be both legal experts and skilled translators -- creating bottlenecks and impacting effective access to justice. To address this challenge, we introduce SwiLTra-Bench, a comprehensive multilingual benchmark of over 180K aligned Swiss legal translation pairs comprising laws, headnotes, and press releases across all Swiss languages along with English, designed to evaluate LLM-based translation systems. Our systematic evaluation reveals that frontier models achieve superior translation performance across all document types, while specialized translation systems excel specifically in laws but under-perform in headnotes. Through rigorous testing and human expert validation, we demonstrate that while fine-tuning open SLMs significantly improves their translation quality, they still lag behind the best zero-shot prompted frontier models such as Claude-3.5-Sonnet. Additionally, we present SwiLTra-Judge, a specialized LLM evaluation system that aligns best with human expert assessments.
LGNov 7, 2024
Impact of white noise in artificial neural networks trained for classification: performance and noise mitigation strategiesNadezhda Semenova, Daniel Brunner
In recent years, the hardware implementation of neural networks, leveraging physical coupling and analog neurons has substantially increased in relevance. Such nonlinear and complex physical networks provide significant advantages in speed and energy efficiency, but are potentially susceptible to internal noise when compared to digital emulations of such networks. In this work, we consider how additive and multiplicative Gaussian white noise on the neuronal level can affect the accuracy of the network when applied for specific tasks and including a softmax function in the readout layer. We adapt several noise reduction techniques to the essential setting of classification tasks, which represent a large fraction of neural network computing. We find that these adjusted concepts are highly effective in mitigating the detrimental impact of noise.
LGMar 21, 2025
Model-free front-to-end training of a large high performance laser neural networkAnas Skalli, Satoshi Sunada, Mirko Goldmann et al.
Artificial neural networks (ANNs), have become ubiquitous and revolutionized many applications ranging from computer vision to medical diagnoses. However, they offer a fundamentally connectionist and distributed approach to computing, in stark contrast to classical computers that use the von Neumann architecture. This distinction has sparked renewed interest in developing unconventional hardware to support more efficient implementations of ANNs, rather than merely emulating them on traditional systems. Photonics stands out as a particularly promising platform, providing scalability, high speed, energy efficiency, and the ability for parallel information processing. However, fully realized autonomous optical neural networks (ONNs) with in-situ learning capabilities are still rare. In this work, we demonstrate a fully autonomous and parallel ONN using a multimode vertical cavity surface emitting laser (VCSEL) using off-the-shelf components. Our ONN is highly efficient and is scalable both in network size and inference bandwidth towards the GHz range. High performance hardware-compatible optimization algorithms are necessary in order to minimize reliance on external von Neumann computers to fully exploit the potential of ONNs. As such we present and extensively study several algorithms which are broadly compatible with a wide range of systems. We then apply these algorithms to optimize our ONN, and benchmark them using the MNIST dataset. We show that our ONN can achieve high accuracy and convergence efficiency, even under limited hardware resources. Crucially, we compare these different algorithms in terms of scaling and optimization efficiency in term of convergence time which is crucial when working with limited external resources. Our work provides some guidance for the design of future ONNs as well as a simple and flexible way to train them.
OPTICSMar 5, 2025
Limits of nonlinear and dispersive fiber propagation for an optical fiber-based extreme learning machineAndrei V. Ermolaev, Mathilde Hary, Lev Leybov et al.
We report a generalized nonlinear Schrödinger equation simulation model of an extreme learning machine (ELM) based on optical fiber propagation. Using the MNIST handwritten digit dataset as a benchmark, we study how accuracy depends on propagation dynamics, as well as parameters governing spectral encoding, readout, and noise. For this dataset and with quantum noise limited input, test accuracies of : over 91% and 93% are found for propagation in the anomalous and normal dispersion regimes respectively. Our results also suggest that quantum noise on the input pulses introduces an intrinsic penalty to ELM performance.
APP-PHJun 5, 2024
Training of Physical Neural NetworksAli Momeni, Babak Rahmani, Benjamin Scellier et al.
Physical neural networks (PNNs) are a class of neural-like networks that leverage the properties of physical systems to perform computation. While PNNs are so far a niche research area with small-scale laboratory demonstrations, they are arguably one of the most underappreciated important opportunities in modern AI. Could we train AI models 1000x larger than current ones? Could we do this and also have them perform inference locally and privately on edge devices, such as smartphones or sensors? Research over the past few years has shown that the answer to all these questions is likely "yes, with enough research": PNNs could one day radically change what is possible and practical for AI systems. To do this will however require rethinking both how AI models work, and how they are trained - primarily by considering the problems through the constraints of the underlying hardware physics. To train PNNs at large scale, many methods including backpropagation-based and backpropagation-free approaches are now being explored. These methods have various trade-offs, and so far no method has been shown to scale to the same scale and performance as the backpropagation algorithm widely used in deep learning today. However, this is rapidly changing, and a diverse ecosystem of training techniques provides clues for how PNNs may one day be utilized to create both more efficient realizations of current-scale AI models, and to enable unprecedented-scale models.
MLMay 13, 2023
Convergence and scaling of Boolean-weight optimization for hardware reservoirsLouis Andreoli, Stéphane Chrétien, Xavier Porte et al.
Hardware implementation of neural network are an essential step to implement next generation efficient and powerful artificial intelligence solutions. Besides the realization of a parallel, efficient and scalable hardware architecture, the optimization of the system's extremely large parameter space with sampling-efficient approaches is essential. Here, we analytically derive the scaling laws for highly efficient Coordinate Descent applied to optimizing the readout layer of a random recurrently connection neural network, a reservoir. We demonstrate that the convergence is exponential and scales linear with the network's number of neurons. Our results perfectly reproduce the convergence and scaling of a large-scale photonic reservoir implemented in a proof-of-concept experiment. Our work therefore provides a solid foundation for such optimization in hardware networks, and identifies future directions that are promising for optimizing convergence speed during learning leveraging measures of a neural network's amplitude statistics and the weight update rule.
NEMar 12, 2021
Understanding and mitigating noise in trained deep neural networksNadezhda Semenova, Laurent Larger, Daniel Brunner
Deep neural networks unlocked a vast range of new applications by solving tasks of which many were previously deemed as reserved to higher human intelligence. One of the developments enabling this success was a boost in computing power provided by special purpose hardware, such as graphic or tensor processing units. However, these do not leverage fundamental features of neural networks like parallelism and analog state variables. Instead, they emulate neural networks relying on binary computing, which results in unsustainable energy consumption and comparatively low speed. Fully parallel and analogue hardware promises to overcome these challenges, yet the impact of analogue neuron noise and its propagation, i.e. accumulation, threatens rendering such approaches inept. Here, we determine for the first time the propagation of noise in deep neural networks comprising noisy nonlinear neurons in trained fully connected layers. We study additive and multiplicative as well as correlated and uncorrelated noise, and develop analytical methods that predict the noise level in any layer of symmetric deep neural networks or deep neural networks trained with back propagation. We find that noise accumulation is generally bound, and adding additional network layers does not worsen the signal to noise ratio beyond a limit. Most importantly, noise accumulation can be suppressed entirely when neuron activation functions have a slope smaller than unity. We therefore developed the framework for noise in fully connected deep neural networks implemented in analog systems, and identify criteria allowing engineers to design noise-resilient novel neural network hardware.
NEDec 21, 2020
A complete, parallel and autonomous photonic neural network in a semiconductor multimode laserXavier Porte, Anas Skalli, Nasibeh Haghighi et al.
Neural networks are one of the disruptive computing concepts of our time. However, they fundamentally differ from classical, algorithmic computing in a number of fundamental aspects. These differences result in equally fundamental, severe and relevant challenges for neural network computing using current computing substrates. Neural networks urge for parallelism across the entire processor and for a co-location of memory and arithmetic, i.e. beyond von Neumann architectures. Parallelism in particular made photonics a highly promising platform, yet until now scalable and integratable concepts are scarce. Here, we demonstrate for the first time how a fully parallel and fully implemented photonic neural network can be realized using spatially distributed modes of an efficient and fast semiconductor laser. Importantly, all neural network connections are realized in hardware, and our processor produces results without pre- or post-processing. 130+ nodes are implemented in a large-area vertical cavity surface emitting laser, input and output weights are realized via the complex transmission matrix of a multimode fiber and a digital micro-mirror array, respectively. We train the readout weights to perform 2-bit header recognition, a 2-bit XOR and 2-bit digital analog conversion, and obtain < 0.9 10^-3 and 2.9 10^-2 error rates for digit recognition and XOR, respectively. Finally, the digital analog conversion can be realized with a standard deviation of only 5.4 10^-2. Our system is scalable to much larger sizes and to bandwidths in excess of 20 GHz.
NEApr 6, 2020
Human action recognition with a large-scale brain-inspired photonic computerPiotr Antonik, Nicolas Marsal, Daniel Brunner et al.
The recognition of human actions in video streams is a challenging task in computer vision, with cardinal applications in e.g. brain-computer interface and surveillance. Deep learning has shown remarkable results recently, but can be found hard to use in practice, as its training requires large datasets and special purpose, energy-consuming hardware. In this work, we propose a scalable photonic neuro-inspired architecture based on the reservoir computing paradigm, capable of recognising video-based human actions with state-of-the-art accuracy. Our experimental optical setup comprises off-the-shelf components, and implements a large parallel recurrent neural network that is easy to train and can be scaled up to hundreds of thousands of nodes. This work paves the way towards simply reconfigurable and energy-efficient photonic information processing systems for real-time video processing.
NEApr 6, 2020
Bayesian optimisation of large-scale photonic reservoir computersPiotr Antonik, Nicolas Marsal, Daniel Brunner et al.
Introduction. Reservoir computing is a growing paradigm for simplified training of recurrent neural networks, with a high potential for hardware implementations. Numerous experiments in optics and electronics yield comparable performance to digital state-of-the-art algorithms. Many of the most recent works in the field focus on large-scale photonic systems, with tens of thousands of physical nodes and arbitrary interconnections. While this trend significantly expands the potential applications of photonic reservoir computing, it also complicates the optimisation of the high number of hyper-parameters of the system. Methods. In this work, we propose the use of Bayesian optimisation for efficient exploration of the hyper-parameter space in a minimum number of iteration. Results. We test this approach on a previously reported large-scale experimental system, compare it to the commonly used grid search, and report notable improvements in performance and the number of experimental iterations required to optimise the hyper-parameters. Conclusion. Bayesian optimisation thus has the potential to become the standard method for tuning the hyper-parameters in photonic reservoir computing.
NEMar 27, 2020
Boolean learning under noise-perturbations in hardware neural networksLouis Andreoli, Xavier Porte, Stéphane Chrétien et al.
A high efficiency hardware integration of neural networks benefits from realizing nonlinearity, network connectivity and learning fully in a physical substrate. Multiple systems have recently implemented some or all of these operations, yet the focus was placed on addressing technological challenges. Fundamental questions regarding learning in hardware neural networks remain largely unexplored. Noise in particular is unavoidable in such architectures, and here we investigate its interaction with a learning algorithm using an opto-electronic recurrent neural network. We find that noise strongly modifies the system's path during convergence, and surprisingly fully decorrelates the final readout weight matrices. This highlights the importance of understanding architecture, noise and learning algorithm as interacting players, and therefore identifies the need for mathematical tools for noisy, analogue system optimization.
ETDec 17, 2019
Three dimensional waveguide-interconnects for scalable integration of photonic neural networksJohnny Moughames, Xavier Porte, Michael Thiel et al.
Photonic waveguides are prime candidates for integrated and parallel photonic interconnects. Such interconnects correspond to large-scale vector matrix products, which are at the heart of neural network computation. However, parallel interconnect circuits realized in two dimensions, for example by lithography, are strongly limited in size due to disadvantageous scaling. We use three dimensional (3D) printed photonic waveguides to overcome this limitation. 3D optical-couplers with fractal topology efficiently connect large numbers of input and output channels, and we show that the substrate's footprint area scales linearly. Going beyond simple couplers, we introduce functional circuits for discrete spatial filters identical to those used in deep convolutional neural networks.
NEJul 23, 2019
Reservoir-size dependent learning in analogue neural networksXavier Porte, Louis Andreoli, Maxime Jacquot et al.
The implementation of artificial neural networks in hardware substrates is a major interdisciplinary enterprise. Well suited candidates for physical implementations must combine nonlinear neurons with dedicated and efficient hardware solutions for both connectivity and training. Reservoir computing addresses the problems related with the network connectivity and training in an elegant and efficient way. However, important questions regarding impact of reservoir size and learning routines on the convergence-speed during learning remain unaddressed. Here, we study in detail the learning process of a recently demonstrated photonic neural network based on a reservoir. We use a greedy algorithm to train our neural network for the task of chaotic signals prediction and analyze the learning-error landscape. Our results unveil fundamental properties of the system's optimization hyperspace. Particularly, we determine the convergence speed of learning as a function of reservoir size and find exceptional, close to linear scaling. This linear dependence, together with our parallel diffractive coupling, represent optimal scaling conditions for our photonic neural network scheme.
ETJul 21, 2019
Fundamental aspects of noise in analog-hardware neural networksNadezhda Semenova, Xavier Porte, Louis Andreoli et al.
We study and analyze the fundamental aspects of noise propagation in recurrent as well as deep, multi-layer networks. The main focus of our study are neural networks in analogue hardware, yet the methodology provides insight for networks in general. The system under study consists of noisy linear nodes, and we investigate the signal-to-noise ratio at the network's outputs which is the upper limit to such a system's computing accuracy. We consider additive and multiplicative noise which can be purely local as well as correlated across populations of neurons. This covers the chief internal-perturbations of hardware networks and noise amplitudes were obtained from a physically implemented recurrent neural network and therefore correspond to a real-world system. Analytic solutions agree exceptionally well with numerical data, enabling clear identification of the most critical components and aspects for noise management. Focusing on linear nodes isolates the impact of network connections and allows us to derive strategies for mitigating noise. Our work is the starting point in addressing this aspect of analogue neural networks, and our results identify notoriously sensitive points while simultaneously highlighting the robustness of such computational systems.
ETMay 4, 2018
Efficient Design of Hardware-Enabled Reservoir Computing in FPGAsBogdan Penkovsky, Laurent Larger, Daniel Brunner
In this work, we propose a new approach towards the efficient optimization and implementation of reservoir computing hardware reducing the required domain expert knowledge and optimization effort. First, we adapt the reservoir input mask to the structure of the data via linear autoencoders. We therefore incorporate the advantages of dimensionality reduction and dimensionality expansion achieved by conventional algorithmically efficient linear algebra procedures of principal component analysis. Second, we employ evolutionary-inspired genetic algorithm techniques resulting in a highly efficient optimization of reservoir dynamics with dramatically reduced number of evaluations comparing to exhaustive search. We illustrate the method on the so-called single-node reservoir computing architecture, especially suitable for implementation in ultrahigh-speed hardware. The combination of both methods and the resulting reduction of time required for performance optimization of a hardware system establish a strategy towards machine learning hardware capable of self-adaption to optimally solve specific problems. We confirm the validity of those principles building reservoir computing hardware based on a field-programmable gate array.
NENov 14, 2017
Reinforcement Learning in a large scale photonic Recurrent Neural NetworkJulian Bueno, Sheler Maktoobi, Luc Froehly et al.
Photonic Neural Network implementations have been gaining considerable attention as a potentially disruptive future technology. Demonstrating learning in large scale neural networks is essential to establish photonic machine learning substrates as viable information processing systems. Realizing photonic Neural Networks with numerous nonlinear nodes in a fully parallel and efficient learning hardware was lacking so far. We demonstrate a network of up to 2500 diffractively coupled photonic nodes, forming a large scale Recurrent Neural Network. Using a Digital Micro Mirror Device, we realize reinforcement learning. Our scheme is fully parallel, and the passive weights maximize energy efficiency and bandwidth. The computational output efficiently converges and we achieve very good performance.