AINov 10, 2022
Power Grid Congestion Management via Topology Optimization with AlphaZeroMatthias Dorfer, Anton R. Fuxjäger, Kristian Kozak et al.
The energy sector is facing rapid changes in the transition towards clean renewable sources. However, the growing share of volatile, fluctuating renewable generation such as wind or solar energy has already led to an increase in power grid congestion and network security concerns. Grid operators mitigate these by modifying either generation or demand (redispatching, curtailment, flexible loads). Unfortunately, redispatching of fossil generators leads to excessive grid operation costs and higher emissions, which is in direct opposition to the decarbonization of the energy sector. In this paper, we propose an AlphaZero-based grid topology optimization agent as a non-costly, carbon-free congestion management alternative. Our experimental evaluation confirms the potential of topology optimization for power grid operation, achieves a reduction of the average amount of required redispatching by 60%, and shows the interoperability with traditional congestion management methods. Our approach also ranked 1st in the WCCI 2022 Learning to Run a Power Network (L2RPN) competition. Based on our findings, we identify and discuss open research problems as well as technical challenges for a productive system on a real power grid.
AIFeb 15, 2023
Reinforcement Learning Based Power Grid Day-Ahead Planning and AI-Assisted ControlAnton R. Fuxjäger, Kristian Kozak, Matthias Dorfer et al.
The ongoing transition to renewable energy is increasing the share of fluctuating power sources like wind and solar, raising power grid volatility and making grid operation increasingly complex and costly. In our prior work, we have introduced a congestion management approach consisting of a redispatching optimizer combined with a machine learning-based topology optimization agent. Compared to a typical redispatching-only agent, it was able to keep a simulated grid in operation longer while at the same time reducing operational cost. Our approach also ranked 1st in the L2RPN 2022 competition initiated by RTE, Europe's largest grid operator. The aim of this paper is to bring this promising technology closer to the real world of power grid operation. We deploy RL-based agents in two settings resembling established workflows, AI-assisted day-ahead planning and realtime control, in an attempt to show the benefits and caveats of this new technology. We then analyse congestion, redispatching and switching profiles, and elementary sensitivity analysis providing a glimpse of operation robustness. While there is still a long way to a real control room, we believe that this paper and the associated prototypes help to narrow the gap and pave the way for a safe deployment of RL agents in tomorrow's power grids.
LGSep 29, 2020Code
Align-RUDDER: Learning From Few Demonstrations by Reward RedistributionVihang P. Patil, Markus Hofmarcher, Marius-Constantin Dinu et al.
Reinforcement learning algorithms require many samples when solving complex hierarchical tasks with sparse and delayed rewards. For such complex tasks, the recently proposed RUDDER uses reward redistribution to leverage steps in the Q-function that are associated with accomplishing sub-tasks. However, often only few episodes with high rewards are available as demonstrations since current exploration strategies cannot discover them in reasonable time. In this work, we introduce Align-RUDDER, which utilizes a profile model for reward redistribution that is obtained from multiple sequence alignment of demonstrations. Consequently, Align-RUDDER employs reward redistribution effectively and, thereby, drastically improves learning on few demonstrations. Align-RUDDER outperforms competitors on complex artificial tasks with delayed rewards and few demonstrations. On the Minecraft ObtainDiamond task, Align-RUDDER is able to mine a diamond, though not frequently. Code is available at https://github.com/ml-jku/align-rudder. YouTube: https://youtu.be/HO-_8ZUl-UY
IRMay 19, 2017Code
End-to-End Cross-Modality Retrieval with CCA Projections and Pairwise Ranking LossMatthias Dorfer, Jan Schlüter, Andreu Vall et al.
Cross-modality retrieval encompasses retrieval tasks where the fetched items are of a different type than the search query, e.g., retrieving pictures relevant to a given text query. The state-of-the-art approach to cross-modality retrieval relies on learning a joint embedding space of the two modalities, where items from either modality are retrieved using nearest-neighbor search. In this work, we introduce a neural network layer based on Canonical Correlation Analysis (CCA) that learns better embedding spaces by analytically computing projections that maximize correlation. In contrast to previous approaches, the CCA Layer (CCAL) allows us to combine existing objectives for embedding space learning, such as pairwise ranking losses, with the optimal projections of CCA. We show the effectiveness of our approach for cross-modality retrieval on three different scenarios (text-to-image, audio-sheet-music and zero-shot retrieval), surpassing both Deep CCA and a multi-view network using freely learned projections optimized by a pairwise ranking loss, especially when little training data is available (the code for all three methods is released at: https://github.com/CPJKU/cca_layer).
LGFeb 13, 2025
AB-UPT: Scaling Neural CFD Surrogates for High-Fidelity Automotive Aerodynamics Simulations via Anchored-Branched Universal Physics TransformersBenedikt Alkin, Maurits Bleeker, Richard Kurle et al.
Recent advances in neural surrogate modeling offer the potential for transformative innovations in applications such as automotive aerodynamics. Yet, industrial-scale problems often involve volumetric meshes with cell counts reaching 100 million, presenting major scalability challenges. Complex geometries further complicate modeling through intricate surface-volume interactions, while quantities such as vorticity are highly nonlinear and must satisfy strict divergence-free constraints. To address these requirements, we introduce AB-UPT as a novel modeling scheme for building neural surrogates for CFD simulations. AB-UPT is designed to: (i) decouple geometry encoding and prediction tasks via multi-branch operators; (ii) enable scalability to high-resolution outputs via neural simulation in a low-dimensional latent space, coupled with anchored neural field decoders to predict high-fidelity outputs; (iii) enforce physics consistency by a divergence-free formulation. We show that AB-UPT yields state-of-the-art predictive accuracy of surface and volume fields on automotive CFD simulations ranging from 33 thousand up to 150 million mesh cells. Furthermore, our anchored neural field architecture enables the enforcement of hard physical constraints on the physics predictions without degradation in performance, exemplified by modeling divergence-free vorticity fields. Notably, the proposed models can be trained on a single GPU in less than a day and predict industry-standard surface and volume fields within seconds. Additionally, we show that the flexible design of our method enables neural simulation from a CAD geometry alone, thereby eliminating the need for costly CFD meshing procedures for inference.
LGJul 3, 2019
The Receptive Field as a Regularizer in Deep Convolutional Neural Networks for Acoustic Scene ClassificationKhaled Koutini, Hamid Eghbal-zadeh, Matthias Dorfer et al.
Convolutional Neural Networks (CNNs) have had great success in many machine vision as well as machine audition tasks. Many image recognition network architectures have consequently been adapted for audio processing tasks. However, despite some successes, the performance of many of these did not translate from the image to the audio domain. For example, very deep architectures such as ResNet and DenseNet, which significantly outperform VGG in image recognition, do not perform better in audio processing tasks such as Acoustic Scene Classification (ASC). In this paper, we investigate the reasons why such powerful architectures perform worse in ASC compared to simpler models (e.g., VGG). To this end, we analyse the receptive field (RF) of these CNNs and demonstrate the importance of the RF to the generalization capability of the models. Using our receptive field analysis, we adapt both ResNet and DenseNet, achieving state-of-the-art performance and eventually outperforming the VGG-based models. We introduce systematic ways of adapting the RF in CNNs, and present results on three data sets that show how changing the RF over the time and frequency dimensions affects a model's performance. Our experimental results show that very small or very large RFs can cause performance degradation, but deep models can be made to generalize well by carefully choosing an appropriate RF size within a certain range.
IRJun 26, 2019
Learning Soft-Attention Models for Tempo-invariant Audio-Sheet Music RetrievalStefan Balke, Matthias Dorfer, Luis Carvalho et al.
Connecting large libraries of digitized audio recordings to their corresponding sheet music images has long been a motivation for researchers to develop new cross-modal retrieval systems. In recent years, retrieval systems based on embedding space learning with deep neural networks got a step closer to fulfilling this vision. However, global and local tempo deviations in the music recordings still require careful tuning of the amount of temporal context given to the system. In this paper, we address this problem by introducing an additional soft-attention mechanism on the audio input. Quantitative and qualitative results on synthesized piano data indicate that this attention increases the robustness of the retrieval system by focusing on different parts of the input representation based on the tempo of the audio. Encouraged by these results, we argue for the potential of attention models as a very general tool for many MIR tasks.
IRFeb 12, 2019
Cross-Modal Music Retrieval and Applications: An Overview of Key MethodologiesMeinard Müller, Andreas Arzt, Stefan Balke et al.
There has been a rapid growth of digitally available music data, including audio recordings, digitized images of sheet music, album covers and liner notes, and video clips. This huge amount of data calls for retrieval strategies that allow users to explore large music collections in a convenient way. More precisely, there is a need for cross-modal retrieval algorithms that, given a query in one modality (e.g., a short audio excerpt), find corresponding information and entities in other modalities (e.g., the name of the piece and the sheet music). This goes beyond exact audio identification and subsequent retrieval of metainformation as performed by commercial applications like Shazam [1].
SDSep 15, 2018
Attention as a Perspective for Learning Tempo-invariant Audio QueriesMatthias Dorfer, Jan Hajič, Gerhard Widmer
Current models for audio--sheet music retrieval via multimodal embedding space learning use convolutional neural networks with a fixed-size window for the input audio. Depending on the tempo of a query performance, this window captures more or less musical content, while notehead density in the score is largely tempo-independent. In this work we address this disparity with a soft attention mechanism, which allows the model to encode only those parts of an audio excerpt that are most relevant with respect to efficient query codes. Empirical results on classical piano music indicate that attention is beneficial for retrieval performance, and exhibits intuitively appealing behavior.
AIJul 17, 2018
Learning to Listen, Read, and Follow: Score Following as a Reinforcement Learning GameMatthias Dorfer, Florian Henkel, Gerhard Widmer
Score following is the process of tracking a musical performance (audio) with respect to a known symbolic representation (a score). We start this paper by formulating score following as a multimodal Markov Decision Process, the mathematical foundation for sequential decision making. Given this formal definition, we address the score following task with state-of-the-art deep reinforcement learning (RL) algorithms such as synchronous advantage actor critic (A2C). In particular, we design multimodal RL agents that simultaneously learn to listen to music, read the scores from images of sheet music, and follow the audio along in the sheet, in an end-to-end fashion. All this behavior is learned entirely from scratch, based on a weak and potentially delayed reward signal that indicates to the agent how close it is to the correct position in the score. Besides discussing the theoretical advantages of this learning paradigm, we show in experiments that it is in fact superior compared to previously proposed methods for score following in raw sheet music images.
IRMay 24, 2018
A Hybrid Approach to Music Playlist Continuation Based on Playlist-Song MembershipAndreu Vall, Matthias Dorfer, Markus Schedl et al.
Automated music playlist continuation is a common task of music recommender systems, that generally consists in providing a fitting extension to a given playlist. Collaborative filtering models, that extract abstract patterns from curated music playlists, tend to provide better playlist continuations than content-based approaches. However, pure collaborative filtering models have at least one of the following limitations: (1) they can only extend playlists profiled at training time; (2) they misrepresent songs that occur in very few playlists. We introduce a novel hybrid playlist continuation model based on what we name "playlist-song membership", that is, whether a given playlist and a given song fit together. The proposed model regards any playlist-song pair exclusively in terms of feature vectors. In light of this information, and after having been trained on a collection of labeled playlist-song pairs, the proposed model decides whether a playlist-song pair fits together or not. Experimental results on two datasets of curated music playlists show that the proposed playlist continuation model compares to a state-of-the-art collaborative filtering model in the ideal situation of extending playlists profiled at training time and where songs occurred frequently in training playlists. In contrast to the collaborative filtering model, and as a result of its general understanding of the playlist-song pairs in terms of feature vectors, the proposed model is additionally able to (1) extend non-profiled playlists and (2) recommend songs that occurred seldom or never in training~playlists.
LGNov 10, 2017
Deep Within-Class Covariance Analysis for Robust Audio Representation LearningHamid Eghbal-zadeh, Matthias Dorfer, Gerhard Widmer
Convolutional Neural Networks (CNNs) can learn effective features, though have been shown to suffer from a performance drop when the distribution of the data changes from training to test data. In this paper we analyze the internal representations of CNNs and observe that the representations of unseen data in each class, spread more (with higher variance) in the embedding space of the CNN compared to representations of the training data. More importantly, this difference is more extreme if the unseen data comes from a shifted distribution. Based on this observation, we objectively evaluate the degree of representation's variance in each class via eigenvalue decomposition on the within-class covariance of the internal representations of CNNs and observe the same behaviour. This can be problematic as larger variances might lead to mis-classification if the sample crosses the decision boundary of its class. We apply nearest neighbor classification on the representations and empirically show that the embeddings with the high variance actually have significantly worse KNN classification performances, although this could not be foreseen from their end-to-end classification results. To tackle this problem, we propose Deep Within-Class Covariance Analysis (DWCCA), a deep neural network layer that significantly reduces the within-class covariance of a DNN's representation, improving performance on unseen test data from a shifted distribution. We empirically evaluate DWCCA on two datasets for Acoustic Scene Classification (DCASE2016 and DCASE2017). We demonstrate that not only does DWCCA significantly improve the network's internal representation, it also increases the end-to-end classification accuracy, especially when the test set exhibits a distribution shift. By adding DWCCA to a VGG network, we achieve around 6 percentage points improvement in the case of a distribution mismatch.
MMAug 7, 2017
Aktuelle Entwicklungen in der Automatischen MusikverfolgungAndreas Arzt, Matthias Dorfer
In this paper we present current trends in real-time music tracking (a.k.a. score following). Casually speaking, these algorithms "listen" to a live performance of music, compare the audio signal to an abstract representation of the score, and "read" along in the sheet music. In this way at any given time the exact position of the musician(s) in the sheet music is computed. Here, we focus on the aspects of flexibility and usability of these algorithms. This comprises work on automatic identification and flexible tracking of the piece being played as well as current approaches based on Deep Learning. The latter enables direct learning of correspondences between complex audio data and images of the sheet music, avoiding the complicated and time-consuming definition of a mid-level representation. ----- Diese Arbeit befasst sich mit aktuellen Entwicklungen in der automatischen Musikverfolgung durch den Computer. Es handelt sich dabei um Algorithmen, die einer musikalischen Aufführung "zuhören", das aufgenommene Audiosignal mit einer (abstrakten) Repräsentation des Notentextes vergleichen und sozusagen in diesem mitlesen. Der Algorithmus kennt also zu jedem Zeitpunkt die Position der Musiker im Notentext. Neben der Vermittlung eines generellen Überblicks, liegt der Schwerpunkt dieser Arbeit auf der Beleuchtung des Aspekts der Flexibilität und der einfacheren Nutzbarkeit dieser Algorithmen. Es wird dargelegt, welche Schritte getätigt wurden (und aktuell getätigt werden) um den Prozess der automatischen Musikverfolgung einfacher zugänglich zu machen. Dies umfasst Arbeiten zur automatischen Identifikation von gespielten Stücken und deren flexible Verfolgung ebenso wie aktuelle Ansätze mithilfe von Deep Learning, die es erlauben Bild und Ton direkt zu verbinden, ohne Umwege über abstrakte und nur unter großem Zeitaufwand zu erstellende Zwischenrepräsentationen.
IRJul 31, 2017
Learning Audio - Sheet Music Correspondences for Score Identification and Offline AlignmentMatthias Dorfer, Andreas Arzt, Gerhard Widmer
This work addresses the problem of matching short excerpts of audio with their respective counterparts in sheet music images. We show how to employ neural network-based cross-modality embedding spaces for solving the following two sheet music-related tasks: retrieving the correct piece of sheet music from a database when given a music audio as a search query; and aligning an audio recording of a piece with the corresponding images of sheet music. We demonstrate the feasibility of this in experiments on classical piano music by five different composers (Bach, Haydn, Mozart, Beethoven and Chopin), and additionally provide a discussion on why we expect multi-modal neural networks to be a fruitful paradigm for dealing with sheet music and audio at the same time.
SDJun 20, 2017
A Hybrid Approach with Multi-channel I-Vectors and Convolutional Neural Networks for Acoustic Scene ClassificationHamid Eghbal-zadeh, Bernhard Lehner, Matthias Dorfer et al.
In Acoustic Scene Classification (ASC) two major approaches have been followed . While one utilizes engineered features such as mel-frequency-cepstral-coefficients (MFCCs), the other uses learned features that are the outcome of an optimization algorithm. I-vectors are the result of a modeling technique that usually takes engineered features as input. It has been shown that standard MFCCs extracted from monaural audio signals lead to i-vectors that exhibit poor performance, especially on indoor acoustic scenes. At the same time, Convolutional Neural Networks (CNNs) are well known for their ability to learn features by optimizing their filters. They have been applied on ASC and have shown promising results. In this paper, we first propose a novel multi-channel i-vector extraction and scoring scheme for ASC, improving their performance on indoor and outdoor scenes. Second, we propose a CNN architecture that achieves promising ASC results. Further, we show that i-vectors and CNNs capture complementary information from acoustic scenes. Finally, we propose a hybrid system for ASC using multi-channel i-vectors and CNNs by utilizing a score fusion technique. Using our method, we participated in the ASC task of the DCASE-2016 challenge. Our hybrid approach achieved 1 st rank among 49 submissions, substantially improving the previous state of the art.
IRMay 23, 2017
Music Playlist Continuation by Learning from Hand-Curated Examples and Song Features: Alleviating the Cold-Start Problem for Rare and Out-of-Set SongsAndreu Vall, Hamid Eghbal-zadeh, Matthias Dorfer et al.
Automated music playlist generation is a specific form of music recommendation. Generally stated, the user receives a set of song suggestions defining a coherent listening session. We hypothesize that the best way to convey such playlist coherence to new recommendations is by learning it from actual curated examples, in contrast to imposing ad hoc constraints. Collaborative filtering methods can be used to capture underlying patterns in hand-curated playlists. However, the scarcity of thoroughly curated playlists and the bias towards popular songs result in the vast majority of songs occurring in very few playlists and thus being poorly recommended. To overcome this issue, we propose an alternative model based on a song-to-playlist classifier, which learns the underlying structure from actual playlists while leveraging song features derived from audio, social tags and independent listening logs. Experiments on two datasets of hand-curated playlists show competitive performance compared to collaborative filtering when sufficient training data is available and more robust performance when recommending rare and out-of-set songs. For example, both approaches achieve a recall@100 of roughly 35% for songs occurring in 5 or more training playists, whereas the proposed model achieves a recall@100 of roughly 15% for songs occurring in 4 or less training playlists, compared to the 3% achieved by collaborative filtering.
SDDec 15, 2016
On the Potential of Simple Framewise Approaches to Piano TranscriptionRainer Kelz, Matthias Dorfer, Filip Korzeniowski et al.
In an attempt at exploring the limitations of simple approaches to the task of piano transcription (as usually defined in MIR), we conduct an in-depth analysis of neural network-based framewise transcription. We systematically compare different popular input representations for transcription systems to determine the ones most suitable for use with neural networks. Exploiting recent advances in training techniques and new regularizers, and taking into account hyper-parameter tuning, we show that it is possible, by simple bottom-up frame-wise processing, to obtain a piano transcriber that outperforms the current published state of the art on the publicly available MAPS dataset -- without any complex post-processing steps. Thus, we propose this simple approach as a new baseline for this dataset, for future transcription research to build on and improve.
SDDec 15, 2016
Live Score Following on Sheet Music ImagesMatthias Dorfer, Andreas Arzt, Sebastian Böck et al.
In this demo we show a novel approach to score following. Instead of relying on some symbolic representation, we are using a multi-modal convolutional neural network to match the incoming audio stream directly to sheet music images. This approach is in an early stage and should be seen as proof of concept. Nonetheless, the audience will have the opportunity to test our implementation themselves via 3 simple piano pieces.
SDDec 15, 2016
Towards End-to-End Audio-Sheet-Music RetrievalMatthias Dorfer, Andreas Arzt, Gerhard Widmer
This paper demonstrates the feasibility of learning to retrieve short snippets of sheet music (images) when given a short query excerpt of music (audio) -- and vice versa --, without any symbolic representation of music or scores. This would be highly useful in many content-based musical retrieval scenarios. Our approach is based on Deep Canonical Correlation Analysis (DCCA) and learns correlated latent spaces allowing for cross-modality retrieval in both directions. Initial experiments with relatively simple monophonic music show promising results.
LGDec 15, 2016
Towards Score Following in Sheet Music ImagesMatthias Dorfer, Andreas Arzt, Gerhard Widmer
This paper addresses the matching of short music audio snippets to the corresponding pixel location in images of sheet music. A system is presented that simultaneously learns to read notes, listens to music and matches the currently played music to its corresponding notes in the sheet. It consists of an end-to-end multi-modal convolutional neural network that takes as input images of sheet music and spectrograms of the respective audio snippets. It learns to predict, for a given unseen audio snippet (covering approximately one bar of music), the corresponding position in the respective score line. Our results suggest that with the use of (deep) neural networks -- which have proven to be powerful image processing models -- working with sheet music becomes feasible and a promising future research direction.
LGNov 15, 2015
Deep Linear Discriminant AnalysisMatthias Dorfer, Rainer Kelz, Gerhard Widmer
We introduce Deep Linear Discriminant Analysis (DeepLDA) which learns linearly separable latent representations in an end-to-end fashion. Classic LDA extracts features which preserve class separability and is used for dimensionality reduction for many classification problems. The central idea of this paper is to put LDA on top of a deep neural network. This can be seen as a non-linear extension of classic LDA. Instead of maximizing the likelihood of target labels for individual samples, we propose an objective function that pushes the network to produce feature distributions which: (a) have low variance within the same class and (b) high variance between different classes. Our objective is derived from the general LDA eigenvalue problem and still allows to train with stochastic gradient descent and back-propagation. For evaluation we test our approach on three different benchmark datasets (MNIST, CIFAR-10 and STL-10). DeepLDA produces competitive results on MNIST and CIFAR-10 and outperforms a network trained with categorical cross entropy (same architecture) on a supervised setting of STL-10.