DLOct 24, 2022
Predicting the Citation Count and CiteScore of Journals One Year in AdvanceWilliam Croft, Jörg-Rüdiger Sack
Prediction of the future performance of academic journals is a task that can benefit a variety of stakeholders including editorial staff, publishers, indexing services, researchers, university administrators and granting agencies. Using historical data on journal performance, this can be framed as a machine learning regression problem. In this work, we study two such regression tasks: 1) prediction of the number of citations a journal will receive during the next calendar year, and 2) prediction of the Elsevier CiteScore a journal will be assigned for the next calendar year. To address these tasks, we first create a dataset of historical bibliometric data for journals indexed in Scopus. We propose the use of neural network models trained on our dataset to predict the future performance of journals. To this end, we perform feature selection and model configuration for a Multi-Layer Perceptron and a Long Short-Term Memory. Through experimental comparisons to heuristic prediction baselines and classical machine learning models, we demonstrate superior performance in our proposed models for the prediction of future citation and CiteScore values.
CRFeb 19, 2021
Obfuscation of Images via Differential Privacy: From Facial Images to General ImagesWilliam Croft, Jörg-Rüdiger Sack, Wei Shi
Due to the pervasiveness of image capturing devices in every-day life, images of individuals are routinely captured. Although this has enabled many benefits, it also infringes on personal privacy. A promising direction in research on obfuscation of facial images has been the work in the k-same family of methods which employ the concept of k-anonymity from database privacy. However, there are a number of deficiencies of k-anonymity that carry over to the k-same methods, detracting from their usefulness in practice. In this paper, we first outline several of these deficiencies and discuss their implications in the context of facial obfuscation. We then develop a framework through which we obtain a formal differentially private guarantee for the obfuscation of facial images in generative machine learning models. Our approach provides a provable privacy guarantee that is not susceptible to the outlined deficiencies of k-same obfuscation and produces photo-realistic obfuscated output. In addition, we demonstrate through experimental comparisons that our approach can achieve comparable utility to k-same obfuscation in terms of preservation of useful features in the images. Furthermore, we propose a method to achieve differential privacy for any image (i.e., without restriction to facial images) through the direct modification of pixel intensities. Although the addition of noise to pixel intensities does not provide the high visual quality obtained via generative machine learning models, it offers greater versatility by eliminating the need for a trained model. We demonstrate that our proposed use of the exponential mechanism in this context is able to provide superior visual quality to pixel-space obfuscation using the Laplace mechanism.
DBNov 1, 2019
Differential Privacy Via a Truncated and Normalized Laplace MechanismWilliam Lee Croft, Jörg-Rüdiger Sack, Wei Shi
When querying databases containing sensitive information, the privacy of individuals stored in the database has to be guaranteed. Such guarantees are provided by differentially private mechanisms which add controlled noise to the query responses. However, most such mechanisms do not take into consideration the valid range of the query being posed. Thus, noisy responses that fall outside of this range may potentially be produced. To rectify this and therefore improve the utility of the mechanism, the commonly used Laplace distribution can be truncated to the valid range of the query and then normalized. However, such a data-dependent operation of normalization leaks additional information about the true query response thereby violating the differential privacy guarantee. Here, we propose a new method which preserves the differential privacy guarantee through a careful determination of an appropriate scaling parameter for the Laplace distribution. We also generalize the privacy guarantee in the context of the Laplace distribution to account for data-dependent normalization factors and study this guarantee for different classes of range constraint configurations. We provide derivations of the optimal scaling parameter (i.e., the minimal value that preserves differential privacy) for each class or provide an approximation thereof. As a consequence of this work, one can use the Laplace distribution to answer queries in a range-adherent and differentially private manner.