Gábor Vattay

CL
4papers
63citations
Novelty38%
AI Score36

4 Papers

QUANT-PHMar 12
Transition from Statistical to Hardware-Limited Scaling in Photonic Quantum State Reconstruction

Attila Baumann, Zsolt Kis, János Koltai et al.

The theoretical efficiency of classical shadow tomography is predicated on a perfect Haar-random unitary ensemble, yet this mathematical ideal remains physically unattainable in near-term hardware. Here, we report the experimental discovery of a fundamental accuracy bound on integrated photonic processors: a ``Hardware Horizon'' where the reconstruction error undergoes a sharp phase transition. While the error initially obeys the predicted statistical scaling $\mathcal{O}(M^{-1/2})$, it abruptly saturates at a floor determined by the spectral distortions of the realized unitary group. By deriving a phenomenological error model, we decouple the competing mechanisms of static coherent spectral distortion and dynamic decoherence, demonstrating that this intrinsic noise floor imposes a hard bound that statistical accumulation cannot overcome. These findings establish that the utility of shadow tomography on NISQ (noisy intermediate-scale quantum) hardware is defined by a specific scaling law involving hardware parameters, necessitating active compensation strategies to bridge the gap between theoretical purity and the noisy reality of integrated photonics.

SOC-PHMar 11, 2019
Scaling in Words on Twitter

Eszter Bokányi, Dániel Kondor, Gábor Vattay

Scaling properties of language are a useful tool for understanding generative processes in texts. We investigate the scaling relations in citywise Twitter corpora coming from the Metropolitan and Micropolitan Statistical Areas of the United States. We observe a slightly superlinear urban scaling with the city population for the total volume of the tweets and words created in a city. We then find that a certain core vocabulary follows the scaling relationship of that of the bulk text, but most words are sensitive to city size, exhibiting a super- or a sublinear urban scaling. For both regimes we can offer a plausible explanation based on the meaning of the words. We also show that the parameters for Zipf's law and Heaps law differ on Twitter from that of other texts, and that the exponent of Zipf's law changes with city size.

APDec 20, 2016
A Bayesian Approach to Identify Bitcoin Users

Péter L. Juhász, József Stéger, Dániel Kondor et al.

Bitcoin is a digital currency and electronic payment system operating over a peer-to-peer network on the Internet. One of its most important properties is the high level of anonymity it provides for its users. The users are identified by their Bitcoin addresses, which are random strings in the public records of transactions, the blockchain. When a user initiates a Bitcoin-transaction, his Bitcoin client program relays messages to other clients through the Bitcoin network. Monitoring the propagation of these messages and analyzing them carefully reveal hidden relations. In this paper, we develop a mathematical model using a probabilistic approach to link Bitcoin addresses and transactions to the originator IP address. To utilize our model, we carried out experiments by installing more than a hundred modified Bitcoin clients distributed in the network to observe as many messages as possible. During a two month observation period we were able to identify several thousand Bitcoin clients and bind their transactions to geographical locations.

CLNov 5, 2013
Using Robust PCA to estimate regional characteristics of language use from geo-tagged Twitter messages

Dániel Kondor, István Csabai, László Dobos et al.

Principal component analysis (PCA) and related techniques have been successfully employed in natural language processing. Text mining applications in the age of the online social media (OSM) face new challenges due to properties specific to these use cases (e.g. spelling issues specific to texts posted by users, the presence of spammers and bots, service announcements, etc.). In this paper, we employ a Robust PCA technique to separate typical outliers and highly localized topics from the low-dimensional structure present in language use in online social networks. Our focus is on identifying geospatial features among the messages posted by the users of the Twitter microblogging service. Using a dataset which consists of over 200 million geolocated tweets collected over the course of a year, we investigate whether the information present in word usage frequencies can be used to identify regional features of language use and topics of interest. Using the PCA pursuit method, we are able to identify important low-dimensional features, which constitute smoothly varying functions of the geographic location.