László Dobos

SR
h-index17
3papers
17citations
Novelty32%
AI Score36

3 Papers

SRFeb 16
Generalization from Low- to Moderate-Resolution Spectra with Neural Networks for Stellar Parameter Estimation: A Case Study with DESI

Xiaosheng Zhao, Yuan-Sen Ting, Rosemary F. G. Wyse et al.

Cross-survey generalization is a critical challenge in stellar spectral analysis, particularly in cases such as transferring from low- to moderate-resolution surveys. We investigate this problem using pre-trained models, focusing on simple neural networks such as multilayer perceptrons (MLPs), with a case study transferring from LAMOST low-resolution spectra (LRS) to DESI medium-resolution spectra (MRS). Specifically, we pre-train MLPs on either LRS or their embeddings and fine-tune them for application to DESI stellar spectra. We compare MLPs trained directly on spectra with those trained on embeddings derived from transformer-based models (self-supervised foundation models pre-trained for multiple downstream tasks). We also evaluate different fine-tuning strategies, including residual-head adapters, LoRA, and full fine-tuning. We find that MLPs pre-trained on LAMOST LRS achieve strong performance, even without fine-tuning, and that modest fine-tuning with DESI spectra further improves the results. For iron abundance, embeddings from a transformer-based model yield advantages in the metal-rich ([Fe/H] > -1.0) regime, but underperform in the metal-poor regime compared to MLPs trained directly on LRS. We also show that the optimal fine-tuning strategy depends on the specific stellar parameter under consideration. These results highlight that simple pre-trained MLPs can provide competitive cross-survey generalization, while the role of spectral foundation models for cross-survey stellar parameter estimation requires further exploration.

SENov 21, 2019Code
Kooplex: collaborative data analytics portal for advancing sciences

Dávid Visontai, József Stéger, János Márk Szalai-Gindl et al.

Research collaborations are continuously emerging catalyzed by online platforms, where people can share their codes, calculations, data and results. These virtual research platforms are innovative, community oriented, flexible and secure as required by modern scientific approaches. A wide range of open source and commercial solutions are available in this field emphasizing the relevant aspects of such a platform differently. In this paper we present our open source and modular platform, KOOPLEX, which combines the key concepts of dynamic collaboration, customizable research environment, data sharing, access to datahubs, reproducible research and reporting. It is easily deployable and scalable to serve more users or access large computational resources.

CLNov 5, 2013
Using Robust PCA to estimate regional characteristics of language use from geo-tagged Twitter messages

Dániel Kondor, István Csabai, László Dobos et al.

Principal component analysis (PCA) and related techniques have been successfully employed in natural language processing. Text mining applications in the age of the online social media (OSM) face new challenges due to properties specific to these use cases (e.g. spelling issues specific to texts posted by users, the presence of spammers and bots, service announcements, etc.). In this paper, we employ a Robust PCA technique to separate typical outliers and highly localized topics from the low-dimensional structure present in language use in online social networks. Our focus is on identifying geospatial features among the messages posted by the users of the Twitter microblogging service. Using a dataset which consists of over 200 million geolocated tweets collected over the course of a year, we investigate whether the information present in word usage frequencies can be used to identify regional features of language use and topics of interest. Using the PCA pursuit method, we are able to identify important low-dimensional features, which constitute smoothly varying functions of the geographic location.