Spase Petkoski

CE
h-index22
3papers
8citations
Novelty37%
AI Score42

3 Papers

90.5CLMay 20
On the limits and opportunities of AI reviewers: Reviewing the reviews of Nature-family papers with 45 expert scientists

Seungone Kim, Dongkeun Yoon, Kiril Gashteovski et al.

With the advancement of AI capabilities, AI reviewers are beginning to be deployed in scientific peer review, yet their capability and credibility remain in question: many scientists simply view them as probabilistic systems without the expertise to evaluate research, while other researchers are more optimistic about their readiness without concrete evidence. Understanding what AI reviewers do well, where they fall short, and what challenges remain is essential. However, existing evaluations of AI reviewers have focused on whether their verdicts match human verdicts (e.g., score alignment, acceptance prediction), which is insufficient to characterize their capabilities and limits. In this paper, we close this gap through a large-scale expert annotation study, in which 45 domain scientists in Physical, Biological, and Health Sciences spent 469 hours rating 2,960 individual criticisms (each targeting one specific aspect of a paper) from human-written and AI-generated reviews of 82 Nature-family papers on correctness, significance, and sufficiency of evidence. On a composite of all three dimensions, a reviewing agent powered by GPT-5.2 scores above each paper's top-rated human reviewer (60.0% vs. 48.2%, p = 0.009), while all three AI reviewers (including Gemini 3.0 Pro and Claude Opus 4.5) exceed the lowest-rated human across every dimension. AI reviewers' accurate criticisms are also more often rated significant and well-evidenced, and surface a distinct 26% of issues no human raises. However, AI reviewers overlap far more than humans do (21% vs. 3% for cross-reviewer pairs), and exhibit 16 recurring weaknesses humans do not share, such as limited subfield knowledge, lack of long context management over multiple files, and overly critical stance on minor issues. Overall, our results position current AI reviewers as complements to, not substitutes for, human reviewers.

CEFeb 11, 2021Code
Brain Modelling as a Service: The Virtual Brain on EBRAINS

Michael Schirner, Lia Domide, Dionysios Perdikis et al.

The Virtual Brain (TVB) is now available as open-source cloud ecosystem on EBRAINS, a shared digital research platform for brain science. It offers services for constructing, simulating and analysing brain network models (BNMs) including the TVB network simulator; magnetic resonance imaging (MRI) processing pipelines to extract structural and functional connectomes; multiscale co-simulation of spiking and large-scale networks; a domain specific language for automatic high-performance code generation from user-specified models; simulation-ready BNMs of patients and healthy volunteers; Bayesian inference of epilepsy spread; data and code for mouse brain simulation; and extensive educational material. TVB cloud services facilitate reproducible online collaboration and discovery of data assets, models, and software embedded in scalable and secure workflows, a precondition for research on large cohort data sets, better generalizability and clinical translation.

LGOct 1, 2025
Dynamical system reconstruction from partial observations using stochastic dynamics

Viktor Sip, Martin Breyton, Spase Petkoski et al.

Learning stochastic models of dynamical systems underlying observed data is of interest in many scientific fields. Here we propose a novel method for this task, based on the framework of variational autoencoders for dynamical systems. The method estimates from the data both the system state trajectories and noise time series. This approach allows to perform multi-step system evolution and supports a teacher forcing strategy, alleviating limitations of autoencoder-based approaches for stochastic systems. We demonstrate the performance of the proposed approach on six test problems, covering simulated and experimental data. We further show the effects of the teacher forcing interval on the nature of the internal dynamics, and compare it to the deterministic models with equivalent architecture.