Seyyed Ehsan Mahmoudi

SIFeb 21

UniRank: A Multi-Agent Calibration Pipeline for Estimating University Rankings from Anonymized Bibliometric Signals

Pedram Riyazimehr, Seyyed Ehsan Mahmoudi

We present UniRank, a multi-agent LLM pipeline that estimates university positions across global ranking systems using only publicly available bibliometric data from OpenAlex and Semantic Scholar. The system employs a three-stage architecture: (a) zero-shot estimation from anonymized institutional metrics, (b) per-system tool-augmented calibration against real ranked universities, and (c) final synthesis. Critically, institutions are anonymized -- names, countries, DOIs, paper titles, and collaboration countries are all redacted -- and their actual ranks are hidden from the calibration tools during evaluation, preventing LLM memorization from confounding results. On the Times Higher Education (THE) World University Rankings ($n=352$), the system achieves MAE = 251.5 rank positions, Median AE = 131.5, PNMAE = 12.03%, Spearman $ρ= 0.769$, Kendall $τ= 0.591$, hit rate @50 = 20.7%, hit rate @100 = 39.8%, and a Memorization Index of exactly zero (no exact-match zero-width predictions among all 352 universities). The systematic positive-signed error (+190.1 positions, indicating the system consistently predicts worse ranks than actual) and monotonic performance degradation from elite tier (MAE = 60.5, hit@100 = 90.5%) to tail tier (MAE = 328.2, hit@100 = 20.8%) provide strong evidence that the pipeline performs genuine analytical reasoning rather than recalling memorized rankings. A live demo is available at https://unirank.scinito.ai .

CLJun 29, 2021

SAT Based Analogy Evaluation Framework for Persian Word Embeddings

Seyyed Ehsan Mahmoudi, Mehrnoush Shamsfard

In recent years there has been a special interest in word embeddings as a new approach to convert words to vectors. It has been a focal point to understand how much of the semantics of the the words has been transferred into embedding vectors. This is important as the embedding is going to be used as the basis for downstream NLP applications and it will be costly to evaluate the application end-to-end in order to identify quality of the used embedding model. Generally the word embeddings are evaluated through a number of tests, including analogy test. In this paper we propose a test framework for Persian embedding models. Persian is a low resource language and there is no rich semantic benchmark to evaluate word embedding models for this language. In this paper we introduce an evaluation framework including a hand crafted Persian SAT based analogy dataset, a colliquial test set (specific to Persian) and a benchmark to study the impact of various parameters on the semantic evaluation task.

Seyyed Ehsan Mahmoudi

2 Papers