CYJul 26, 2024
TAGIFY: LLM-powered Tagging Interface for Improved Data Findability on OGD portalsKevin Kliimask, Anastasija Nikiforova
Efforts directed towards promoting Open Government Data (OGD) have gained significant traction across various governmental tiers since the mid-2000s. As more datasets are published on OGD portals, finding specific data becomes harder, leading to information overload. Complete and accurate documentation of datasets, including association of proper tags with datasets is key to improving dataset findability and accessibility. Analysis conducted on the Estonian Open Data Portal, revealed that 11% datasets have no associated tags, while 26% had only one tag assigned to them, which underscores challenges in data findability and accessibility within the portal, which, according to the recent Open Data Maturity Report, is considered trend-setter. The aim of this study is to propose an automated solution to tagging datasets to improve data findability on OGD portals. This paper presents Tagify - a prototype of tagging interface that employs large language models (LLM) such as GPT-3.5-turbo and GPT-4 to automate dataset tagging, generating tags for datasets in English and Estonian, thereby augmenting metadata preparation by data publishers and improving data findability on OGD portals by data users. The developed solution was evaluated by users and their feedback was collected to define an agenda for future prototype improvements.
CYJul 29, 2024
To accept or not to accept? An IRT-TOE Framework to Understand Educators' Resistance to Generative AI in Higher EducationJan-Erik Kalmus, Anastasija Nikiforova
Since the public release of Chat Generative Pre-Trained Transformer (ChatGPT), extensive discourse has emerged concerning the potential advantages and challenges of integrating Generative Artificial Intelligence (GenAI) into education. In the realm of information systems, research on technology adoption is crucial for understanding the diverse factors influencing the uptake of specific technologies. Theoretical frameworks, refined and validated over decades, serve as guiding tools to elucidate the individual and organizational dynamics, obstacles, and perceptions surrounding technology adoption. However, while several models have been proposed, they often prioritize elucidating the factors that facilitate acceptance over those that impede it, typically focusing on the student perspective and leaving a gap in empirical evidence regarding educators viewpoints. Given the pivotal role educators play in higher education, this study aims to develop a theoretical model to empirically predict the barriers preventing educators from adopting GenAI in their classrooms. Acknowledging the lack of theoretical models tailored to identifying such barriers, our approach is grounded in the Innovation Resistance Theory (IRT) framework and augmented with constructs from the Technology-Organization-Environment (TOE) framework. This model is transformed into a measurement instrument employing a quantitative approach, complemented by a qualitative approach to enrich the analysis and uncover concerns related to GenAI adoption in the higher education domain.
CYMay 22, 2024
From the evolution of public data ecosystems to the evolving horizons of the forward-looking intelligent public data ecosystem empowered by emerging technologiesAnastasija Nikiforova, Martin Lnenicka, Petar Milić et al.
Public data ecosystems (PDEs) represent complex socio-technical systems crucial for optimizing data use in the public sector and outside it. Recognizing their multifaceted nature, previous research pro-posed a six-generation Evolutionary Model of Public Data Ecosystems (EMPDE). Designed as a result of a systematic literature review on the topic spanning three decade, this model, while theoretically robust, necessitates empirical validation to enhance its practical applicability. This study addresses this gap by validating the theoretical model through a real-life examination in five European countries - Latvia, Serbia, Czech Republic, Spain, and Poland. This empirical validation provides insights into PDEs dynamics and variations of implementations across contexts, particularly focusing on the 6th generation of forward-looking PDE generation named "Intelligent Public Data Generation" that represents a paradigm shift driven by emerging technologies such as cloud computing, Artificial Intelligence, Natural Language Processing tools, Generative AI, and Large Language Models (LLM) with potential to contribute to both automation and augmentation of business processes within these ecosystems. By transcending their traditional status as a mere component, evolving into both an actor and a stakeholder simultaneously, these technologies catalyze innovation and progress, enhancing PDE management strategies to align with societal, regulatory, and technical imperatives in the digital era.
CYSep 29, 2025
Responsible AI Adoption in the Public Sector: A Data-Centric Taxonomy of AI Adoption ChallengesAnastasija Nikiforova, Martin Lnenicka, Ulf Melin et al.
Despite Artificial Intelligence (AI) transformative potential for public sector services, decision-making, and administrative efficiency, adoption remains uneven due to complex technical, organizational, and institutional challenges. Responsible AI frameworks emphasize fairness, accountability, and transparency, aligning with principles of trustworthy AI and fair AI, yet remain largely aspirational, overlooking technical and institutional realities, especially foundational data and governance. This study addresses this gap by developing a taxonomy of data-related challenges to responsible AI adoption in government. Based on a systematic review of 43 studies and 21 expert evaluations, the taxonomy identifies 13 key challenges across technological, organizational, and environmental dimensions, including poor data quality, limited AI-ready infrastructure, weak governance, misalignment in human-AI decision-making, economic and environmental sustainability concerns. Annotated with institutional pressures, the taxonomy serves as a diagnostic tool to surface 'symptoms' of high-risk AI deployment and guides policymakers in building the institutional and data governance conditions necessary for responsible AI adoption.
DBJun 16, 2024
From Data Quality for AI to AI for Data Quality: A Systematic Review of Tools for AI-Augmented Data Quality Management in Data WarehousesHeidi Carolina Tamm, Anastasija Nikiforova
While high data quality (DQ) is critical for analytics, compliance, and AI performance, data quality management (DQM) remains a complex, resource-intensive, and often manual process. This study investigates the extent to which existing tools support AI-augmented data quality management (DQM) in data warehouse environments. To this end, we conduct a systematic review of 151 DQ tools to evaluate their automation capabilities, particularly in detecting and recommending DQ rules in data warehouses -- a key component of modern data ecosystems. Using a multi-phase screening process based on functionality, trialability, regulatory compliance (e.g., GDPR), and architectural compatibility with data warehouses, only 10 tools met the criteria for AI-augmented DQM. The analysis reveals that most tools emphasize data cleansing and preparation for AI, rather than leveraging AI to improve DQ itself. Although metadata- and ML-based rule detection techniques are present, features such as SQL-based rule specification, reconciliation logic, and explainability of AI-driven recommendations remain scarce. This study offers practical guidance for tool selection and outlines critical design requirements for next-generation AI-driven DQ solutions -- advocating a paradigm shift from ``data quality for AI'' to ``AI for data quality management''.
DBJul 9, 2020
Open Data Quality Evaluation: A Comparative Analysis of Open Data in LatviaAnastasija Nikiforova
Nowadays open data is entering the mainstream - it is free available for every stakeholder and is often used in business decision-making. It is important to be sure data is trustable and error-free as its quality problems can lead to huge losses. The research discusses how (open) data quality could be assessed. It also covers main points which should be considered developing a data quality management solution. One specific approach is applied to several Latvian open data sets. The research provides a step-by-step open data sets analysis guide and summarizes its results. It is also shown there could exist differences in data quality depending on data supplier (centralized and decentralized data releases) and, unfortunately, trustable data supplier cannot guarantee data quality problems absence. There are also underlined common data quality problems detected not only in Latvian open data but also in open data of 3 European countries.
SEJul 9, 2020
Application of LEAN Principles to Improve Business Processes: a Case Study in Latvian IT CompanyAnastasija Nikiforova, Zane Bicevska
The research deals with application of the LEAN principles to business processes of a typical IT company. The paper discusses LEAN principles amplifying advantages and shortcomings of their application. The authors suggest use of the LEAN principles as a tool to identify improvement potential for IT company's business processes and work-flow efficiency. During a case study the implementation of LEAN principles has been exemplified in business processes of a particular Latvian IT company. The obtained results and conclusions can be used for meaningful and successful application of LEAN principles and methods in projects of other IT companies.