MAJul 12, 2023
Self-Adaptive Large Language Model (LLM)-Based Multiagent SystemsNathalia Nascimento, Paulo Alencar, Donald Cowan
In autonomic computing, self-adaptation has been proposed as a fundamental paradigm to manage the complexity of multiagent systems (MASs). This achieved by extending a system with support to monitor and adapt itself to achieve specific concerns of interest. Communication in these systems is key given that in scenarios involving agent interaction, it enhances cooperation and reduces coordination challenges by enabling direct, clear information exchange. However, improving the expressiveness of the interaction communication with MASs is not without challenges. In this sense, the interplay between self-adaptive systems and effective communication is crucial for future MAS advancements. In this paper, we propose the integration of large language models (LLMs) such as GPT-based technologies into multiagent systems. We anchor our methodology on the MAPE-K model, which is renowned for its robust support in monitoring, analyzing, planning, and executing system adaptations in response to dynamic environments. We also present a practical illustration of the proposed approach, in which we implement and assess a basic MAS-based application. The approach significantly advances the state-of-the-art of self-adaptive systems by proposing a new paradigm for MAS self-adaptation of autonomous systems based on LLM capabilities.
MAAug 21, 2023
GPT-in-the-Loop: Adaptive Decision-Making for Multiagent SystemsNathalia Nascimento, Paulo Alencar, Donald Cowan
This paper introduces the "GPT-in-the-loop" approach, a novel method combining the advanced reasoning capabilities of Large Language Models (LLMs) like Generative Pre-trained Transformers (GPT) with multiagent (MAS) systems. Venturing beyond traditional adaptive approaches that generally require long training processes, our framework employs GPT-4 for enhanced problem-solving and explanation skills. Our experimental backdrop is the smart streetlight Internet of Things (IoT) application. Here, agents use sensors, actuators, and neural networks to create an energy-efficient lighting system. By integrating GPT-4, these agents achieve superior decision-making and adaptability without the need for extensive training. We compare this approach with both traditional neuroevolutionary methods and solutions provided by software engineers, underlining the potential of GPT-driven multiagent systems in IoT. Structurally, the paper outlines the incorporation of GPT into the agent-driven Framework for the Internet of Things (FIoT), introduces our proposed GPT-in-the-loop approach, presents comparative results in the IoT context, and concludes with insights and future directions.
AINov 20, 2023
GPT in Data Science: A Practical Exploration of Model SelectionNathalia Nascimento, Cristina Tavares, Paulo Alencar et al.
There is an increasing interest in leveraging Large Language Models (LLMs) for managing structured data and enhancing data science processes. Despite the potential benefits, this integration poses significant questions regarding their reliability and decision-making methodologies. It highlights the importance of various factors in the model selection process, including the nature of the data, problem type, performance metrics, computational resources, interpretability vs accuracy, assumptions about data, and ethical considerations. Our objective is to elucidate and express the factors and assumptions guiding GPT-4's model selection recommendations. We employ a variability model to depict these factors and use toy datasets to evaluate both the model and the implementation of the identified heuristics. By contrasting these outcomes with heuristics from other platforms, our aim is to determine the effectiveness and distinctiveness of GPT-4's methodology. This research is committed to advancing our comprehension of AI decision-making processes, especially in the realm of model selection within data science. Our efforts are directed towards creating AI systems that are more transparent and comprehensible, contributing to a more responsible and efficient practice in data science.
CLNov 28, 2023
Comparing Generative Chatbots Based on Process RequirementsLuis Fernando Lins, Nathalia Nascimento, Paulo Alencar et al.
Business processes are commonly represented by modelling languages, such as Event-driven Process Chain (EPC), Yet Another Workflow Language (YAWL), and the most popular standard notation for modelling business processes, the Business Process Model and Notation (BPMN). Most recently, chatbots, programs that allow users to interact with a machine using natural language, have been increasingly used for business process execution support. A recent category of chatbots worth mentioning is generative-based chatbots, powered by Large Language Models (LLMs) such as OpenAI's Generative Pre-Trained Transformer (GPT) model and Google's Pathways Language Model (PaLM), which are trained on billions of parameters and support conversational intelligence. However, it is not clear whether generative-based chatbots are able to understand and meet the requirements of constructs such as those provided by BPMN for process execution support. This paper presents a case study to compare the performance of prominent generative models, GPT and PaLM, in the context of process execution support. The research sheds light into the challenging problem of using conversational approaches supported by generative chatbots as a means to understand process-aware modelling notations and support users to execute their tasks.
LGNov 23, 2023
Extending Variability-Aware Model Selection with Bias Detection in Machine Learning ProjectsCristina Tavares, Nathalia Nascimento, Paulo Alencar et al.
Data science projects often involve various machine learning (ML) methods that depend on data, code, and models. One of the key activities in these projects is the selection of a model or algorithm that is appropriate for the data analysis at hand. ML model selection depends on several factors, which include data-related attributes such as sample size, functional requirements such as the prediction algorithm type, and non-functional requirements such as performance and bias. However, the factors that influence such selection are often not well understood and explicitly represented. This paper describes ongoing work on extending an adaptive variability-aware model selection method with bias detection in ML projects. The method involves: (i) modeling the variability of the factors that affect model selection using feature models based on heuristics proposed in the literature; (ii) instantiating our variability model with added features related to bias (e.g., bias-related metrics); and (iii) conducting experiments that illustrate the method in a specific case study to illustrate our approach based on a heart failure prediction project. The proposed approach aims to advance the state of the art by making explicit factors that influence model selection, particularly those related to bias, as well as their interactions. The provided representations can transform model selection in ML projects into a non ad hoc, adaptive, and explainable process.
AIMay 13
An Agentic LLM-Based Framework for Population-Scale Mental Health ScreeningGiuliano Lorenzoni, Paulo Alencar, Donald Cowan
Mental health disorders affect millions worldwide, and healthcare systems are increasingly overwhelmed by the volume of clinical data generated from electronic records, telemedicine platforms, and population-level screening programs. At the same time, the emergence of novel AI-based approaches in healthcare calls for intelligent frameworks capable of processing domain-specific unstructured clinical information while adapting to patient-specific needs. This paper proposes an agentic framework for building robust LLM-based pipelines, where each stage is encapsulated as a LangChain agent governed by explicit policies and proxy-guided evaluation. Stages are incrementally locked once validated, ensuring that later adaptations cannot overwrite configurations without demonstrated improvement. The proposed framework evolves from feature-level exploration, through proxy-based tuning and freeze/rollback mechanisms, to full orchestration by an Orchestrator Agent that coordinates preprocessing, retrieval, selection, diversity, threshold optimization, and decoding. A proof-of-concept in transcript-based depression detection demonstrates that the framework converges to stable configurations, such as cosine similarity, dynamic Top-k, and threshold 0.75, while controlling evaluation costs and avoiding regressions. These results highlight the potential of agentic AI to enable population-level mental health screening over large clinical datasets, addressing critical challenges in trustworthiness, reproducibility, and adaptability required in healthcare environments.
AIMay 12
LLM-X: A Scalable Negotiation-Oriented Exchange for Communication Among Personal LLM AgentsGiuliano Lorenzoni, Paulo Alencar, Donald Cowan
We propose a personal-LLM exchange (LLM-X), a scalable negotiation-oriented environment that enables direct, structured communication across populations of personal agents (LLMs), each representing an individual user. Unlike existing tool-centric protocols that focus on agent-API interaction, LLM-X introduces a message bus and routing substrate for LLM-to-LLM coordination with guarantees around schema validity and policy enforcement. We contribute: (1) an architecture for LLM-X comprising federated gateways, topic-based routing, and policy enforcement; (2) a typed message protocol supporting capability negotiation and contract-net-style coordination; and (3) the first empirical evaluation of LLM-based multi-agent negotiation at scale. Experiments span 5, 9, and 12 agents, under distinct negotiation policies (Low, Medium, High), and across both short-run (minutes) and long-run (2h, 12h) load conditions. Results highlight clear policy-performance trade-offs: stricter policies improve robustness and fairness but increase latencies and message volume. Extended runs confirm that LLM-X remains stable under sustained load, with bounded latency drift.
AIMay 11
CPEMH: An Agentic Framework for Prompt-Driven Behavior Evaluation and Assurance in Foundation-Model Systems for Mental Health ScreeningGiuliano Lorenzoni, Ivens Portugal, Paulo Alencar et al.
This paper presents CPEMH, an agentic framework designed to evaluate prompt-driven behavior in foundation-model systems operating on transcript-based datasets for mental-health screening. CPEMH serves as an engineering methodology for behavioral assurance in large-scale language systems, introducing an orchestrated architecture that autonomously performs the design, evaluation, and selection of prompt strategies, enabling systematic control of behavioral variability across contexts. Its modular agentic design, combining orchestrator, inference, and evaluation agents, ensures traceability, reproducibility, and robustness throughout the prompting lifecycle. A case study on automated depression screening from interview transcripts demonstrates the framework's capacity to stabilize and audit foundation-model behavior in conversational and clinically sensitive domains. Lessons learned emphasize the role of modular orchestration in behavioral assurance, the prioritization of stability over architectural complexity, and the integration of F1, bias, and robustness as core acceptance criteria.
CLApr 3, 2024
Assessing ML Classification Algorithms and NLP Techniques for Depression Detection: An Experimental Case StudyGiuliano Lorenzoni, Cristina Tavares, Nathalia Nascimento et al.
Depression has affected millions of people worldwide and has become one of the most common mental disorders. Early mental disorder detection can reduce costs for public health agencies and prevent other major comorbidities. Additionally, the shortage of specialized personnel is very concerning since Depression diagnosis is highly dependent on expert professionals and is time-consuming. Recent research has evidenced that machine learning (ML) and Natural Language Processing (NLP) tools and techniques have significantly bene ted the diagnosis of depression. However, there are still several challenges in the assessment of depression detection approaches in which other conditions such as post-traumatic stress disorder (PTSD) are present. These challenges include assessing alternatives in terms of data cleaning and pre-processing techniques, feature selection, and appropriate ML classification algorithms. This paper tackels such an assessment based on a case study that compares different ML classifiers, specifically in terms of data cleaning and pre-processing, feature selection, parameter setting, and model choices. The case study is based on the Distress Analysis Interview Corpus - Wizard-of-Oz (DAIC-WOZ) dataset, which is designed to support the diagnosis of mental disorders such as depression, anxiety, and PTSD. Besides the assessment of alternative techniques, we were able to build models with accuracy levels around 84% with Random Forest and XGBoost models, which is significantly higher than the results from the comparable literature which presented the level of accuracy of 72% from the SVM model.
CLDec 31, 2024
GPT-4 on Clinic Depression Assessment: An LLM-Based Pilot StudyGiuliano Lorenzoni, Pedro Elkind Velmovitsky, Paulo Alencar et al.
Depression has impacted millions of people worldwide and has become one of the most prevalent mental disorders. Early mental disorder detection can lead to cost savings for public health agencies and avoid the onset of other major comorbidities. Additionally, the shortage of specialized personnel is a critical issue because clinical depression diagnosis is highly dependent on expert professionals and is time consuming. In this study, we explore the use of GPT-4 for clinical depression assessment based on transcript analysis. We examine the model's ability to classify patient interviews into binary categories: depressed and not depressed. A comparative analysis is conducted considering prompt complexity (e.g., using both simple and complex prompts) as well as varied temperature settings to assess the impact of prompt complexity and randomness on the model's performance. Results indicate that GPT-4 exhibits considerable variability in accuracy and F1-Score across configurations, with optimal performance observed at lower temperature values (0.0-0.2) for complex prompts. However, beyond a certain threshold (temperature >= 0.3), the relationship between randomness and performance becomes unpredictable, diminishing the gains from prompt complexity. These findings suggest that, while GPT-4 shows promise for clinical assessment, the configuration of the prompts and model parameters requires careful calibration to ensure consistent results. This preliminary study contributes to understanding the dynamics between prompt engineering and large language models, offering insights for future development of AI-powered tools in clinical settings.
CLDec 31, 2024
Exploring Variability in Fine-Tuned Models for Text Classification with DistilBERTGiuliano Lorenzoni, Ivens Portugal, Paulo Alencar et al.
This study evaluates fine-tuning strategies for text classification using the DistilBERT model, specifically the distilbert-base-uncased-finetuned-sst-2-english variant. Through structured experiments, we examine the influence of hyperparameters such as learning rate, batch size, and epochs on accuracy, F1-score, and loss. Polynomial regression analyses capture foundational and incremental impacts of these hyperparameters, focusing on fine-tuning adjustments relative to a baseline model. Results reveal variability in metrics due to hyperparameter configurations, showing trade-offs among performance metrics. For example, a higher learning rate reduces loss in relative analysis (p=0.027) but challenges accuracy improvements. Meanwhile, batch size significantly impacts accuracy and F1-score in absolute regression (p=0.028 and p=0.005) but has limited influence on loss optimization (p=0.170). The interaction between epochs and batch size maximizes F1-score (p=0.001), underscoring the importance of hyperparameter interplay. These findings highlight the need for fine-tuning strategies addressing non-linear hyperparameter interactions to balance performance across metrics. Such variability and metric trade-offs are relevant for tasks beyond text classification, including NLP and computer vision. This analysis informs fine-tuning strategies for large language models and promotes adaptive designs for broader model applicability.
SEMay 19, 2023
Comparing Software Developers with ChatGPT: An Empirical InvestigationNathalia Nascimento, Paulo Alencar, Donald Cowan
The advent of automation in particular Software Engineering (SE) tasks has transitioned from theory to reality. Numerous scholarly articles have documented the successful application of Artificial Intelligence to address issues in areas such as project management, modeling, testing, and development. A recent innovation is the introduction of ChatGPT, an ML-infused chatbot, touted as a resource proficient in generating programming codes and formulating software testing strategies for developers and testers respectively. Although there is speculation that AI-based computation can increase productivity and even substitute software engineers in software development, there is currently a lack of empirical evidence to verify this. Moreover, despite the primary focus on enhancing the accuracy of AI systems, non-functional requirements including energy efficiency, vulnerability, fairness (i.e., human bias), and safety frequently receive insufficient attention. This paper posits that a comprehensive comparison of software engineers and AI-based solutions, considering various evaluation criteria, is pivotal in fostering human-machine collaboration, enhancing the reliability of AI-based methods, and understanding task suitability for humans or AI. Furthermore, it facilitates the effective implementation of cooperative work structures and human-in-the-loop processes. This paper conducts an empirical investigation, contrasting the performance of software engineers and AI systems, like ChatGPT, across different evaluation metrics. The empirical study includes a case of assessing ChatGPT-generated code versus code produced by developers and uploaded in Leetcode.
AIFeb 15, 2021
A Reference Model for IoT Embodied Agents Controlled by Neural NetworksNathalia Nascimento, Paulo Alencar, Donald Cowan et al.
Embodied agents is a term used to denote intelligent agents, which are a component of devices belonging to the Internet of Things (IoT) domain. Each agent is provided with sensors and actuators to interact with the environment, and with a 'controller' that usually contains an artificial neural network (ANN). In previous publications, we introduced three software approaches to design, implement and test IoT embodied agents. In this paper, we propose a reference model based on statecharts that offers abstractions tailored to the development of IoT applications. The model represents embodied agents that are controlled by neural networks. Our model includes the ANN training process, represented as a reconfiguration step such as changing agent features or neural net connections. Our contributions include the identification of the main characteristics of IoT embodied agents, a reference model specification based on statecharts, and an illustrative application of the model to support autonomous street lights. The proposal aims to support the design and implementation of IoT applications by providing high-level design abstractions and models, thus enabling the designer to have a uniform approach to conceiving, designing and explaining such applications.
SEFeb 15, 2021
Machine Learning Model Development from a Software Engineering Perspective: A Systematic Literature ReviewGiuliano Lorenzoni, Paulo Alencar, Nathalia Nascimento et al.
Data scientists often develop machine learning models to solve a variety of problems in the industry and academy but not without facing several challenges in terms of Model Development. The problems regarding Machine Learning Development involves the fact that such professionals do not realize that they usually perform ad-hoc practices that could be improved by the adoption of activities presented in the Software Engineering Development Lifecycle. Of course, since machine learning systems are different from traditional Software systems, some differences in their respective development processes are to be expected. In this context, this paper is an effort to investigate the challenges and practices that emerge during the development of ML models from the software engineering perspective by focusing on understanding how software developers could benefit from applying or adapting the traditional software engineering process to the Machine Learning workflow.
SEFeb 10, 2021
A Cognitive and Machine Learning-Based Software Development Paradigm Supported by ContextGlaucia Melo, Paulo Alencar, Donald Cowan
Advances in the use of cognitive and machine learning (ML) enabled systems fuel the quest for novel approaches and tools to support software developers in executing their tasks. First, as software development is a complex and dynamic activity, these tasks are highly dependent on the characteristics of the software project and its context, and developers need comprehensive support in terms of information and guidance based on the task context. Second, there is a lack of methods based on conversational-guided agents that consider cognitive aspects such as paying attention and remembering. Third, there is also a lack of techniques that make use of historical implicit or tacit data to infer new knowledge about the project tasks such as related tasks, task experts, relevant information needed for task completion and warnings, and navigation aspects of the process such as what tasks to perform next and optimal task sequencing. Based on these challenges, this paper introduces a novel paradigm for human-machine software support based on context, cognitive assistance, and machine learning, and briefly describes ongoing research activities to realize this paradigm. The research takes advantage of the synergy among emergent methods provided in context-aware software processes, cognitive computing such as chatbots, and machine learning such as recommendation systems. These novel paradigms have the potential to transform the way software development currently occurs by allowing developers to receive valuable information and guidance in real-time while they are participating in projects.
SEDec 25, 2018
The Next Generation of Metadata-Oriented Testing of Research SoftwareDoug Mulholland, Paulo Alencar, Donald Cowan
Research software refers to software development tools that accelerate discovery and simplifies access to digital infrastructures. However, although research software platforms can be built increasingly more innovative and powerful than ever before, with increasing complexity there is a greater risk of failure if unplanned for and untested program scenarios arise. As systems age and are changed by different programmers the risk of a change impacting the overall system increases. In contrast, systems that are built with less emphasis on program code and more emphasis on data that describes the application can be more readily changed and maintained by individuals who are less technically skilled but are often more familiar with the application domain. Such systems can also be tested using automatically generated advanced testing regimes.
SEDec 25, 2018
A Variability-Aware Design Approach to the Data Analysis Modeling ProcessMaria Cristina Vale Tavares, Paulo Alencar, Donald Cowan
The massive amount of current data has led to many different forms of data analysis processes that aim to explore this data to uncover valuable insights. Methodologies to guide the development of big data science projects, including CRISP-DM and SEMMA, have been widely used in industry and academia. The data analysis modeling phase, which involves decisions on the most appropriate models to adopt, is at the core of these projects. However, from a software engineering perspective, the design and automation of activities performed in this phase are challenging. In this paper, we propose an approach to the data analysis modeling process which involves (i) the assessment of the variability inherent in the CRISP-DM data analysis modeling phase and the provision of feature models that represent this variability; (ii) the definition of a framework structural design that captures the identified variability; and (iii) evaluation of the developed framework design in terms of the possibilities for process automation. The proposed approach advances the state of the art by offering a variability-aware design solution that can enhance system flexibility, potentially leading to novel software frameworks which can significantly improve the level of automation in data analysis modeling process.
SEDec 25, 2018
The iEnvironment Platform: Developing an Open Science Software Platform for Integrated Environmental Monitoring and Modeling of Surface WaterPaulo Alencar, Donald Cowan, Doug Mulholland
This paper describes the development of iEnvironment, an open science software platform that supports monitoring and modeling of aspects of surface water. The platform supports science and engineering research, especially in the context of the creation, sharing, analysis and maintenance of big and open data. In this era of big data, iEnvironment facilitates access to open data resources and research collaboration among science and research disciplines supported by computer scientists and software developers.
AIDec 14, 2018
An IoT Analytics Embodied Agent Model based on Context-Aware Machine LearningNathalia Nascimento, Paulo Alencar, Carlos Lucena et al.
Agent-based Internet of Things (IoT) applications have recently emerged as applications that can involve sensors, wireless devices, machines and software that can exchange data and be accessed remotely. Such applications have been proposed in several domains including health care, smart cities and agriculture. However, despite their increased adoption, deploying these applications in specific settings has been very challenging because of the complex static and dynamic variability of the physical devices such as sensors and actuators, the software application behavior and the environment in which the application is embedded. In this paper, we propose a modeling approach for IoT analytics based on learning embodied agents (i.e. situated agents). The approach involves: (i) a variability model of IoT embodied agents; (ii) feedback evaluative machine learning; and (iii) reconfiguration of a group of agents in accordance with environmental context. The proposed approach advances the state of the art in that it facilitates the development of Agent-based IoT applications by explicitly capturing their complex and dynamic variabilities and supporting their self-configuration based on an context-aware and machine learning-based approach.
MAFeb 12, 2018
Machine Learning-based Variability Handling in IoT AgentsNathalia Nascimento, Paulo Alencar, Carlos Lucena et al.
Agent-based IoT applications have recently been proposed in several domains, such as health care, smart cities and agriculture. Deploying these applications in specific settings has been very challenging for many reasons including the complex static and dynamic variability of the physical devices such as sensors and actuators, the software application behavior and the environment in which the application is embedded. In this paper, we propose a self-configurable IoT agent approach based on feedback-evaluative machine-learning. The approach involves: i) a variability model of IoT agents; ii) generation of sets of customized agents; iii) feedback evaluative machine learning; iv) modeling and composition of a group of IoT agents; and v) a feature-selection method based on manual and automatic feedback.
SEFeb 4, 2018
Software Engineers vs. Machine Learning Algorithms: An Empirical Study Assessing Performance and Reuse TasksNathalia Nascimento, Carlos Lucena, Paulo Alencar et al.
Several papers have recently contained reports on applying machine learning (ML) to the automation of software engineering (SE) tasks, such as project management, modeling and development. However, there appear to be no approaches comparing how software engineers fare against machine-learning algorithms as applied to specific software development tasks. Such a comparison is essential to gain insight into which tasks are better performed by humans and which by machine learning and how cooperative work or human-in-the-loop processes can be implemented more effectively. In this paper, we present an empirical study that compares how software engineers and machine-learning algorithms perform and reuse tasks. The empirical study involves the synthesis of the control structure of an autonomous streetlight application. Our approach consists of four steps. First, we solved the problem using machine learning to determine specific performance and reuse tasks. Second, we asked software engineers with different domain knowledge levels to provide a solution to the same tasks. Third, we compared how software engineers fare against machine-learning algorithms when accomplishing the performance and reuse tasks based on criteria such as energy consumption and safety. Finally, we analyzed the results to understand which tasks are better performed by either humans or algorithms so that they can work together more effectively. Such an understanding and the resulting human-in-the-loop approaches, which take into account the strengths and weaknesses of humans and machine-learning algorithms, are fundamental not only to provide a basis for cooperative work in support of software engineering, but also, in other areas.
SEFeb 24, 2016
A Survey on Domain-Specific Languages for Machine Learning in Big DataIvens Portugal, Paulo Alencar, Donald Cowan
The amount of data generated in the modern society is increasing rapidly. New problems and novel approaches of data capture, storage, analysis and visualization are responsible for the emergence of the Big Data research field. Machine Learning algorithms can be used in Big Data to make better and more accurate inferences. However, because of the challenges Big Data imposes, these algorithms need to be adapted and optimized to specific applications. One important decision made by software engineers is the choice of the language that is used in the implementation of these algorithms. Therefore, this literature survey identifies and describes domain-specific languages and frameworks used for Machine Learning in Big Data. By doing this, software engineers can then make more informed choices and beginners have an overview of the main languages used in this domain.
SENov 17, 2015
The Use of Machine Learning Algorithms in Recommender Systems: A Systematic ReviewIvens Portugal, Paulo Alencar, Donald Cowan
Recommender systems use algorithms to provide users with product or service recommendations. Recently, these systems have been using machine learning algorithms from the field of artificial intelligence. However, choosing a suitable machine learning algorithm for a recommender system is difficult because of the number of algorithms described in the literature. Researchers and practitioners developing recommender systems are left with little information about the current approaches in algorithm usage. Moreover, the development of a recommender system using a machine learning algorithm often has problems and open questions that must be evaluated, so software engineers know where to focus research efforts. This paper presents a systematic review of the literature that analyzes the use of machine learning algorithms in recommender systems and identifies research opportunities for software engineering research. The study concludes that Bayesian and decision tree algorithms are widely used in recommender systems because of their relative simplicity, and that requirement and design phases of recommender system development appear to offer opportunities for further research.
SENov 17, 2015
Requirements Engineering for General Recommender SystemsIvens Portugal, Paulo Alencar, Donald Cowan
In requirements engineering for recommender systems, software engineers must identify the data that drives the recommendations. This is a labor-intensive task, which is error-prone and expensive. One possible solution to this problem is the adoption of automatic recommender system development approach based on a general recommender framework. One step towards the creation of such a framework is to determine the type of data used in recommender systems. In this paper, a systematic review has been conducted to identify the type of user and recommendation data items needed by a general recommender system. A user and item model is proposed, and some considerations about algorithm specific parameters are explained. A further goal is to study the impact of the fields of big data and Internet of things on the development of recommender systems.