Dennis Zyska

26.5CLFeb 24, 2023Code

CARE: Collaborative AI-Assisted Reading Environment

Dennis Zyska, Nils Dycke, Jan Buchmann et al.

Recent years have seen impressive progress in AI-assisted writing, yet the developments in AI-assisted reading are lacking. We propose inline commentary as a natural vehicle for AI-based reading assistance, and present CARE: the first open integrated platform for the study of inline commentary and reading. CARE facilitates data collection for inline commentaries in a commonplace collaborative reading environment, and provides a framework for enhancing reading with NLP-based assistance, such as text classification, generation or question answering. The extensible behavioral logging allows unique insights into the reading and commenting behavior, and flexible configuration makes the platform easy to deploy in new scenarios. To evaluate CARE in action, we apply the platform in a user study dedicated to scholarly peer review. CARE facilitates the data collection and study of inline commentary in NLP, extrinsic evaluation of NLP assistance, and application prototyping. We invite the community to explore and build upon the open source implementation of CARE.

8.3CLApr 14

Exposía: Teaching and Assessment of Academic Writing Skills for Research Project Proposals and Peer Feedback

Dennis Zyska, Alla Rozovskaya, Ilia Kuznetsov et al.

We present Exposía, the first public dataset that connects writing and feedback in higher education, enabling research on educationally grounded computational approaches to teaching and evaluating academic writing. Exposía includes student research project proposals and peer and instructor feedback consisting of comments and free-text reviews. The dataset was collected in the "Introduction to Scientific Work" course of the Computer Science. Exposía reflects the multi-stage nature of the academic writing process that includes drafting, receiving feedback, and revising the writing based on the feedback received. Both the project proposals and peer feedback are accompanied by human assessment scores based on a fine-grained, pedagogically-grounded schema for writing and feedback assessment that we develop. We use Exposía to benchmark state-of-the-art large language models (LLMs) on two tasks: automated scoring of (1) the proposals and (2) the student reviews. We find that the two tasks benefit from different LLMs. Furthermore, closed-source models consistently outperform open-weight models, motivating further research on improving the performance of open-weight models preferred in classroom settings. Finally, we establish that a prompting strategy that scores multiple aspects of the writing together is the most effective, an important finding for classroom deployment.

Dennis Zyska

2 Papers