Warren Li

4.9CLNov 12, 2025Code

Order Matters: Rethinking Prompt Construction in In-Context Learning

Warren Li, Yiqian Wang, Zihan Wang et al.

In-context learning (ICL) enables large language models to perform new tasks by conditioning on a sequence of examples. Most prior work reasonably and intuitively assumes that which examples are chosen has a far greater effect on performance than how those examples are ordered, leading to a focus on example selection. We revisit this assumption and conduct a systematic comparison between the effect of selection and ordering. Through controlled experiments on both classification and generation tasks, using multiple open-source model families (0.5B to 27B parameters) and GPT-5, we find that the variance in performance due to different example orderings is comparable to that from using entirely different example sets. Furthermore, we show that strong orderings can be identified using only a development set, achieving performance close to an oracle that selects the best ordering based on test labels. Our findings highlight the equal and intertwined importance of example selection and ordering in prompt design, calling for a reexamination of the assumptions held in ICL.

8.6HCApr 21

Hint-Writing with Deferred AI Assistance: Fostering Critical Engagement in Data Science Education

Anjali Singh, Christopher Brooks, Warren Li et al.

Generating hints for incorrect code is a cognitively demanding task that fosters learning and metacognitive development. This study investigates three designs for personalized, scalable, and reflective hint-writing activities within a data science course: (i) writing a hint independently, (ii) writing a hint with on-demand AI assistance, and (iii) deferred AI assistance, in which students first write a hint independently and then revise it with the help of an AI-generated one. We examine how AI support can scaffold the learning process without diminishing students' productive cognitive effort. Through a randomized controlled experiment with graduate-level students (N=97), we found that deferring AI assistance leads to the highest-quality hints. Further, this design helps students identify a wide range of mistakes they otherwise struggle to identify without any AI assistance. Students valued these activities as opportunities to practice debugging and critically engage with AI outputs--skills that are now critical for learners to acquire as programming becomes increasingly automated and the use of AI for learning grows. Our findings also highlight key considerations for designing student-AI collaborative learning experiences to sustain student engagement, maintain appropriate cognitive load, and mitigate negative effects of AI, such as introducing redundancies and extraneous information into student work.

Warren Li

2 Papers