Song Yao

5.1EMJul 5

When the Scaffold Stays On: AI, Practice Style, and Screening in Elite Skill Formation

Song Yao

Generative AI raises short-term productivity by completing tasks that learners would otherwise practice on their own. Whether this exchange erodes frontier skill depends on the mode of use: substitute-users let AI stand in for deliberate practice and fail to develop skill, while complement-users use it to accelerate skill development. For institutions that train and certify talent, the design question is not whether to allow AI but how to govern the mode of its use. We ask whether AI-prohibited evaluation gates can separate the two modes. In elite competitive programming, the International Collegiate Programming Contest (ICPC) and the International Olympiad in Informatics (IOI) prohibit AI under in-person proctoring, with qualification-round entry, whereas Codeforces (CF) practice is unproctored and open to all. From CF submission histories we build an AI-prompt signature, more first-attempt acceptances, fewer attempts, fewer debugging retries, consistent with AI-assisted practice. CF practice has shifted toward this signature across entry cohorts spanning two AI rollouts. In CF contests, a stronger signature predicts smaller rating gains for users with no ICPC-IOI affiliation, but not for those who qualified. Inside the AI-prohibited ICPC environment, a shift toward AI-style practice predicts higher non-AI-aided scores for AI-era entrants. The same signature carries opposite signs across the two environments, exactly the pattern a type-separating gate predicts. The message is constructive: AI-style practice is compatible with frontier skill; the erosion risk links to the substitute mode; and that mode is separable by gates standard at credential boundaries, from medical and legal boards to professional certification.

27.7CLDec 1, 2016

ESE: Efficient Speech Recognition Engine with Sparse LSTM on FPGA

Song Han, Junlong Kang, Huizi Mao et al.

Long Short-Term Memory (LSTM) is widely used in speech recognition. In order to achieve higher prediction accuracy, machine learning scientists have built larger and larger models. Such large model is both computation intensive and memory intensive. Deploying such bulky model results in high power consumption and leads to high total cost of ownership (TCO) of a data center. In order to speedup the prediction and make it energy efficient, we first propose a load-balance-aware pruning method that can compress the LSTM model size by 20x (10x from pruning and 2x from quantization) with negligible loss of the prediction accuracy. The pruned model is friendly for parallel processing. Next, we propose scheduler that encodes and partitions the compressed model to each PE for parallelism, and schedule the complicated LSTM data flow. Finally, we design the hardware architecture, named Efficient Speech Recognition Engine (ESE) that works directly on the compressed model. Implemented on Xilinx XCKU060 FPGA running at 200MHz, ESE has a performance of 282 GOPS working directly on the compressed LSTM network, corresponding to 2.52 TOPS on the uncompressed one, and processes a full LSTM for speech recognition with a power dissipation of 41 Watts. Evaluated on the LSTM for speech recognition benchmark, ESE is 43x and 3x faster than Core i7 5930k CPU and Pascal Titan X GPU implementations. It achieves 40x and 11.5x higher energy efficiency compared with the CPU and GPU respectively.

Song Yao

2 Papers