CV AINov 23, 2022

Can Machines Imitate Humans? Integrative Turing-like tests for Language and Vision Demonstrate a Narrowing Gap

Mengmi Zhang, Elisa Pavarino, Xiao Liu, Giorgia Dellaferrera, Ankur Sikarwar, Caishun Chen, Marcelo Armendariz, Noga Mudrik, Prachi Agrawal, Spandan Madan, Mranmay Shetty, Andrei Barbu

arXiv:2211.13087v32.61 citationsh-index: 12

Originality Incremental advance

AI Analysis

This work addresses the critical problem of assessing AI's human-likeness for applications in daily life, providing large-scale datasets and metrics as benchmarks, though it is incremental in refining evaluation methods.

The study systematically benchmarked AI's ability to imitate humans across language and vision tasks, finding that current AIs are approaching the ability to convincingly impersonate humans and deceive human judges, with AI judges outperforming humans in distinguishing AI from human responses.

As AI becomes increasingly embedded in daily life, ascertaining whether an agent is human is critical. We systematically benchmark AI's ability to imitate humans in three language tasks (image captioning, word association, conversation) and three vision tasks (color estimation, object detection, attention prediction), collecting data from 636 humans and 37 AI agents. Next, we conducted 72,191 Turing-like tests with 1,916 human judges and 10 AI judges. Current AIs are approaching the ability to convincingly impersonate humans and deceive human judges in both language and vision. Even simple AI judges outperformed humans in distinguishing AI from human responses. Imitation ability showed minimal correlation with conventional AI performance metrics, suggesting that passing as human is an important independent evaluation criterion. The large-scale Turing datasets and metrics introduced here offer valuable benchmarks for assessing human-likeness in AI and highlight the importance of rigorous, quantitative imitation tests for AI development.

View on arXiv PDF

Similar