AIJul 1, 2024Code
CRAB: Cross-environment Agent Benchmark for Multimodal Language Model AgentsTianqi Xu, Linyao Chen, Dai-Jie Wu et al.
The development of autonomous agents increasingly relies on Multimodal Language Models (MLMs) to perform tasks described in natural language with GUI environments, such as websites, desktop computers, or mobile phones. Existing benchmarks for MLM agents in interactive environments are limited by their focus on a single environment, lack of detailed and generalized evaluation methods, and the complexities of constructing tasks and evaluators. To overcome these limitations, we introduce Crab, the first agent benchmark framework designed to support cross-environment tasks, incorporating a graph-based fine-grained evaluation method and an efficient mechanism for task and evaluator construction. Our framework supports multiple devices and can be easily extended to any environment with a Python interface. Leveraging Crab, we developed a cross-platform Crab Benchmark-v0 comprising 120 tasks in computer desktop and mobile phone environments. We evaluated four advanced MLMs using different single and multi-agent system configurations on this benchmark. The experimental results demonstrate that the single agent with GPT-4o achieves the best completion ratio of 38.01%. All framework code, agent code, and task datasets are publicly available at https://github.com/camel-ai/crab.
SPAug 26, 2023
A Two-Dimensional Deep Network for RF-based Drone Detection and Identification Towards Secure Coverage ExtensionZixiao Zhao, Qinghe Du, Xiang Yao et al.
As drones become increasingly prevalent in human life, they also raises security concerns such as unauthorized access and control, as well as collisions and interference with manned aircraft. Therefore, ensuring the ability to accurately detect and identify between different drones holds significant implications for coverage extension. Assisted by machine learning, radio frequency (RF) detection can recognize the type and flight mode of drones based on the sampled drone signals. In this paper, we first utilize Short-Time Fourier. Transform (STFT) to extract two-dimensional features from the raw signals, which contain both time-domain and frequency-domain information. Then, we employ a Convolutional Neural Network (CNN) built with ResNet structure to achieve multi-class classifications. Our experimental results show that the proposed ResNet-STFT can achieve higher accuracy and faster convergence on the extended dataset. Additionally, it exhibits balanced performance compared to other baselines on the raw dataset.
GNMar 28, 2020
Knowledge synthesis from 100 million biomedical documents augments the deep expression profiling of coronavirus receptorsAJ Venkatakrishnan, Arjun Puranik, Akash Anand et al.
The COVID-19 pandemic demands assimilation of all available biomedical knowledge to decode its mechanisms of pathogenicity and transmission. Despite the recent renaissance in unsupervised neural networks for decoding unstructured natural languages, a platform for the real-time synthesis of the exponentially growing biomedical literature and its comprehensive triangulation with deep omic insights is not available. Here, we present the nferX platform for dynamic inference from over 45 quadrillion possible conceptual associations extracted from unstructured biomedical text, and their triangulation with Single Cell RNA-sequencing based insights from over 25 tissues. Using this platform, we identify intersections between the pathologic manifestations of COVID-19 and the comprehensive expression profile of the SARS-CoV-2 receptor ACE2. We find that tongue keratinocytes and olfactory epithelial cells are likely under-appreciated targets of SARS-CoV-2 infection, correlating with reported loss of sense of taste and smell as early indicators of COVID-19 infection, including in otherwise asymptomatic patients. Airway club cells, ciliated cells and type II pneumocytes in the lung, and enterocytes of the gut also express ACE2. This study demonstrates how a holistic data science platform can leverage unprecedented quantities of structured and unstructured publicly available data to accelerate the generation of impactful biological insights and hypotheses.