Matthew S. Brown

CV
h-index2
4papers
8citations
Novelty49%
AI Score47

4 Papers

AIDec 2, 2022Code
SimpleMind adds thinking to deep neural networks

Youngwon Choi, M. Wasil Wahi-Anwar, Matthew S. Brown

Deep neural networks (DNNs) detect patterns in data and have shown versatility and strong performance in many computer vision applications. However, DNNs alone are susceptible to obvious mistakes that violate simple, common sense concepts and are limited in their ability to use explicit knowledge to guide their search and decision making. While overall DNN performance metrics may be good, these obvious errors, coupled with a lack of explainability, have prevented widespread adoption for crucial tasks such as medical image analysis. The purpose of this paper is to introduce SimpleMind, an open-source software framework for Cognitive AI focused on medical image understanding. It allows creation of a knowledge base that describes expected characteristics and relationships between image objects in an intuitive human-readable form. The SimpleMind framework brings thinking to DNNs by: (1) providing methods for reasoning with the knowledge base about image content, such as spatial inferencing and conditional reasoning to check DNN outputs; (2) applying process knowledge, in the form of general-purpose software agents, that are chained together to accomplish image preprocessing, DNN prediction, and result post-processing, and (3) performing automatic co-optimization of all knowledge base parameters to adapt agents to specific problems. SimpleMind enables reasoning on multiple detected objects to ensure consistency, providing cross checking between DNN outputs. This machine reasoning improves the reliability and trustworthiness of DNNs through an interpretable model and explainable decisions. Example applications are provided that demonstrate how SimpleMind supports and improves deep neural networks by embedding them within a Cognitive AI framework.

RONov 3, 2025Code
TRACE: Textual Reasoning for Affordance Coordinate Extraction

Sangyun Park, Jin Kim, Yuchen Cui et al.

Vision-Language Models (VLMs) struggle to translate high-level instructions into the precise spatial affordances required for robotic manipulation. While visual Chain-of-Thought (CoT) methods exist, they are often computationally intensive. In this work, we introduce TRACE (Textual Reasoning for Affordance Coordinate Extraction), a novel methodology that integrates a textual Chain of Reasoning (CoR) into the affordance prediction process. We use this methodology to create the TRACE dataset, a large-scale collection created via an autonomous pipeline that pairs instructions with explicit textual rationales. By fine-tuning a VLM on this data, our model learns to externalize its spatial reasoning before acting. Our experiments show that our TRACE-tuned model achieves state-of-the-art performance, reaching 48.1% accuracy on the primary Where2Place (W2P) benchmark (a 9.6% relative improvement) and 55.0% on the more challenging W2P(h) subset. Crucially, an ablation study demonstrates that performance scales directly with the amount of reasoning data used, confirming the CoR's effectiveness. Furthermore, analysis of the model's attention maps reveals an interpretable reasoning process where focus shifts dynamically across reasoning steps. This work shows that training VLMs to generate a textual CoR is an effective and robust strategy for enhancing the precision, reliability, and interpretability of VLM-based robot control. Our dataset and code are available at https://github.com/jink-ucla/TRACE

CVJun 11, 2025Code
Autonomous Computer Vision Development with Agentic AI

Jin Kim, Muhammad Wahi-Anwa, Sangyun Park et al.

Agentic Artificial Intelligence (AI) systems leveraging Large Language Models (LLMs) exhibit significant potential for complex reasoning, planning, and tool utilization. We demonstrate that a specialized computer vision system can be built autonomously from a natural language prompt using Agentic AI methods. This involved extending SimpleMind (SM), an open-source Cognitive AI environment with configurable tools for medical image analysis, with an LLM-based agent, implemented using OpenManus, to automate the planning (tool configuration) for a particular computer vision task. We provide a proof-of-concept demonstration that an agentic system can interpret a computer vision task prompt, plan a corresponding SimpleMind workflow by decomposing the task and configuring appropriate tools. From the user input prompt, "provide sm (SimpleMind) config for lungs, heart, and ribs segmentation for cxr (chest x-ray)"), the agent LLM was able to generate the plan (tool configuration file in YAML format), and execute SM-Learn (training) and SM-Think (inference) scripts autonomously. The computer vision agent automatically configured, trained, and tested itself on 50 chest x-ray images, achieving mean dice scores of 0.96, 0.82, 0.83, for lungs, heart, and ribs, respectively. This work shows the potential for autonomous planning and tool configuration that has traditionally been performed by a data scientist in the development of computer vision applications.

CVJan 25, 2022Code
ADAPT: An Open-Source sUAS Payload for Real-Time Disaster Prediction and Response with AI

Daniel Davila, Joseph VanPelt, Alexander Lynch et al.

Small unmanned aircraft systems (sUAS) are becoming prominent components of many humanitarian assistance and disaster response (HADR) operations. Pairing sUAS with onboard artificial intelligence (AI) substantially extends their utility in covering larger areas with fewer support personnel. A variety of missions, such as search and rescue, assessing structural damage, and monitoring forest fires, floods, and chemical spills, can be supported simply by deploying the appropriate AI models. However, adoption by resource-constrained groups, such as local municipalities, regulatory agencies, and researchers, has been hampered by the lack of a cost-effective, readily-accessible baseline platform that can be adapted to their unique missions. To fill this gap, we have developed the free and open-source ADAPT multi-mission payload for deploying real-time AI and computer vision onboard a sUAS during local and beyond-line-of-site missions. We have emphasized a modular design with low-cost, readily-available components, open-source software, and thorough documentation (https://kitware.github.io/adapt/). The system integrates an inertial navigation system, high-resolution color camera, computer, and wireless downlink to process imagery and broadcast georegistered analytics back to a ground station. Our goal is to make it easy for the HADR community to build their own copies of the ADAPT payload and leverage the thousands of hours of engineering we have devoted to developing and testing. In this paper, we detail the development and testing of the ADAPT payload. We demonstrate the example mission of real-time, in-flight ice segmentation to monitor river ice state and provide timely predictions of catastrophic flooding events. We deploy a novel active learning workflow to annotate river ice imagery, train a real-time deep neural network for ice segmentation, and demonstrate operation in the field.