CVNov 17, 2022
I Can't Believe There's No Images! Learning Visual Tasks Using only Language SupervisionSophia Gu, Christopher Clark, Aniruddha Kembhavi · allen-ai
Many high-level skills that are required for computer vision tasks, such as parsing questions, comparing and contrasting semantics, and writing descriptions, are also required in other domains such as natural language processing. In this paper, we ask whether it is possible to learn those skills from text data and then transfer them to vision tasks without ever training on visual training data. Key to our approach is exploiting the joint embedding space of contrastively trained vision and language encoders. In practice, there can be systematic differences between embedding spaces for different modalities in contrastive models, and we analyze how these differences affect our approach and study strategies to mitigate this concern. We produce models using only text training data on four representative tasks: image captioning, visual entailment, visual question answering and visual news captioning, and evaluate them on standard benchmarks using images. We find these models perform close to models trained on images, while surpassing prior work for captioning and visual entailment in this text-only setting by over 9 points, and outperforming all prior work on visual news by over 30 points. We also showcase a variety of stylistic image captioning models that are trained using no image data and no human-curated language data, but instead using readily-available text data from books, the web, or language models.
CLOct 10, 2023
LLMs as Potential Brainstorming Partners for Math and Science ProblemsSophia Gu
With the recent rise of widely successful deep learning models, there is emerging interest among professionals in various math and science communities to see and evaluate the state-of-the-art models' abilities to collaborate on finding or solving problems that often require creativity and thus brainstorming. While a significant chasm still exists between current human-machine intellectual collaborations and the resolution of complex math and science problems, such as the six unsolved Millennium Prize Problems, our initial investigation into this matter reveals a promising step towards bridging the divide. This is due to the recent advancements in Large Language Models (LLMs). More specifically, we conduct comprehensive case studies to explore both the capabilities and limitations of the current state-of-the-art LLM, notably GPT-4, in collective brainstorming with humans.
MFJan 9, 2021
Deep Reinforcement Learning with Function Properties in Mean Reversion StrategiesSophia Gu
Over the past decades, researchers have been pushing the limits of Deep Reinforcement Learning (DRL). Although DRL has attracted substantial interest from practitioners, many are blocked by having to search through a plethora of available methodologies that are seemingly alike, while others are still building RL agents from scratch based on classical theories. To address the aforementioned gaps in adopting the latest DRL methods, I am particularly interested in testing out if any of the recent technology developed by the leads in the field can be readily applied to a class of optimal trading problems. Unsurprisingly, many prominent breakthroughs in DRL are investigated and tested on strategic games: from AlphaGo to AlphaStar and at about the same time, OpenAI Five. Thus, in this writing, I want to show precisely how to use a DRL library that is initially built for games in a fundamental trading problem; mean reversion. And by introducing a framework that incorporates economically-motivated function properties, I also demonstrate, through the library, a highly-performant and convergent DRL solution to decision-making financial problems in general.