Vedanshu

CVJul 25, 2024

Enhancing Model Performance: Another Approach to Vision-Language Instruction Tuning

Vedanshu, MM Tripathi, Bhavnesh Jaint

The integration of large language models (LLMs) with vision-language (VL) tasks has been a transformative development in the realm of artificial intelligence, highlighting the potential of LLMs as a versatile general-purpose chatbot. However, the current trend in this evolution focuses on the integration of vision and language to create models that can operate in more diverse and real-world contexts. We present a novel approach, termed Bottleneck Adapter, specifically crafted for enhancing the multimodal functionalities of these complex models, enabling joint optimization of the entire multimodal LLM framework through a process known as Multimodal Model Tuning (MMT). Our approach utilizes lightweight adapters to connect the image encoder and LLM without the need for large, complex neural networks. Unlike the conventional modular training schemes, our approach adopts an end-to-end optimization regime, which, when combined with the adapters, facilitates the joint optimization using a significantly smaller parameter set. Our method exhibits robust performance with 90.12\% accuracy, outperforming both human-level performance (88.4\%) and LaVIN-7B (89.41\%).

LGDec 9, 2018

Zero Initialization of modified Gated Recurrent Encoder-Decoder Network for Short Term Load Forecasting

Vedanshu, M M Tripathi

Single layer Feedforward Neural Network(FNN) is used many a time as a last layer in models such as seq2seq or could be a simple RNN network. The importance of such layer is to transform the output to our required dimensions. When it comes to weights and biases initialization, there is no such specific technique that could speed up the learning process. We could depend on deep network initialization techniques such as Xavier or He initialization. But such initialization fails to show much improvement in learning speed or accuracy. In this paper we propose Zero Initialization (ZI) for weights of a single layer network. We first test this technique with on a simple RNN network and compare the results against Xavier, He and Identity initialization. As a final test we implement it on a seq2seq network. It was found that ZI considerably reduces the number of epochs used and improve the accuracy. The developed model has been applied for short-term load forecasting using the load data of Australian Energy Market. The model is able to forecast the day ahead load accurately with error of 0.94%.

Vedanshu

2 Papers