Comparing a composite model versus chained models to locate a nearest visual object
This work addresses model selection for autonomous vehicles to optimize cell station connectivity, but it is incremental as it compares existing architectures without introducing new methods.
The study compared two neural network architectures for locating the nearest visual object from geographic images and text, finding that both achieved similar performance with RMSEs of 0.055 and 0.056, but the chained model trained 12 times faster while the composite model reduced data labeling effort.
Extracting information from geographic images and text is crucial for autonomous vehicles to determine in advance the best cell stations to connect to along their future path. Multiple artificial neural network models can address this challenge; however, there is no definitive guidance on the selection of an appropriate model for such use cases. Therefore, we experimented two architectures to solve such a task: a first architecture with chained models where each model in the chain addresses a sub-task of the task; and a second architecture with a single model that addresses the whole task. Our results showed that these two architectures achieved the same level performance with a root mean square error (RMSE) of 0.055 and 0.056; The findings further revealed that when the task can be decomposed into sub-tasks, the chain architecture exhibits a twelve-fold increase in training speed compared to the composite model. Nevertheless, the composite model significantly alleviates the burden of data labeling.