Visual Learning of Arithmetic Operations
This work addresses the challenge of end-to-end visual learning for cognitive tasks, but it is incremental as it builds on existing neural network methods with limited scope.
The paper tackled the problem of teaching neural networks to perform arithmetic operations directly from visual inputs of numbers, without explicit numerical concepts, and found that addition and subtraction could be learned with a small model, but multiplication and Roman numeral tasks required task decomposition.
A simple Neural Network model is presented for end-to-end visual learning of arithmetic operations from pictures of numbers. The input consists of two pictures, each showing a 7-digit number. The output, also a picture, displays the number showing the result of an arithmetic operation (e.g., addition or subtraction) on the two input numbers. The concepts of a number, or of an operator, are not explicitly introduced. This indicates that addition is a simple cognitive task, which can be learned visually using a very small number of neurons. Other operations, e.g., multiplication, were not learnable using this architecture. Some tasks were not learnable end-to-end (e.g., addition with Roman numerals), but were easily learnable once broken into two separate sub-tasks: a perceptual \textit{Character Recognition} and cognitive \textit{Arithmetic} sub-tasks. This indicates that while some tasks may be easily learnable end-to-end, other may need to be broken into sub-tasks.