Juyang Weng

h-index3

7papers

43citations

Novelty36%

AI Score22

Ranked #187,857 of 205,806 authors (top 91%)#40,272 in LG (top 95%)

7 Papers

LGAug 23, 2022

Why Deep Learning's Performance Data Are Misleading

Juyang Weng

This is a theoretical paper, as a companion paper of the keynote talk at the same conference AIEE 2023. In contrast to conscious learning, many projects in AI have employed so-called "deep learning" many of which seemed to give impressive performance. This paper explains that such performance data are deceptively inflated due to two misconducts: "data deletion" and "test on training set". This paper clarifies "data deletion" and "test on training set" in deep learning and why they are misconducts. A simple classification method is defined, called Nearest Neighbor With Threshold (NNWT). A theorem is established that the NNWT method reaches a zero error on any validation set and any test set using the two misconducts, as long as the test set is in the possession of the author and both the amount of storage space and the time of training are finite but unbounded like with many deep learning methods. However, many deep learning methods, like the NNWT method, are all not generalizable since they have never been tested by a true test set. Why? The so-called "test set" was used in the Post-Selection step of the training stage. The evidence that misconducts actually took place in many deep learning projects is beyond the scope of this paper.

LGNov 23, 2022

On "Deep Learning" Misconduct

Juyang Weng

This is a theoretical paper, as a companion paper of the plenary talk for the same conference ISAIC 2022. In contrast to the author's plenary talk in the same conference, conscious learning (Weng, 2022b; Weng, 2022c) which develops a single network for a life (many tasks), "Deep Learning" trains multiple networks for each task. Although "Deep Learning" may use different learning modes, including supervised, reinforcement and adversarial modes, almost all "Deep Learning" projects apparently suffer from the same misconduct, called "data deletion" and "test on training data". This paper establishes a theorem that a simple method called Pure-Guess Nearest Neighbor (PGNN) reaches any required errors on validation data set and test data set, including zero-error requirements, through the same misconduct, as long as the test data set is in the possession of the authors and both the amount of storage space and the time of training are finite but unbounded. The misconduct violates well-known protocols called transparency and cross-validation. The nature of the misconduct is fatal, because in the absence of any disjoint test, "Deep Learning" is clearly not generalizable.

AIAug 4, 2022

Developmental Network Two, Its Optimality, and Emergent Turing Machines

Juyang Weng, Zejia Zheng, Xiang Wu

Strong AI requires the learning engine to be task non-specific and to automatically construct a dynamic hierarchy of internal features. By hierarchy, we mean, e.g., short road edges and short bush edges amount to intermediate features of landmarks; but intermediate features from tree shadows are distractors that must be disregarded by the high-level landmark concept. By dynamic, we mean the automatic selection of features while disregarding distractors is not static, but instead based on dynamic statistics (e.g. because of the instability of shadows in the context of landmark). By internal features, we mean that they are not only sensory, but also motor, so that context from motor (state) integrates with sensory inputs to become a context-based logic machine. We present why strong AI is necessary for any practical AI systems that work reliably in the real world. We then present a new generation of Developmental Networks 2 (DN-2). With many new novelties beyond DN-1, the most important novelty of DN-2 is that the inhibition area of each internal neuron is neuron-specific and dynamic. This enables DN-2 to automatically construct an internal hierarchy that is fluid, whose number of areas is not static as in DN-1. To optimally use the limited resource available, we establish that DN-2 is optimal in terms of maximum likelihood, under the condition of limited learning experience and limited resources. We also present how DN-2 can learn an emergent Universal Turing Machine (UTM). Together with the optimality, we present the optimal UTM. Experiments for real-world vision-based navigation, maze planning, and audition used DN-2. They successfully showed that DN-2 is for general purposes using natural and synthetic inputs. Their automatically constructed internal representation focuses on important features while being invariant to distractors and other irrelevant context-concepts.

LGFeb 13, 2024

Misconduct in Post-Selections and Deep Learning

Juyang Weng

This is a theoretical paper on "Deep Learning" misconduct in particular and Post-Selection in general. As far as the author knows, the first peer-reviewed papers on Deep Learning misconduct are [32], [37], [36]. Regardless of learning modes, e.g., supervised, reinforcement, adversarial, and evolutional, almost all machine learning methods (except for a few methods that train a sole system) are rooted in the same misconduct -- cheating and hiding -- (1) cheating in the absence of a test and (2) hiding bad-looking data. It was reasoned in [32], [37], [36] that authors must report at least the average error of all trained networks, good and bad, on the validation set (called general cross-validation in this paper). Better, report also five percentage positions of ranked errors. From the new analysis here, we can see that the hidden culprit is Post-Selection. This is also true for Post-Selection on hand-tuned or searched hyperparameters, because they are random, depending on random observation data. Does cross-validation on data splits rescue Post-Selections from the Misconducts (1) and (2)? The new result here says: No. Specifically, this paper reveals that using cross-validation for data splits is insufficient to exonerate Post-Selections in machine learning. In general, Post-Selections of statistical learners based on their errors on the validation set are statistically invalid.

LGJun 19, 2021

Post-Selections in AI and How to Avoid Them

Juyang Weng

Neural network based Artificial Intelligence (AI) has reported increasing scales in experiments. However, this paper raises a rarely reported stage in such experiments called Post-Selection alter the reader to several possible protocol flaws that may result in misleading results. All AI methods fall into two broad schools, connectionist and symbolic. The Post-Selection fall into two kinds, Post-Selection Using Validation Sets (PSUVS) and Post-Selection Using Test Sets (PSUTS). Each kind has two types of post-selectors, machines and humans. The connectionist school received criticisms for its "black box" and now the Post-Selection; but the seemingly "clean" symbolic school seems more brittle because of its human PSUTS. This paper first presents a controversial view: all static "big data" are non-scalable. We then analyze why error-backprop from randomly initialized weights suffers from severe local minima, why PSUVS lacks cross-validation, why PSUTS violates well-established protocols, and why every paper involved should transparently report the Post-Selection stage. To avoid future pitfalls in AI competitions, this paper proposes a new AI metrics, called developmental errors for all networks trained, under Three Learning Conditions: (1) an incremental learning architecture (due to a "big data" flaw), (2) a training experience and (3) a limited amount of computational resources. Developmental Networks avoid Post-Selections because they automatically discover context-rules on the fly by generating emergent Turing machines (not black boxes) that are optimal in the sense of maximum-likelihood across lifetime, conditioned on the Three Learning Conditions.

NCJun 30, 2020

Conscious Intelligence Requires Lifelong Autonomous Programming For General Purposes

Juyang Weng

Universal Turing Machines [29, 10, 18] are well known in computer science but they are about manual programming for general purposes. Although human children perform conscious learning (i.e., learning while being conscious) from infancy [24, 23, 14, 4], it is unknown that Universal Turing Machiness can facilitate not only our understanding of Autonomous Programming For General Purposes (APFGP) by machines, but also enable early-age conscious learning. This work reports a new kind of AI---conscious learning AI from a machine's "baby" time. Instead of arguing what static tasks a conscious machine should be able to do during its "adulthood", this work suggests that APFGP is a computationally clearer and necessary criterion for us to judge whether a machine is capable of conscious learning so that it can autonomously acquire skills along its "career path". The results here report new concepts and experimental studies for early vision, audition, natural language understanding, and emotion, with conscious learning capabilities that are absent from traditional AI systems.

AIOct 12, 2018

A Model for Auto-Programming for General Purposes

Juyang Weng

The Universal Turing Machine (TM) is a model for VonNeumann computers --- general-purpose computers. A human brain can inside-skull-automatically learn a universal TM so that he acts as a general-purpose computer and writes a computer program for any practical purposes. It is unknown whether a machine can accomplish the same. This theoretical work shows how the Developmental Network (DN) can accomplish this. Unlike a traditional TM, the TM learned by DN is a super TM --- Grounded, Emergent, Natural, Incremental, Skulled, Attentive, Motivated, and Abstractive (GENISAMA). A DN is free of any central controller (e.g., Master Map, convolution, or error back-propagation). Its learning from a teacher TM is one transition observation at a time, immediate, and error-free until all its neurons have been initialized by early observed teacher transitions. From that point on, the DN is no longer error-free but is always optimal at every time instance in the sense of maximal likelihood, conditioned on its limited computational resources and the learning experience. This letter also extends the Church-Turing thesis to automatic programming for general purposes and sketchily proved it.