Teng Gao

3papers

7citations

Novelty42%

AI Score36

Ranked #119,365 of 205,806 authors (top 58%)#941 in SD (top 51%)

3 Papers

75.8NIMay 13

Intelligence Delivery Network: Toward an Internet Architecture for the AI Age

Hanling Wang, Qing Li, Dan Zhao et al.

The rapid emergence of AI-powered applications is reshaping the role of the Internet. Users increasingly rely on the network to obtain intelligence services derived from large foundation models, rather than merely to reach remote endpoints or retrieve specific content. Today's dominant deployment paradigm for AI services remains cloud-centric, where user requests are transmitted to remote data centers for centralized inference. Although operationally convenient, this paradigm suffers from latency and jitter, heavy wide-area traffic, limited utilization of distributed heterogeneous compute resources, and growing privacy and governance concerns. In this paper, we propose the Intelligence Delivery Network (IDN), an Internet architecture that treats AI capabilities as deliverable network services. The key idea is to position, select, reuse, and verify intelligence across cloud, regional, edge, and local environments according to demand locality, resource availability, and policy constraints. We present the system assumptions of IDN, define its core architectural mechanisms, and discuss how capability abstraction, compute resource integration, demand-driven deployment, service routing, state-aware caching, and trust management can jointly support distributed AI services. We believe that IDN provides a practical path toward an Internet architecture for the AI age, making AI capabilities more accessible, efficient, trustworthy, and responsive to diverse application needs.

SDNov 2, 2021

CycleGAN with Dual Adversarial Loss for Bone-Conducted Speech Enhancement

Qing Pan, Teng Gao, Jian Zhou et al.

Compared with air-conducted speech, bone-conducted speech has the unique advantage of shielding background noise. Enhancement of bone-conducted speech helps to improve its quality and intelligibility. In this paper, a novel CycleGAN with dual adversarial loss (CycleGAN-DAL) is proposed for bone-conducted speech enhancement. The proposed method uses an adversarial loss and a cycle-consistent loss simultaneously to learn forward and cyclic mapping, in which the adversarial loss is replaced with the classification adversarial loss and the defect adversarial loss to consolidate the forward mapping. Compared with conventional baseline methods, it can learn feature mapping between bone-conducted speech and target speech without additional air-conducted speech assistance. Moreover, the proposed method also avoids the oversmooth problem which is occurred commonly in conventional statistical based models. Experimental results show that the proposed method outperforms baseline methods such as CycleGAN, GMM, and BLSTM. Keywords: Bone-conducted speech enhancement, dual adversarial loss, Parallel CycleGAN, high frequency speech reconstruction

SDNov 2, 2021

Attention-Guided Generative Adversarial Network for Whisper to Normal Speech Conversion

Teng Gao, Jian Zhou, Huabin Wang et al.

Whispered speech is a special way of pronunciation without using vocal cord vibration. A whispered speech does not contain a fundamental frequency, and its energy is about 20dB lower than that of a normal speech. Converting a whispered speech into a normal speech can improve speech quality and intelligibility. In this paper, a novel attention-guided generative adversarial network model incorporating an autoencoder, a Siamese neural network, and an identity mapping loss function for whisper to normal speech conversion (AGAN-W2SC) is proposed. The proposed method avoids the challenge of estimating the fundamental frequency of the normal voiced speech converted from a whispered speech. Specifically, the proposed model is more amendable to practical applications because it does not need to align speech features for training. Experimental results demonstrate that the proposed AGAN-W2SC can obtain improved speech quality and intelligibility compared with dynamic-time-warping-based methods.