Jie Jiang

h-index23

4papers

81citations

Novelty44%

AI Score27

Ranked #155,023 of 194,257 authors (top 80%)#26,992 in CL (top 88%)

4 Papers

1.5CVNov 29, 2023Code

LGFCTR: Local and Global Feature Convolutional Transformer for Image Matching

Wenhao Zhong, Jie Jiang

Image matching that finding robust and accurate correspondences across images is a challenging task under extreme conditions. Capturing local and global features simultaneously is an important way to mitigate such an issue but recent transformer-based decoders were still stuck in the issues that CNN-based encoders only extract local features and the transformers lack locality. Inspired by the locality and implicit positional encoding of convolutions, a novel convolutional transformer is proposed to capture both local contexts and global structures more sufficiently for detector-free matching. Firstly, a universal FPN-like framework captures global structures in self-encoder as well as cross-decoder by transformers and compensates local contexts as well as implicit positional encoding by convolutions. Secondly, a novel convolutional transformer module explores multi-scale long range dependencies by a novel multi-scale attention and further aggregates local information inside dependencies for enhancing locality. Finally, a novel regression-based sub-pixel refinement module exploits the whole fine-grained window features for fine-level positional deviation regression. The proposed method achieves superior performances on a wide range of benchmarks. The code will be available on https://github.com/zwh0527/LGFCTR.

3.5CLMay 10, 2020

Non-Autoregressive Image Captioning with Counterfactuals-Critical Multi-Agent Learning

Longteng Guo, Jing Liu, Xinxin Zhu et al.

Most image captioning models are autoregressive, i.e. they generate each word by conditioning on previously generated words, which leads to heavy latency during inference. Recently, non-autoregressive decoding has been proposed in machine translation to speed up the inference time by generating all words in parallel. Typically, these models use the word-level cross-entropy loss to optimize each word independently. However, such a learning process fails to consider the sentence-level consistency, thus resulting in inferior generation quality of these non-autoregressive models. In this paper, we propose a Non-Autoregressive Image Captioning (NAIC) model with a novel training paradigm: Counterfactuals-critical Multi-Agent Learning (CMAL). CMAL formulates NAIC as a multi-agent reinforcement learning system where positions in the target sequence are viewed as agents that learn to cooperatively maximize a sentence-level reward. Besides, we propose to utilize massive unlabeled images to boost captioning performance. Extensive experiments on MSCOCO image captioning benchmark show that our NAIC model achieves a performance comparable to state-of-the-art autoregressive models, while brings 13.9x decoding speedup.

5.6HCAug 9, 2019

MakeSense: An IoT Testbed for Social Research of Indoor Activities

Jie Jiang, Riccardo Pozza, Nigel Gilbert et al.

There has been increasing interest in deploying IoT devices to study human behaviour in locations such as homes and offices. Such devices can be deployed in a laboratory or `in the wild' in natural environments. The latter allows one to collect behavioural data that is not contaminated by the artificiality of a laboratory experiment. Using IoT devices in ordinary environments also brings the benefits of reduced cost, as compared with lab experiments, and less disturbance to the participants' daily routines which in turn helps with recruiting them into the research. However, in this case, it is essential to have an IoT infrastructure that can be easily and swiftly installed and from which real-time data can be securely and straightforwardly collected. In this paper, we present MakeSense, an IoT testbed that enables real-world experimentation for large scale social research on indoor activities through real-time monitoring and/or situation-aware applications. The testbed features quick setup, flexibility in deployment, the integration of a range of IoT devices, resilience, and scalability. We also present two case studies to demonstrate the use of the testbed, one in homes and one in offices.

1.2MMApr 21, 2017

FISF: Better User Experience using Smaller Bandwidth for Panoramic Virtual Reality Video

Lun Wang, Damai Dai, Jie Jiang et al.

The panoramic video is widely used to build virtual reality (VR) and is expected to be one of the next generation Killer-Apps. Transmitting panoramic VR videos is a challenging task because of two problems: 1) panoramic VR videos are typically much larger than normal videos but they need to be transmitted with limited bandwidth in mobile networks. 2) high-resolution and fluent views should be provided to guarantee a superior user experience and avoid side-effects such as dizziness and nausea. To address these two problems, we propose a novel interactive streaming technology, namely Focus-based Interactive Streaming Framework (FISF). FISF consists of three parts: 1) we use the classic clustering algorithm DBSCAN to analyze real user data for Video Focus Detection (VFD); 2) we propose a Focus-based Interactive Streaming Technology (FIST), including a static version and a dynamic version; 3) we propose two optimization methods: focus merging and prefetch strategy. Experimental results show that FISF significantly outperforms the state-of-the-art. The paper is submitted to Sigcomm 2017, VR/AR Network on 31 Mar 2017 at 10:44:04am EDT.