CVAug 18, 2023
Improving Buoy Detection with Deep Transfer Learning for Mussel Farm AutomationCarl McMillan, Junhong Zhao, Bing Xue et al.
The aquaculture sector in New Zealand is experiencing rapid expansion, with a particular emphasis on mussel exports. As the demands of mussel farming operations continue to evolve, the integration of artificial intelligence and computer vision techniques, such as intelligent object detection, is emerging as an effective approach to enhance operational efficiency. This study delves into advancing buoy detection by leveraging deep learning methodologies for intelligent mussel farm monitoring and management. The primary objective centers on improving accuracy and robustness in detecting buoys across a spectrum of real-world scenarios. A diverse dataset sourced from mussel farms is captured and labeled for training, encompassing imagery taken from cameras mounted on both floating platforms and traversing vessels, capturing various lighting and weather conditions. To establish an effective deep learning model for buoy detection with a limited number of labeled data, we employ transfer learning techniques. This involves adapting a pre-trained object detection model to create a specialized deep learning buoy detection model. We explore different pre-trained models, including YOLO and its variants, alongside data diversity to investigate their effects on model performance. Our investigation demonstrates a significant enhancement in buoy detection performance through deep learning, accompanied by improved generalization across diverse weather conditions, highlighting the practical effectiveness of our approach.
CVJan 5, 2023
Expressive Speech-driven Facial Animation with controllable emotionsYutong Chen, Junhong Zhao, Wei-Qiang Zhang
It is in high demand to generate facial animation with high realism, but it remains a challenging task. Existing approaches of speech-driven facial animation can produce satisfactory mouth movement and lip synchronization, but show weakness in dramatic emotional expressions and flexibility in emotion control. This paper presents a novel deep learning-based approach for expressive facial animation generation from speech that can exhibit wide-spectrum facial expressions with controllable emotion type and intensity. We propose an emotion controller module to learn the relationship between the emotion variations (e.g., types and intensity) and the corresponding facial expression parameters. It enables emotion-controllable facial animation, where the target expression can be continuously adjusted as desired. The qualitative and quantitative evaluations show that the animation generated by our method is rich in facial emotional expressiveness while retaining accurate lip movement, outperforming other state-of-the-art methods.
CLMar 1, 2022
BERT-LID: Leveraging BERT to Improve Spoken Language IdentificationYuting Nie, Junhong Zhao, Wei-Qiang Zhang et al.
Language identification is the task of automatically determining the identity of a language conveyed by a spoken segment. It has a profound impact on the multilingual interoperability of an intelligent speech system. Despite language identification attaining high accuracy on medium or long utterances(>3s), the performance on short utterances (<=1s) is still far from satisfactory. We propose a BERT-based language identification system (BERT-LID) to improve language identification performance, especially on short-duration speech segments. We extend the original BERT model by taking the phonetic posteriorgrams (PPG) derived from the front-end phone recognizer as input. Then we deployed the optimal deep classifier followed by it for language identification. Our BERT-LID model can improve the baseline accuracy by about 6.5% on long-segment identification and 19.9% on short-segment identification, demonstrating our BERT-LID's effectiveness to language identification.
CVMar 22, 2022
Deep Portrait DelightingJoshua Weir, Junhong Zhao, Andrew Chalmers et al.
We present a deep neural network for removing undesirable shading features from an unconstrained portrait image, recovering the underlying texture. Our training scheme incorporates three regularization strategies: masked loss, to emphasize high-frequency shading features; soft-shadow loss, which improves sensitivity to subtle changes in lighting; and shading-offset estimation, to supervise separation of shading and texture. Our method demonstrates improved delighting quality and generalization when compared with the state-of-the-art. We further demonstrate how our delighting method can enhance the performance of light-sensitive computer vision tasks such as face relighting and semantic parsing, allowing them to handle extreme lighting conditions.
CVMar 24, 2024
Exploring Accurate 3D Phenotyping in Greenhouse through Neural Radiance FieldsJunhong Zhao, Wei Ying, Yaoqiang Pan et al.
Accurate collection of plant phenotyping is critical to optimising sustainable farming practices in precision agriculture. Traditional phenotyping in controlled laboratory environments, while valuable, falls short in understanding plant growth under real-world conditions. Emerging sensor and digital technologies offer a promising approach for direct phenotyping of plants in farm environments. This study investigates a learning-based phenotyping method using the Neural Radiance Field to achieve accurate in-situ phenotyping of pepper plants in greenhouse environments. To quantitatively evaluate the performance of this method, traditional point cloud registration on 3D scanning data is implemented for comparison. Experimental result shows that NeRF(Neural Radiance Fields) achieves competitive accuracy compared to the 3D scanning methods. The mean distance error between the scanner-based method and the NeRF-based method is 0.865mm. This study shows that the learning-based NeRF method achieves similar accuracy to 3D scanning-based methods but with improved scalability and robustness.