CVFeb 28, 2023
One-Shot Video InpaintingSangjin Lee, Suhwan Cho, Sangyoun Lee
Recently, removing objects from videos and filling in the erased regions using deep video inpainting (VI) algorithms has attracted considerable attention. Usually, a video sequence and object segmentation masks for all frames are required as the input for this task. However, in real-world applications, providing segmentation masks for all frames is quite difficult and inefficient. Therefore, we deal with VI in a one-shot manner, which only takes the initial frame's object mask as its input. Although we can achieve that using naive combinations of video object segmentation (VOS) and VI methods, they are sub-optimal and generally cause critical errors. To address that, we propose a unified pipeline for one-shot video inpainting (OSVI). By jointly learning mask prediction and video completion in an end-to-end manner, the results can be optimal for the entire task instead of each separate module. Additionally, unlike the two stage methods that use the predicted masks as ground truth cues, our method is more reliable because the predicted masks can be used as the network's internal guidance. On the synthesized datasets for OSVI, our proposed method outperforms all others both quantitatively and qualitatively.
IVSep 19, 2024
Multi-Scale Feature Prediction with Auxiliary-Info for Neural Image CompressionChajin Shin, Sangjin Lee, Sangyoun Lee
Recently, significant improvements in rate-distortion performance of image compression have been achieved with deep-learning techniques. A key factor in this success is the use of additional bits to predict an approximation of the latent vector, which is the output of the encoder, through another neural network. Then, only the difference between the prediction and the latent vector is coded into the bitstream, along with its estimated probability distribution. We introduce a new predictive structure consisting of the auxiliary coarse network and the main network, inspired by neural video compression. The auxiliary coarse network encodes the auxiliary information and predicts the approximation of the original image as multi-scale features. The main network encodes the residual between the predicted feature from the auxiliary coarse network and the feature of the original image. To further leverage our new structure, we propose Auxiliary info-guided Feature Prediction (AFP) module that uses global correlation to predict more accurate predicted features. Moreover, we present Context Junction module that refines the auxiliary feature from AFP module and produces the residuals between the refined features and the original image features. Finally, we introduce Auxiliary info-guided Parameter Estimation (APE) module, which predicts the approximation of the latent vector and estimates the probability distribution of these residuals. We demonstrate the effectiveness of the proposed modules by various ablation studies. Under extensive experiments, our model outperforms other neural image compression models and achieves a 19.49\% higher rate-distortion performance than VVC on Tecnick dataset.
ROApr 30, 2024
STT: Stateful Tracking with Transformers for Autonomous DrivingLonglong Jing, Ruichi Yu, Xu Chen et al.
Tracking objects in three-dimensional space is critical for autonomous driving. To ensure safety while driving, the tracker must be able to reliably track objects across frames and accurately estimate their states such as velocity and acceleration in the present. Existing works frequently focus on the association task while either neglecting the model performance on state estimation or deploying complex heuristics to predict the states. In this paper, we propose STT, a Stateful Tracking model built with Transformers, that can consistently track objects in the scenes while also predicting their states accurately. STT consumes rich appearance, geometry, and motion signals through long term history of detections and is jointly optimized for both data association and state estimation tasks. Since the standard tracking metrics like MOTA and MOTP do not capture the combined performance of the two tasks in the wider spectrum of object states, we extend them with new metrics called S-MOTA and MOTPS that address this limitation. STT achieves competitive real-time performance on the Waymo Open Dataset.
CVFeb 15, 2022
Exploring Discontinuity for Video Frame InterpolationSangjin Lee, Hyeongmin Lee, Chajin Shin et al.
Video frame interpolation (VFI) is the task that synthesizes the intermediate frame given two consecutive frames. Most of the previous studies have focused on appropriate frame warping operations and refinement modules for the warped frames. These studies have been conducted on natural videos containing only continuous motions. However, many practical videos contain various unnatural objects with discontinuous motions such as logos, user interfaces and subtitles. We propose three techniques to make the existing deep learning-based VFI architectures robust to these elements. First is a novel data augmentation strategy called figure-text mixing (FTM) which can make the models learn discontinuous motions during training stage without any extra dataset. Second, we propose a simple but effective module that predicts a map called discontinuity map (D-map), which densely distinguishes between areas of continuous and discontinuous motions. Lastly, we propose loss functions to give supervisions of the discontinuous motion areas which can be applied along with FTM and D-map. We additionally collect a special test benchmark called Graphical Discontinuous Motion (GDM) dataset consisting of some mobile games and chatting videos. Applied to the various state-of-the-art VFI networks, our method significantly improves the interpolation qualities on the videos from not only GDM dataset, but also the existing benchmarks containing only continuous motions such as Vimeo90K, UCF101, and DAVIS.
CVFeb 28, 2021
Predicting post-operative right ventricular failure using video-based deep learningRohan Shad, Nicolas Quach, Robyn Fong et al.
Non-invasive and cost effective in nature, the echocardiogram allows for a comprehensive assessment of the cardiac musculature and valves. Despite progressive improvements over the decades, the rich temporally resolved data in echocardiography videos remain underutilized. Human reads of echocardiograms reduce the complex patterns of cardiac wall motion, to a small list of measurements of heart function. Furthermore, all modern echocardiography artificial intelligence (AI) systems are similarly limited by design - automating measurements of the same reductionist metrics rather than utilizing the wealth of data embedded within each echo study. This underutilization is most evident in situations where clinical decision making is guided by subjective assessments of disease acuity, and tools that predict disease onset within clinically actionable timeframes are unavailable. Predicting the likelihood of developing post-operative right ventricular failure (RV failure) in the setting of mechanical circulatory support is one such clinical example. To address this, we developed a novel video AI system trained to predict post-operative right ventricular failure (RV failure), using the full spatiotemporal density of information from pre-operative echocardiography scans. We achieve an AUC of 0.729, specificity of 52% at 80% sensitivity and 46% sensitivity at 80% specificity. Furthermore, we show that our ML system significantly outperforms a team of human experts tasked with predicting RV failure on independent clinical evaluation. Finally, the methods we describe are generalizable to any cardiac clinical decision support application where treatment or patient selection is guided by qualitative echocardiography assessments.
CVFeb 2, 2021
Test-Time Adaptation for Out-of-distributed Image InpaintingChajin Shin, Taeoh Kim, Sangjin Lee et al.
Deep learning-based image inpainting algorithms have shown great performance via powerful learned prior from the numerous external natural images. However, they show unpleasant results on the test image whose distribution is far from the that of training images because their models are biased toward the training images. In this paper, we propose a simple image inpainting algorithm with test-time adaptation named AdaFill. Given a single out-of-distributed test image, our goal is to complete hole region more naturally than the pre-trained inpainting models. To achieve this goal, we treat remained valid regions of the test image as another training cues because natural images have strong internal similarities. From this test-time adaptation, our network can exploit externally learned image priors from the pre-trained features as well as the internal prior of the test image explicitly. Experimental results show that AdaFill outperforms other models on the various out-of-distribution test images. Furthermore, the model named ZeroFill, that are not pre-trained also sometimes outperforms the pre-trained models.
CROct 26, 2020
5W1H-based Expression for the Effective Sharing of Information in Digital Forensic InvestigationsJaehyeok Han, Jieon Kim, Sangjin Lee
Digital forensic investigation is used in various areas related to digital devices including the cyber crime. This is an investigative process using many techniques, which have implemented as tools. The types of files covered by the digital forensic investigation are wide and varied, however, there is no way to express the results into a standardized format. The standardization are different by types of device, file system, or application. Different outputs make it time-consuming and difficult to share information and to implement integration. In addition, it could weaken cyber security. Thus, it is important to define normalization and to present data in the same format. In this paper, a 5W1H-based expression for information sharing for effective digital forensic investigation is proposed to analyze digital forensic information using six questions--what, who, where, when, why and how. Based on the 5W1H-based expression, digital information from different types of files is converted and represented in the same format of outputs. As the 5W1H is the basic writing principle, application of the 5W1H-based expression on the case studies shows that this expression enhances clarity and correctness for information sharing. Furthermore, in the case of security incidents, this expression has an advantage in being compatible with STIX.
CVMay 27, 2020
Extrapolative-Interpolative Cycle-Consistency Learning for Video Frame ExtrapolationSangjin Lee, Hyeongmin Lee, Taeoh Kim et al.
Video frame extrapolation is a task to predict future frames when the past frames are given. Unlike previous studies that usually have been focused on the design of modules or construction of networks, we propose a novel Extrapolative-Interpolative Cycle (EIC) loss using pre-trained frame interpolation module to improve extrapolation performance. Cycle-consistency loss has been used for stable prediction between two function spaces in many visual tasks. We formulate this cycle-consistency using two mapping functions; frame extrapolation and interpolation. Since it is easier to predict intermediate frames than to predict future frames in terms of the object occlusion and motion uncertainty, interpolation module can give guidance signal effectively for training the extrapolation function. EIC loss can be applied to any existing extrapolation algorithms and guarantee consistent prediction in the short future as well as long future frames. Experimental results show that simply adding EIC loss to the existing baseline increases extrapolation performance on both UCF101 and KITTI datasets.
MMMar 9, 2020
Forensic Analysis of Residual Information in Adobe PDF FilesHyunji Chung, Jungheum Park, Sangjin Lee
In recent years, as electronic files include personal records and business activities, these files can be used as important evidences in a digital forensic investigation process. In general, the data that can be verified using its own application programs is largely used in the investigation of document files. However, in the case of the PDF file that has been largely used at the present time, certain data, which include the data before some modifications, exist in electronic document files unintentionally. Because such residual information may present the writing process of a file, it can be usefully used in a forensic viewpoint. This paper introduces why the residual information is stored inside the PDF file and explains a way to extract the information. In addition, we demonstrate the attributes of PDF files can be used to hide data.
CRFeb 28, 2020
Forensic analysis of the Windows telemetry for diagnosticsJaehyeok Han, Jungheum Park, Hyunji Chung et al.
Telemetry is the automated sensing and collection of data from a remote device. It is often used to provide better services for users. Microsoft uses telemetry to periodically collect information about Windows systems and to help improve user experience and fix potential issues. Windows telemetry service functions by creating RBS files on the local system to reliably transfer and manage the telemetry data, and these files can provide useful information in a digital forensic investigation. Combined with the information derived from traditional Windows forensics, investigators can have greater confidence in the evidence derived from various artifacts. It is possible to acquire information that can be confirmed only for live systems, such as the computer hardware serial number, the connection records for external storage devices, and traces of executed processes. This information is included in the RBS files that are created for use in Windows telemetry. In this paper, we introduced how to acquire RBS files telemetry and analyzed the data structure of these RBS files, which are able to determine the types of information that Windows OS have been collected. We also discussed the reliability and the novelty by comparing the conventional artifacts with the RBS files, which could be useful in digital forensic investigation.
CRAug 22, 2017
Digital Forensic Investigation of Cloud Storage ServicesHyunji Chung, Jungheum Park, Sangjin Lee et al.
The demand for cloud computing is increasing because of the popularity of digital devices and the wide use of the Internet. Among cloud computing services, most consumers use cloud storage services that provide mass storage. This is because these services give them various additional functions as well as storage. It is easy to access cloud storage services using smartphones. With increasing utilization, it is possible for malicious users to abuse cloud storage services. Therefore, a study on digital forensic investigation of cloud storage services is necessary. This paper proposes new procedure for investigating and analyzing the artifacts of all accessible devices, such as Windows, Mac, iPhone, and Android smartphone.
CRJul 27, 2017
Digital Forensic Approaches for Amazon Alexa EcosystemHyunji Chung, Jungheum Park, Sangjin Lee
Internet of Things devices such as the Amazon Echo are undoubtedly great sources of potential digital evidence due to their ubiquitous use and their always on mode of operation, constituting a human life black box. The Amazon Echo in particular plays a centric role for the cloud based intelligent virtual assistant Alexa developed by Amazon Lab126. The Alexa enabled wireless smart speaker is the gateway for all voice commands submitted to Alexa. Moreover, the IVA interacts with a plethora of compatible IoT devices and third party applications that leverage cloud resources. Understanding the complex cloud ecosystem that allows ubiquitous use of Alexa is paramount on supporting digital investigations when need raises. This paper discusses methods for digital forensics pertaining to the IVA Alexa ecosystem. The primary contribution of this paper consists of a new efficient approach of combining cloud native forensics with client side forensics, to support practical digital investigations. Based on a deep understanding of the targeted ecosystem, we propose a proof of concept tool, CIFT, that supports identification, acquisition and analysis of both native artifacts from the cloud and client centric artifacts from local devices.