IVMar 20, 2022
Multi-Modal Learning Using Physicians Diagnostics for Optical Coherence Tomography ClassificationY. Logan, K. Kokilepersaud, G. Kwon et al.
In this paper, we propose a framework that incorporates experts diagnostics and insights into the analysis of Optical Coherence Tomography (OCT) using multi-modal learning. To demonstrate the effectiveness of this approach, we create a medical diagnostic attribute dataset to improve disease classification using OCT. Although there have been successful attempts to deploy machine learning for disease classification in OCT, such methodologies lack the experts insights. We argue that injecting ophthalmological assessments as another supervision in a learning framework is of great importance for the machine learning process to perform accurate and interpretable classification. We demonstrate the proposed framework through comprehensive experiments that compare the effectiveness of combining diagnostic attribute features with latent visual representations and show that they surpass the state-of-the-art approach. Finally, we analyze the proposed dual-stream architecture and provide an insight that determine the components that contribute most to classification performance.
CVJul 15, 2025
NarrLV: Towards a Comprehensive Narrative-Centric Evaluation for Long Video GenerationX. Feng, H. Yu, M. Wu et al.
With the rapid development of foundation video generation technologies, long video generation models have exhibited promising research potential thanks to expanded content creation space. Recent studies reveal that the goal of long video generation tasks is not only to extend video duration but also to accurately express richer narrative content within longer videos. However, due to the lack of evaluation benchmarks specifically designed for long video generation models, the current assessment of these models primarily relies on benchmarks with simple narrative prompts (e.g., VBench). To the best of our knowledge, our proposed NarrLV is the first benchmark to comprehensively evaluate the Narrative expression capabilities of Long Video generation models. Inspired by film narrative theory, (i) we first introduce the basic narrative unit maintaining continuous visual presentation in videos as Temporal Narrative Atom (TNA), and use its count to quantitatively measure narrative richness. Guided by three key film narrative elements influencing TNA changes, we construct an automatic prompt generation pipeline capable of producing evaluation prompts with a flexibly expandable number of TNAs. (ii) Then, based on the three progressive levels of narrative content expression, we design an effective evaluation metric using the MLLM-based question generation and answering framework. (iii) Finally, we conduct extensive evaluations on existing long video generation models and the foundation generation models. Experimental results demonstrate that our metric aligns closely with human judgments. The derived evaluation outcomes reveal the detailed capability boundaries of current video generation models in narrative content expression.
LGMar 2, 2021
Financial Crime & Fraud Detection Using Graph Computing: Application Considerations & OutlookE. Kurshan, H. Shen, H. Yu
In recent years, the unprecedented growth in digital payments fueled consequential changes in fraud and financial crimes. In this new landscape, traditional fraud detection approaches such as rule-based engines have largely become ineffective. AI and machine learning solutions using graph computing principles have gained significant interest. Graph neural networks and emerging adaptive solutions provide compelling opportunities for the future of fraud and financial crime detection. However, implementing the graph-based solutions in financial transaction processing systems has brought numerous obstacles and application considerations to light. In this paper, we overview the latest trends in the financial crimes landscape and discuss the implementation difficulties current and emerging graph solutions face. We argue that the application demands and implementation challenges provide key insights in developing effective solutions.
ROMar 5, 2020
Autonomous Driving at Intersections: A Critical-Turning-Point Approach for Left TurnsK. Shu, H. Yu, X. Chen et al.
Left-turn planning is one of the formidable challenges for autonomous vehicles, especially at unsignalized intersections due to the unknown intentions of oncoming vehicles. This paper addresses the challenge by proposing a critical turning point (CTP) based hierarchical planning approach. This includes a high-level candidate path generator and a low-level partially observable Markov decision process (POMDP) based planner. The proposed (CTP) concept, inspired by human-driving behaviors at intersections, aims to increase the computational efficiency of the low-level planner and to enable human-friendly autonomous driving. The POMDP based low-level planner takes unknown intentions of oncoming vehicles into considerations to perform less conservative yet safe actions. With proper integration, the proposed hierarchical approach is capable of achieving safe planning results with high commute efficiency at unsignalized intersections in real time.