Claire Tang

CR
3papers
147citations
Novelty37%
AI Score37

3 Papers

CVSep 28, 2024
Contrastive ground-level image and remote sensing pre-training improves representation learning for natural world imagery

Andy V. Huynh, Lauren E. Gillespie, Jael Lopez-Saucedo et al.

Multimodal image-text contrastive learning has shown that joint representations can be learned across modalities. Here, we show how leveraging multiple views of image data with contrastive learning can improve downstream fine-grained classification performance for species recognition, even when one view is absent. We propose ContRastive Image-remote Sensing Pre-training (CRISP)$\unicode{x2014}$a new pre-training task for ground-level and aerial image representation learning of the natural world$\unicode{x2014}$and introduce Nature Multi-View (NMV), a dataset of natural world imagery including $>3$ million ground-level and aerial image pairs for over 6,000 plant taxa across the ecologically diverse state of California. The NMV dataset and accompanying material are available at hf.co/datasets/andyvhuynh/NatureMultiView.

7.4CRApr 10
S3CDM: A secret-sharing-scheme-based cyberattack detection model and its simulation implementation

Chi Sing Chum, Jia Lu, Claire Tang et al.

We design and develop a secret-sharing-scheme-based cyberattack detection model(S3CDM)that can detect unauthorized or illegal activities (especially insider attacks) and protect sensitive information within complex network infrastructures of large organizations. The model splits a secret among a group of legitimate participants or components for authentication, integration and detection of unauthorized activities. Traditional Shamir's polynomial interpolation based and our own hash function based schemes are utilized in the model, they both are practical and efficient to make sure the communications between different components are secure and any unauthorized activities can be detected. The model offers a flexible multi-factor authentication method to enhance the overall system security. Probability analysis [3] shows that multiple component model is more resistant against cyberattacks than the single component one. To demonstrate the feasibility, we implement the S3CDM in three parts on Google Cloud Platform, i.e., the front end UI (User Interface) running on an HTTP server, the back end individual services written in Python, and a PostgreSQL database. Docker is used to manage the start and stop of individual services and their URLs. We demonstrate how to use the UI with a use case of simulation of broken path in details.

ROMar 14, 2024
BEHAVIOR-1K: A Human-Centered, Embodied AI Benchmark with 1,000 Everyday Activities and Realistic Simulation

Chengshu Li, Ruohan Zhang, Josiah Wong et al.

We present BEHAVIOR-1K, a comprehensive simulation benchmark for human-centered robotics. BEHAVIOR-1K includes two components, guided and motivated by the results of an extensive survey on "what do you want robots to do for you?". The first is the definition of 1,000 everyday activities, grounded in 50 scenes (houses, gardens, restaurants, offices, etc.) with more than 9,000 objects annotated with rich physical and semantic properties. The second is OMNIGIBSON, a novel simulation environment that supports these activities via realistic physics simulation and rendering of rigid bodies, deformable bodies, and liquids. Our experiments indicate that the activities in BEHAVIOR-1K are long-horizon and dependent on complex manipulation skills, both of which remain a challenge for even state-of-the-art robot learning solutions. To calibrate the simulation-to-reality gap of BEHAVIOR-1K, we provide an initial study on transferring solutions learned with a mobile manipulator in a simulated apartment to its real-world counterpart. We hope that BEHAVIOR-1K's human-grounded nature, diversity, and realism make it valuable for embodied AI and robot learning research. Project website: https://behavior.stanford.edu.