Jian Ren

h-index32

15papers

885citations

Novelty47%

AI Score40

Ranked #73,106 of 194,257 authors (top 38%)#1,747 in CR (top 26%)

15 Papers

21.2CVMar 27, 2024

TextCraftor: Your Text Encoder Can be Image Quality Controller

Yanyu Li, Xian Liu, Anil Kag et al.

Diffusion-based text-to-image generative models, e.g., Stable Diffusion, have revolutionized the field of content generation, enabling significant advancements in areas like image editing and video synthesis. Despite their formidable capabilities, these models are not without their limitations. It is still challenging to synthesize an image that aligns well with the input text, and multiple runs with carefully crafted prompts are required to achieve satisfactory results. To mitigate these limitations, numerous studies have endeavored to fine-tune the pre-trained diffusion models, i.e., UNet, utilizing various technologies. Yet, amidst these efforts, a pivotal question of text-to-image diffusion model training has remained largely unexplored: Is it possible and feasible to fine-tune the text encoder to improve the performance of text-to-image diffusion models? Our findings reveal that, instead of replacing the CLIP text encoder used in Stable Diffusion with other large language models, we can enhance it through our proposed fine-tuning approach, TextCraftor, leading to substantial improvements in quantitative benchmarks and human assessments. Interestingly, our technique also empowers controllable image generation through the interpolation of different text encoders fine-tuned with various rewards. We also demonstrate that TextCraftor is orthogonal to UNet finetuning, and can be combined to further improve generative quality.

9.6CVNov 7, 2024

AsCAN: Asymmetric Convolution-Attention Networks for Efficient Recognition and Generation

Anil Kag, Huseyin Coskun, Jierun Chen et al.

Neural network architecture design requires making many crucial decisions. The common desiderata is that similar decisions, with little modifications, can be reused in a variety of tasks and applications. To satisfy that, architectures must provide promising latency and performance trade-offs, support a variety of tasks, scale efficiently with respect to the amounts of data and compute, leverage available data from other tasks, and efficiently support various hardware. To this end, we introduce AsCAN -- a hybrid architecture, combining both convolutional and transformer blocks. We revisit the key design principles of hybrid architectures and propose a simple and effective \emph{asymmetric} architecture, where the distribution of convolutional and transformer blocks is \emph{asymmetric}, containing more convolutional blocks in the earlier stages, followed by more transformer blocks in later stages. AsCAN supports a variety of tasks: recognition, segmentation, class-conditional image generation, and features a superior trade-off between performance and latency. We then scale the same architecture to solve a large-scale text-to-image task and show state-of-the-art performance compared to the most recent public and commercial models. Notably, even without any computation optimization for transformer blocks, our models still yield faster inference speed than existing works featuring efficient attention mechanisms, highlighting the advantages and the value of our approach.

6.3IRSep 9, 2025

FLeW: Facet-Level and Adaptive Weighted Representation Learning of Scientific Documents

Zheng Dou, Deqing Wang, Fuzhen Zhuang et al.

Scientific document representation learning provides powerful embeddings for various tasks, while current methods face challenges across three approaches. 1) Contrastive training with citation-structural signals underutilizes citation information and still generates single-vector representations. 2) Fine-grained representation learning, which generates multiple vectors at the sentence or aspect level, requires costly integration and lacks domain generalization. 3) Task-aware learning depends on manually predefined task categorization, overlooking nuanced task distinctions and requiring extra training data for task-specific modules. To address these problems, we propose a new method that unifies the three approaches for better representations, namely FLeW. Specifically, we introduce a novel triplet sampling method that leverages citation intent and frequency to enhance citation-structural signals for training. Citation intents (background, method, result), aligned with the general structure of scientific writing, facilitate a domain-generalized facet partition for fine-grained representation learning. Then, we adopt a simple weight search to adaptively integrate three facet-level embeddings into a task-specific document embedding without task-aware fine-tuning. Experiments show the applicability and robustness of FLeW across multiple scientific tasks and fields, compared to prior models.

11.1CVJun 14, 2021

Flow Guided Transformable Bottleneck Networks for Motion Retargeting

Jian Ren, Menglei Chai, Oliver J. Woodford et al.

Human motion retargeting aims to transfer the motion of one person in a "driving" video or set of images to another person. Existing efforts leverage a long training video from each target person to train a subject-specific motion transfer model. However, the scalability of such methods is limited, as each model can only generate videos for the given target subject, and such training videos are labor-intensive to acquire and process. Few-shot motion transfer techniques, which only require one or a few images from a target, have recently drawn considerable attention. Methods addressing this task generally use either 2D or explicit 3D representations to transfer motion, and in doing so, sacrifice either accurate geometric modeling or the flexibility of an end-to-end learned representation. Inspired by the Transformable Bottleneck Network, which renders novel views and manipulations of rigid objects, we propose an approach based on an implicit volumetric representation of the image content, which can then be spatially manipulated using volumetric flow fields. We address the challenging question of how to aggregate information across different body poses, learning flow fields that allow for combining content from the appropriate regions of input images of highly non-rigid human subjects performing complex motions into a single implicit volumetric representation. This allows us to learn our 3D representation solely from videos of moving people. Armed with both 3D object understanding and end-to-end learned rendering, this categorically novel representation delivers state-of-the-art image generation quality, as shown by our quantitative and qualitative evaluations.

35.5CVApr 22, 2021Code

Motion Representations for Articulated Animation

Aliaksandr Siarohin, Oliver J. Woodford, Jian Ren et al.

We propose novel motion representations for animating articulated objects consisting of distinct parts. In a completely unsupervised manner, our method identifies object parts, tracks them in a driving video, and infers their motions by considering their principal axes. In contrast to the previous keypoint-based works, our method extracts meaningful and consistent regions, describing locations, shape, and pose. The regions correspond to semantically relevant and distinct object parts, that are more easily detected in frames of the driving video. To force decoupling of foreground from background, we model non-object related global motion with an additional affine transformation. To facilitate animation and prevent the leakage of the shape of the driving object, we disentangle shape and pose of objects in the region space. Our model can animate a variety of objects, surpassing previous methods by a large margin on existing benchmarks. We present a challenging new benchmark with high-resolution videos and show that the improvement is particularly pronounced when articulated objects are considered, reaching 96.6% user preference vs. the state of the art.

8.3CRApr 25, 2019

Bitcoin and Blockchain: Security and Privacy

Ehab Zaghloul, Tongtong Li, Matt Mutka et al.

A cryptocurrency is a decentralized digital currency that is designed for secure and private asset transfer and storage. As a currency, it should be difficult to counterfeit and double-spend. In this paper, we review and analyze the major security and privacy issues of Bitcoin. In particular, we focus on its underlying foundation, blockchain technology. First, we present a comprehensive background of Bitcoin and the preliminary on security. Second, the major security threats and countermeasures of Bitcoin are investigated. We analyze the risk of double-spending attacks, evaluate the probability of success in performing the attacks and derive the profitability for the attacker to perform such attacks. Third, we analyze the underlying Bitcoin peer-to-peer network security risks and Bitcoin storage security. We compare three types of Bitcoin wallets in terms of security, type of services and their trade-offs. Finally, we discuss the security and privacy features of alternative cryptocurrencies and present an overview of emerging technologies today. Our results can help Bitcoin users to determine a trade-off between the risk of double-spending attempts and the transaction time delay or confidence before accepting transactions. These results can also assist miners to develop suitable strategies to get involved in the mining process and maximize their profits.

2.7CRApr 25, 2019

$d$-MABE: Distributed Multilevel Attribute-Based EMR Management and Applications

Ehab Zaghloul, Tongtong Li, Matt Mutka et al.

Current systems used by medical institutions for the management and transfer of Electronic Medical Records (EMR) can be vulnerable to security and privacy threats. In addition, these centralized systems often lack interoperability and give patients limited or no access to their own EMRs. In this paper, we propose a novel distributed data sharing scheme that applies the security benefits of blockchain to handle these concerns. With blockchain, we incorporate smart contracts and a distributed storage system to alleviate the dependence on the record-generating institutions to manage and share patient records. To preserve privacy of patient records, we implement our smart contracts as a method to allow patients to verify attributes prior to granting access rights. Our proposed scheme also facilitates selective sharing of medical records among staff members that belong to different levels of a hierarchical institution. We provide extensive security, privacy, and evaluation analyses to show that our proposed scheme is both efficient and practical.

2.3CRSep 12, 2018

Security and Privacy Enhancement for Outsourced Biometric Identification

Kai Zhou, Jian Ren

A lot of research has been focused on secure outsourcing of biometric identification in the context of cloud computing. In such schemes, both the encrypted biometric database and the identification process are outsourced to the cloud. The ultimate goal is to protect the security and privacy of the biometric database and the query templates. Security analysis shows that previous schemes suffer from the enrolment attack and unnecessarily expose more information than needed. In this paper, we propose a new secure outsourcing scheme aims at enhancing the security from these two aspects. First, besides all the attacks discussed in previous schemes, our proposed scheme is also secure against the enrolment attack. Second, we model the identification process as a fixed radius similarity query problem instead of the kNN search problem. Such a modelling is able to reduce the exposed information thus enhancing the privacy of the biometric database. Our comprehensive security and complexity analysis show that our scheme is able to enhance the security and privacy of the biometric database and query templates while maintaining the same computational savings from outsourcing.

17.7CVJun 4, 2018

Adversarial Domain Adaptation for Classification of Prostate Histopathology Whole-Slide Images

Jian Ren, Ilker Hacihaliloglu, Eric A. Singer et al.

Automatic and accurate Gleason grading of histopathology tissue slides is crucial for prostate cancer diagnosis, treatment, and prognosis. Usually, histopathology tissue slides from different institutions show heterogeneous appearances because of different tissue preparation and staining procedures, thus the predictable model learned from one domain may not be applicable to a new domain directly. Here we propose to adopt unsupervised domain adaptation to transfer the discriminative knowledge obtained from the source domain to the target domain without requiring labeling of images at the target domain. The adaptation is achieved through adversarial training to find an invariant feature space along with the proposed Siamese architecture on the target domain to add a regularization that is appropriate for the whole-slide images. We validate the method on two prostate cancer datasets and obtain significant classification improvement of Gleason scores as compared with the baseline models.

5.8CRJan 8, 2018

P-MOD: Secure Privilege-Based Multilevel Organizational Data-Sharing in Cloud Computing

Ehab Zaghloul, Kai Zhou, Jian Ren

Cloud computing has changed the way enterprises store, access and share data. Data is constantly being uploaded to the cloud and shared within an organization built on a hierarchy of many different individuals that are given certain data access privileges. With more data storage needs turning over to the cloud, finding a secure and efficient data access structure has become a major research issue. With different access privileges, individuals with more privileges (at higher levels of the hierarchy) are granted access to more sensitive data than those with fewer privileges (at lower levels of the hierarchy). In this paper, a Privilege-based Multilevel Organizational Data-sharing scheme~(P-MOD) is proposed that incorporates a privilege-based access structure into an attribute-based encryption mechanism to handle these concerns. Each level of the privilege-based access structure is affiliated with an access policy that is uniquely defined by specific attributes. Data is then encrypted under each access policy at every level to grant access to specific data users based on their data access privileges. An individual ranked at a certain level can decrypt the ciphertext (at that specific level) if and only if that individual owns a correct set of attributes that can satisfy the access policy of that level. The user may also decrypt the ciphertexts at the lower levels with respect to the user's level. Security analysis shows that P-MOD is secure against adaptively chosen plaintext attack assuming the DBDH assumption holds.The comprehensive performance analysis demonstrates that P-MOD is more efficient in computational complexity and storage space than the existing schemes in secure data sharing within an organization.

6.3CRNov 14, 2017

PassBio: Privacy-Preserving User-Centric Biometric Authentication

Kai Zhou, Jian Ren

The proliferation of online biometric authentication has necessitated security requirements of biometric templates. The existing secure biometric authentication schemes feature a server-centric model, where a service provider maintains a biometric database and is fully responsible for the security of the templates. The end-users have to fully trust the server in storing, processing and managing their private templates. As a result, the end-users' templates could be compromised by outside attackers or even the service provider itself. In this paper, we propose a user-centric biometric authentication scheme (PassBio) that enables end-users to encrypt their own templates with our proposed light-weighted encryption scheme. During authentication, all the templates remain encrypted such that the server will never see them directly. However, the server is able to determine whether the distance of two encrypted templates is within a pre-defined threshold. Our security analysis shows that no critical information of the templates can be revealed under both passive and active attacks. PassBio follows a "compute-then-compare" computational model over encrypted data. More specifically, our proposed Threshold Predicate Encryption (TPE) scheme can encrypt two vectors x and y in such a manner that the inner product of x and y can be evaluated and compared to a pre-defined threshold. TPE guarantees that only the comparison result is revealed and no key information about x and y can be learned. Furthermore, we show that TPE can be utilized as a flexible building block to evaluate different distance metrics such as Hamming distance and Euclidean distance over encrypted data. Such a compute-then-compare computational model, enabled by TPE, can be widely applied in many interesting applications such as searching over encrypted data while ensuring data security and privacy.

11.9CRFeb 26, 2016

ExpSOS: Secure and Verifiable Outsourcing of Exponentiation Operations for Mobile Cloud Computing

Kai Zhou, M. H. Afifi, Jian Ren

Discrete exponential operation, such as modular exponentiation and scalar multiplication on elliptic curves, is a basic operation of many public-key cryptosystems. However, the exponential operations are considered prohibitively expensive for resource-constrained mobile devices. In this paper, we address the problem of secure outsourcing of exponentiation operations to one single untrusted server. Our proposed scheme (ExpSOS) only requires very limited number of modular multiplications at local mobile environment thus it can achieve impressive computational gain. ExpSOS also provides a secure verification scheme with probability approximately 1 to ensure that the mobile end-users can always receive valid results. The comprehensive analysis as well as the simulation results in real mobile device demonstrates that our proposed ExpSOS can significantly improve the existing schemes in efficiency, security and result verifiability. We apply ExpSOS to securely outsource several cryptographic protocols to show that ExpSOS is widely applicable to many cryptographic computations.

1.2ITNov 7, 2015

Optimal Construction of Regenerating Code through Rate-matching in Hostile Networks

Jian Li, Tongtong Li, Jian Ren

Regenerating code is a class of code very suitable for distributed storage systems, which can maintain optimal bandwidth and storage space. Two types of important regenerating code have been constructed: the minimum storage regeneration (MSR) code and the minimum bandwidth regeneration (MBR) code. However, in hostile networks where adversaries can compromise storage nodes, the storage capacity of the network can be significantly affected. In this paper, we propose two optimal constructions of regenerating codes through rate-matching that can combat against this kind of adversaries in hostile networks: 2-layer rate-matched regenerating code and $m$-layer rate-matched regenerating code. For the 2-layer code, we can achieve the optimal storage efficiency for given system requirements. Our comprehensive analysis shows that our code can detect and correct malicious nodes with higher storage efficiency compared to the universally resilient regenerating code which is a straightforward extension of regenerating code with error detection and correction capability. Then we propose the $m$-layer code by extending the 2-layer code and achieve the optimal error correction efficiency by matching the code rate of each layer's regenerating code. We also demonstrate that the optimized parameter can achieve the maximum storage capacity under the same constraint. Compared to the universally resilient regenerating code, our code can achieve much higher error correction efficiency.

3.2CRNov 7, 2015

CASO: Cost-Aware Secure Outsourcing of General Computational Problems

Kai Zhou, Jian Ren

Computation outsourcing is an integral part of cloud computing. It enables end-users to outsource their computational tasks to the cloud and utilize the shared cloud resources in a pay-per-use manner. However, once the tasks are outsourced, the end-users will lose control of their data, which may result in severe security issues especially when the data is sensitive. To address this problem, secure outsourcing mechanisms have been proposed to ensure security of the end-users' outsourced data. In this paper, we investigate outsourcing of general computational problems which constitute the mathematical basics for problems emerged from various fields such as engineering and finance. To be specific, we propose affine mapping based schemes for the problem transformation and outsourcing so that the cloud is unable to learn any key information from the transformed problem. Meanwhile, the overhead for the transformation is limited to an acceptable level compared to the computational savings introduced by the outsourcing itself. Furthermore, we develop cost-aware schemes to balance the trade-offs between end-users' various security demands and computational overhead. We also propose a verification scheme to ensure that the end-users will always receive a valid solution from the cloud. Our extensive complexity and security analysis show that our proposed Cost-Aware Secure Outsourcing (CASO) scheme is both practical and effective.

5.7CROct 5, 2015

Beyond the MDS Bound in Distributed Cloud Storage

Jian Li, Tongtong Li, Jian Ren

Distributed storage plays a crucial role in the current cloud computing framework. After the theoretical bound for distributed storage was derived by the pioneer work of the regenerating code, Reed-Solomon code based regenerating codes were developed. The RS code based minimum storage regeneration code (RS-MSR) and the minimum bandwidth regeneration code (RS-MBR) can achieve theoretical bounds on the MSR point and the MBR point respectively in code regeneration. They can also maintain the MDS property in code reconstruction. However, in the hostile network where the storage nodes can be compromised and the packets can be tampered with, the storage capacity of the network can be significantly affected. In this paper, we propose a Hermitian code based minimum storage regenerating (H-MSR) code and a minimum bandwidth regenerating (H-MBR) code. We first prove that our proposed Hermitian code based regenerating codes can achieve the theoretical bounds for MSR point and MBR point respectively. We then propose data regeneration and reconstruction algorithms for the H-MSR code and the H-MBR code in both error-free network and hostile network. Theoretical evaluation shows that our proposed schemes can detect the erroneous decodings and correct more errors in hostile network than the RS-MSR code and the RS-MBR code with the same code rate. Our analysis also demonstrates that the proposed H-MSR and H-MBR codes have lower computational complexity than the RS-MSR/RS-MBR codes in both code regeneration and code reconstruction.