SEMar 19, 2025
LLM-Aided Customizable Profiling of Code Data Based On Programming Language ConceptsPankaj Thorat, Adnan Qidwai, Adrija Dhar et al.
Data profiling is critical in machine learning for generating descriptive statistics, supporting both deeper understanding and downstream tasks like data valuation and curation. This work addresses profiling specifically in the context of code datasets for Large Language Models (code-LLMs), where data quality directly influences tasks such as code generation and summarization. Characterizing code datasets in terms of programming language concepts enables better insights and targeted data curation. Our proposed methodology decomposes code data profiling into two phases: (1) an offline phase where LLMs are leveraged to derive and learn rules for extracting syntactic and semantic concepts across various programming languages, including previously unseen or low-resource languages, and (2) an online deterministic phase applying these derived rules for efficient real-time analysis. This hybrid approach is customizable, extensible to new syntactic and semantic constructs, and scalable to multiple languages. Experimentally, our LLM-aided method achieves a mean accuracy of 90.33% for syntactic extraction rules and semantic classification accuracies averaging 80% and 77% across languages and semantic concepts, respectively.
CRNov 30, 2021
Privacy-Preserving Decentralized Exchange MarketplacesKavya Govindarajan, Dhinakaran Vinayagamurthy, Praveen Jayachandran et al.
Decentralized exchange markets leveraging blockchain have been proposed recently to provide open and equal access to traders, improve transparency and reduce systemic risk of centralized exchanges. However, they compromise on the privacy of traders with respect to their asset ownership, account balance, order details and their identity. In this paper, we present Rialto, a fully decentralized privacy-preserving exchange marketplace with support for matching trade orders, on-chain settlement and market price discovery. Rialto provides confidentiality of order rates and account balances and unlinkability between traders and their trade orders, while retaining the desirable properties of a traditional marketplace like front-running resilience and market fairness. We define formal security notions and present a security analysis of the marketplace. We perform a detailed evaluation of our solution, demonstrate that it scales well and is suitable for a large class of goods and financial instruments traded in modern exchange markets.
CRSep 23, 2020
Reliable, Fair and Decentralized Marketplace for Content Sharing Using BlockchainPrabal Banerjee, Chander Govindarajan, Praveen Jayachandran et al.
Content sharing platforms such as Youtube and Vimeo have promoted pay per view models for artists to monetize their content. Yet, artists remain at the mercy of centralized platforms that control content listing and advertisement, with little transparency and fairness in terms of number of views or revenue. On the other hand, consumers are distanced from the publishers and cannot authenticate originality of the content. In this paper, we develop a reliable and fair platform for content sharing without a central facilitator. The platform is built as a decentralized data storage layer to store and share content in a fault-tolerant manner, where the peers also participate in a blockchain network. The blockchain is used to manage content listings and as an auditable and fair marketplace transaction processor that automatically pays out the content creators and the storage facilitators using smart contracts. We demonstrate the system with the blockchain layer built on Hyperledger Fabric and the data layer built on Tahoe-LAFS,and show that our design is practical and scalable with low overheads.