Mital Kinderkhedia

DB
3papers
14citations
Novelty8%
AI Score15

3 Papers

SEJan 9, 2022Code
A Benchmark of JSON-compatible Binary Serialization Specifications

Juan Cruz Viotti, Mital Kinderkhedia

We present a comprehensive benchmark of JSON-compatible binary serialization specifications using the SchemaStore open-source test suite collection of over 400 JSON documents matching their respective schemas and representative of their use across industries. We benchmark a set of schema-driven (ASN.1, Apache Avro, Microsoft Bond, Cap'n Proto, FlatBuffers, Protocol Buffers, and Apache Thrift) and schema-less (BSON, CBOR, FlexBuffers, MessagePack, Smile, and UBJSON) JSON-compatible binary serialization specifications. Existing literature on benchmarking JSON-compatible binary serialization specifications demonstrates extensive gaps when it comes to binary serialization specifications coverage, reproducibility and representativity, the role of data compression in binary serialization and the choice and use of obsolete versions of binary serialization specifications. We introduce a tiered taxonomy for JSON documents consisting of 36 categories classified as Tier 1, Tier 2 and Tier 3 as a common basis to class JSON documents based on their size, type of content, characteristics of their structure and redundancy criteria. We built and published a free-to-use online tool to automatically categorize JSON documents according to our taxonomy that generates related summary statistics. In the interest of fairness and transparency, we adhere to reproducible software development standards and publicly host the benchmark software and results on GitHub.

DBJan 6, 2022
A Survey of JSON-compatible Binary Serialization Specifications

Juan Cruz Viotti, Mital Kinderkhedia

In this paper, we present the recent advances that highlight the characteristics of JSON-compatible binary serialization specifications. We motivate the discussion by covering the history and evolution of binary serialization specifications across the years starting from 1960s to 2000s and onwards. We analyze the use cases of the most popular serialization specifications across the industries. Drawing on the schema-driven (ASN.1, Apache Avro, Microsoft Bond, Cap'n Proto, FlatBuffers, Protocol Buffers, and Apache Thrift) and schema-less (BSON, CBOR, FlexBuffers, MessagePack, Smile, and UBJSON) JSON-compatible binary serialization specifications, we compare and contrast their inner workings through our analysis. We explore a set of non-standardized binary integer encoding techniques (ZigZag integer encoding and Little Endian Base 128 variable-length integer encoding) that are essential to understand the various JSON-compatible binary serialization specifications. We systematically discuss the history, the characteristics, and the serialization processes of the selection of schema-driven and schema-less binary serialization specifications and we identify the challenges associated with schema evolution in the context of binary serialization. Through reflective exercise, we explain our observations of the selection of JSON-compatible binary serialization specifications. This paper aims to guide the reader to make informed decisions on the choice between schema-driven or schema-less JSON-compatible binary serialization specifications.

LGJun 7, 2019
Learning Representations of Graph Data -- A Survey

Mital Kinderkhedia

Deep Neural Networks have shown tremendous success in the area of object recognition, image classification and natural language processing. However, designing optimal Neural Network architectures that can learn and output arbitrary graphs is an ongoing research problem. The objective of this survey is to summarize and discuss the latest advances in methods to Learn Representations of Graph Data. We start by identifying commonly used types of graph data and review basics of graph theory. This is followed by a discussion of the relationships between graph kernel methods and neural networks. Next we identify the major approaches used for learning representations of graph data namely: Kernel approaches, Convolutional approaches, Graph neural networks approaches, Graph embedding approaches and Probabilistic approaches. A variety of methods under each of the approaches are discussed and the survey is concluded with a brief discussion of the future of learning representation of graph data.