Saikat Guha

CV
9papers
209citations
Novelty41%
AI Score42

9 Papers

CVJun 21, 2022
Towards Optimizing OCR for Accessibility

Peya Mowar, Tanuja Ganu, Saikat Guha

Visual cues such as structure, emphasis, and icons play an important role in efficient information foraging by sighted individuals and make for a pleasurable reading experience. Blind, low-vision and other print-disabled individuals miss out on these cues since current OCR and text-to-speech software ignore them, resulting in a tedious reading experience. We identify four semantic goals for an enjoyable listening experience, and identify syntactic visual cues that help make progress towards these goals. Empirically, we find that preserving even one or two visual cues in aural form significantly enhances the experience for listening to print content.

QUANT-PHNov 13, 2024
Multiplexed bi-layered realization of fault-tolerant quantum computation over optically networked trapped-ion modules

Nitish K. Chandra, Saikat Guha, Kaushik P. Seshadreesan

We study an architecture for fault-tolerant measurement-based quantum computation (FT-MBQC) over optically-networked trapped-ion modules. The architecture is implemented with a finite number of modules and ions per module, and leverages photonic interactions for generating remote entanglement between modules and local Coulomb interactions for intra-modular entangling gates. We focus on generating the topologically protected Raussendorf-Harrington-Goyal (RHG) lattice cluster state, which is known to be robust against lattice bond failures and qubit noise, with the modules acting as lattice sites. To ensure that the remote entanglement generation rates surpass the bond-failure tolerance threshold of the RHG lattice, we employ spatial and temporal multiplexing. For realistic system timing parameters, we estimate the code cycle time of the RHG lattice and the ion resources required in a bi-layered implementation, where the number of modules matches the number of sites in two lattice layers, and qubits are reinitialized after measurement. For large distances between modules, we incorporate quantum repeaters between sites and analyze the benefits in terms of cumulative resource requirements. Finally, we derive and analyze a qubit noise-tolerance threshold inequality for the RHG lattice generation in the proposed architecture that accounts for noise from various sources. This includes the depolarizing noise arising from the photonically-mediated remote entanglement generation between modules due to finite optical detection efficiency, limited visibility, and the presence of dark clicks, in addition to the noise from imperfect gates and measurements, and memory decoherence with time. Our work thus underscores the hardware and channel threshold requirements to realize distributed FT-MBQC in a leading qubit platform today -- trapped ions.

LGOct 31, 2022
Towards Zero-Shot and Few-Shot Table Question Answering using GPT-3

Pragya Srivastava, Tanuja Ganu, Saikat Guha

We present very early results on using GPT-3 to perform question answering on tabular data. We find that stock pre-trained GPT-3 is able to zero-shot learn the table structure from a serialized JSON array-of-arrays representation, and able to answer lookup queries and simple comparison questions in natural language without any fine-tuning. We further find that simple prompt engineering to include few-shot static Q&A examples significantly improves accuracy. Lastly, we find that intermixing passage text improves accuracy even further on heterogeneous data. We apply our approach on a novel dataset of simple tables in newspaper infographics with promising results. Overall, we find much cause for optimism in this basic approach.

CVNov 16, 2022
ChartParser: Automatic Chart Parsing for Print-Impaired

Anukriti Kumar, Tanuja Ganu, Saikat Guha

Infographics are often an integral component of scientific documents for reporting qualitative or quantitative findings as they make it much simpler to comprehend the underlying complex information. However, their interpretation continues to be a challenge for the blind, low-vision, and other print-impaired (BLV) individuals. In this paper, we propose ChartParser, a fully automated pipeline that leverages deep learning, OCR, and image processing techniques to extract all figures from a research paper, classify them into various chart categories (bar chart, line chart, etc.) and obtain relevant information from them, specifically bar charts (including horizontal, vertical, stacked horizontal and stacked vertical charts) which already have several exciting challenges. Finally, we present the retrieved content in a tabular format that is screen-reader friendly and accessible to the BLV users. We present a thorough evaluation of our approach by applying our pipeline to sample real-world annotated bar charts from research papers.

11.8QUANT-PHApr 28
Quantum-enhanced Network Tomography

Yufei Zheng, Zihao Gong, Saikat Guha et al.

Network tomography refers to the use of inference techniques for inferring internal network states from end-to-end probes. Quantum probes, implemented by sending blocks of $n$ coherent-state pulses augmented with continuous-variable (CV) squeezing ($n=1$) or weak temporal-mode entanglement ($n>1$) over a lossy channel to a receiver with homodyne detection capabilities, are known to carry information about the channel transmissivity. Assuming a subset of nodes in an optical network is capable of sending and receiving such probes through intermediate nodes with all-optical switching capabilities, we leverage these quantum probes to estimate link transmissivities. To determine how to route the probes in a network, we propose a probe construction algorithm that guarantees link identifiability, while maximizing the number of information orthogonal sets of transmissivities. A set of probes induces a Fisher information matrix (FIM). We then derive two metrics, the determinant of the FIM and the trace of its inverse, to evaluate the performance of the probes. In particular, our results can be used to characterize the quantum improvement in estimating link transmissivities in a general optical network.

CVJun 21, 2022
Broken News: Making Newspapers Accessible to Print-Impaired

Vishal Agarwal, Tanuja Ganu, Saikat Guha

Accessing daily news content still remains a big challenge for people with print-impairment including blind and low-vision due to opacity of printed content and hindrance from online sources. In this paper, we present our approach for digitization of print newspaper into an accessible file format such as HTML. We use an ensemble of instance segmentation and detection framework for newspaper layout analysis and then OCR to recognize text elements such as headline and article text. Additionally, we propose EdgeMask loss function for Mask-RCNN framework to improve segmentation mask boundary and hence accuracy of downstream OCR task. Empirically, we show that our proposed loss function reduces the Word Error Rate (WER) of news article text by 32.5 %.

CVJun 21, 2022
Document Navigability: A Need for Print-Impaired

Anukriti Kumar, Tanuja Ganu, Saikat Guha

Printed documents continue to be a challenge for blind, low-vision, and other print-disabled (BLV) individuals. In this paper, we focus on the specific problem of (in-)accessibility of internal references to citations, footnotes, figures, tables and equations. While sighted users can flip to the referenced content and flip back in seconds, linear audio narration that BLV individuals rely on makes following these references extremely hard. We propose a vision based technique to locate the referenced content and extract metadata needed to (in subsequent work) inline a content summary into the audio narration. We apply our technique to citations in scientific documents and find it works well both on born-digital as well as scanned documents.

66.4QUANT-PHMar 17
Boosted linear-optical measurements on single-rail qubits with unentangled ancillas

Aqil Sajjad, Isack Padilla, Saikat Guha

Any quantum state of the radiation field, sliced in small non-overlapping space-time bins is a collection of single-rail qubits, each spanning the vacuum and single-photon Fock state of a mode. Quantum logic on these qubits would enable arbitrary measurements on information-bearing light, but is hard due to the lack of strong nonlinearities. With unentangled ancilla single-rail qubits, an $8$-port interferometer and photon detection, we show any single-rail qubit measurement in the $XY$ Bloch plane is realizable with success probability $147/256$, which beats the prior-known $1/2$ limit.

ITSep 20, 2017
Covert Wireless Communication with Artificial Noise Generation

Ramin Soltani, Dennis Goeckel, Don Towsley et al.

Covert communication conceals the transmission of the message from an attentive adversary. Recent work on the limits of covert communication in additive white Gaussian noise (AWGN) channels has demonstrated that a covert transmitter (Alice) can reliably transmit a maximum of $\mathcal{O}\left(\sqrt{n}\right)$ bits to a covert receiver (Bob) without being detected by an adversary (Warden Willie) in $n$ channel uses. This paper focuses on the scenario where other friendly nodes distributed according to a two-dimensional Poisson point process with density $m$ are present in the environment. We propose a strategy where the friendly node closest to the adversary, without close coordination with Alice, produces artificial noise. We show that this method allows Alice to reliably and covertly send $\mathcal{O}(\min\{{n,m^{γ/2}\sqrt{n}}\})$ bits to Bob in $n$ channel uses, where $γ$ is the path-loss exponent. Moreover, we also consider a setting where there are $N_{\mathrm{w}}$ collaborating adversaries uniformly and randomly located in the environment and show that in $n$ channel uses, Alice can reliably and covertly send $\mathcal{O}\left(\min\left\{n,\frac{m^{γ/2} \sqrt{n}}{N_{\mathrm{w}}^γ}\right\}\right)$ bits to Bob when $γ>2$, and $\mathcal{O}\left(\min\left\{n,\frac{m \sqrt{n}}{N_{\mathrm{w}}^{2}\log^2 {N_{\mathrm{w}}}}\right\}\right)$ when $γ= 2$. Conversely, we demonstrate that no higher covert throughput is possible for $γ>2$.