Abhinav Chalise

2.2IVJul 2

Data-driven Video Codec with Implicit Neural Representations

Nishan Khanal, Saugat Neupane, Abhinav Chalise et al.

A conventional codec stores a video as compressed pixel data. We instead store the video, together with its audio track, as the weights of a single sinusoidal representation network (SIREN) that maps space-time coordinates to RGB values and audio amplitudes. The network uses separate audio and video initialization layers, a stack of shared fully connected hidden layers, and three output branches: one for video and two Siamese audio branches whose disagreement is used to estimate and subtract residual noise. The overfitted teacher network is then compressed by response-based knowledge distillation into a smaller student, followed by 16-bit symmetric weight quantization and lossless LZMA2 (xz) encoding. On a 6.08 MiB test video, the quantized student reaches a video PSNR of 28.72 dB with SSIM of 0.75, and an audio PSNR of 24.18 dB with a log spectral distance of 10.69 dB, while the pipeline shrinks the representation from 9.05 MiB to 2.33 MiB, an overall compression ratio of 2.61. A bit-width sweep from 1-bit to 32-bit quantization shows that reconstruction quality saturates at 16 bits. We compare against H.264, HEVC, and MP3, report where the approach falls short of them, and describe a browser-based prototype that trains, transfers, and decodes these models over WebRTC.

3.2ROAug 17, 2025

Mechanical Automation with Vision: A Design for Rubik's Cube Solver

Abhinav Chalise, Nimesh Gopal Pradhan, Nishan Khanal et al.

The core mechanical system is built around three stepper motors for physical manipulation, a microcontroller for hardware control, a camera and YOLO detection model for real-time cube state detection. A significant software component is the development of a user-friendly graphical user interface (GUI) designed in Unity. The initial state after detection from real-time YOLOv8 model (Precision 0.98443, Recall 0.98419, Box Loss 0.42051, Class Loss 0.2611) is virtualized on GUI. To get the solution, the system employs the Kociemba's algorithm while physical manipulation with a single degree of freedom is done by combination of stepper motors' interaction with the cube achieving the average solving time of ~2.2 minutes.

Abhinav Chalise

2 Papers