MMOct 8, 2016
Perceptually-Driven Video Coding with the Daala Video CodecYushin Cho, Thomas J. Daede, Nathan E. Egge et al.
The Daala project is a royalty-free video codec that attempts to compete with the best patent-encumbered codecs. Part of our strategy is to replace core tools of traditional video codecs with alternative approaches, many of them designed to take perceptual aspects into account, rather than optimizing for simple metrics like PSNR. This paper documents some of our experiences with these tools, which ones worked and which did not. We evaluate which tools are easy to integrate into a more traditional codec design, and show results in the context of the codec being developed by the Alliance for Open Media.
MMAug 5, 2016
Daala: Building A Next-Generation Video Codec From Unconventional TechnologyJean-Marc Valin, Timothy B. Terriberry, Nathan E. Egge et al.
Daala is a new royalty-free video codec that attempts to compete with state-of-the-art royalty-bearing codecs. To do so, it must achieve good compression while avoiding all of their patented techniques. We use technology that is as different as possible from traditional approaches to achieve this. This paper describes the technology behind Daala and discusses where it fits in the newly created AV1 codec from the Alliance for Open Media. We show that Daala is approaching the performance level of more mature, state-of-the art video codecs and can contribute to improving AV1.
MMMay 16, 2016
Daala: A Perceptually-Driven Still Picture CodecJean-Marc Valin, Nathan E. Egge, Thomas Daede et al.
Daala is a new royalty-free video codec based on perceptually-driven coding techniques. We explore using its keyframe format for still picture coding and show how it has improved over the past year. We believe the technology used in Daala could be the basis of an excellent, royalty-free image format.
SDMar 6, 2016
Improved Noise Weighting in CELP Coding of Speech - Applying the Vorbis Psychoacoustic Model To SpeexJean-Marc Valin, Christopher Montgomery
One key aspect of the CELP algorithm is that it shapes the coding noise using a simple, yet effective, weighting filter. In this paper, we improve the noise shaping of CELP using a more modern psychoacoustic model. This has the significant advantage of improving the quality of an existing codec without the need to change the bit-stream. More specifically, we improve the Speex CELP codec by using the psychoacoustic model used in the Vorbis audio codec. The results show a significant increase in quality, especially at high bit-rates, where the improvement is equivalent to a 20% reduction in bit-rate. The technique itself is not specific to Speex and could be applied to other CELP codecs.
SDMar 6, 2016
Low-Complexity Iterative Sinusoidal Parameter EstimationJean-Marc Valin, Daniel V. Smith, Christopher Montgomery et al.
Sinusoidal parameter estimation is a computationally-intensive task, which can pose problems for real-time implementations. In this paper, we propose a low-complexity iterative method for estimating sinusoidal parameters that is based on the linearisation of the model around an initial frequency estimate. We show that for N sinusoids in a frame of length L, the proposed method has a complexity of O(LN), which is significantly less than the matching pursuits method. Furthermore, the proposed method is shown to be more accurate than the matching pursuits and time frequency reassignment methods in our experiments.
SDFeb 17, 2016
A High-Quality Speech and Audio Codec With Less Than 10 ms DelayJean-Marc Valin, Timothy B. Terriberry, Christopher Montgomery et al.
With increasing quality requirements for multimedia communications, audio codecs must maintain both high quality and low delay. Typically, audio codecs offer either low delay or high quality, but rarely both. We propose a codec that simultaneously addresses both these requirements, with a delay of only 8.7 ms at 44.1 kHz. It uses gain-shape algebraic vector quantisation in the frequency domain with time-domain pitch prediction. We demonstrate that the proposed codec operating at 48 kbit/s and 64 kbit/s out-performs both G.722.1C and MP3 and has quality comparable to AAC-LD, despite having less than one fourth of the algorithmic delay of these codecs.
SDFeb 17, 2016
An Iterative Linearised Solution to the Sinusoidal Parameter Estimation ProblemJean-Marc Valin, Daniel V. Smith, Christopher Montgomery et al.
Signal processing applications use sinusoidal modelling for speech synthesis, speech coding, and audio coding. Estimation of the model parameters involves non-linear optimisation methods, which can be very costly for real-time applications. We propose a low-complexity iterative method that starts from initial frequency estimates and converges rapidly. We show that for N sinusoids in a frame of length L, the proposed method has a complexity of O(LN), which is significantly less than the matching pursuits method. Furthermore, the proposed method is shown to be more accurate than the matching pursuits and time-frequency reassignment methods in our experiments.