Wednesday, February 16, 2022

Google Translate

 I observed an interesting behavior while playing with Google Translate today:


Not sure if it's a result of some incorrectly tagged samples or it's the underlying neural translation model has accumulated a small sense of humor:) If somehow we can shape the personality of a large neural network with the training dataset, can we say that the trained model is sort of conscious, maybe to the slightest extent?

Wednesday, December 30, 2015

ExoPlayer Architecture

ExoPlayer is an open source media player from Google. It provides an example implementation for DASH and Smooth Streaming playback with Common Encryption, so that 3rd-party applications can extend it to build rich media experience which isn't directly available from the built-in MediaPlayer.

[1] https://google.github.io/ExoPlayer/guide.html
[2] https://github.com/google/ExoPlayer
[3] http://www.iso.org/iso/home/store/catalogue_ics/catalogue_detail_ics.htm?csnumber=65274
[4] https://www.iso.org/obp/ui/#iso:std:iso-iec:23001:-7:ed-1:v1:en

Tuesday, June 3, 2014

Introduction of audio pipeline in ChromiumOS

1. System architecture

Chromium OS is an open source project, based on Linux kernel with Chrome browser as the main application interface.
The communication mechanism between Chromium OS and its browser currently is via D-bus.

2. Chrome OS audio server (CRAS)

2.1 Introduction

  • CRAS sits between Chrome and ALSA 
    • Output: Chrome connects to this audio server, sends audio data to it, then audio server renders this to the selected ALSA device through alsa-lib. 
    • Input: the server will get audio samples from alsa and forward them to Chrome. 
  • Three main requirements: 
    • Low Latency: 20ms latency and waking up every 10ms 
    • Low CPU usage: Less than two percent of the CPU of a CR48 is used while playing back audio 
    • Dynamic audio routing

2.2 Code structure

  • server - the source for the sound server 
  • libcras - client library for interacting with cras 
  • common - files common to both the server and library 
  • tests - tests for cras and libcras
  • source code location: chromiumos/src/third_party/adhd/cras

2.3 Block diagram

2.4 Run-time behavior

2.4.1 Initialization

2.4.2 Audio IO thread

Audio IO thread is responsible for reading data from each stream, mixing and sending to output device. Note: Digital signal processing(DSP) is applied post-mix, i.e. global effects.

2.4.3 DSP thread

There is a dedicated thread to handle DSP related requests.

2.4.4 server main thread

2.4.5 Audio routing

2.4.5.1 Jack plugged in

2.4.5.2 Switch to Bluetooth

3. Chromium Media Pipeline

3.1 Introduction

Pull-based pipeline, consists of data source, demuxers, decoders and renders.

3.2 Pipeline integration into Webkit

3.3 Pipeline state machine

3.4 Pipeline decomposition

3.4.1 class diagram

3.4.2 Audio Sink to CRAS

3.4.3 Example of initiating a new playback stream

3.5 Multi-channel processing

Currently IO devices enumerated with CRAS server only support up to stereo output. If a client requests to add a stream which isn’t compatible with iodev’s audio format, then CRAS client will perform format conversion automatically.

4. Interactions between UI and CRAS

Because all applications run in browser tabs, Chromium has defined an extension API currently for audio to control CRAS parameters, such as setting volume, etc.

References

Thursday, October 24, 2013

DASH tools from GPAC


1. Content creation
  • DashCast
    $./DashCast -conf dash.conf -av Fantastic_Four_Trailer.mp4 -seg-dur 2000 -frag-dur 200
  • MP4Box
    $./MP4Box -dash 2000 -frag 200 -rap -frag-rap Fantastic_Four_Trailer.mp4

2. Playback

Sunday, September 15, 2013

Open source media frameworks


  • VLC LGPL, C
    git clone git://git.videolan.org/vlc.git
  • MPlayer GPLv2, C
    svn checkout svn://svn.mplayerhq.hu/mplayer/trunk mplayer
  • GStreamer LGPL, C
     git clone git://anongit.freedesktop.org/gstreamer/
  • GPAC LGPL, C
    svn co svn://svn.code.sf.net/p/gpac/code/trunk/gpac gpac
  • XBMC GPL, C++
    git clone git://github.com/xbmc/xbmc.git
  • Xine GPLv2, C
    http://sourceforge.net/projects/xine/
  • FFMpeg GPL/LGPL, C
    git clone git://source.ffmpeg.org/ffmpeg.git ffmpeg

Tuesday, May 21, 2013

Next phase of my life

My daughter was born a week ago. If you know me or my wife, please share our happiness! :-D

Saturday, May 11, 2013

Casual thoughts on Stagefright

For the past 4+ years, I've mainly focused on Android multimedia area in my daily work. While the maintenance of Stagefright is much easier compared with OpenCORE,, there are still places which I think could be improved in future. Although I will not have the privilege of closely working on it any more after moving to my next occupation :)

Main issue
One of the biggest pains while working with Stagefright is various ANRs and tombstones related to mediaserver during stability test or user trials.
On one hand, Stagefright isn't mature and robust enough, there are some inherent issues limited by its structure, which makes it very difficult to fix all of them thoroughly. For example, you can easily reproduce an ANR while seeking and play/pause quickly during HTTP streaming due to the blocking read() call to network data and no way to cancel it gracefully, and the existing buffering mechanism doesn't work well because it can't predict which position to be read after seeking; and the potential deadlock between mediaserver and mediaplayer due to the subtle difference between calling mediaplayer from another application and from mediaserver itself (e.g. CameraService) as the binder transaction is skipped from intra-process call, etc.
On the other hand, even if you find the root cause and a potential solution, it's difficult to validate the fix because of the problem is highly impacted by network condition, race condition among threads, etc. Although we can simulate network traffic conditions with bandwidth control tools like netem, it still requires fine tuning and many tries if we don't hack into the code to force it to enter the situation. For lock related issues, I used to write simple CppUTest cases in many loops with different sleep time between function calls and then wish myself luck.

As an advocate of TDD and test automation, I think it would be ideal to have native media engine code written with testability in mind, such as incorporating gtest or other C++ test frameworks, so we can easily add unit test cases for newly reported issues. It would also help to facilitate the up-merge process of each release. Based on my experience, CTS and mediaframeworktest is far from sufficient to catch most of the issues we encountered before shipping the device.

In addition, with more and more features added into AOSP, it would help to unify different pieces of multimedia module into a more flexible architecture, to provide a generic pipeline for playback, recording, and transcoding etc, as well as an elegant way to support customized audio features, like LPA, tunnel mode, 5.1 channels, etc.

I guess it sounds greedy given that most of its code was written by only a couple of Google engineers in a short time with more and more functionality been added to each release. Keep moving, Stagefright!