![]() |
AUDIO Extension for the X Window System |
| Contents |
The implementation actually provides two distinct server extensions: TIME and AUDIO. The TIME extensions provides server-side clocks and schedulers; clients can use this extension to delegate X requests to the server that are not immediately executed but deferred to a later point in time. The AUDIO extension builds on top of this and provides audio services through server-side sample buffers, pcm devices and pcm contexts.
TIME extensionThe TIME extension introduces concepts required for temporal processing of requests. Clients can create the following server-side objects:
Note that the time concept is independent from audio - clients can in fact submit graphics operations through Schedulers and use Clocks to realize server-timed animations. In general scheduled requests are executed by a real-time thread within the X server; an accounting mechanism ensures that processing time is allotted equally to all clients and prevents the system from overload [1]. This mechanism therefore also provides a controlled mechanism through which client application can have "certain" critical operations be executed in real-time (e.g. precise timing of image blits to the screen). AUDIO extensionThe AUDIO extension introduces several server-side objects: SampleBuffers, PCMDevices and PCMContexts. Sample buffersSampleBuffers serve as server-side stores of audio sample data (in a sense they mirror Pixmaps). Each sample buffer stores a sequence of scalar sample values. The buffers realize a "sliding window" concept to support streaming audio:
Each buffer stores samples for a consecutive range of index values; at any time, access to any index is permitted, but writes to indices outside the "active window" will be discarded and reads will implicitly yield zeroes. The active window may be shifted at any time, causing some samples to "drop out" (and their value to be lost), while new samples are "shifted in" (and their values are initialized to zero; they may subsequently be overwritten).
This concept allows random access to the sample data and requires only "weak" coupling between readers and writers (the X client simply has to make sure that all accesses fall into the "active window" which depending on the application scenario can be realized through large buffers and infrequent window shifts or small buffers and frequent window shifts). Applications may transfer sample data from and to the sample buffers as required for playback or capture; the extension allows sample buffers to be placed in shared memory to minimize communication overhead. The extension provides several operations that can be performed on the contents of sample buffers such as: multiplication/scaling, accumulation, convolution, clamping and simple synthesis. These operations generally combine values from one or two source sample buffers and modify the contents of one target sample buffer; refer to the API Documentation for an overview of the available operations. It should be noted that, in a sense, these operations mirror the compositing capabilities of the RENDER extension for images. PCM devices and contextsCapture and playback devices are represented as server-side PCMDevices; clients can list and query available devices, and instantiate PCMContexts to perform playback and capture. Applications must assign one SampleBuffer per channel to a PCMContext; samples will be read from or written to the client-specified buffers. A PCMContext contains all information that determines how the samples in the buffers are to be interpreted, such as sample rate, or mapping to channels [2]; once activated, the PCM device associated with a context will start reading/writing to the specified buffers. The X client has to shift the "active window" of the buffers and write data to or read data from the sample buffers, accordingly. PCMContexts also serve as Clock objects (see above); this means that applications can bind schedulers to PCMContexts and have X requests be executed in sync with audio playback or capture. Audio compositingThe AUDIO extension allows hooks for an "audio manager" that e.g. determines how multiple concurrently playing audio streams from different applications are mixed; the model vaguely corresponds to the COMPOSITE/DAMAGE model of image compositing already present in the X server. The audio manager itself maintains a PCM context to a "physical" PCM device and associated "master sample buffers". It then creates "virtual" PCM devices; the manager will be informed of any client that creates and activates PCM contexts of such virtual devices. Upon activation, the audio manager is responsible for taking actions that cause the contents of the "slave" sample buffers to be mixed into the "master" sample buffer; for this purpose it may use the compositing operations described above [3]. The TIME extension may be used as well to allow low-latency mixing. Footnotes[1] Usually it does not hurt if the true cost is grossly over-estimated (as CPUs are typically sufficiently fast to make computation cost for audio processing nearly neglegible). However the operations performed in the real-time thread on behalf of the client are quite simple (multiplication, accumulation, convolution); during my experiments I have found that a trivial estimator that always assumes worst-case behaviour (dataset too large to fit in L2 cache) does not over-estimate the common case (dataset already present in L1 cache) by a factor more than 2. More experimentation may be needed to determine whether this approach is feasible. [2] Yes, this means that SampleBuffers do not have an associated sample rate. As a consequence the X server does never perform any implicit sample rate conversion -- clients have to explicitly issue requests that result in the desired conversion. This is by design because conversion involves considerable quality/speed tradeoffs that are policy decisions and thus IMHO do not belong into the X server. See XaudioConvolute and http://ccrma.stanford.edu/~jos/resample/ how the extension supports resampling. [3] Among other things the audio compositing manager could be configured to mute or attenuate background clients, or even funny effects like panning audio according to the position of the application's window. It may also transparently redirect clients to different audio devices (e.g. switch to headphones, without the application noticing). |
Helge Bahmann <hcb@chaoticmind.net>