IntroductionThe basic visual (and UI) building block in Wayland (the protocol) is a wl_surface. Basically everything on screen is represented as wl_surfaces in the protocol: mouse cursors, windows, icons, etc. A surface gets its content and size by attaching a wl_buffer to it, which is a handle to a pixel container. A surface has many attributes, like the input region: the region of the surface where it can receive input events. Input events, e.g. pointer motion, that happen on the surface but outside of the input region get directed to what is below the surface. The input region can be empty, but it cannot extend beyond the surface dimensions.
It so happens, that cursor, shell surface (window), and drag icon are also surface roles. Under a desktop shell, a surface cannot become visible (mapped) unless it has a role, and it fills the requirements of that particular role. For example, a client can set a cursor surface only when it has the pointer focus. Without a role the compositor would not know what do with a surface. Roles are exclusive: a surface can have only one role at a time. How a role is assigned depends on the protocol for the particular role, there is no generic set_role-interface.
A window is a wl_surface with a suitable shell role, there is no separate object type "window" in the protocol. A window being a single wl_surface means that its contents must come from a single wl_buffer at a time. For most applications that is just fine, but there are few exceptions where it makes things less than optimal when you want to take advantage of hardware acceleration features to the fullest.
The problemLet us consider a video player in a window. Window decorations and GUI elements are usually rendered in an RGB color format on the CPU. Video usually decodes into some YUV color format. To create one complete wl_buffer for the window, the application must merge these: convert the video into RGB and combine it with the GUI elements. And it has to do that for every single video frame, whether the GUI elements change or not. This causes several performance penalties. If your graphics card is capable of showing YUV-formatted content directly in an overlay, you cannot take advantage of that. If you have video decoding hardware, you probably have to access and copy the produced YUV images with the CPU, while doing a color conversion. Getting CPU access to a hardware rendered buffer may be expensive to begin with, and then color conversion means you are doing a copy. When you finally have that wl_buffer finished and send it to the compositor, the compositor will likely just have to upload it to the GPU again, making another expensive copy. All this hassle and pain is just to get the GUI elements and the video drawn into the same wl_buffer.
Another example is an OpenGL window, or an OpenGL canvas in a window. You definitely do not want to make the GL rendered buffer CPU-accessible, as that can be very expensive. The obvious workaround is to upload your other GUI elements into textures, and combine them with the GL canvas in GL. That could be fairly performant, but it is also very painful to achieve, especially if your toolkit has not been designed to work like that.
A more complex example is a Web browser, where you can have any number of video and GL widgets around the page.
Enter sub-surfacesSub-surface is a wl_surface role, that means the surface is an integral sub-part of a window. A sub-surface must always have a parent surface, and the parent surface can have any role. Therefore a window can be constructed from any number of wl_surface objects by choosing one of them to be the main surface which gets a role from the shell, and others are sub-surfaces. Also nesting is allowed, so you can have sub-sub-surfaces etc.
The tree of sub-surfaces starting from the main surface defines a window. The application sets the sub-surface's position on the parent surface, and the compositor will keep the sub-surface glued to the parent. The compositor does not clip sub-surfaces to the parent surface. This means you could implement decorations as four surfaces around the content surface, and compared to one big surface for decorations, you avoid wasting memory for the part that will always be behind the content surface. (This approach may have a visual downside, though.) It also means, that for window management purposes, the size of the window comes from the union of the whole (sub-)surface tree.
In the windowed video player example, the video can be put on a wl_surface of its own, and the decorations into another. If there are sub-titles on top of the video, that could be a third wl_surface. If the compositor accepts the YUV color format the video decoder produces, you can decode straight into a wl_buffer's storage, and attach that wl_buffer to the wl_surface. No more copying or color conversions in the application. When the compositor gets the YUV buffer, it could use GLSL shaders to convert it into RGBA while it composites, or put the buffer into a hardware overlay directly. In the overlay case, the data produced by the (hardware) video decoder gets scanned out on the graphics chip zero-copy! After decoding, the data is not copied or converted even once, which is the optimal path. Of course, in practice there are many implementation details to get right before reaching the optimal path.
AtomicityUpdates to one wl_surface are made atomic with the commit request. A tree of sub-surfaces needs to be updated atomically, too. This is important especially in resizing a window.
A sub-surface's commit request acts specially, when the sub-surface is in synchronized mode. A commit on the sub-wl_surface does not immediately apply the pending surface state, but instead the pending state is cached. The cache is just another copy of the surface state, in addition to the pending and current sets of state. The cached state gets applied when the parent wl_surface gets new state applied (Note: not straight on the parent surface's commit, but when it gets new state applied.) Relying on the cache mechanism, an application can submit new state for the whole tree of surfaces, and then apply it all with a single request: commit on the main surface.
Input handling considerationsWhen a window has sub-surfaces completely overlapping with its main surface, it is often easiest to set the input region of all sub-surfaces to empty. This will cause all input events to be reported on the main surface, and in the main surface coordinates. Otherwise the input events on a sub-surface are reported in the sub-surface's coordinates.
Independent application sub-modulesA use case than was strongly affecting the design of the sub-surface protocol was application plugin level embedding. An application creates a wl_surface, turns it into a sub-surface, and gives control of that wl_surface to a sub-module or a plugin.
Let us say the plugin is a video sink running in its own thread, and the host application is a Web browser. The browser initializes the video sink and gives it the wl_surface to play on. The video sink decodes the video and pushes frames to the wl_surface. To avoid waking up the browser for every video frame and requiring it to commit on its main surface to let each video frame become visible, the browser can set the sub-surface to desynchronized mode. In desynchronized mode, commits on the sub-surface apply the pending state directly, just like without the sub-surface role. The video sink can run on its own. The browser is still able to control the sub-surface's position on the main surface, glitch-free.
However, resizing gets more complicated, which was also a cause for some criticism. When the browser decides it needs to resize the sub-surface the video sink is using, it sets the sub-surface to synchronized mode temporarily, which means the video on screen stops updating, as all surface state updates now go into the cache. Then the browser signals the new size to the video sink, and the sink acknowledges when it has committed the first buffer with the new size. In the mean time, the browser has repainted its other window parts as needed, and then commits on its main surface. This produces an atomic window update on screen. Finally the browser sets the sub-surface back to the free-running mode. If all goes fast, the result is a glitch-free resize without missing a frame. If things take time, the user still sees a window resize without any flickers, but the video content may freeze for a moment.
Multiple input handlersIt is possible that sub-modules want to handle input on their wl_surfaces, which happen to be sub-surfaces. Sub-modules may even create new wl_surfaces, regardless whether they will be part of the sub-surface tree of a window or not. In such cases, there are a couple of catches.
The first catch is, that when input focus moves to a sub-surface, the input events are given in that surfaces coordinates, like said before.
The bigger catch is how input actually targets surfaces in the client side code. Actual input events for keyboards and pointer devices do not carry the target wl_surface as a parameter. The targeted surface is given by enter events, wl_pointer.enter(surface) for instance. In C code, it means a callback with the following signature gets called:
void pointer_enter(void *data, struct wl_pointer *wl_pointer, uint32_t serial, struct wl_surface *surface, wl_fixed_t surface_x, wl_fixed_t surface_y)You get a struct wl_surface* saying which surface the following pointer events will target. I assume, that toolkits will call wl_surface_get_user_data(surface) to get a pointer to their internal structure, and then continue with that.
What if the wl_surface is not created by the toolkit to begin with? What if the surface was created by a sub-module, or a sub-module unexpectedly set a non-empty input region on a sub-surface? Then, get_user_data will give you a pointer which points to something else that you thought, and the application likely crashes.
When a toolkit gets an enter event for a surface it does not know about, it must not try to use the user_data pointer. I see two obvious ways to detect such surfaces: maintain a hash table of known wl_surface pointers, or use a magic value in the beginning of the struct used as user_data. Neither is nice, but I do not see a way around it, and this is not limited to sub-surfaces or sub-sub-surfaces. Enter events may refer to any wl_surface objects created through the Wayland connection.
Therefore I would propose the following:
- Always be prepared to receive an unknown wl_surface on enter and similar events.
- When writing sub-modules and plugin interfaces, specify whether input is allowed, and whose responsibility is to set the input region to empty.
Out of scopeWhen I started designing the sub-surface protocol, a huge question was what to leave out of it. The following are not provided by sub-surfaces:
- Embedding content from other Wayland clients. The sub-surface extension does not implement any "foreign surface" interfaces, or anything like what X allows by just taking the Window XID and passing it to another client to use. The current consensus seems to be that this should be solved by implementing a mini-compositor in the hosting application.
- Clipping or scaling. The buffer you attach to a sub-surface will decide the size of the sub-surface. There is another extension coming for clipping and scaling.
- Any kind of message passing between application components. That is better solved in application specific ways.
SummarySub-surfaces are intended for special cases, where you need to build a window from several buffers that are composited together, to make efficient use of the hardware resources. They are not meant for widgets in general, nor for pushing parts of application rendering to the compositor. Sub-surfaces are also not meant for things that are not integral parts of a window, like tooltips, menus, or drop-down boxes. These "transient" surface types should be offered by the shell protocol.
Thanks to Collabora, reviewers on wayland-devel@ and in IRC, my work colleagues, and everyone who has helped me with this. Special thanks to Giulio Camuffo for testing the decorations in 4 sub-surfaces use case. I hope I didn't forget anyone.