I was replying to an email, and got side-tracked into writing some Wayland anti-FUD. There are lots of myths about Wayland out there, so I thought to better make it into a blog post.
This post is about the very small overhead of a Wayland (system) compositor, and why Wayland over network will be much better than X-over-ssh.
I predict that on desktops and other systems that may have accounts for more than one person, there will actually be two Wayland compositors stacked. There is a system compositor at the bottom, handling fast user switching, replacing VT switching, etc., and then a session compositor that actually provides the desktop environment. This is not my idea, it has been written in the Wayland FAQ under "Is Wayland replacing the X server?" for a long time.
My point is: Wayland compositors will not make 3D games suck because of compositing. While explaining why, I also continue to explaining why network transparency will not suck either. Now, do not mix up these things, I am not claiming that remoting 3D games over network will magically become feasible.
The overhead of adding a system compositor in the Wayland stack will be very small. A system compositor normally does not do any real work for compositing, it only takes the buffer handle from a client compositor, and flips it onto the screen. No rendering and no image copying involved in the system compositor.
It is the same with a full-screen game vs. any Wayland compositor: the compositor will not do any real work. A game renders its image into a buffer, passes the buffer handle to the compositor, and the compositor tells the hardware to scan out the buffer. No extra copying, no extra rendering.
The overhead that will appear with adding a system compositor, is relaying input events and buffer flips. The amount of data is small, and at least buffer flips will happen at most once per vertical refresh per monitor. There is also the idea of relaying input events only once per frame. This means that CPU process context switches will increase only by few per frame, when adding another compositor in the stack. Ideally the increase is 2 per frame: a switch to system compositor, system compositor handles input and output, and a switch back.
The overhead can be this small, because the protocol has been designed to avoid round-trips. A round-trip means that one process is waiting for another to reply before it can continue. The protocol also favors batching: accumulate a bunch of data, and then send as a batch. Both of these principles minimize the number CPU process context switches.
Because of these design principles, no Wayland developer is worried about the performance of a possible network transparency layer. Minimizing CPU context switches translates directly to minimizing the effect of network latency. Some believe, that even a simple Wayland network transport which practically just relays the Wayland protocol messages as is, and adds transferring of buffer data, will clearly outperform the traditional X-over-ssh.
Now, if you still claim that X-over-ssh would be better, you a) underestimate the effect of latency, and b) forget that modern applications do not send small rendering commands through the X protocol like 10-20 years ago. Modern applications render their content client-side, and send images to the X server. Wayland simply makes images the only way to send content to the server, allowing to drop the whole rendering machinery from the server and avoiding a huge amount of protocol.