11 May 2012

Wayland anti-FUD

I was replying to an email, and got side-tracked into writing some Wayland anti-FUD. There are lots of myths about Wayland out there, so I thought to better make it into a blog post.

This post is about the very small overhead of a Wayland (system) compositor, and why Wayland over network will be much better than X-over-ssh.

I predict that on desktops and other systems that may have accounts for more than one person, there will actually be two Wayland compositors stacked. There is a system compositor at the bottom, handling fast user switching, replacing VT switching, etc., and then a session compositor that actually provides the desktop environment. This is not my idea, it has been written in the Wayland FAQ under "Is Wayland replacing the X server?" for a long time.

My point is: Wayland compositors will not make 3D games suck because of compositing. While explaining why, I also continue to explaining why network transparency will not suck either. Now, do not mix up these things, I am not claiming that remoting 3D games over network will magically become feasible.

The overhead of adding a system compositor in the Wayland stack will be very small. A system compositor normally does not do any real work for compositing, it only takes the buffer handle from a client compositor, and flips it onto the screen. No rendering and no image copying involved in the system compositor.

It is the same with a full-screen game vs. any Wayland compositor: the compositor will not do any real work. A game renders its image into a buffer, passes the buffer handle to the compositor, and the compositor tells the hardware to scan out the buffer. No extra copying, no extra rendering.

The overhead that will appear with adding a system compositor, is relaying input events and buffer flips. The amount of data is small, and at least buffer flips will happen at most once per vertical refresh per monitor. There is also the idea of relaying input events only once per frame. This means that CPU process context switches will increase only by few per frame, when adding another compositor in the stack. Ideally the increase is 2 per frame: a switch to system compositor, system compositor handles input and output, and a switch back.

The overhead can be this small, because the protocol has been designed to avoid round-trips. A round-trip means that one process is waiting for another to reply before it can continue. The protocol also favors batching: accumulate a bunch of data, and then send as a batch. Both of these principles minimize the number CPU process context switches.

Because of these design principles, no Wayland developer is worried about the performance of a possible network transparency layer. Minimizing CPU context switches translates directly to minimizing the effect of network latency. Some believe, that even a simple Wayland network transport which practically just relays the Wayland protocol messages as is, and adds transferring of buffer data, will clearly outperform the traditional X-over-ssh.

Now, if you still claim that X-over-ssh would be better, you a) underestimate the effect of latency, and b) forget that modern applications do not send small rendering commands through the X protocol like 10-20 years ago. Modern applications render their content client-side, and send images to the X server. Wayland simply makes images the only way to send content to the server, allowing to drop the whole rendering machinery from the server and avoiding a huge amount of protocol.


Mathias said...

Is support for network transparency actually planned for Wayland, or are you just speaking theoretically ?

pq said...

Mathias, I have no doubt it will be done, but all main developers have more important issues to solve and implement first.

pq said...

datenwolf, I got your private commentary, but I could not find any way to get back to you. If you want a personal reply, throw me an email.

Scot said...

I think the FUD around the network layer has almost nothing to do with concerns over it's efficiency. The FUD mostly comes from concern over whether or not it will ever actually exist. X-over-ssh works now, works well and when bandwidth is an issue then NoMachine's NX protocol works well as a supplement (unless you're running Gnome 3/Unity).

Having no doubt that it will be done won't allay FUD the way a running "Wayland Network Protocol" project would.

aaron said...

Do you mean to say Wayland will indeed have its own remoting w/ security, or will it be something that can be tunneled through SSH as well? My concern would be security not performance.

pq said...

Scot, yeah, I agree. But getting the core protocol up to 1.0 is the priority right now.

Aaron, it depends. The first attempts will likely be tunneled in ssh. Would it be useful enough to write a new secure protocol, I don't know. Maybe some later network transports will, others don't and use ssh or openssl pipes or something. Anyway, whatever is done, will not affect the Wayland protocol.

You can be sure, that security is in the developers minds all the time. Also, I would not say that Wayland (protocol or libwayland) will have remoting. It can be implemented in compositors (or another shared library), and clients could connect to an automatically launched local proxy-server, but it won't have anything to do with the core Wayland.

Unknown said...

I use X-over-SSH frequently, running such programs as OpenOffice and Thunar on my home server. Are these sorts of programs already sending images instead of draw calls? I would not have guessed that.

A reasonable alternative to X-over-SSH might be, say, GTK-over-SSH or QT-over-SSH. Is this possibility being pursued?

solenskiner said...

>There is also the idea of relaying input events only once per frame.

Once per application frame, or screen frame?

Gamers often disable vertical sync to increase their fps. This also have the effect that the frame they see have more recent input data and less input lag. If Wayland only relay inputs once per screen sync it will turn expensive 1000hz mice into 60hz mice and increase input lag by up to 16ms. This is a big problem.

pq said...

Unknown; GTK, Qt and EFL use images, OpenOffice I don't know about. I've never heard of any toolkit-over-ssh. Note, that latency is what hurts remote X, so if you're on a low latency network, it might not be so bad.

solenskiner, yeah, I bet someone will start yelling about that. There's also lots of myths around the issue, and as not a hardcore-fps-gamer myself I have hard time believing some of that stuff. Anyway, I have wondered about input latency myself, I guess we will have to see. Changing when input is sent will not break the protocol, so we can adjust that later if needed. Also, I haven't seen what the latest plan with "raw input events" (e.g. motion unaccelerated, most likely wanted by games etc.) is, maybe those could be sent without delay when subscribed.

Ole Laursen said...

Regarding latency, there is a related issue when you connect a MIDI keyboard to a software sound synthesizer (also called virtual instruments).

There's some latency from USB, from the sound synthesis itself, and from the sound card and driver, maybe in the order of 3-25 ms. Now, I don't know if these events would go via Wayland, but if they do, 16 ms extra from the windowing system is a big problem. I saw a study somewhere, can't recall where, but I think it was at around 40 ms latency that two people playing serious music together start having trouble coordinating.

Anyway, I'm a bit curious you're aiming for 1.0 of the protocol without actual usage feedback in some areas, but I guess the spec is flexible enough you can dump some things and move on if it doesn't pan out?

pq said...

Ole, audio won't go through Wayland, and I don't see any benefit in passing MIDI keyboard events through Wayland, either, only downsides like you pointed out.

IIRC the plan is to have 0.9x releases for feedback when the protocol looks good to developers, and then later make 1.0 to set the core protocol in stone.

beroal said...

I would second what solenskiner and Ole Laursen said. It is even more severe on cheap hardware. For a frame rate of 60 Hz, the frame period is 17e-3 second. A shooter with a delay of 100e-3 second is useless, so every saving counts. A solution can be as follows. A Wayland server can accumulate events for a smaller period, for example, 1e-4 second. I see no need to tie this period to frames.

Post a Comment