There are many misconceptions about Wayland, and I want to try to correct one. Let's start with the statement:
There is no object in the Wayland protocol that corresponds to Window in X.
Surprised? We need to take a step back to explain what that really means, and I will do it with the help of an example of a complex application: Firefox.
Take a Firefox instance, one window with several tabs open. The active tab contains a flash video running.
Notice, what I called as a "window". It is the window from the end user's perspective, it is "the Firefox" on screen. If there was a text terminal window open, too, that would be another "window". Let's call "the Firefox" an application window.
In X (I imagine, but I do not know if this is really the case) the application window would be an instance of the Xlib data type Window. This Window contains lots of child Windows, for example: every button could be a separate Window. Every scrollable text box could be a separate Window, that is only partially visible. Every tab is probably a Window, too. The hierarchy, the relationships, and event masks are all sent to the X server, so the X server can dispatch input events to the right Windows, and render all those Windows on screen.
Wayland is nothing like that.
Notice how I said the X server renders stuff? A Wayland compositor does not render that stuff. Wayland is not a rendering API.
In Wayland, the whole application window is a wl_surface. Period. There are no child surfaces (except perhaps menus, but those are an exception, because they can extend beyond the application window). All rendering must happen client-side. The application will acquire a new wl_buffer, render everything into it, including even the video image and window decorations, and then do a single attach request to get the updated application window contents on screen, in one go, without flicker. Yes, that really must be done for every video frame playing (you can cache the non-changing graphics client side to avoid full redraws).
The simple statement, that wl_surface corresponds to a window, is roughly correct, when one means an application window. There is nothing like an Xlib Window, except if you invent one in your toolkit, and then it will be specific to your toolkit, and never transmitted over the Wayland protocol.