The Home of Web Automation

How Selenium Works - High Level Architecture

How Selenium Works - High Level Architecture

Selenium WebDriver undoubtedly remains the king/queen of web automation and testing. Tooling, frameworks, community, job postings, millions of people interacting with it and the rest of the ecosystem is resting on the shoulders of a project that started back at 2004. That proves that powered by the effort of all those people maintaining it and contributing to it, it is that good on what aims to do.

Even with how lean the introduction is, independently if you are working currently with Selenium or not, it stands to reason that taking a peek on what the architecture of this tool formed out to be will only be to your benefit. Enjoy the view.

The High Level

First off, most people that are using Selenium directly, by directly meaning a client library, are writting scripts, using their favorite language, that describe specific steps and interactions they want to simulate on a real browser instance. That could be from navigating to pages, clicking elements to submitting forms and scrapping content without permission.

WebDriver driver = new ChromeDriver();					

driver.get("https://www.selenium.dev/projects/"); // Open this URL in the browser instance

WebElement searchElement = driver.findElement(By.name("search")); // Find the hipster search of this website
searchElement.sendKeys("firefox"); // Fill the input with the search term

This above is just a simple example of code in Java that QA engineers are used to writting for their day to day responsibilities. You run this script (with the rest of the boilerplate) and you get a Chrome window on the Selenium website, with the "firefox" term on the search field, straightforward. The exact same API we used is exposed in different language bindings like JavaScript, Python and Ruby.

For some individuals though, the process and the parts that these commands have to go through to arrive at the browser and how they operate together is kind of a blurry picture.

We can start clearing the blur with the illustration below:

Selenium architecture

Some of the components might seem familiar, others not so much. We are gonna break them down one by one so as to connect the dots and fill any potential bumps in your understanding of Selenium WebDriver.

The Components

  1. 1)Client Libraries
  2. Client libraries or otherwise known as language bindings, are libraries that allow developers to use the Selenium high level API in their language of choice. Without knowing all the details, usually what "client libraries" provide, is wrappers over some transport mechanism to communicate with a fixed set of endpoints. Think of them as just a thin layer that most of the times translates HTTP handling from one language to another.

    This seems to be the case at least with the JavaScript wrapper of WebDriver.

  3. 2)JsonWire Protocol
  4. As mentioned in (1) client libraries use a specific HTTP API, kinda rest-ish as David Burns say, to communicate their desired commands to the browser driver that we will speak about in a moment.

    In order to prevent chaos, pains and expected deviances from client library authors and maintainers, a system needs to define a standarized way to communicate with the outside world. The decision for WebDriver, to communicate as uniformly as possible with browser drivers, was the implementation of the JsonWire protocol and as you can imagine by now, it is JSON over HTTP. The specification for this API endpoints can be found here.

    To see how straightforward it can be, here is a small example:

    /* 
    * Send the command to the session with id=sessionId
    * to initiate a navigation to the "url" body parameter
    */


    POST /session/:sessionId/url
    {
    "url": "https://example.com"}
    }

    Going to the JsonWire specification page from GitHub, you must have noticed the OBSOLETE warning, but do not falter. It does not mean that it is an abandoned project or that the information there is irrelevant, on the contrary it has taken a big step forward. As widely used as this protocol came to be, with all the huge ecosystem of tech and people behind it, managed to become an actual Working Draft of the W3C standard.

    That fact makes it an official protocol that user agents aim to implement and conform, in order for programs to remotely instruct the behaviour of web browsers. A huge success and recognition for the WebDriver project in my opinion.

  5. 3)Browser Driver
  6. The browser driver can be considered the "backend" of the WebDriver architecture. It acts as the end host of the JsonWire protocol accepting commands in the form we shown earlier and responding in kind with values, errors, stats or anything that the protocol defines. On one side it actually works exactly like a regular standalone HTTP server for the client libraries. On the other end, it is using its own HTTP API to communicate, send commands and receive responses from the actual browser instance as it instruments its behaviour.

    As you might expect or have noticed while trying to use Selenium, there are specific drivers for different browsers that implement the meaty part of actually sending the commands that retrieved from the client libraries to the browser instance. For example there is the GeckoDriver that supports Firefox, the ChromeDriver that supports Chrome/Chromium, the OperaDriver for Opera and more.

    Each of these browser drivers uses its own way and implementation to communicate with the actual browser instance. Some examples:

  7. ChromeDriver uses the DevTools Protocol.
  8. The GeckoDriver uses the Marionette remote protocol.
  9. The OperaDriver (for Opera>=26) is based on ChromeDriver with some adaptations.
  10. As you can understand some of the drivers may not be open sourced by the maintaining companies but I am confident you can find more information if you look through the pages and documentation for each one.

  11. 4)Browser
  12. The actual browser instance that will receive the commands for its respective browser driver, and finally simulate the web automation task we planned all along.

Closing

That was all about the close look we attempted at the high level architectural components that make up the 'Selenium Architecture' as I see it. Hope you enjoyed it as much as I did while I was looking into how all the actors come together and how much work and dilligence is put in the development and maintenance of each project.

If you enjoyed the article and you want to support me so that I keep the content coming...
Buy me a coffeeBuy me a coffee