Testing ehrQL autocomplete | Bennett Institute for Applied Data Science

This blog post is primarily aimed at a technical audience such as software developers.

When writing code, autocomplete (aka IntelliSense) is invaluable. It helps you write valid code, it allows you to easily discover the available methods and properties for an object, and it provides guidance on which parameters you can pass to a method. Most software developers would be lost without it.

OpenSAFELY researchers write code to answer their research questions. One part of that code is ehrQL, our query language for electronic health record data (see this blog and these docs for more info). We recently greatly improved the autocomplete features for ehrQL.

This is all well and good, but here at the Bennett Institute we like testing our code. We have a large suite of comprehensive tests - 3600 and counting - and changes to the ehrQL codebase cannot be merged until all the tests pass. It is therefore very helpful to have tests that confirm the expected autocomplete behaviour for various ehrQL elements. It gives us confidence that as we change the code in future, the autocomplete would still work as expected, rather than inadvertently breaking it and waiting for an end user to discover it.

But can’t you just use type hints?

ehrQL is built with Python, and although Python is a dynamically typed language, it has support for type hints which improve the ability of autocomplete engines to correctly guess what types things are. Once the autocomplete knows the type of a thing, it then knows all the possible attributes for that thing, and testing this behaviour would be less useful - we’d effectively be testing the autocomplete engine itself, rather than the expected behaviour of autocomplete in ehrQL. However, the problem we have is that many parts of ehrQL are written (with good reason) in such a way that things like column types are only determined at run time, so we need a way to check that the types at run time, are the same as what the autocomplete thinks.

As a more concrete example, we have a class in ehrQL called DatePatientSeries. It represents a column of dates with one row per person, and has various methods and properties, such as .year which converts all the dates to just the year (years are integers, so this becomes an IntPatientSeries), and is_after(another_date) which returns a column of Trues and Falses (a BoolPatientSeries). We use the DatePatientSeries type for the column date_of_birth on the patients table. If the autocomplete engine knows that date_of_birth is a DatePatientSeries, then it knows that if you type patients.date_of_birth. that things like year and is_after() are valid completions. However, because the type is only determined when the code is executed, we need a way to determine if a particular ehrQL object is considered the correct type by the autocomplete engine, without executing the code.

Fortunately, the bit of your favourite integrated development environment (IDE) that provides the autocomplete is typically a standalone piece of software, called a “language server”, that communicates with your IDE via a standard protocol. Whenever you are writing code in an editor, the IDE sends this information to the language server via JSON-RPC, and receives back the information to display to the user - whether that is the list of available methods and properties, the type of the currently hovered item, or the signature of the currently typed method. This means that we can spin up a language server, pass it chunks of ehrQL code, and confirm the expected types are returned. In VSCode the language server is called pyright, and because VSCode is our recommended IDE for researchers in OpenSAFELY, we use that for our tests.

So finally, here is an example of how we test some of the autocomplete features:

We maintain a file containing all statements where we have implemented autocomplete

Each statement in the file is followed by a structured comment showing the expected type for that statement e.g.

patients.date_of_birth  ## type:DatePatientSeries
patients.date_of_birth.year  ## type:IntPatientSeries
patients.age_on("2025-01-01")  ##type:IntPatientSeries

We then have a Python test that for each line in the file
- Parses the structured comment to get the expected type
- Passes the code to the language server to get the actual autocomplete type
- Checks that they’re the same

This works for existing features, but what if we add something new? For that we use a pattern that’s common in many of the tests in our test suite:

Confirm that all the things we currently know about are well tested
Ensure that if new things are added, a test will fail until those new things are either tested, or explicitly ignored in some way

To finish the autocomplete tests, we parse various files within our codebase to produce a comprehensive list of all table columns, properties and methods. We can then check that each of these is either included in the test file, or is in a list of ignored items. Therefore, if someone adds anything new to ehrQL, our tests will fail until that thing is added to our autocomplete tests, or explicitly ignored. This gives us the confidence that autocomplete works as we expect, and that we’re unlikely to break it. There is also the added benefit that the list of ignored things keeps track of the few remaining bits and pieces of ehrQL where we haven’t implemented autocomplete, and so we know what to work on if we want to make it even better.