Should Have Used Ada (SHUA) #2 - Polkit gets "confused"

(Part 1)

Richard Wai

Recently, a serious vulnerability appeared in the freedesktop's "polkit" D-Bus service, which is used extensively by other services to authenticate requests received over D-Bus. A common example of such a service is the controversial but all-powerful systemd, via the associated systemctl cli.

On the surface, the vulnerability bares the hallmarks of a classic integer overflow, however further analysis reveals something more interesting: an active limitation enforced by glib (as opposed to a simple C type-cast truncation/overflow one might expect). The end result is that on systems with 32-bit unsigned uid values (which is most of them), a regular user with a sufficiently high UID can steamroll right through polkit, causing it to erroneously authenticate the user as root, without even considering an authentication challenge. Any number of privileged services may rely on polkit for authentication. In the case of systemd, the magic user is able to arbitrarily execute any systemctl command with total impunity.

This vulnerability, its misleading symptoms, and the band-aid patch that barely addresses it, are all excellent lessons in the importance of maintainability and readability in any large software project. While Ada has so far found its prestige in safety-critical software, it has always been designed as a general-purpose language. Yet, its unique focus on safety, readability, maintainability, and disciplined encapsulation offers an underappreciated benefit to mainstream software, and especially to open source projects. If polkit was originally implemented in Ada, this vulnerability would have been excluded by virtue of Ada's unique type system alone. Indeed, this kind of vulnerability is exactly the kind of frequent software failure that the Ada designers meticulously seek to eliminate through language design, rather than reliance on the programmer getting it right. But that is not the only benefit - keeping the codebase up-to-date with changing environments, such as the representation of a unix user ID, or changes to the predefined D-Bus services, becomes trivial and safe in Ada. Perhaps the most significant benefit, however, is the dramatic improvement to readability and encapsulation. When open source projects want to encourage contributors, what better way is there than to present a read-friendly codebase that is also well encapsulated, so that new contributors can make focused changes with confidence that they won't be destabilizing the larger project.

It is telling that this bug was not found through anything like code review or auditing, or even testing - it was found through a runtime anomaly reported by a user. As our analysis below should make clear, the polkit codebase is a very common but frustrating example of unwieldy design, write-only coding, and a maintenance-ignorant approach to development. Worse still, the continual choice of grotesquely unsafe languages like C and C++ helps neither maintainability, nor long-term stability.

Unfortunately, Ada was not selected for the implementation of polkit. Instead, C is the language of choice (because of course). Mixing C with a large project mostly maintained by volunteers unsurprisingly leads to a difficult to maintain mountain of interdependent modules, which very few maintainers (if any) will understand in its entirety. C does not provide any good way to protect dependent modules from changes in any one module's implementation. Ada, on the other hand, was specifically designed to facilitate large-scale, distributed development through language-enforced discipline (packages) and intuitive encapsulation (private types). C, meanwhile, allows anyone to step on everyone's toes on a whim, and often without reasonable blame. It is no surprise that critical fixes materialize as overly conservative band-aids, in lieu of broad-based breaking changes that would otherwise be needed to "get it right" (which is generally trivial in Ada). And let's be honest: no one is going to prioritize fixing patches to get them right if doing so only threatens the stability of the codebase. Thus the band-aid will likely remain a band-aid until the next band-aid further obscures the ever festering wad of those piled beneath it.

I'm not confused; you're confused!

The patch for this vulnerability, though quick and immediately effective, is unabashedly a classic band-aid, according to the author:

When a user or group above INT32_MAX is created, the numeric uid or gid wraps around to negative when the value is assigned to gint, and polkit gets confused. Let's refuse such uids and gids.

This patch just refuses to initialize uid and gid values to negative. A nicer fix is to change the underlying type to e.g. gint64 to allow the full range of values in uid_t and gid_t to be represented. But this cannot be done without breaking the API, so likely new functions will have to be added (a polkit_unix_user_new variant that takes a gint64, and the same for _group_new, _set_uid, _get_uid, _set_gid, _get_gid, etc.). This will require a bigger patch.

This is a dangerous pattern that is apt to repeat itself. When a codebase is difficult to understand and suffers from decaying readability as a consequence of a constant stream of band-aid patches, vulnerabilities only become more likely, and increasingly hard to identify. As more such vulnerabilities are suddenly discovered after unknown periods as possible zero-day exploits, there is an immense time pressure to close the hole as quickly as possible. Typically this means another indirect patch, which makes the codebase even less readable, and more complex. Ultimately the result is a runaway process where unreadable code becomes more unreadable as more band-aid fixes are continually rushed in, often obfuscating the underlying cause, which is less and less likely to be understood by the author of the next patch. To add insult to injury, the codebase becomes increasingly unstable, as continued indirect patching leads to ever more complicated (fragile) interactions. The more complex the interactions are between components of the codebase, the more prone the project becomes to unexpected results, incorrect assumptions, and hidden dangers for future maintainers.

Polkit generally, and this patch particularly, are great examples of this process in action. The patch does nothing to enable the correct behavior (if 4000000000 is a valid uid for the system, it should be for polkit). Instead the patch simply denies any values outside of polkit's native (but totally inappropriate) signed representation (a "gint") of a uid value.

The author proposes that a "better approach" might be to change the internal uid representation to a 64-bit signed number, but that this "cannot be done without breaking the API". This is interesting because it is also a band-aid, and an ugly one at that - the value coming in is an unsigned 32-bit value, so by type-casting it to a 64-bit signed, we avoid an integer overflow. But what would happen if Linux started to adopt 64-bit (unsigned) uids? Unlikely, but not unthinkable.

Walking through walls

Here is an original posting of the issue and how to reproduce it. We see a user set up with an id of 4000000000, which is a run-of-the-mill unprivileged user. This user then asks systemd, via systemctl, to stop the ssh server.

$ id
uid=4000000000(someuser) gid=100(users) groups=100(users)
$ systemctl stop sshd.service
(pkttyagent:3342): GLib-GObject-WARNING **: 13:28:53.802: value "-294967296" of type 'gint' is invalid or out of range for property 'uid' of type 'gint' **
ERROR:pkttyagent.c:156:main: assertion failed: (polkit_unix_process_get_uid (POLKIT_UNIX_PROCESS (subject)) >= 0)
$ systemctl is-active sshd.service
inactive

Looking at the output of a vulnerable polkit's pkttyagent, one may be quick to suspect either pkttyagent itself, or perhaps some kind of glitch in the interaction between polkitd and pkttyagent. But in fact, authentication as root is inevitable from the very moment systemd asks polkit to authenticate the "subject" (systemctl running under the magic user). Once this process starts, it cannot be stopped. Even if the agent is terminated, or there is no agent at all, polkit will have already decided that the user is root before it even looks for one.

Agent Smith has Déjà vu

When systemctl is invoked, and asked to execute a privileged command (stop in this case), it defers to a configurable subsystem for authentication. In most mainstream Linux distributions, systemd (and hence systemctl) is configured at build-time to handle this through polkit, and thus systemctl invokes a suitable polkit "authentication agent" (pkttyagent in this case), as a "fallback" agent in case an existing authentication agent is not already registered for the subject's D-Bus session. If an agent is already registered for the subject's session, that will be used instead. You can see this in action by attempting to invoke a privileged systemctl command from a terminal window in a graphical environment. Doing so typically invokes a pop-up window for authentication, outside the terminal. If no other authentication agent is registered for the session, pkttyagent will be ready and waiting for the job over the terminal itself.

Regardless of any existing authentication agents that may already exist on the session, systemctl always forks and execs pkttyagent, giving us a somewhat misleading sense that systemctl or systemd ignored the death of pkttyagent. In fact, in normal cases, pkttyagent dying should not be a problem. If polkit decides that the an authentication challenge is needed, and finds that there is no authentication agent on the subject's end, authentication will simply fail.

If we modify pkttyagent to kill itself immediately, and then invoke systemctl stop with a more typical unprivileged user (say with a uid in the 2,000 - 3,000 range) over ssh (forcing pkttyagent to be the only agent on the session), we get this:

$ systemctl stop sshd.service
Failed to stop sshd.service: Interactive authentication required.
See system logs and 'systemctl status sshd.service' for details.
$ systemctl is-active sshd.service
active

Polkit rightly notices that there is no authentication agent available, and declines to authenticate the subject.

Yet the exact same modified pkttyagent, again over ssh, but with our magic user (uid = 4000000000), looks like this:

$ systemctl is-active sshd.service
active
$ systemctl stop sshd.service
$ systemctl is-active sshd.service
inactive

Not a peep. Polkit let us right through. It didn't even care that there was no agent at all. It didn't seem to think the usual authentication challenge was necessary..

Interesting.

Questionable authority

At the heart of polkitd authentication lies a number of pluggable "backend authorities". The backend authorities are modules that allow logic of arbitrary complexity to decide if any given "subject" (e.g. systemctl) is authenticated for interaction with the "caller" (e.g. systemd), at request of the caller, over D-Bus. For most off-the-shelf deployments (such as most Linux distributions), the generic "interactive authority" module, included with polkit, takes charge of managing authentication. The job of the interactive authority is to decide if the subject is: already authenticated; authenticated by default (root); or, due for an authentication challenge. If the interactive authority decides a challenge is required, it attempts to collect an interactive challenge response (typically a password) from an available authentication agent (such as pkttyagent) attached to the subject's D-Bus session.

When polkitd is asked to authenticate a subject on behalf of a caller, it invokes the relevant "backend authority" via its check_authorization interface method (polkit_backend_ interactive_authority_ check_authorization in the case of the interactive authority). This method receives a handle for the subject and the caller, and is responsible for carrying out the actual authorization process.

One of the first actions taken by polkit_backend_ interactive_authority_ check_authorization is to collect the local user information for both the caller, and perhaps more problematically, the subject, via a call to polkit_backend_session_monitor_ get_user_for_subject:

user_of_subject = polkit_backend_session_monitor_ get_user_for_subject
                     (priv->session_monitor,
                      subject, &user_of_subject_matches,
                      &error);

polkit_backend_session_monitor_ get_user_for_subject's job, as the name implies, is to determine the user behind the subject's session (i.e. the user making the systemctl request). For our magic user, this call will ultimately misidentify the subject user as root.

Unfortunately, this is a red herring. It turns-out that this initial call to polkit_backend_session_monitor_ get_user_for_subject is not actually used to authenticate subject, it is only there for a for some sanity checks (which pass). The real action begins at the later call to the aptly named check_authorization_sync:

result = check_authorization_sync (authority,
                                   caller,
                                   subject,
                                   action_id,
                                   details,
                                   flags,
                                   &implicit_authorization,
                                   FALSE, /* checking_imply */
                                   &error);

Down the rabbit hole

Check_authorization_sync's job is to make the determination ("synchronously") on the current authorization state of the subject: already authorized; not authorized (denied); or, in need of a challenge (potentially authorized). It ends up making its own call to polkit_backend_session_monitor_ get_user_for_subject, to exactly the same erroneous determination (user of subject is root):

/* every subject has a user; this is supplied by the client, so we rely
 * on the caller to validate its acceptability. */
user_of_subject = polkit_backend_session_monitor_ get_user_for_subject
                     (priv->session_monitor,
                      subject, NULL,
                      error);

But why? Why does polkit_backend_session_monitor_ get_user_for_subject say the magic user is root, when it definitely isn't? The function's job is to return a "PolkitIdentity" glib object that is derived from a few possible sources depending on what form the "subject" takes. The subject will be one of: a unix process; a D-Bus name; or, a D-Bus session. In the case of systemctl, polkit_backend_session_monitor_ get_user_for_subject determines the subject to be a D-Bus name, i.e. POLKIT_IS_SYSTEM_BUS_NAME will evaluate TRUE for subject. This makes sense, as we want to know the user at the other end of the request, which arrived over D-Bus.

else if (POLKIT_IS_SYSTEM_BUS_NAME (subject))
{
ret = (PolkitIdentity*)polkit_system_bus_name_get_user_sync (POLKIT_SYSTEM_BUS_NAME (subject), NULL, error);
matches = TRUE;
}

The buck then gets passed down into polkit_system_bus_name_ get_user_sync, which is given a D-Bus bus name, and is trusted to return the user "identity" of the process who registered that bus name. Note that we have gone down a level of abstraction from PolkitIdentity to a bona-fide PolkitUnixUser. Looks like we’re getting warmer.

PolkitUnixUser *
polkit_system_bus_name_get_user_sync (PolkitSystemBusName  *system_bus_name,
                                      GCancellable         *cancellable,
                                      GError              **error)
{
PolkitUnixUser *ret = NULL;
guint32 uid;

g_return_val_if_fail (POLKIT_IS_SYSTEM_BUS_NAME (system_bus_name), NULL);
g_return_val_if_fail (cancellable == NULL || G_IS_CANCELLABLE (cancellable), NULL);
g_return_val_if_fail (error == NULL || *error == NULL, NULL);

if (!polkit_system_bus_name_get_creds_sync (system_bus_name, &uid, NULL,
                                            cancellable, error))
goto out;

ret = (PolkitUnixUser*)polkit_unix_user_new (uid);

 out:
return ret;
}

This is starting to get interesting. The function sets up a "guint32" value for uid, which is intriguing because the uid values elsewhere in polkit are always regular (signed) gint values. Though obviously problematic, there seems to be a clear motive for this. To obtain the uid associated with the bus name, polkit_system_bus_name_ get_creds_sync makes a call to the D-Bus interface org.freedesktop. DBus.GetConnectionUnixUser, which according to the D-Bus specification, returns the uid as a "UINT32". Which is not surprising, since this is often the actual type of uids at the system level, and also because negative uid values have never been legal in unix.

Unfortunately, as technically correct as a guint32 might be for the uid, it gets passed directly to polkit_unix_user_new, which if we have a quick look at the declaration, we find takes an argument of type gint!

/**
 * polkit_unix_user_new:
 * @uid: A UNIX user id.
 *
 * Creates a new #PolkitUnixUser object for @uid.
 * Returns: (transfer full): A #PolkitUnixUser object. Free with g_object_unref().
 */
PolkitIdentity *
polkit_unix_user_new (gint uid)

We don't even have a type-cast from guint32 to gint, not that it matters (besides supressing warnings). Obviously mismatched parameter types should be illegal (like they are in Ada), but they aren't in C. Besides, a C type-cast merely transfers bits, and totally disregards the actual value. Contemporarily, int will typically be 32-bit, so a type-cast would be a direct bit-for-bit copy anyways. With or without a explicit type-cast, the value going in can get mangled. For our magic user, that is exactly what happens.

Had this been written in Ada, it would either be illegal at compile time, or a run-time check would be inserted. More on that later.

Do you know who this is?

It was a long, convoluted trace, but we have finally arrived at ground zero. Our large guint32 uid is passed into polkit_unix_user_new, which expects a signed value that is almost always going to be the same bit-width. A classic integer overflow situation ensues, and we enter into the function with a negative uid. Don't worry, the hopelessly over-engineered bolt-on object-oriented C wonder library glib has us covered (see part 2 for more on this dumpster fire of a library).

Inside the polkit_unix_user_new function we find an incredibly simple dispatching operation:

PolkitIdentity *
polkit_unix_user_new (gint uid)
{
return POLKIT_IDENTITY (g_object_new (POLKIT_TYPE_UNIX_USER,
                                      "uid", uid,
                                      NULL));
}

This operation calls into glib to set-up a shiny new POLKIT_TYPE_UNIX_USER data structure (excuse me, "object"), to be initialized with the (negative) gint uid value that just came in. The glib designers were creative enough to build-in an entirely separate type system with their bolt-on object orientation. This type system allows these object oriented "types" (like POLKIT_TYPE_UNIX_USER) to be assigned any number of "properties" (values), which can only be assigned and retrieved through calls to user-defined "methods". Pretty typical object oriented paradigm, but why not just use C++?

The above call to g_object_new first allocates the object, and then runs the appropriate constructor - in this case"polkit_unix_user_class_init", which looks like this:

static void
polkit_unix_user_class_init (PolkitUnixUserClass *klass)
{
GObjectClass *gobject_class = G_OBJECT_CLASS (klass);

gobject_class->finalize = polkit_unix_user_finalize;
gobject_class->get_property = polkit_unix_user_get_property;
gobject_class->set_property = polkit_unix_user_set_property;

/**
 * PolkitUnixUser:uid:
 *
 * The UNIX user id.
 */
g_object_class_install_property (gobject_class,
                                 PROP_UID,
                                 g_param_spec_int ("uid",
                                                   "User ID",
                                                   "The UNIX user ID",
                                                   0,
                                                   G_MAXINT,
                                                   0,
                                                   G_PARAM_CONSTRUCT |
                                                   G_PARAM_READWRITE |
                                                   G_PARAM_STATIC_NAME |
                                                   G_PARAM_STATIC_BLURB |
                                                   G_PARAM_STATIC_NICK));
}

This constructor is actually responsible for registering the properties of this type, namely the uid property. The call to g_object_class_install_property defines the name and quasi type of this property. The name is PROP_UID, and the type is defined as a special integer parameter specification, supplied via a call to g_param_spec_int, which maps to a gint value (the actual type), with a specified range (thus a quasi-type). The glib documentation tell us what the above parameters are.

It would be nice to have parameter naming for this call to g_param_spec_int, as would be possible in Ada. Parameter naming is a great example of read-friendly code. While it adds some verbosity, no one can argue against its clear benefits to the uninitiated (or rusty) reader. It's hard to appreciate parameter naming as the writer, but the benefits are eminently obvious to maintainers, old hands or new hands. Software spends more time being maintained than written - so this approach makes a lot of sense.

Let's envision how a call to g_param_spec_int might look like in Ada, so that we can take a clear look at the different values chosen for the "PROP_UID" property of the PolkitUnixUser "type":

g_param_spec_int (name          => "uid",
                  nick          => "User ID",
                  blurb         => "The UNIX user ID",
                  minimum       => 0,
                  maximum       => G_MAXINT,
                  default_value => 0,
                  flags         => (G_PARAM_CONSTRUCT,
                                    G_PARAM_READWRITE,
                                    G_PARAM_STATIC_NAME,
                                    G_PARAM_STATIC_BLURB,
                                    G_PARAM_STATIC_NICK));

Those maximum and minimum parameters are key. It turns out that setting the "uid" property on a PolkitUnixUser will be silently limited to be in that range. If the uid property is set with a negative value (as it is in the case of our magic user), the glib library confidently and silently says "nope, too low", intervenes, and sets the uid to zero instead - root!

After running the constructor, the initial call to g_object_new then sets tries to set uid property to the given value, will then get limited if needed. Of course, our magic user gets "limited" to 0, and thus the "PolkitIdentity" returned will cause Polkit_Identity_Is_Root to evaluate to true. This PolkitIdentity object is dutifully passed all the way back to the backend authority, or more importantly to check_authorization_sync, which tries to determine if an authentication challenge is required for the subject. It finds that decision to be an easy one:

/* special case: uid 0, root, is _always_ authorized for anything */
if (identity_is_root_user (user_of_subject))
{
result = polkit_authorization_result_new (TRUE, FALSE, NULL);
goto out;
}

Special case indeed!

Again, it would be nice to have Ada parameter naming to make better sense of what the "authorization result" looks like. In Ada, it could look like this:

if Identity_Is_Root_User (User_Of_Subject) then
Result := Polkit_Authorization_Result_New (Is_Authorized => True,
                                           Is_Challenge  => False,
                                           PolkitDetails => null);
goto Cleanup_And_Exit;  -- out is a reserved word in Ada
end if;

So we see here that if check_authorization_sync determines the user to be root (uid 0), then it returns an instruction saying - yes this user is authorized, and you don't need to do an authentication challenge (no need to ask for a password!)

So back in polkit_backend_ interactive_authority_ check_authorization, we have the final check before approving the action - to see if check_authorization_sync determined that we need to do a challenge (and invoke the subject's authentication agent):

/* Caller is up for a challenge! With light sabers! Use an authentication agent if one exists... */
if (polkit_authorization_result_get_is_challenge (result) &&
    (flags & POLKIT_CHECK_AUTHORIZATION_FLAGS_ALLOW_USER_INTERACTION))
{
AuthenticationAgent *agent;

agent = get_authentication_agent_for_subject (interactive_authority, subject);

But of course since Is_Challenge is false (for the result of check_authorization_sync), no challenge happens, and the authentication result (Is_Authorized = True, Is_Challenege = False) is passed back up until eventually polkit tells systemd that the subject is authorized.

The end result is that no authentication agent is invoked at all, and the magic user is now fully authenticated and authorized (as root). This is why, even when pkttyagent terminates due to the failed assertions, it does absolutely nothing to prevent the erroneous authentication as root. Polkit doesn't care if there is an agent available at all.

Real talk.

There is no denying that these kinds of bugs are common. Thery are extremely hard to find with testing or by code review, and are probably more numerous that the software community would like to admit.

Of course, like most vulnerabilities of this magnitude, there is a confluence of factors involved. There are clearly issues with readability, and the ability to make changes properly and safely (more on that in part 2). Many of these issues stem from fundamental flaws of the chosen language. C is universally recognised to be unsafe, but worse still it is not designed in any sense to facilitate maintenance, large teams, or any kind of significant abstraction. Furthermore, many other popular languages consistently fail to recognise that humans make mistakes frequently. Many languages seem to take the position that the programmer "should know what they are doing", and "if they make this mistake, it is their fault".

Ada is unique because it was founded on the recognition that programming is a human activity. Ada is designed to facilitate the strengths of human creativity, through intuitive and powerful systems of abstraction, encapsulation, information hiding, and data modeling. Ada is designed to deal with the things that humans are so bad at doing, such as accidentally leaving out symbols (like C's '==' vs '='), spotting subtle differences (**ptr vs *ptr), or assuming that an incoming value is always going to be correct and valid (the root cause of most buffer overflows). Us humans are not likely to get better at these things, and worse still, we tend to overestimate our own skills and abilities to avoid these kinds of errors.

The genius of the Ada type system

The Ada type system is all about actual values, not representations. The Ada type system is designed to model data, rather than modeling storage in memory. The compiler handles the machine representation of types automatically, and the language rules ensure that values never magically change. This is obviously rational, but Ada is definitely in the minority for taking this approach. Most popular languages (including Rust), present types directly as machine types. Unless you are deliberately trying to manipulate flags in a register (Ada is much better than C at that anyways), it makes little sense to be trying to decide if a type should be 16-bit or 32-bit, signed or unsigned. In Ada, the programmer is encouraged to focus on what values are actually being stored (not how), and what those values mean in relation to others (5 degrees vs. 5 km/h).

Let's look at this polkit authentication disaster from an Ada perspective. We can take two approaches: one is where gint and guint32 are two distinct types; the other is where gint and guint32 are subtypes representing different constraints of the same type.

The first approach is the most rational. It doesn't really make sense to make gint and guint32 subtypes of the same type, since they are conceptually distinct (machine types with different representations). In Ada, we'd probably expect to have a uid type, with perhapse some constrained subtypes like "non_root_uid", with a range of 1 .. uid'Last. This would allow programming by contract where specific subprograms could exclude root uids, for example. However, in this example we are trying to make a more direct comparison between Ada and the typical C-like "everything is a machine type" world. We will revisit the powerful applications of Ada data modeling concepts in part two. The point here is to transliterate the C code directly to Ada, and show how Ada would still catch the bug anyways.

Since we know that gint is the same as C's standard int type , let's just define it as such:

with Interfaces.C;

subtype gint    is Interfaces.C.int;
type    guint32 is mod 2**32;

Here we've declared guint32 as a modular type, which has a range of 0 .. (2**32 - 1) - i.e. an unsigned 32-bit number. We could also have declared it as an integer with a finite range (type guint32 is range 0 .. 2**32 - 1). However, a modular type is a more "apples-to-apples" analogue to an unsigned integer in C. Using a finite range has benefits, as it excludes wrap-around overflow conditions.

Let's assume we try to do the same thing as in the actual polkit code, where the "actual parameter" (what is passed in by the caller) for uid is a guint32, while the "formal parameter" (what is expected by callee) is a gint:

function Polkit_Unix_User_New (uid: gint) return Polkit_Unix_User;

function Polkit_System_Bus_Name_Get_User_Sync (System_Bus_Name: in Polkit_System_Bus_Name;
                                               Cancellable: in out G_cancellable)
                                             return Polkit_Unix_User
is
uid: guint32;
begin
Polkit_System_Bus_Name_Get_Creds_Sync (…, uid, …);

-- …

return Polkit_Unix_User_New (uid);

exception
when others =>
return Null_Unix_User;
end Polkit_System_Bus_Name_Get_User_Sync;

This would be rejected at compile time. In Ada it is simply illegal to provide an actual parameter that is a different type from the formal (expected) parameter. gint and guint32 are two distinct types, and thus a call to Polkit_Unix_User_New with a guint32 value for the uid parameter is rejected at compile-time.

But what about a type-cast? Actual type-casts in Ada are deliberately cumbersome, and almost never used. They require a very inconvenient and conspicuous instantiation of a generic function "Ada.Unchecked_Conversion", and are intended for very specialised, rare, low-level operations. Ada does this because type-casts are incredibly dangerous, and should only ever be used where their behavior (bit-for-bit copying) is specifically need - which is very rare. By taking this approach, Ada also makes it easy to find any such instances of Unchecked_Conversion, and audit their use. Meanwhile in C, type-casts a way of life, and auditing their use is much less practical. They are also often responsible for causing very bad things. In this vulnerability, we even have a type-cast by default, which is extra bad.

Instead of normalizing type-casts like C does, Ada has the concept of type conversions instead. Type conversions are a foundational concept of Ada's exceptional type safety. In Ada, the binary representation is typically the responsibility of the compiler, while the language focuses on actual values. When assigning a value from one object to another, Ada always ensures that the actual value of the source is assigned, not the representation. Type conversions are about the conversion of values, and are totally agnostic to the underlying binary representation. This distinction is important, and is actually more intuitive than the behaviour of a type-cast. That's surely one reason type-casts cause so many problems.

Ada type conversions have an entire section dedicated to them in the Ada standard. Value conversion rules are precisely prescribed. In fact, the section on type conversions has no "Implementation Permissions" subclause, implementation-defined behaviour, representational dependencies, or any undefined behavior at all. That's what safety looks like.

We could attempt to make the above assignment legal by using a type conversion to convert (the value of) uid to the expected gint type at the point of call, like this:

return Polkit_Unix_User_New (gint (uid));

That would make the call legal in this case, but that is not the end of the story. Unlike C, the Ada compiler will not just assume that the programmer knows what they are doing. There are a significant set of rules that must be satisfied, and not all type conversions are legal. Though in this case we have a fairly vanilla conversion between two integer types, the Ada compiler will still look at this and ask itself: "are there any possible values of guint32 that would not be in the range of gint?" In this case answer will be "yes"!

If we consider a typical definition of C.int (a 32-bit representation) the actual Ada type definition of Interfaces.C.int might look like this:

type int is range -2_147_483_648 .. 2_147_483_647
  with Size => 32;

Our own definition of guint32 means that it has an effective range of 0 .. 4_294_967_295 (2^32 - 1). Alas, the compiler knows statically that there exists valid guint32 values which are not valid gint values (any value of uid > 2_147_483_647 - i.e. gint'Last). The Ada standard mandates that a run-time check must be inserted to check the value of uid at the point of the type conversion. If the compiler knew statically that there would never be a value of guint32 that would be invalid for gint, it can optimize away the check. If it knew that there would never be a valid value, it will probably complain and tell us to expect an exception at run-time, or reject the compilation entirely. In this case, if at run time, uid is greater than 2_147_483_647, a Constraint_Error exception is raised at the attempt to convert the value. In the case of this vulnerability, that is exactly what would have happened. Our magic uid of 4_000_000_000 is a valid value of guint32, but it is out of range the range of valid values for gint. Even if the underlying representation is valid for both, Ada always treats values in absolute terms - a positive value could never equate (or be converted to) a negative value.

Take a moment to consider how common these kinds of bugs are: overflows, improper type-casting, and positive numbers being erroneously interpreted as signed negative values. These kinds of bugs often come with serious consequences, as in this case. Such errors are basically impossible in Ada, unless you are very determined, or are too quick to disable runtime checks. While disabling runtime checks might make sense in some particularly constrained embedded systems, it would be silly to turn them off in something mainstream like polkit.

To be sure

How about the second option, where both guint32 and gint are subtypes of the same type? This approach is awkward, but it is somewhat more similar to the actual C code, and it again demonstrates that Ada would still catch the bug regardless. To start, we'll need a parent type that can handle the ranges of both subtypes. That basically means the parent type must be at least 64-bits wide, and signed (the patch author agrees):

type Signed_64 is range -2**63+1 .. +2**63-1;

subtype gint    is Signed_64 range -2**31+1 .. +2**31-1;
subtype guint32 is Signed_64 range 0 .. 2**32-1;

Notice how guint32 includes positive values that are greater than the maximum positive values of gint. Of course we know this, since an unsigned value can make use of the most significant bit to get a additional magnitude of values.

Now if we retry the same code as our first try above (without a type conversion):

return Polkit_Unix_User_New (uid);

This would be legal, and is more technically similar to what is happening in the actual C code. However, the Ada compiler will still check that the subtype for uid (guint32) is a subtype of the type as gint (which it is), and that the ranges of guint32 overlap that of gint (they do). Additionally, If the compiler was able to determine that the subtype of guint32 was completely within the range of gint, it could (and probably would) elect to omit a run-time check. However, in this case, there is a large range of values of guint32 that are not valid for gint. One of those values happens to be 4000000000 - the uid of our magic user. Knowing this, the Ada compiler is required to insert a run-time check here to ensure that the actual value of uid at the point of call is a valid value for the subtype of the formal parameter (a gint). And once again, our magic user would fail that check, and a Constraint_Error would be raised at the point of call to Polkit_Unix_User_New, and everything would end up the same as in the first approach.

What could have been

As we saw, if polkit was written in Ada, our magic user would have inevitably triggered a Constraint_Error exception. If polkit used SPARK Ada, the tools would have attempted (and failed) to actually prove that uid would never be outside the range of gint at this point (either with a type conversion, or between two subtypes of the same type). In either case, Polkit_Unix_User_New would never execute with a value that was different than the value passed to it, and would almost certainly result in the authentication simply failing for the magic user (that would be the rational response to an unhandled exception flying out of the authorization process). Better yet, if using SPARK, the issue would have been discovered during development!

Unfortunately, polkit was not written in Ada, and there is a lot more to say about this story. If we reflect on the original patch, we recall the author lamenting about not being able to make a more proper change out of fear of "breaking the API". Ada has something to say about this situation as well. In part two, we'll focus on how Ada, as a language choice, could have done a much better a job at implementing polkit in the first place. Perhaps more importantly, Ada's superiority in the realm of readability and encapsulation naturally leads to excellent long-term maintainability and stability. That last point is especially relevant to open source software.