Algorithmic bias: important topic, problematic term

Recently, I engaged in a discussion within the Expert Group on Data Ethics on the pros and cons of the term “algorithmic bias”, which describes the fact that certain people groups might be discriminated by an automatic decision making system, and how to prevent this. While every research in this sphere is very important and rightly so at the forefront of current discussions in data science, artificial intelligence and digital ethics (see e.g. here, here or here), I think the term itself might do more harm than good in the public discussion.

Here’s my line of thought:

First, the (German) definition of algorithm in computer science and beyond is very broad, pointing to any unambiguous sequence of instructions to solve a given problem; it can be implemented as a computer program that transforms some input into corresponding output. As this encompasses any automatic computer program, we are formally save to call any biased decision by a computer program “algorithmic bias”.

But maybe we can be more precise to prevent unnecessary harm; let me explain.

Second, in computer science practice, we usually constrain the use of the word “algorithm” to only mean those programs that solve “a class of problems” instead of every customized piece of code (see also English Wikipedia, first sentence). Those algorithms go by certain well-established names, e.g. the Quicksort algorithm to sort lists of items; the Gauss-Newton algorithm to solve non-linear least squares problems; the Backpropagation algorithm to train neural nets; and so forth. This fits well with the terminology used in the machine learning community: if I solve a given problem (e.g., face detection) by tweaking the parameters of such a well-known algorithm, I publish about the model that I built and its properties; if I fundamentally change the method by which I built the model, I publish a new machine learning algorithm (which is often considered a more fundamental research result).

Ok, whatever; what has the terminology in computer science to do with data ethics?

Third, I suppose that the “general public” (that reads newspapers, votes, etc.) has a similar understanding: for the non-expert, our digital world/economy runs on algorithms - fundamental building blocks of the digital age, like e.g. Diesel engines in the mechanical/automotive world. They don’t understand every ever written piece of code as a separate algorithm (as would be warranted by point [1] above); rather, they see algorithms as the general principles of computing (the view presented in [2] above), which produce specific results (that this is performed by means of an intermediate model is probably not well understood). Consider now the measure of fear, uncertainty and doubt we induce if we speak of “algorithmic bias”: this conveys the message (to those having a view according to [2/3] above) that the pillars of the digital world are fundamentally flawed, not being able to handle anything in a fair manner - much like the FUD that has recently befallen those owning a Diesel car (the engine is beyond repair - what else can we do than to look for something else?).

Ok, if the possibility for this misunderstanding is valid, what can we do about it?

Fourth, we can completely overcome this problem if we are more precise and move from the general (and, as discussed, possibly misleading) term “algorithmic bias” to more specific forms like “selection bias” (for the bias in us humans selecting the wrong algorithms/model for a specific task; or the wrong data for a dataset) or “dataset bias” (for the case where a in principle neutral algorithm like any machine learning method creates a model that picks up biases in the data) etc.

Because in the end, all the bias introduced into computer programs ultimately comes from a human source (which can be mitigated), and it does not help to open up a new virtual, blurry front line against algorithmd/machines that arouses emotions but does not bring us closer to a solution (of our human problems).

Written on October 18, 2018 (last modified: October 18, 2018)