> As Google and other tech giants adopted the technology, no one quite realized ...

tchalla · on March 17, 2021

Algorithmic bias is not ONLY a data problem, it’s also a model problem [0]. The bias from model developers are coded via learning rates, network hyper parameters and objective functions.

[0] https://twitter.com/sarahookr/status/1361373527861915648?s=2...

visarga · on March 17, 2021

Model bias is not a huge issue. Maybe something about class imbalance or regularization. The huge issue is deployment - what is the model used for? How is it affecting people in reality? What metric is it optimizing?

Between all these the degree of L1 regularization or the class weights are minor things. Most models will perform similarly given the same data. It's mostly the data that makes the difference.

yorwba · on March 17, 2021

There's an interplay between the two insofar as a model built to handle a specific dataset will involve design decisions informed by the data. E.g. you might pick a certain level of L1 regularization because it maximizes performance on the data you have, which can lead to bias against data you don't have.

But if you take "model" to mean the pure mathematical description without parameters or hyperparameters that need to be determined by experimentation, then I agree that optimizing the model on a dataset will not lead to bias against specific groups of humans unless the data used contains such a bias.

baylearn · on March 17, 2021

Citations needed to back your claim. Research literature seem to support the opposite.

tchalla · on March 17, 2021

Is that your personal opinion? Research seems to disagree. Citations in the previous link.

blueblisters · on March 17, 2021

Hyperparameters play a significant role in bias when you're dealing with imbalanced classes, or long tail samples.

But this ties back to the original data problem, right? If you don't have enough training samples for (known or unknown) unknowns, your model is likely to be biased against them.

blackbear_ · on March 17, 2021

While this is true, "learning the biases of the researchers who built it" is a very misleading way of putting it, because it is still very unclear if and how certain design decisions impact the bias of the resulting model.

Given that reducing bias while not giving up other desirable properties is a young and open research direction, researchers in general should not be faulted for using the current (imperfect) state of the art or for working on something that is not (yet) focused on bias.

ageek123 · on March 17, 2021

https://doxa.substack.com/p/googles-colosseum is a pretty good treatment of this topic.

dr_dshiv · on March 17, 2021

Yes, the optimization goal (the objective function) is a major factor in the function of algorithmic systems. I'm not sure bias is the best word to use here, however.

It is a known challenge to align the designed purpose of an algorithm with actual optimization metrics. For instance, recommendation systems may have the purpose of improving user experience, but if time-on-site metrics are used as the optimization function, there can be unexpected results.

cbsmith · on March 17, 2021

Yeah, but it is indirectly the biases of the researchers. A researcher is more likely to notice and correct for training data problems that conflict with their biases.

inductive_magic · on March 17, 2021

This requires him to be aware of each pattern/“bias” in the data, which he isn’t, which is the reason we use the algos in the first place.

Come to think of it, isn’t that an interesting venue for GAN-esque methods to detect the relation of patterns falling into these categories of biases? Or is that recursive problem? If not, put me in the paper :-)

adjkant · on March 17, 2021

I can't tell if watching the switch from they to he pronouns in a post about bias awareness is making me side more with or against your point :)

inductive_magic · on March 17, 2021

Lets just say I made the consicious descision to go with the masculine form :-)

cbsmith · on March 18, 2021

> This requires him to be aware of each pattern/“bias” in the data, which he isn’t, which is the reason we use the algos in the first place.

That's not strictly true. In a lot of cases you start out oblivious to biases in the data, and then when you evaluate the model you notice problems.

But your point about obliviousness to bias is exactly what I'm speaking towards. One might be oblivious to bias that aligns with your own biases, but notice bias that conflicts with it.

visarga · on March 17, 2021

> A researcher is more likely to notice and correct for training data problems that conflict with their biases.

That's not a systematic way of tackling bias. I would rather have invested more in creating good benchmarks and norms.

cbsmith · on March 18, 2021

I would very much agree it is not a systematic way of tackling bias. That's why it creates bias. ;-)

kortilla · on March 18, 2021

People doing ML “research” are not the ones applying it to specific data sets day to day. “I pointed a neural net at our sales data” is not research in the normal sense.

cbsmith · on March 27, 2021

Yeah, I look at the semantic problem with calling it "ML research" and just throw up my arms. These discussions aren't generally driven by people who care about semantics.

astrange · on March 17, 2021

That's not the only mechanism. Even if you want to stick to technical issues, it can be biased because you didn't train it long enough or the model is too small. And of course the entire question depends on what the model is being used for.

d110af5ccf · on March 17, 2021

Everything you say is true, but in context the important point is that the sort of biases ML models pick up are overwhelmingly related to training insufficiencies and are often incredibly difficult to spot unless you already know they exist. For a practical example, see the recent Twitter image cropping oddities (https://twitter.com/bascule/status/1307440596668182528).

The idea (as quoted) that models are routinely picking up biases directly from researchers is complete nonsense.

astrange · on March 17, 2021

Right, that's some kind of human interest journalist fudging and it's not true. But bias/surprising wrong answers in ML is obviously a real problem and fixing the data is not always the right answer. You might not be able to tell what's wrong with the data, or where you could get any more of it, and you might be reusing a model for a new problem and not have the capability to retrain it.

visarga · on March 17, 2021

We should only use models where they work well. Like in architecture, we should only build what will be safe for use.

mrtnmcc · on March 17, 2021

Researchers choose the training data..

inductive_magic · on March 17, 2021

>implying that humans are aware of every pattern (“bias”) in the data, which they are not, which is the reason we use these algos in the first place

audible sigh

psychiatrist24 · on March 17, 2021

It is also exactly the job of DNNs, to pick up biases. That is literally what they are designed to do.

visarga · on March 17, 2021

I would argue that DNNs would not work as well if they weren't picking up biases. Sometimes we need to learn the biases in order to better detect them.

Even humans need to know about swear words in order to consciously avoid using them, or need to learn about reproduction in order to avoid teenage pregnancies. Not knowing does not make us or the AI better.

For example, what GPT-3 needs is a "conscience", a separate model monitoring and rejecting harmful outputs. If I am not mistaken the demo is already displaying warnings when it goes off into weird places.

TheAdamAndChe · on March 17, 2021

I personally don't want infrastructure or tools to dictate the bounds of my morality or the overton window of the whole society.