Appendix B: Algorithmic/Machine-Learning Bias
Machine Learning Bias
There are several types of machine bias. A few of the more common types include:
Association Bias
This bias misleads the learning process when an association occurs from a bias in the training data. When this data is collected, it is often reinforced by cultural biases. For example, many women who play video games online opt for accounts that suggest they are male to avoid online harassment. If an algorithm is trained by a dataset based on account monikers, it will bias the algorithm to suggest fewer women play video games online; this effectively yields a result that also confirms a cultural bias.
Emergent Bias
This bias is a result of applying the algorithm to new data or contexts. If the training data is aligned to hypothetical datasets instead of real-world contexts, the bias results in overlooking key features that would otherwise dismiss the spurious or inappropriate correlation. The bias can be further exacerbated by feedback loops that confirm the bias with new data that confirms the correlation. For example, the Correctional Offender Management Profiling for Alternative Sanctions (COMPAS) algorithm, which was employed by several states, extracted data from arrests and used this data to predict the likelihood to of individuals to commit crimes by effectively adopting racial profiling and racial bias accelerated by the feedback loop (Angwin et al., 2016). COMPAS modeled the common bias already known, and not desired, in arrest statistics. (Carson, 2021).
Exclusion Bias
This occurs when the modelers fail to recognize the importance of a data point that would otherwise consequently influence the training. By excluding the data, the training is significantly altered and thus displays the bias.
Language Bias
Statistical sampling frequently uses data from English-language sources. This can distort training in large language models to ignore non-English language groups and distort topics more familiar to their culture (Luo, et. al, 2023).
Marginalized Bias
Traditionally under-represented or marginalized groups are not adequately included in training sets which creates a direct bias. These groups include bias against race, gender, sexual orientation, political affiliation, and disability. For example, when datasets over-represented Caucasians, darker skinned pedestrians were less safe self-driving cars trained by these datasets (Wilson, et al., 2019).
Measurement Bias
Underlying problems with the accuracy of the data and the process of how it was measured corrupt the training. Simply, invalid measurements or data collection methods will create measurement bias and biased output. For example, hospitals might judge high-risk patients based on their previous use of the health care system but not include the variability of health care accessibility for some populations (Cook, n.d.).
Recall Bias
Just before training, labels are inconsistently applied to the data based on subjective observation from the trainers, creating a false-positive rate. Since the data recalled is not precise for training, these estimation values create a bias or variance from the true values.
Sample Bias
This occurs when the data used for training is too small, or not a representative sample to accurately train the algorithm. An example of this would be training a neural network about college faculty while only using photos of elderly white men who are faculty.
Examples of Algorithmic Bias
When not accounted for, algorithmic bias can create great harm. Often the victims of the bias are the same victims of societal prejudice. A few of these cases include:
Amazon Automated Hiring
In 2014 Amazon began engineering an algorithm to automate hiring; by 2018 its algorithm demonstrated a marginalized bias (sex-bias). Amazon’s software engineers skewed disproportionately male. When using the existing pool of employees as training data, the algorithm naturally downgraded resumes with references to ‘women’s’ and graduates of two all-women’s colleges. (Dastin, 2018). Amazon tried correcting these issues, but the bias continued and then discontinued the program in 2018 (Goodman, 2018).
Medical Industry
In 2007 the Vaginal Birth after Cesarean Delivery (VBAC) algorithm was designed for providers to assess the safety of giving birth vaginally for their patients, and illuminating women at risk of uterine ruptures. While the training data focused on factors such as the woman’s age, it predicted that Black and Hispanic women were less likely to have a successful vaginal birth after having a C-section than non-Hispanic white women. Effectively, the algorithm identified symptoms of inequality based on instructional racism and attributed these as medical factors to guide prognosis. In 2017, research revealed that error and that it caused doctors to perform more C-sections on these on Black and Hispanic women than on non-Hispanic white women (Vas et al., 2019). While few large-scale deployments of AI currently exist, the above issue demonstrates just one of the concerns of its application (Panch et al., 2019). Moreover, most positive results from AI systems using patient data are fraught with issues (Fraser et al., 2018). Regardless, we can expect a growing reliance on AI in healthcare (Mittermaier et al., 2023)
Mortgage Lending
Credit scores have a long history of intentional and legalized discrimination that still display racial bias (Wu, 2024), however AI bias has amplified the effects when an algorithm was applied to mortgage lending. Compared to white applicants, lenders were; 40% more likely to turn down Latino applicants, 50% more likely to turn down Asian/Pacific Islander applicants; 70% more likely to turn down Native American applicants, even if all candidates met the same criteria (Martinez and Kirchner, 2021). Further, 80% of black mortgage applicants were denied (Hale, 2021). By adopting seemingly non-partial algorithms to large data sets, the AI bias has contributed to institutional racism that will probably create more data about homeownership rates that can be used to perpetuate its bias.