iDEM Seminar on Natural Language Processing and its biases

iDEM Seminar on Natural Language Processing and its biases

21/01/2025 Carla Martinez

On January 14th, as part of iDEM’s Ethics Seminars Series, members of the iDEM project attended a seminar led by Horacio Saggion, professor at Universitat Pompeu Fabra and coordinator of iDEM. This seminar covered natural language processing and its biases. 

What are biases in Natural Language Processing (NLP)? 

Natural Language Processing is a branch of artificial intelligence (AI) that enables computers to comprehend human language, whether it is written or spoken. This seminar discussed the dangers and issues human biases pose when transferred into natural language processing models. Language isn’t arbitrary, meaning that there are patterns, historical influences and cultural elements that have influenced language evolution and word usage. So, bias is employed in natural language processing, among other factors, to forecast results. 

Human bias and word embedding, a method used to turn words into numerical representations so computers can understand their meaning and how they relate to each other, can result in gender, race or class bias. Type-level embeddings turn words into numerical representations that portray their meanings. Similar words – like «polite» and «friendly» – are close together, and even opposite words – like «cold» and «hot» – might end up near each other. The closeness is measured by how far apart the number representations are. This closeness is affected by human biases found in semantics, which then find their way to language statistics transferred to AI1. This can result in association of names (woman or man, European-American or African-American) with specific professions or personalities1.

How does this affect us?

Bias can interfere with crucial matters such as loan allocation or job searches. AI created by banks can register names or postcodes in applications as belonging to people from certain social classes. If they infer a lower socioeconomic status, they might recommend higher interest rates. The statistics included in data fed to resumé filtering AI, can lead to the association of professions with racial stereotypes, – due to the statistical prevalence of some profiles over others in different job positions – favouring certain CVs over others depending on the race or ethnicity it infers from the name on it. 

In the same way, gender bias in AI can potentially also help enhance stereotypes among humans. Humans associate words with attributes, for example socially we attribute certain qualities, professions or objects to be feminine and others to be masculine. Words typically associated with females are linked more to the arts rather than to science. This sees itself translated into NLP models and the tools it ends up creating. For example, depending on the topic of a book or academic paper, AI might assume the gender of the author. When translating a sentence from English to a language with grammatical gender categories. NLP can use this bias to decipher the gender of professionals, translating for example, the doctor (not gendered) to el doctor (masculine) in Spanish, and the nurse (not gendered) to la enfermera (feminine) in Spanish. It can also associate certain races or ethnicities to more positive or negative personality traits and attributions, promoting racist stereotypes in society1.

It is vital to know that AI is not unbiased when we use it and to ensure we read results critically. We must also ensure that our input is as unbiased as possible so our outputs may be more accurate. 


If you are interested in this topic and would like some further reading, these were the papers the seminar referenced:

Caliskan, A., Bryson, J. J., & Narayanan, A. (2017). Semantics derived automatically from language corpora contain human-like biases. Science (New York, N.Y.), 356(6334), 183–186. https://doi.org/10.1126/science.aal4230 

Kurita, K., Vyas, N., Pareek, A., Black, A. W., & Tsvetkov, Y. (2019). Measuring bias in contextualized word representations. arXiv preprint arXiv:1906.07337. https://arxiv.org/pdf/1906.07337 

Bolukbasi, T., Chang, K. W., Zou, J. Y., Saligrama, V., & Kalai, A. T. (2016). Man is to computer programmer as woman is to homemaker? debiasing word embeddings. Advances in neural information processing systems, 29. https://arxiv.org/pdf/1607.06520 Noble, S. U. (2018). Algorithms of oppression: How search engines reinforce racism. In Algorithms of oppression. New York university press. https://www.degruyter.com/document/doi/10.18574/nyu/9781479833641.001.0001/html


For more information, you can also contact our project coordinator at horacio.saggion@upf.edu