# Towards Understanding and Mitigating Unintended Biases in Language Model-driven Conversational Recommendation

Conversational Recommendation Systems, BERT, Contextual Language Models, Bias and Discrimination.
[ pdf ] [ code ]

## Abstract

Conversational Recommendation Systems (CRSs) have recently started to leverage pretrained language models (LM) such as BERT for their ability to semantically interpret a wide range of preference statement variations. However, pretrained LMs are prone to intrinsic biases in their training data, which may be exacerbated by biases embedded in domain-specific language data (e.g., user reviews) used to fine-tune LMs for CRSs.

We study a recently introduced LM-driven recommendation backbone (termed LMRec) of a CRS to investigate how unintended bias --- i.e., due to language variations such as name references or indirect indicators of sexual orientation or location that should not affect recommendations --- manifests in significantly shifted price and category distributions of restaurant recommendations. For example, offhand mention of names associated with the black community significantly lowers the price distribution of recommended restaurants, while offhand mentions of common male-associated names lead to an increase in recommended alcohol-serving establishments.

While these results raise red flags regarding a range of previously undocumented unintended biases that can occur in LM-driven CRSs, there is fortunately a silver lining: we show that training side masking and test side neutralization of non-preferential entities nullifies the observed biases without significantly impacting recommendation performance.

## Template-based Analysis

We define unintended bias in language-based recommendation as

#### A systematic shift in recommendations corresponding to non-preferentially related changes in the input (e.g., a mention of a friend's name).

In order to evaluate unintended bias, we make use of a template-based analysis of bias types outlined in Table 1 and conduct the bias analysis.

We perform the following bias analysis with the setups listed below:

• Natural conversational template sentences are created for each targeted concept (e.g., race).
• Conversational templates are generated at inference time and fed into LMRec. The top 20 recommendation items are generated corresponding to each input.
• Attributes for the recommended items are recorded, including price levels, categories, and item names and from this we compute various statistical aggregations such as the bias scoring methods covered next.

The complete list of input test phrases is presented below in Table 2

The complete list of substitution words for different bias types is presented below in Table 3 and Table 4

.

## Bias Scoring Methods

We now define the following scoring functions for measuring recommendation bias in this work:

### Price Percentage Score

We measure the percentage at each price level $m \in \{\$, \$\$, \$\$\$, \$\$\$\$\}$ being recommended to different bias sources (e.g., race, gender, etc.). Given the restaurant recommendation list $\mathcal{I}_{m}$ including the recommended items at price level $m$, we calculate the probability of an item in $\mathcal{I}_m$ being recommended to a user with mentioned name label $l=white$ vs. $l=black$.
$$P(l = l_i|m = m_j) = \frac{\vert \mathcal{I}_{l=l_i, m=m_j} \vert}{\vert \mathcal{I}_{m=m_j} \vert}.$$

#### Effects in different datasets.

It can be noticed that certain cities (e.g., Toronto, Austin, and Orlando) exhibit different behaviour than the rest of the cities at the \$\$\$\$ price level. This shows that the unintended bias in the recommendation results will be affected by the training review dataset, resulting in different variations across different cities.

## Unintended Gender Bias

### More nightlife-related recommendations for males.

Among the sensitive items, we see a significant shift of nightlife-related activities (predominantly alcohol-related venues) to the male side of the first relationship mentioned, as reflected in other results.

## Unintended Location Bias

The unintentional mentioning of locations may contain the user's information on employment, social status or religion. An example of such phrases is "Can you pick a place to go after I leave the [LOCATION]?". The placeholder could be "construction site", indicating that the user may be a construction worker. Similarly, the religious information is implicitly incorporated by mentioning locations such as "synagogues", "churches", and "mosques". As mentioned in our work, it is considered to be undesirable if conversational recommender systems exhibit price discrimination towards different users' indications of desired locations. Therefore, in this section, we aim to study whether LMRec exhibits such behaviour.

We construct a set of testing sentences based on a pre-defined collection of templates. Each testing phrase includes a placeholder [LOCATION], which provides potential employment, social status or religious information implicitly. We measure the differences in average price levels of the top 20 recommended restaurants across the substitution words. The average is computed over all cities and all templates.

### Relationship between occupation and price level.

In brief, we see in the figure above (presented with 90% confidence intervals) that professional establishments (e.g., "fashion studio" or "law office") and religious venues like "synagogue have a higher average price than "convenience store" and "mosque" indicating possible socioeconomic biases based on location and religion. When the occupation information is substituted into the recommendation request queries, a person who goes to the fashion studio receives higher-priced recommendations than those who are heading to a convenience store. The results also appear to imply that people who visit fashion studios or can afford a psychiatrist also go to expensive restaurants. While occupations related to fashion are less related to socioeconomic status, occupations such as lawyers and psychologists fit into the highest occupational scale defined by hollingshead.. We hypothesize that people related to lawyers, psychiatrists, or psychologists are considered to have higher SES (i.e., the service providers and the customers), while the population majority at places such as universities may be students who have lower SES thus leading to the observed price associations in the above figure.

From the perspective of religious information inferred by the mention of locations, the average price level of restaurant recommendations for Jewish people is the highest among the three prompt labels we tested. It is consistent with the analysis result by Pearson et al. that Jewish Americans are more likely to have a higher income distribution than other white and black populations. This common stereotype may lead to the unfairness of the recommender that will consistently recommend the cheaper restaurants to people with religions other than Judaism, predominantly Muslim, which has the lowest average price for recommendation results among the three religions.

## Limitations

We now proceed to outline some limitations of our analysis that might be explored in future work:

• Choice of model: As discussed in the Template-based Analysis section, the recommendation results for this work are based purely on the context of language requests at test time and are not personalized to individual users. Therefore, future work can investigate the existence of unintended biases in a personalized version of LMRec although this extension of LMRec would be a novel contribution itself.
• Application of test-side neutralization: As described in Train-side Masking & Test-side Neutralization, test-side neutralization performs a post-processing bias mitigation method by masking out text that reveals sensitive information in the input queries. However, the biases that exist in the model or recommendation results are not removed by this methodology. To this end, we note that there may be information in the training data that contributes to biases and cannot be easily masked (e.g., sensitive attributes that can be linked to food and cuisine types), and therefore train-time masking could not be applied to every possible contributing factor. Hence future work could investigate novel methods that may be capable of removing or mitigating biases from the trained embeddings through both direct and indirect association of language with sensitive attributes.
• Harmfulness of certain observed unintended biases: It is well-noted in the literature that biases in recommender systems may be very harmful to specific user populations (deldjoo et al, 2022., hildebrandt et al., geyik et al, 2022., edizel et al., dash et al2022.). However, whether recommending desserts to women and pubs to men is harmful remains an open question from an ethical perspective. While we wanted to highlight these notable user-item associations that we observed in our analysis, it is beyond the scope of this work to attempt to resolve such ethical questions. Nonetheless, we remark that some unintended bias may be allowable since, generally, it may be deemed innocuous in a given application setting (e.g., recommending desserts to women), and also for practical purposes since bias cannot always be completely detected and removed from the training text or request queries. Overall though, investigating these ethical questions is an important problem for future research.

## Conclusion

Given the potential that pretrained LMs offer for CRSs, we have presented the first quantitative and qualitative analysis to identify and measure unintended biases in LMRec. We observed that the model exhibits various unintended biases without involving any preferential statements nor recorded preferential history of the user, but simply due to an offhand mention of a name or relationship that in principle should not change the recommendations. Fortunately, we have shown that training side masking and test side neutralization of non-preferential entities nullifies the observed biases without significantly impacting recommendation performance. Overall, our work has aimed to identify and raise a red flag for LM-driven CRSs and we consider this study a first step towards understanding and mitigating unintended biases in future LM-driven CRSs that have the potential to impact millions of users.

## Citation

Cited as:

                        @article{shen2023towards,  title   = "Towards understanding and mitigating unintended biases in language model-driven conversational recommendation",  author  = "Shen, Tianshu and Li, Jiaru and Bouadjenek, Mohamed Reda and Mai, Zheda and Sanner, Scott",  journal = "Information Processing & Management",  year    = "2023"}