Classic! Five AI-Themed Papers from Top Management Journals M&SOM

This issue focuses on five AI-themed papers from Manufacturing & Service Operations Management, revealing the multidimensional roles and dynamic challenges of AI technology in operational decision-making, service optimization, and ethical governance.

A Framework for Decision Science Integration of Human-Machine Collaboration The integration of Machine Learning (ML) and Behavioral Science (BSci) provides complementary solutions to operational management problems: ML reveals complex patterns through data modeling, while BSci analyzes human behavioral biases and cognitive mechanisms, collaboratively optimizing prediction and decision-making processes. The study proposes an interdisciplinary research framework, emphasizing that the coupling of algorithm design and behavioral interventions can expand the solution boundaries of OM problems (Davis et al., 2024).

The Evolution of Decision Biases in Large Language Models Testing ChatGPT in 18 operational scenarios shows that nearly half of the decisions made by GPT-3.5 and GPT-4 exhibit human-like cognitive biases (such as risk aversion and confirmation bias), although mathematical problem-solving capabilities improve with model upgrades. GPT-4 performs excellently in structured tasks but exhibits increased behavioral biases in preference-based problems, indicating that managers need to select models based on task types and be wary of algorithmic biases infiltrating decision chains (Chen et al., 2025).

The AI Adoption Paradox in Healthcare Settings The intelligence of AI assistants (ML-driven) significantly enhances the adoption rate and speed among physicians, while transparency (informing about the existence of AI) only accelerates adoption time. Low-intelligence AI requires transparency to compensate for trust, whereas high-intelligence AI diminishes the necessity for transparency due to technological reliability. The study indicates that enhancing AI core capabilities is more critical than information disclosure, and low-capability scenarios require transparent mechanisms to buffer (Hou et al., 2024).

The Algorithmic Advantage Boundaries of Cover Selection AI overcomes data bottlenecks through transfer learning, improving user engagement by 12%-16% in restaurant cover selection compared to crowdsourcing, especially effective for older establishments, low-rated businesses, and those lacking user-generated content. The algorithm excels at capturing potential visual features (such as food quality and ambiance), validating AI’s comparative advantage in complex feature extraction and unstructured data processing (Khern-am-nuai et al., 2023).

The Ethical Dilemma of Algorithmic Discrimination in Finance Banning gender data exacerbates bias in loan models, with ML models being less discriminatory and more profitable than logistic regression. Feature engineering and selection drive algorithmic bias, while data rebalancing and gender proxy modeling can reduce discrimination. The study questions the “data isolation” paradigm of anti-discrimination, advocating for the collection of protected attributes to monitor bias and promote algorithm accountability and data ethics rebalancing (Kelley et al., 2022).

These five studies collectively point to the need for AI-enabled operational management to resolve three intertwined contradictions: the complementary design of technological efficacy and human cognition, the dynamic matching of algorithmic advantages and scenario adaptation, and the collaborative governance of efficiency enhancement and ethical risks. Future research should deepen interdisciplinary integration to construct a triadic intelligent decision-making ecosystem of “technology-behavior-institution”.

Andrew M. Davis, Shawn Mankad, Charles J. Corbett, Elena Katok (2024) OM Forum—The Best of Both Worlds: Machine Learning and Behavioral Science in Operations Management. Manufacturing & Service Operations Management 26(5):1605-1621.

Abstract

Problem definition: Two disciplines increasingly applied in operations management (OM) are machine learning (ML) and behavioral science (BSci). Rather than treating these as mutually exclusive fields, we discuss how they can work as complements to solve important OM problems.

Methodology/results: We illustrate how ML and BSci enhance one another in non-OM domains before detailing how each step of their respective research processes can benefit the other in OM settings. We then conclude by proposing a framework to help identify how ML and BSci can jointly contribute to OM problems.

Managerial implications: Overall, we aim to explore how the integration of ML and BSci can enable researchers to solve a wide range of problems within OM, allowing future research to generate valuable insights for managers, companies, and society.

Abstract

Problem definition: Large language models (LLMs) are being increasingly leveraged in business and consumer decision-making processes. Because LLMs learn from human data and feedback, which can be biased, determining whether LLMs exhibit human-like behavioral decision biases (e.g., base-rate neglect, risk aversion, confirmation bias, etc.) is crucial prior to implementing LLMs into decision-making contexts and workflows. To understand this, we examine 18 common human biases that are important in operations management (OM) using the dominant LLM, ChatGPT.

Methodology/results: We perform experiments where GPT-3.5 and GPT-4 act as participants to test these biases using vignettes adapted from the literature (“standard context”) and variants reframed in inventory and general OM contexts. In almost half of the experiments, Generative Pre-trained Transformer (GPT) mirrors human biases, diverging from prototypical human responses in the remaining experiments. We also observe that GPT models have a notable level of consistency between the standard and OM-specific experiments as well as across temporal versions of the GPT-3.5 model. Our comparative analysis between GPT-3.5 and GPT-4 reveals a dual-edged progression of GPT’s decision making, wherein GPT-4 advances in decision-making accuracy for problems with well-defined mathematical solutions while simultaneously displaying increased behavioral biases for preference-based problems.

Managerial implications: First, our results highlight that managers will obtain the greatest benefits from deploying GPT to workflows leveraging established formulas. Second, that GPT displayed a high level of response consistency across the standard, inventory, and non-inventory operational contexts provides optimism that LLMs can offer reliable support even when details of the decision and problem contexts change. Third, although selecting between models, like GPT-3.5 and GPT-4, represents a trade-off in cost and performance, our results suggest that managers should invest in higher-performing models, particularly for solving problems with objective solutions.

Abstract

Problem definition: Artificial intelligence (AI) assistants—software agents that can perform tasks or services for individuals—are among the most promising AI applications. However, little is known about the adoption of AI assistants by service providers (i.e., physicians) in a real-world healthcare setting. In this paper, we investigate the impact of the AI smartness (i.e., whether the AI assistant is powered by machine learning intelligence) and the impact of AI transparency (i.e., whether physicians are informed of the AI assistant).

Methodology/results: We collaborate with a leading healthcare platform to run a field experiment in which we compare physicians’ adoption behavior, that is, adoption rate and adoption timing, of smart and automated AI assistants under transparent and non-transparent conditions. We find that the smartness can increase the adoption rate and shorten the adoption timing, whereas the transparency can only shorten the adoption timing. Moreover, the impact of AI transparency on the adoption rate is contingent on the smartness level of the AI assistant: the transparency increases the adoption rate only when the AI assistant is not equipped with smart algorithms and fails to do so when the AI assistant is smart.

Managerial implications: Our study can guide platforms in designing their AI strategies. Platforms should improve the smartness of AI assistants. If such an improvement is too costly, the platform should transparentize the AI assistant, especially when it is not smart.

Abstract

Problem definition: Restaurant review platforms, such as Yelp and TripAdvisor, routinely receive large numbers of photos in their review submissions. These photos provide significant value for users who seek to compare restaurants. In this context, the choice of cover images (i.e., representative photos of the restaurants) can greatly influence the level of user engagement on the platform. Unfortunately, selecting these images can be time consuming and often requires human intervention. At the same time, it is challenging to develop a systematic approach to assess the effectiveness of the selected images.

Methodology/results: In this paper, we collaborate with a large review platform in Asia to investigate this problem. We discuss two image selection approaches, namely crowd-based and artificial intelligence (AI)-based systems. The AI-based system we use learns complex latent image features, which are further enhanced by transfer learning to overcome the scarcity of labeled data. We collaborate with the platform to deploy our AI-based system through a randomized field experiment to carefully compare both systems. We find that the AI-based system outperforms the crowd-based counterpart and boosts user engagement by 12.43%–16.05% on average. We then conduct empirical analyses on observational data to identify the underlying mechanisms that drive the superior performance of the AI-based system.

Managerial implications: Finally, we infer from our findings that the AI-based system outperforms the crowd-based system for restaurants with (i) a longer tenure on the platform, (ii) a limited number of user-generated photos, (iii) a lower star rating, and (iv) lower user engagement during the crowd-based system.

Abstract

Problem definition: We use a realistically large, publicly available data set from a global fintech lender to simulate the impact of different antidiscrimination laws and their corresponding data management and model-building regimes on gender-based discrimination in the nonmortgage fintech lending setting.

Academic/practical relevance: Our paper extends the conceptual understanding of model-based discrimination from computer science to a realistic context that simulates the situations faced by fintech lenders in practice, where advanced machine learning (ML) techniques are used with high-dimensional, feature-rich, highly multicollinear data. We provide technically and legally permissible approaches for firms to reduce discrimination across different antidiscrimination regimes whilst managing profitability.

Methodology: We train statistical and ML models on a large and realistically rich publicly available data set to simulate different antidiscrimination regimes and measure their impact on model quality and firm profitability. We use ML explainability techniques to understand the drivers of ML discrimination.

Results: We find that regimes that prohibit the use of gender (like those in the United States) substantially increase discrimination and slightly decrease firm profitability. We observe that ML models are less discriminatory, of better predictive quality, and more profitable compared with traditional statistical models like logistic regression. Unlike omitted variable bias—which drives discrimination in statistical models—ML discrimination is driven by changes in the model training procedure, including feature engineering and feature selection, when gender is excluded. We observe that down sampling the training data to rebalance gender, gender-aware hyperparameter selection, and up sampling the training data to rebalance gender all reduce discrimination, with varying trade-offs in predictive quality and firm profitability. Probabilistic gender proxy modeling (imputing applicant gender) further reduces discrimination with negligible impact on predictive quality and a slight increase in firm profitability.

Managerial implications: A rethink is required of the antidiscrimination laws, specifically with respect to the collection and use of protected attributes for ML models. Firms should be able to collect protected attributes to, at minimum, measure discrimination and ideally, take steps to reduce it. Increased data access should come with greater accountability for firms.

Leave a Comment