Chapter 14: Challenges and design principles for the evaluation of productive AI systems in the public sector
Restricted access

While research on the development and adoption of AI systems is growing, organizations will harness benefits and avoid harm from AI systems only if AI systems maintain high performance after they are developed and adopted. A key activity in this regard is the evaluation of productive AI systems. In this Action Design Research study, we built, implemented, and evaluated an infrastructure for evaluating productive AI systems at the Danish Business Authority and examined the challenges that such an infrastructure needs to address. We found that key challenges revolve around tedious work, resource availability, maintaining an overview, ensuring sufficient priority, and timing evaluations. We propose that these challenges can be addressed by a digitised evaluation infrastructure that automatically stops systems not evaluated, by aligning evaluation timing with patterns of change in the real world, by making evaluation work meaningful, and by leveraging synergies between evaluation and other activities. Our study provides unique insights into the challenges of ongoing AI system evaluation in organisational realities, into emergent solution strategies, and their theoretical foundations.

You are not authenticated to view the full text of this chapter or article.

Access options

Get access to the full article by using one of the access options below.

Other access options

Redeem Token

Institutional Login

Log in with Open Athens, Shibboleth, or your institutional credentials

Login via Institutional Access

Personal login

Log in with your Elgar Online account

Login with your Elgar account