The agency has launched an R&D program aimed at exploring the effectiveness of algorithmic decision making in mission-critical defence operations.
To continue reading the rest of this article, please log in.
Create free account to get unlimited news articles and more!
The Defense Advanced Research Projects Agency (DARPA) has unveiled the ‘In the Moment’ (ITM) program, which seeks to quantify the alignment of algorithms with ‘trusted human decision-makers’ in complex scenarios.
The program involves the evaluation and development of trusted algorithmic decision-makers for mission-critical Department of Defense (DoD) operations.
“ITM is different from typical AI development approaches that require human agreement on the right outcomes,” Matt Turek, ITM program manager, said.
“The lack of a right answer in difficult scenarios prevents us from using conventional AI evaluation techniques, which implicitly requires human agreement to create ground-truth data.”
The ITM endeavour is compared to self-driving transport technology, in which algorithms determine right and wrong driving responses based on unchanged traffic signs and rules of the road.
In this context, hard-coding risk values into the simulation environment could be used to train self-driving car algorithms.
However, according to Turek, this approach would not work for DoD operations, given that combat situations evolve rapidly.
“The DoD needs rigorous, quantifiable, and scalable approaches to evaluating and building algorithmic systems for difficult decision-making where objective ground truth is unavailable,” he added.
“Difficult decisions are those where trusted decision-makers disagree, no right answer exists, and uncertainty, time-pressure, and conflicting values create significant decision-making challenges.”
ITM is expected to leverage a medical imaging analysis field, where techniques have been developed for evaluating systems when skilled experts may disagree on ground truth.
To overcome such decision-making challenges, an algorithmically drawn boundary is compared to the distribution of boundaries drawn by human experts.
If the algorithm’s boundary lies within the distribution of boundaries drawn by human experts over many trials, the algorithm is deemed comparable to human performance.
“Building on the medical imaging insight, ITM will develop a quantitative framework to evaluate decision-making by algorithms in very difficult domains,” Turek said.
“We will create realistic, challenging decision-making scenarios that elicit responses from trusted humans to capture a distribution of key decision-maker attributes.
“Then we’ll subject a decision-making algorithm to the same challenging scenarios and map its responses into the reference distribution to compare it to the trusted human decision-makers.”
The ITM program is set to involve four technical areas:
- Developing decision-maker characterisation techniques that identify and quantify key decision-maker attributes in difficult domains;
- Creating a quantitative alignment score between a human decision-maker and an algorithm in ways that are predictive of end-user trust;
- Designing and executing the program evaluation; and
- Policy and practice integration — providing legal, moral, and ethical expertise to the program; supporting the development of future DoD policy and concepts of operations (CONOPS); overseeing development of an ethical operations process (DevEthOps); and conducting outreach events to the broader policy community.
The program is scheduled to run over three and a half years and across two phases, with the potential for a third phase devoted to maturing the technology with a transition partner.
The first phase is expected to focus on small-unit triage as the decision-making scenario, while the second is expected to increase decision-making complexity by focusing on mass-casualty events.
Multiple human and algorithmic decision-makers are set to be presented scenarios from the medical triage (Phase 1) or mass casualty (Phase 2) domains.
Algorithmic decision-makers will be responsible for including an aligned algorithmic decision-maker with knowledge of key human decision-making attributes and a baseline algorithmic decision-maker with no knowledge of those key human attributes.
A human triage professional is also set to be included as an experimental control.
“We’re going to collect the decisions, the responses from each of those decision-makers, and present those in a blinded fashion to multiple triage professionals,” Turek said.
“Those triage professionals won’t know whether the response comes from an aligned algorithm or a baseline algorithm or from a human.
“And the question that we might pose to those triage professionals is which decision-maker would they delegate to, providing us a measure of their willingness to trust those particular decision-makers.”
[Related: DARPA to develop ‘neural tool’ to combat veteran suicide]