Collection, Transport, and Presentation: The Three Wicked Problems Inhibiting Data-Driven Decision-Making in the Army

Military leaders have sought hard data to drive their decisions for decades, perhaps most famously beginning with Secretary of Defense Robert McNamara’s so-called Whiz Kids’’ in the 1960s. As retrospective analysis of those decisions made clear, however, data alone does not good decisions make. Errors in collection, transmission, and presentation decimated the efficacy of this initiative.1 The Vietnam War is a cautionary tale in data-driven decision-making gone wrong, an important reminder that modernity’s insatiable need for more data is no more a silver bullet today than it was sixty years ago.

This paper leaves aside important questions regarding the efficacy of data-driven decision-making, especially in a world where data overload is not a danger but rather a given. It also ignores the impact of biases that can make efforts to achieve “data-driven decision-making” little more than a quest for decision-driven data. Humans have a well-studied tendency to search for data that confirms their opinions rather than challenge them.2 Instead, this paper examines the technical challenges facing the Joint Force in making data-driven decisions. Where appropriate, this paper discusses these challenges through the lens of defensive cyberspace operations, a common mission at all echelons from tactical units to strategic headquarters. These challenges fall into three categories: collection, transport, and presentation. Subsequent papers should carefully examine the efficacy of this approach that necessitates centralization through complex, brittle systems of systems and runs counter to the decentralized mandates of mission command. Here, too, we must not only ask whether or not we can, but also whether or not we should. This paper deals only with the former.

Collection #

In the context of data-driven decision-making, “collection” refers to the acquisition of data.3 Sensors capture an abstraction of reality, at a given point in time, and turn it into data; this data may then be transported and later presented elsewhere. Collection as the first challenge for data-driven decision-making aligns directly to the first technical requirement for analysis, correct and complete data. The first and most insidious challenges to data-driven decision-making manifest at this stage:

Proprietary vendor equipment presents one of the greatest challenges to collection. Without organic sensors, collection is often impossible; when those organic sensors do exist, they often provide little more than diagnostic data appropriate for technicians to troubleshoot them but not decisions military leaders make regarding the employment of their forces.

Even information systems suffer from collection challenges. Despite extensive built-in sensor mechanisms, lack of the right data captured in the right way continues to inhibit defensive cyberspace operations. Even in the cyber domain, where collection as code enables the tuning of sensors with the push of a button, sufficient collection remains a key challenge confounding operations. This should serve as a cautionary tale when dealing with collection in the land, sea, air, and space domains, where sensors are governed by physics, materiel, and acquisitions—and code, too.

Transport #

While collection certainly faces challenges, it is firmly grounded in the present—-transport, on the other hand, is reliant on technology, systems, and programs dating back decades. “Transport,” in the context of data-driven decision-making, refers to “processing” as defined in ADP 2-0 as well as the transmission of that data between systems.5 Transport aligns directly to the second technical requirement for analysis, correct and complete data in a suitable platform.

Transport is the most significant yet—paradoxically—the least studied challenge facing data-driven decision-making. This challenge is primarily the result of stasis in transmission mediums, particularly within the Department of Defense. While data volume has grown exponentially in the Internet age, and will continue to do so at an ever-increasing pace going forward, transmission mediums progressed at a comparatively glacial pace—again, particularly within the Department of Defense. This impacts tactical units the most, whose options for data transmission not only failed to keep pace with the increase in data, but also with industry as well. While technologies like 5G and systems like Starlink present a near incomprehensible increase in capability to units accustomed to tactical connections measured in mere kilobits per second under the best of circumstances, they face a long and fraught road to general availability. As capabilities for collection continue to increase, and researchers develop new and interesting ways to present that data, transport will continue to remain the most impactful factor in enabling data-driven decision-making.

Presentation #

Presentation’’ in the context of data-driven decision-making refers not just to the literal visual presentation of data, but also to the system through which data is made available to its users. Presentation aligns directly to the third and final requirement for analysis, correct and complete data, in a suitable platform, with the requisite analytics to answer the decision-makers’ information requirements.

Presentation presents the least significant challenge to data-driven decision making—yet, paradoxically, often receives the most attention of the three. Here, the means—presentation of data in order to enable informed decision-making—have instead become the end itself: presentation of data in order to present data.

Dependencies, not a checklist #

These three challenges—-collection, transport, and presentation—cannot be considered in isolation. To enable true data-driven decision-making, all three must be considered together.

Without collection and transport, even the most intuitive presentation system will fail to enable data-driven decision-making. It would have no data to present. Similarly, without collection, even perfect transport and presentation will serve no real purpose. Those systems would have no data to transport or present. This dependency relationship works both ways. Without appropriate presentation, even exquisite collection and transport would fail. Of course, exquisite collection and perfect presentation mean naught without transport to connect them.

This should not be construed to justify boutique, end-to-end solutions. Incompatible systems already abound. Rather, this should highlight the importance of understanding the ability of new systems to integrate with pre-existing systems for collection, transport, and presentation as well as the suitability of those pres-existing systems to support the new one. Each new solution need not add a new row to the table below; instead, designers must ensure that their new solution C can integrate with a sufficient transport medium 1 to feed a suitable presentation tool Y—all in service of enabling the ultimate goal, data-driven decision-making.

Collection, transport, and presentation capability matrix

At the other end of the spectrum, however, concepts like the joint all domain command and control, or JADC2, attempt to address these three challenges together but do so in far too general a manner to succeed. As David Deptula recently explained in Making Joint All Domain Command and Control a Reality, But while [the Department of Defense’s JADC2] definition captures what JADC2 aims to achieve, it says little about how to achieve it. As a result, joint all domain command and control has partially stalled due to a cloudy department-wide vision that every service views slightly differently. To make this concept a reality, the Pentagon needs a straightforward, clear, and understandable description of what its vision entails.’’ Opposite cumbersome specificity lies unhelpful generality, a danger leaders must take care to avoid in addressing data-driven decision-making.

The present approach to enabling data-driven decision-making continues to err on the side of specificity by addressing these columns individually. In defensive cyberspace operations, for example, efforts to address collection gaps across the enterprise seldom take into account the impact of legacy wide area network uplinks incapable of handling both increased collection and regular business usage. Meanwhile, separate teams work on presentation systems such as the Army’s Big Data Platform, Gabriel Nimbus, whose efficacy as an enabler of defensive cyberspace operations remains questionable for lack of suitable collection and transport, among other reasons. Here, too—as well as in the broader context of data-driven decision-making—collection, transport, and presentation must be considered together.

Applicability #

This paper examined the technical challenges facing the Joint Force in making data-driven decisions. The oft-discussed sensor-to-shooter’’ loop is a subset of this problem that focuses specifically on shrinking the delay between a sensor capturing data and a shooter prosecuting a target. This paper’s framework of collection, transport, and presentation apply equally well to this specific instance of the more general problem facing the joint force.

Data-driven decision-making and sensor-to-shooter relationship

Similarly, this framework also captures the challenges facing defensive cyberspace operations well. Correct and complete collection remains elusive across the enterprise. Transport for such volumes of data remains infeasible under many circumstances. Precious few systems have succeeded in presenting that data in a coherent manner. This paper’s framework applies not only at the macro level, but also the micro level as well.

Safeguards #

Any discussion of data-driven decision-making must include a proviso that a human must make the eventual decision itself. A particular course of action may make sense based on the data, but it may be incompatible with rules of engagement or the law of armed conflict. Humans must ultimately retain responsibility for making the best decision possible given sound judgement and the information at hand. The Army’s Field Artillery branch has an old saying, “No unobserved rounds.” Similar advice should apply here: “No unsupervised decisions.” This is a crucial moral and ethical consideration that should remain at the front of any discussion of data-driven decision-making, especially when most (if not all) of that process becomes opaque as is the case with systems based on machine learning or artificial intelligence.

Humans should not only serve as a final check in an otherwise automated decision-making process, but also implement safeguards along the way. Each step in any multi-step process introduces another opportunity for error; when each step relies on the integrity of the previous one, those errors may cascade. At the collection, transport, and presentation stages, operator error, malicious actors, and simple entropy threaten the entire system. While several techniques for mitigating errors at each stage exist, all involve trade-offs.

Consensus between multiple sensors in a distributed, austere environment, for example, is unlikely—especially given the likelihood of malign interference. Relying on agreement amongst a plurality of sensors may also prove problematic if an adversary manages to interfere with 51% of them, thereby causing the system to present an inaccurate consensus. For obvious reasons, decision-makers should also not rely on that other 49%, for while the minority may represent a more accurate picture of reality in this specific situation, it may not in others. Conceptually, it is important to remember that different perspectives lead to different conclusions, but all perspectives are equally valid perceptions of reality. Sensors capture an abstraction of reality, at a given point in time, and turn it into data; whether or not that abstraction is an accurate representation of reality, however, remains a question for decision-makers to decide. This question alone will require earnest analysis and careful deliberation.

Conclusion #

Unfortunately, this is a far simpler concept to explain than to solve. A “wicked problem” is a problem for which no perfect solution exists, only progressively better ones. The Joint Force faces not one but three wicked problems in collection, transport, and presentation on its road to data-driven decision-making—and not just three wicked problems, but three wicked and interrelated ones. This paper has no solutions to offer, only a framework for examining the technical challenges facing the joint force. As Charles Kettering once said, though, “A problem well-stated is a problem half solved.” This paper may have no solutions to offer, but it is at least a step in the right direction.

Contributors #

Thanks to the following people for providing feedback during the creation of this paper.

Their input was considered, but this paper is not necessarily an accurate reflection of their opinions.

Note: An edited version of this article appeared on West Point’s Modern War Institute, published June 27th, 2023, which can be viewed here.

 The Problem of Metrics: Assessing Progress and Effectiveness in the Vietnam War, by Gregory Daddis via Data are the Map for Warfare: Developing a Data-Literate Army, by LTC Timothy Sikora.

 For example, in Confirmation bias in information search, interpretation, and memory recall: evidence from reasoning about four controversial topics, researchers Dasa Vedejova and Vladimira Cavojova examined how confirmation bias impacts not just the interpretation of new evidence, but also evidence search and recall of previously discovered evidence.

 Notably, this diverges from the definition of “collection” in ADP 2-0: Intelligence which defines it as “the acquisition of information and the provisioning of this information to processing elements.” ADP 2-0 considers collection and processing a single function, but for the purposes of this paper, “processing” activities as defined in ADP 2-0 are considered part of transport.

 This is a key question in the field of signal detection theory which is, unfortunately, beyond the scope of this paper.

 This includes “provisioning of [the collected] information to processing elements”, and activities such as “data conversion and correlation, document and media translation, and signal decryption.”, from ADP 2-0.