Skip to content

The Healthcare Data Conundrum (Part I)

“It’s the economy, stupid.” 

James Carville, Bill Clinton’s strategist for 1992 presidential campaign  

The above battle mantra during the presidential campaign of Bill Clinton was one of the top three phrases that the Democrats continually used to defeat then president George H. Bush.

This can be altered by one word to “It’s the data, stupid.” for an apt descriptor of our current situation with AI in clinical medicine and healthcare. The COVID-19 pandemic has exposed too many deficiencies of both healthcare data and IT infrastructure, and this has had an unfavorable impact on the value proposition for AI in terms of impact on clinical outcome.

First, a brief primer on healthcare data.

Healthcare data is either structured (database data such as names, ICD-10 code, prescriptions, lab values, etc in patient forms, insurance data, and billing data) or unstructured (data that cannot be displayed in rows and columns such as clinicians’ free text notes, image files, etc), with most (60-80% or more) of healthcare data in the latter category.

Recent advances in deep learning and natural language processing have been capable of transforming unstructured healthcare data to structured format and thus elevate the contribution of unstructured data to insights. An email or a web page, however, may be considered semi-structured as there is some internal metadata structure. Data about data is termed metadata. An example of metadata is the tables, columns, data types, table relationships, etc for a relational database.

There is a myriad of nuances with healthcare data: heterogeneity, accuracy, completeness, location, definition, format, etc. In addition, the following are several essential issues that can easily thwart any project momentum in healthcare. First, data access remains one of the biggest challenges as most of data science work and time in healthcare is spent on data access (including curation). The increasing volumes of publicly available healthcare data has not necessarily solved this conundrum as these databases have been inconsistent in quality of the data.

Second, data sharing still remains  difficult; the pandemic demonstrated, however, at least an increase in willingness for some institutions to share certain types of data. Third, data privacy and security continue to be an understandable obstacle. The institution of data privacy measures such as EU’s GDPR (General Data Protection Regulation), California’s CCPA (California Consumer Privacy Act) , Brazil’s LGPD (Lei Geral de Protecao de Dados), and South Africa’s POPI (Protection of Personal Information) has escalated in just the past few years.

There is also discussion of the concept of a data trust that will enable an entity in which a few trustees look after the data with the permission of the people with the data. Lastly, data accuracy remains a major challenge. There is a propensity to assume that electronic record data is accurate since it is in print when in fact it can be full of errors from the many “copy and paste” functions and the accuracy of publicly available data has been questioned.

The other two top phrases during the successful Bill Clinton campaign? “Change vs more of the same” and “Don’t forget health care.” Both are particularly relevant and well worth acting on for the data and AI dyad in healthcare.

Next week, I will elaborate on the potential solutions to the issues in healthcare data.

Show Buttons
Hide Buttons