Using Big Data & Analytics to Predict Hospital Re-Admissions

Working Paper Authors Publish Date
56 Meghan Nagpal
Reza Samavi
May. 2016


“Big data” is a term often used to describe very large data sets, ranging from terabytes to exabytes, which are often complex (Chen, Chiang, & Storey, 2012). SAS, an American developer of advanced analytical software, defines big data as a term to describe “large volumes of data, both structured and unstructured, that inundates business on a day-to-day basis”. Industry analyst Doug Laney suggested that a big data system should carry the attributes of volume, velocity, and variety (SAS, n.d.). In addition to those three attributes of big data, SAS also considers variability and complexity. Table 1 describes each attribute as defined by SAS:

Table 1. Attributes of Big Data as defined by SAS

Attribute Definition
Volume Large amounts of data stored on an electronic system.
Velocity Data streamed in a fast, and timely manner.
Variety Data comes in all formats, structured and unstructured. Structured data could consist of numeric and/or binary values in a traditional database. Unstructured data could be any other form of input such as a text document or audio file.
Variability Data can flow at inconsistent rates and there can be times where there are periodic peaks of data flow.
Complexity Data can come from a variety of sources in which the complexities stem from matching, cleansing, and transforming the data.

 SAS stresses that the amount of data is not important, but it is what organizations do with the data that is important. This is where analytics is added to the equation. As defined by IBM (Cortada, Gordon, & Lenihan, 2012), “analytics is the systematic use of data and related business insights developed through applied analytical disciplines to drive fact-based decision making for planning, management, measurement, and learning. Analytics may be descriptive, predictive, or prescriptive.”

Currently, we are living in an era where data is readily available for businesses and services in almost every industry. The Data-Information-Knowledge-Wisdom (DIKW) Pyramid has historically been able to show the process in which data is transferred into wisdom in which decisions are made for the well-being of a population[1]. However, in this era of Big Data & Analytics, there has been a shift in which decisions are made through predictive-modelling through data as opposed to hypotheses’ from knowledge (Batra, 2014). As mentioned in the quote by IBM (Cortada, Gordon, & Lenihan, 2012), analytics are descriptive, predictive, or prescriptive. This can be translated into a three-phased approach to utilizing analytics in which we answer the following questions:

  1.  What can analytics tell us about our current state?
  2. What can we predict about the future from these analytics?
  3. What decisions can be made based off of the predictions made from these analytics?

With the many technologies that are able to harness large volumes of data, the big data & analytics revolution stands to grow bigger in almost every industry (SAS, n.d.). Some examples of industries which are key beneficiaries of big data & analytics include banking, education, government, manufacturing, retail, media, small & midsize business, and health care. All industries benefit from big data for being able to analyze current trends, predicting future outcomes based on these trends, and being able to make decisions based off of these trends.

This paper specifically examines the role of big data in health care and will specifically examine the necessary technical & business requirements to build predictive models for identifying which patients are at risk for hospital re-admissions. The importance of predicting which patients are at risk for re-admissions allows healthcare providers to adjust discharge plans to minimize this risk.

This paper will first examine the role of big data & analytics in healthcare, give a brief history of big data and analytics, identify risk factors for hospital re-admission based on a literature review, and identify big data techniques and architectures required for such an analytical system from research publications. The benefit of creating an analytical system to predict which patients are most likely to be re-admitted to hospital following discharge is that healthcare providers can personalize care plans. Such a personalized approach can mitigate the risk of re-admission and potentially save costs for the healthcare system.

[1] This is a hierarchy of how data is translated into wisdom. More information can be found on Wikipedia (

DeGroote on Google Plus DeGroote on Twitter DeGroote on Facebook