Healthcare Analytics: Exploring Data on Providers and the Industry

Data analytics is a hot topic in the healthcare industry. However, people throw this term around with little understanding of what it really means. Similar to “Big Data” and “Affordable Healthcare”, it is difficult to know what people mean when the term can be applied to a wide variety of actual definitions that can be very different from each other.

For me, what differentiates analytics from other types of data reporting and analysis is how the topic or subject of the analysis is determined. In conventional data analysis, a report is developed to collect and format data in a manner pre-determined to address a specific issue or purpose. For example, you may develop a report that shows you the balance of your accounts receivables and to categorize these dollars into aging categories. This report allows you to get a current picture of how much money you are owed and by which health plans or patients. It allows you to focus on the money that is in danger of becoming uncollectable and to monitor your performance regarding this process over time. Similar reports are familiar to providers like census reports that show you admission and discharge activity and case mix reports that show you the clinical conditions associated with your patients.


These reports are normally run in conjunction with other management activities where tasks are involved that take action based on this data or the data is used to adjust the distribution of resources. They become a part of the operational management plan for the healthcare organization.

In analytics, you build a pool of data without a specific purpose in mind. At this stage, the objective is to get clean, reliable and timely information from whatever sources are available. These can be the clinical systems under the control of the healthcare provider that can provide detailed information about encounters with their own patients or data that is available to the industry that is collected outside of these systems, both of these sources are essential to building a database that reflects the performance of the provider and an environment for comparing or benchmarking this data against the rest of the industry.

Once you have access to this data, in analytics, instead of developing a report to provide you with specific information, you examine the data itself for clues and relationships that will guide you through a process of exploring these discoveries. True analytics is more like science than accounting.

One method of data analysis is to develop a hypothesis and then use the data to prove or disprove the hypothesis. This is done by taking the hypothesis in question and converting it into an actionable query of the data. For example, let’s say that Medicare is introducing a new value-based payment model for the services your facility provides. Your question might be “How will this new payment model affect my Medicare revenue?” You might guess that your revenue might decrease, since this is the normal result of these changes, but it is not always the case.

Since this new payment model is projected to occur in the future, and the future has not happened yet, you are limited to predicting the future based on the past. Let’s say that the data at your disposal includes all of the claims you have submitted to Medicare and what they paid you. Let’s further assume that the services you billed in these claims in the past are a valid reflection of the services you will continue to provide in the future. Let’s also assume that the data in the claims includes everything you need to calculate the payment under the new payment model.

If these assumptions are true, you should be able to apply the new payment model to this historical claim data, compare what was actually paid to the predicted payments and provide a fairly accurate estimate of how the new payment model will affect your business in the future by measuring the difference in paid amounts from the past and the new predicted payments.

Another process popular in analytics is to simply explore the data without any pre-conceived issues or questions. In this approach, you are looking for relationships that may not be evident on the surface. For example, in February of 2016 I published a blog article on research done by CMS using claim data that included documentation of the dramatic increase of voluntary discharges by hospices from the months of July through October over the rest of the year. The researchers concluded that this increase was not related to an increase in the reasons for these voluntary discharges, since it is unlikely to be related to the months of the year, but to actions taken by hospices related to the upcoming hospice cap calculation. However, the data shows an increase for all hospices, not just those over the cap.

Read more – Seasonal Discharge Rates Article

One of the dangers of this approach to analytics is drawing accurate conclusions from the data. In this scenario, we can accurately measure the increase of voluntary discharges by the months of the year through claim data. Assuming the data was analyzed correctly, there can be no doubt that the increases occurred. The question is why?

We see this subject dealt with in climate change. There is little argument that the temperature of the planet is increasing. Although there appears to be a relationship and circumstantial evidence that this temperature increase is related to the increase of man-made carbon emissions, there is no way to establish an absolute connection without controversy.

I always like to present examples from baseball since this industry is one of the most progressive when it comes to the use of analytics. Every year, innovators figure out new ways to use existing data to increase the chances of producing or preventing runs and therefore, winning more games.

In the 1990s, the industry noticed a significant increase in home runs produced. This fact was clear and pronounced in the data collected from games played. Many theories abounded relating this increase to a particular cause. One popular theory at the time was that MLB wanted to see more home runs and scoring in general so they had begun to manufacture baseballs that were more tightly wound with string at the core so that they would come off the bat with a higher velocity, producing more home runs.

Now, we have more data on this phenomenon and we know that the increase in home runs had a direct correlation with the level of performance enhancing drugs(PEDs) used by players to increase their strength. This is further supported by a decrease in home runs after MLB began random testing and major penalties for PEDs in 2003.

Back to the voluntary discharge examples in hospice. What is the true reason for this increase? Some additional clues might lead to a different explanation or better support for the existing one. Is this trend true of all hospices or just some of them that influence the average for everyone? If the latter is true, what are some of the differences between the hospices that experience these increases and those that do not? Are there differences between the characteristics of patients discharged voluntarily earlier in the year, when the rates are low, and when they are discharged before the cap year end? Do they tend to have higher lengths of stay influencing their impact on the cap? The answers to these questions might prove to support the existing theory, disprove it, or lead us to a new one. In any case, the danger of analytics is coming to assumptions that either are not supported by the facts or seem to be supported, but further research is available that can either strengthen or weaken the conclusion. In analytics, it is important to keep digging, even when you think you know the answers.

In future articles, I intend to explore some of these issues. We have obtained claim data from CMS as part of a data use agreement. Part of this agreement is to provide public reporting from this data for the industry. This data includes all financial and clinical information from 100% of the claims processed by Medicare. Although the data is always about six months behind, it provides a clear and accurate representation of healthcare services and payments under the Medicare program. Claim data is recognized to be the best quality source of this data and tends to be more accurate than the same data collected through clinical systems specific to providers and their EMR vendors. Because this data is related to payments and is subject to review and audits, it tends to be subject to a higher degree of validation over EMR data which is mostly used internally and is not cross-checked outside of the provider organization. It is also formatted into a uniform data layout with codes that conform to mandatory definitions. Unlike claim data, EMR data does not conform to any required national standard.

Through this data, and the analytics process, I plan on presenting my own theories on these issues and present the numbers to back them up. I welcome any suggestions any of you may have on topics that might interest you that might be derived from this data source.

Contact Us:


By Kalon Mitchell – President, MEDTranDirect