NHANES Part 1: Supplements
The National Health and Nutrition Examination Survey (NHANES) is a yearly survey performed by the Center for Disease Control. It is a comprehensive survey of more than 10,000 participants that examines health, nutrition, demographics, socioeconomic statuses, and much more. As such, it is an absolutely massive database, with 6 different tables and hundreds of columns that need to be sorted through to get answers to specific questions.
This is Part 1 of my NHANES research, and it deals with Supplements. Specifically, I was wondering the following:
Do people who take supplements receive healthcare less often than people who don’t?
To do this, I had to filter through several of the tables and narrow down on what I thought would be an accurate representation of the differences between people who take supplements and people who don’t. The full Python analysis can be found here.
Cleaning up the Data
Because the database is so large, to answer the supplement question, I only brought in demographics, the health questionnaire, and two of the dietary supplements table, specifically the supplement table for “Past 30 Days” and “Total Supplements”.
The database columns are also coded with identifier sequences that aren’t readily recognizable, so I had to use the NHANES documentation from the CDC’s website to identify which columns would be particularly useful. On top of that, I had to determine which survey answers were necessary based upon the data collection methodology and the parameters of the particular questions asked of the participants.
With all that in mind, let’s begin with data cleanup:
As you can see, the columns have unique identifiers when you bring them in, and the CDC documentation is necessary
Renaming the columns used greatly helps with identifying parameters for data analysis later
I needed to remove any rows that were Missing (99) or the participant refused to answer (77). I also needed to change the “2” that was used to designate no supplements taken to 0 for correlation test purposes.
See? Much better.
Correlations
After getting everything cleaned up, I decided to run a correlation heatmap on the dataset to see which variables could be related in some way (i.e. does higher supplement intake correlate with lower healthcare needs).
Well, it doesn’t look like anything jumps out as related. But, just to be sure, let’s look at those numbers for what we want to find out.
So, it doesn’t look like there’s anything that even remotely correlated to how many supplements someone takes. Now, let’s get to the question we were trying to answer: does taking supplements decrease how often you receive healthcare?
To do this, we’re going to need to transform the dataset even more, specifically by getting the averages of both groups and plotting them side by side.
Answering the Question
Time to get our averages. First, I created a boxplot and hex bin plot that will show me where most of the data lies. A scatter plot would do the trick, except that all the datapoints fall on integers (no decimals) and overlap each other because there’s no variation.
This boxplot shows us that people received healthcare anywhere between 0 and 15 times a year, with more than 16 times a year being an outlier (7 = 13-15 times a year and 8 = more than 16 according to the NHANES coding and documentation).
So, we can see that there’s a lot of scatter throughout the spectrum with most of the data being situated around people receiving healthcare 2-5 times a year (2 = 2-3 times a year, 3=4-5 times a year). Let’s group it up by the number of supplements taken and see what those numbers look like.
According to this bar graph, the number of times someone receives healthcare doesn’t seem to correlate well with how many supplements they take, with a large spike appearing around 18 supplements taken, possibly because of an outlier. Next, we’ll need to see how the numbers look for people either taking or not taking supplements. Calculating the averages for the times that healthcare was received with whether or not someone took supplements gives us a stark yes/no, A/B, black/white comparison of whether supplements directly affect the amount of healthcare received.
So, it would seem that taking supplements doesn’t mean a person receives less healthcare in a year. In fact, they receive more! The average times that someone who doesn’t take any supplements received healthcare was 2.61 on the NHANES scale, or 2 to 3 or 4 times a year. However, the people that did take supplements averaged 3.94 on the scale, or 4 to 5 or 6 times a year. That doesn’t seem like a very large difference, but take into account that most “healthy” individuals only receive healthcare during routine checkups, so that 2 or 3 extra times a year would seem like a lot more for one of those individuals.
However…
There were definitely some outliers in the data that could have skewed the statistics slightly. Therefore, let’s try all of those comparisons again, but with a much more robust dataset.
Robust Measurements
In order to get a more robust dataset and eliminate any potential outliers, I removed the top and bottom 10% from each category of people who take supplements. If I were to remove the 10th percentile from the dataset as a whole, the data for the people who took no supplements whatsoever would be completely removed in the process. So, let’s look at the trimming process and then run those tests again.
Here we can see that there are only 221 entries for the “No Supplements” group, meaning the trimming process would have completely eliminated the group from the dataset. The way I performed it, the individual groups would be trimmed independently and would allow outliers to be removed while simultaneously making the averages more robust within the group. Let’s take a look at how this process affected our data:
Before Trimming
After Trimming
The measurements really don’t seem that different, but they weren’t influenced by outliers, so we know that the results are more accurate than before. Let’s take a quick look at the specific numbers for the averages by number of supplements taken and then the A/B comparison of no supplements vs. supplements.
So, it would seem that removing outliers brought the averages closer together! According to our data and calculations, there is only a difference of 0.5 when it comes to the number of times someone received healthcare, which would be roughly equivalent to 1 time a year or less, showing that taking supplements doesn’t lower the amount of healthcare that’s regularly required and doesn’t prevent you from getting sick.
ANSWER TO THE QUESTION:
No, people who take supplements do not receive healthcare less often than people who don’t (in fact, it’s slightly more).
There are some caveats to this study though:
This dataset is made up of random people from throughout the United States. When people are asked how many times they’ve received healthcare, it’s an all or nothing scenario. So, someone with a chronic illness that requires a lot of medical care is given the same weight as someone who’s considered “healthy”. While it does make sense with this many people (10,000 or more) and the outliers would be negated by the rest of the group population, we still have to take it into slight consideration.
Secondly, children and people who refused to answer or didn’t answer were removed from the dataset. Whether children would affect this study or not is not clear, and further research would need to be done. The people who refused to or just didn’t answer didn’t add anything to the dataset for what we were looking for, so they were simply removed.