Wednesday, October 26, 2016

Roller Derby Injury Survey Results

For those that were wondering, that's my own x-ray in the background!

This is part of my #1YearOfViz series! Check out the archive here:

This is the page to display the results of the ongoing Roller Derby Injury Survey. If you are an injured (particularly already recovered) skater I urge you to fill it out to make the data as accurate as possible. One of the nice features of the newest version of Tableau Public that I've been using to display my data is that it can now query Google Sheets nightly for updated data. So what you're seeing on this page is LIVE data (as of midnight Eastern Standard Time on the day of viewing).

Let me first address the big issue everyone has with the survey which is that only a singular injury can be done at a time. This was purposeful as different injuries can have different recovery times/rates/etc and I didn't want to muddy the data with a variety of null results from the approximately 70% or so who had NOT had a previous injury. If you have filled the survey out for one injury but have had others I urge you to PLEASE fill it out again with details about your other injuries! The data is largely parsed by injury type (like time to recovery etc) and to make this data as accurate as possible we need as many data-points as possible. I know it takes time to do but I hope you feel it's time well spent!

Below is the summary of responses thus-far. If you feel that your region (country/state) is not represented enough in the data please spread the survey around to get others to fill it out and help get the data as robust as possible. I'd also like to take this opportunity to thank not only the Gimp Crew forum but also Roller Derby Athletics for helping to get this survey out to as many people as possible in the derby community! Your help is amazeballs! =)

I'll admit going into this data with some preconceived notions I wanted to examine. The first among those was WHEN people get injured. The overarching feeling I've always thought has been the newer skaters are more prone to injury. This turns out to be the case, with almost 1/3 of all injuries happening within the first year of skating.

It seems (with the current dataset) that the longer a skater skates the less-likely an injury is to occur. I may (once there is enough data to warrant this) add in a filter for injury type as I would guess that skaters suffering knee-related injuries would be higher the longer a person has skated with things like ankle injuries still being a "bad fall" type injury that would happen more frequently to newer skaters.

UPDATED 10-26-2016: I realized I had forgotten to put in the following exported image showing the how the age ranges above skew about 5+ years over what you see in the derby community as a whole:

The other thing I was curious about was the frequency of types of injuries and where those tended to take place. As anyone who's been involved in the sport for any length of time can tell you ankle injuries are by far the most prevalent in the data. That said however I feel that there is a LARGE area of unreported injuries with head impact/concussion injuries that is not accurately represented in the data. From personal experience having a teammate that I worked with for years who had to retire due to multiple concussion injuries in a single year (5+) I feel this is probably a part of the sport most people don't see as a "serious" injury so people don't get added to the Gimp Crew forum after concussive injuries. For every ankle break I've seen I've seen at least 3-5 concussions of various levels. At the time of this writing however concussions represent under 10% of the data... just saying that's probably not exactly accurate. While doing a search for some images to represent concussive injuries I came across this AWESOME article about the practices of concussion testing in other pro sports and the policy changes over the last 20 years or so in mandatory testing, etc.

Probably the largest surprise that I found was something I'd guessed about but wasn't sure of the "level" of. One of the reasons that I collected height and weight was to facilitate calculation of BMI to see if BMI's considerably changed after injury. Personally speaking I lost a bit of weight after my injury (leg muscle mass) so I was curious how normal that was. It turns out that weight change is fairly rare with less than 50% (at the time of writing) experiencing any weight change at all. Weight change though is relatively (to height obviously) so I wanted to use the CDC guidelines on BMI health. I'll be the first one to admit that BMI is a ROUGH measurement, I work out, I lift heavy, and have what I would generally consider to be a muscular body (#humblebrag)... that said according to the CDC I'm labelled as "Overweight" based on my height/weight/BMI combo.

Here's the last assumption that people have when thinking about derby injuries... that due to people being injured most typically in their first year of skating most injured skaters aren't in the best shape to begin with. I'm NOT SAYING that ALL injured skaters are out of shape, I'm just putting the data out that (at the time of writing) ~40% of injured skaters have BMI's that would label them as obese. Even if you don't agree with the CDC's definitions, the fact of the matter is that it appears that skaters with some extra weight tend to have a higher rate of injury. Check the chart below for the current numbers on weight change and BMI:

Other common questions that people ask are if surgery sped up or delayed recovery rates. After taking a look at the data I can tell you that in almost every injury case where a surgery was reported that recovery time was longer. In many cases though, in the long term, they tended to feel more "normal" sooner. I don't think this has to do specifically with HAVING surgery but more so the people that medical professionals suggest have surgery are just simply more severely injured. For instance if you look at Compound vs Non-Compound breaks you'll find that in EVERY SINGLE INSTANCE compound breaks required surgery and took significantly longer to heal.

Fortunately for me my health insurance at the time of my accident was pretty amazing and my co-pay and everything for my surgery was seemingly very little. I was curious how many people opted for things like USARS/WFTDA insurance. As it turns out over ~70% have an additional insurance aside from (or instead of) their personal insurance. This number bumps up ~1-3% after an injury... so it doesn't really look like prior injuries are a driving factor in if people decide to get additional insurance. I can create a dashboard for this data but honestly it was just kinda boring. =P

As someone who always wanted to jam I really had hoped that I would be able to come back a jammer... but at the time of writing only about 1/3 of people who used to jam return to jamming post injury whereas 59% of blockers return to blocking. You can check the chart below and see which people stay in which positions etc.

Where do we go from here?! Well the first thing to mention is that this data can be cut/sorted a zillion ways from Sunday. Have a question that this data can answer? Ask! Hit me up at twitter @wjking0 or via the Roller Derby Gimp Crew (which is an AWESOME resource for ANY injured skater with tons of support!) or in the comments below this blog. Since this data is live there is plenty of ways to continue to slice and dice the data in a real-time fashion!

P.S. I have a few more charts I'll likely add on as future blog posts as I get them all polished up (with regards to things like injury/pain data and a few other things). I'll put a link down here when another post goes up about the data!

This is how I feel after looking at injury rates all week...

Wednesday, October 19, 2016

Car, Cycling and Pedestrian Collision Data - Part 1

This is part of my #1YearOfViz series! Check out the archive here:

I knew I would have to do this eventually, I finally came upon a data set so large that I have to split it into multiple posts. I originally was turned on to this data as part of some work I had done with the University of Kentucky Police Department, I had done some work on their crime log and one of the captains thought I would be interested in traffic collision data (Captain Matlock who's super rad and a data nerd himself!). I said oh you mean traffic accidents? He replied, there are no accidents. when he gave me a sample of their data I realized I could deduce several trends in it, particularly in regards to pedestrian and cycling accidents as they occurred on campus.

Being an avid cyclist myself I saw the potential for this data to really help and inform other cyclists and people working on the planning of the University of Kentucky's roads and pathways. One of my last days as an employee of the University of Kentucky was spent with the cycling committee briefing them on this data . They then informed me that this looks to have come from a larger data set from When I pulled up the site I was giddy with excitement at the fact that there are so many data fields and so much historical data, way more than I had originally been given by the UK police department.

Unfortunately, the yearly downloads from that state police website were not very functional. They were .DAT files but neither of the data definitions listed on the website allowed me to properly parse those 2 GB yearly files into anything usable. I then decided that I needed to just scrape Fayette County as a proof of concept, however even that had to be done in six months intervals which go back over approximately the last six years.

Let's get into the data!

First I would like to mention that all points on the dashboards listed below are clickable, so if you click a roadway name it will reshape all of the shown data to reflect that roadway name until you click off of it clearing that selection. The same goes for things like day of the week of collision or hour of collision, any of those will reshape all the other existing data on the charts below this goes for all four of the dashboards I have posted below.

This first dashboard highlights the locations of collisions highlighted by particular roadways with day of the week and hour of day frequency being shown at the bottom. The map to the right of the roadway names shows coloration by number of injured in particular accidents the concentration of redness at different locations can be a good indicator of where are the most injurious areas of a particular roadway. Take a look around at the data and remember you can use the controls in the upper left of the map to zoom in on a particular area of interest after you have set your filters were selected your items on the other charts.

 This second dashboard is just a comparison of the percentage change of pedestrian and bicycle collisions by month and year. Taken as a whole pedestrian collisions have risen slightly over the last six years while bicycle collisions have fallen slightly . The thing to remember is the percentage differences in these changes are less than 1% so not terribly significant. If we limit the collisions to the last three years we see those trends are reversed. In the last three years , pedestrian collisions have gone down 0.4% while bicycle collisions have gone up 0.2% . I didn't have any particularly significant dates to slide the slider to in order to examine a particular change in Lexington policy or anything. But I left the date slider on the right in case anyone wanted to check something out. 

This third dashboard is a slightly more simplified version of the first except that I wanted to look at injury rates particularly. You'll notice the red coloration is the percentage of people injured on that given day or time. Again, the date sliders are on the right as well as selections for the day and hour of collisions though those can be selected by clicking directly on the dots as well . Things like bicycle and pedestrian collisions as well as, in this visualization, fatalities are included. 

This last dashboard is just one purely to look at every aspect of timing of cycling collations including by year, month, day of week, and hour with both injury totals and total number of collisions also listed in the chart.

 I will be linking the second or third or possibly even fourth part of this viz down below as I complete them but they will all hopefully be part of my one year of viz challenge that I've made for myself. As always, if you have questions or concerns you can leave them in the comments below this blog or hit me up on Twitter at @wjking0.

Thursday, October 13, 2016

National Parks Tourism and Money Comparison

This is part of my #1YearOfViz series! Check out the archive here:

Originally found the subject of today's blog in a Reddit forum that I am an admin of called r/datasets/. I'm a pretty big fan of our national parks and have several faves including Monument, Yellowstone, etc. This year marks the 100th year of the United States National Park Service has been in existence so I thought this would be a great time to make a viz about our nations national parks!

Some of the things that I discovered in the data are as follows:

I started noticing a trend that over the last couple of years national parks have been visited more than any time in the last 20 or so years! I've been searching for a reason in this uptick of people visiting national parks have yet to discern one. One suggestion is that the "Every Kid in a Park" initiative is responsible for the growth over the last couple of years. This seems unlikely however because after further research the beginning of the Every Kid in a Park initiative was September 1, 2015. You'll see by the chart below the last two years in particular since 2013 has seen some of the heaviest growth in recent recorded history.

Unfortunately, not all of the data that I'm showing Nice charts was available from a singular data source. I had to scrape the national parks website located here, as well as the national parks statistical site located here. Unfortunately, I was also unable to easily join the data set as the names of the national parks on the website do not match the names of the national parks on the report that they issue on their statistical page. Additionally, the main national parks website does not have any year by year breakdown of monies coming into a state per park (just a state total).

That said, we can still do some calculations with the amount of money that has come in totally and the number of parts located in the state to get a rough value of return over the last years 20 years per park. As you can imagine, none of this requires mind blowing mathematics or calculations. After I began examining data I noticed a trend in which coastal states tended to have a higher return value per national park than non-coastal states. I decided to do a grouping to see if that assumption was correct and it turns out, that it is! Check out the Story below and click through the stages I described above to see for yourself!

This makes sense if you think about it, most people (that I know anyway) don't vacation by going inland but a lot of people who are in land-locked states I believe tend to go towards the coast for vacation purposes. If you're curious about how the NPS calculates the amount of money coming in feel free to check out their write-up on these numbers here (PDF). What this appears to be on the outside is that coastal based national Park tend to pull about half million dollars more a year in revenue then non-coastal national parks. We can figure this out by assuming that the total on the national parks website was from the last 20 years of data they have collected.

Interestingly enough, the state with the highest return her national Park is actually North Carolina!

As always, if you have any questions or concerns you can leave a comment below or hit me up on Twitter at wjking0.
Enjoy those waves gang!

Wednesday, October 5, 2016

Where (NOT) to Eat in Lexington, KY - UPDATED LIVE DATA!

Lexington, KY Skyline
This is part of my #1YearOfViz series! Check out the archive here:

I posted the original version of this back several years ago as one of my very first geo-located dataviz that I'd created. With the new changes in Tableau Public I have finally found a way to get the live-updated data from the Lexington Health Department. If you'd like to see the raw Google Sheet that I'm pulling this data from I'll make it available here.

I didn't do too much as far as changing this data from it's original form except making the data a live-updating format and putting some additional filters and analysis on top of what I'd done previously.

First off I'd like to announce that I've developed what I think is a good mobile version which you can pull up on your phone if you'd like to bookmark to be able to quickly/easily check food scores/violations for a place. Click on the image below to be linked out directly to the dash!

If you'd like to see the full dash and analysis list click below to open up the rest of the blog post!