I knew I would have to do this eventually, I finally came upon a data set so large that I have to split it into multiple posts. I originally was turned on to this data as part of some work I had done with the University of Kentucky Police Department, I had done some work on their crime log and one of the captains thought I would be interested in traffic collision data (Captain Matlock who's super rad and a data nerd himself!). I said oh you mean traffic accidents? He replied, there are no accidents. when he gave me a sample of their data I realized I could deduce several trends in it, particularly in regards to pedestrian and cycling accidents as they occurred on campus.
Being an avid cyclist myself I saw the potential for this data to really help and inform other cyclists and people working on the planning of the University of Kentucky's roads and pathways. One of my last days as an employee of the University of Kentucky was spent with the cycling committee briefing them on this data . They then informed me that this looks to have come from a larger data set from crashinformationKY.org. When I pulled up the site I was giddy with excitement at the fact that there are so many data fields and so much historical data, way more than I had originally been given by the UK police department.
Unfortunately, the yearly downloads from that state police website were not very functional. They were .DAT files but neither of the data definitions listed on the website allowed me to properly parse those 2 GB yearly files into anything usable. I then decided that I needed to just scrape Fayette County as a proof of concept, however even that had to be done in six months intervals which go back over approximately the last six years.
Let's get into the data!
First I would like to mention that all points on the dashboards listed below are clickable, so if you click a roadway name it will reshape all of the shown data to reflect that roadway name until you click off of it clearing that selection. The same goes for things like day of the week of collision or hour of collision, any of those will reshape all the other existing data on the charts below this goes for all four of the dashboards I have posted below.
This first dashboard highlights the locations of collisions highlighted by particular roadways with day of the week and hour of day frequency being shown at the bottom. The map to the right of the roadway names shows coloration by number of injured in particular accidents the concentration of redness at different locations can be a good indicator of where are the most injurious areas of a particular roadway. Take a look around at the data and remember you can use the controls in the upper left of the map to zoom in on a particular area of interest after you have set your filters were selected your items on the other charts.
This second dashboard is just a comparison of the percentage change of pedestrian and bicycle collisions by month and year. Taken as a whole pedestrian collisions have risen slightly over the last six years while bicycle collisions have fallen slightly . The thing to remember is the percentage differences in these changes are less than 1% so not terribly significant. If we limit the collisions to the last three years we see those trends are reversed. In the last three years , pedestrian collisions have gone down 0.4% while bicycle collisions have gone up 0.2% . I didn't have any particularly significant dates to slide the slider to in order to examine a particular change in Lexington policy or anything. But I left the date slider on the right in case anyone wanted to check something out.
This third dashboard is a slightly more simplified version of the first except that I wanted to look at injury rates particularly. You'll notice the red coloration is the percentage of people injured on that given day or time. Again, the date sliders are on the right as well as selections for the day and hour of collisions though those can be selected by clicking directly on the dots as well . Things like bicycle and pedestrian collisions as well as, in this visualization, fatalities are included.
This last dashboard is just one purely to look at every aspect of timing of cycling collations including by year, month, day of week, and hour with both injury totals and total number of collisions also listed in the chart.
I will be linking the second or third or possibly even fourth part of this viz down below as I complete them but they will all hopefully be part of my one year of viz challenge that I've made for myself. As always, if you have questions or concerns you can leave them in the comments below this blog or hit me up on Twitter at @wjking0.