This is going to be a SUPER quick post as this is a SUPER small (but useful!) viz that I whipped up. I literally have about 6-7 vizzes that I could put out but I'm working on an actual big story at the moment that I'm hoping to get picked up by some news organizations so I want to really give it the TLC it deserves. So this week instead of a deep dive you get a shallow wade into a more useful than data-filled viz!
I was poking around looking for some things to work with as far as GeoJSON data (which the new Tableau 10.2 supports!) and I came across the "New Mapping" group out of UK. I poked around a little on their github page and found the GeoJSON for all the Lexington Traffic Cameras! I thought, "Wow, this is neat!" and started building my viz around it... then I thought, "STOP!"...
Where did this data come from? Was it being used anywhere else!? Then I found it... Lexington Fayette Urban County Government had already built a site for this!
Then I realized their site doesn't reformat to mobile and while it does provide live video streams (for about 5 seconds before auto-closing) it required a click on each camera to show the data. This seemed like an unnecessary step so I made the dash below so that if you hover over a point the still image camera data will show immediately (and will refresh upon scrolling over another and back over again). Additionally I created a mobile-specific version formatted for phones! It isn't much but sometimes just improving a UI can mean a huge difference in the utilization of a tool!
The question I always ask myself when re-doing someone else's work
Click the image below for the mobile version or continue to scroll down for the desktop interface:
As always if you have any questions hit me up in the comments below or on Twitter @wjking0.
Me when I hit Import.io's scraping limit and get banned (again)
I wanted to start this post talking about the problems I've recently encountered was using Import.io. Multiple times now I have run into the their scraping limit for "free" users and have been temporarily banned from using their services. one time accidentally ran cloud-based scrape as a test but the scrape continued after I closed it so I ended up running a query of a thousand instead of the 20 to 30 I wanted to run. Then this month I've ran a scrape of over 10,000 (their new limit on local clients scraping) in a given month, I was originally told the Legacy client would be allowed to have infinite scraping as long as it was done locally (via their Facebook users group). This was apparently not the case.
I started out looking for a new scraping client checking out several pages for clients to use. But almost all web scraping services required monthly subscription fees, or have no local clients to use for cheaper or free rate. That's when/how I discovered Octoparse!
/\ Me hugging Octoparse!
It is kind of magic! It took me over a week to really learn how to use it, but this is been extraordinarily worth it! The main school and Octoparse are much more tools in imports. Octoparse allows unlimited queries from the local client and you only have to pay when you're using their cloud-based services. Which is the way I suggested import price their systems when support contacted me about my ban from their services.
Hey Octoparse, I just met you,
and this is crazy,
comment my blog,
and sponsor me maybe!?
I only wish I had known about Octoparse earlier so that I could have stayed myself around 12 hours worth of work when I did the West Virginia State salary scrape a while back! what you will be looking at in this visualization is the first scrape that I have completed using Octoparse. The data came out incredibly clean and simple, my only complaint in the export of Octoparse is that CSV export to be not directly readable by excel when opened. It's really a minor complaint next to the awesome flexibility of the product though! Let's get into the data!
I've settled in on my designs for salary-based dashboards with only a single year of data. I decided not to fix it since it's not broke and replicated the same types of dashboards I've done in the UK Salary Viz here and a little bit of the work I did in the WV State Salary Viz mentioned previously. The "Dots Dash" as I call it is really just a fun visual representation of all the people/years/money that goes into something like public education in one single county.
This next one is just Salary Over Time and Number of People Over Time so basically how many people are making approximately how much, how quickly do you see raises given, etc. If you'll notice at the side this viz starts out with a filter of "Instructor" on it to show specifically teachers salaries over time as all teachers (I think) have 'instructor' as part of their titles. You can set this wildcard filter to whatever you'd like (ex. 'bus driver') to see how your or a friend's particular job futures will look over time.
The next story dashboard I really wanted to look at how locations/grade-types pay different teachers. Do art teachers make more at Liberty than at Brian Station? How about music teachers at Elementary schools vs High Schools? Step through the story with the top tabs and you can filter on the right and compare median salaries by location. I'd like to ultimately turn this into part of what I'll use for a future dashboard I'm going to work on that will compare test scores to teacher salaries for particular places... but this will have to do for this week! =D The last little section was just because I was curious how how much principals make in general and I was surprised (and glad) to see they make good money.
This last dash is just the "big list" that a lot of people like to see... if you CLICK on a location or a job title the data to the right (medians/averages of salaries and years worked) will reformat to that highlighted selection. If you click on a job it will not be the medians/averages for that particular school (as each school doesn't have enough non-teaching staff to make that functional) so it reformats to show EVERYONE who shares that job title. You can also filter this list by name if you're looking for someone in particular's salary.
Finally, as the son of a public school teacher let me say to all of you out there doing the work every day...
As always hit me up on twitter @wjking0 or in the comments below for questions/concerns!
I knew I would have to do this eventually, I finally came upon a data set so large that I have to split it into multiple posts. I originally was turned on to this data as part of some work I had done with the University of Kentucky Police Department, I had done some work on their crime log and one of the captains thought I would be interested in traffic collision data (Captain Matlock who's super rad and a data nerd himself!). I said oh you mean traffic accidents? He replied, there are no accidents. when he gave me a sample of their data I realized I could deduce several trends in it, particularly in regards to pedestrian and cycling accidents as they occurred on campus.
Being an avid cyclist myself I saw the potential for this data to really help and inform other cyclists and people working on the planning of the University of Kentucky's roads and pathways. One of my last days as an employee of the University of Kentucky was spent with the cycling committee briefing them on this data . They then informed me that this looks to have come from a larger data set from crashinformationKY.org. When I pulled up the site I was giddy with excitement at the fact that there are so many data fields and so much historical data, way more than I had originally been given by the UK police department.
Unfortunately, the yearly downloads from that state police website were not very functional. They were .DAT files but neither of the data definitions listed on the website allowed me to properly parse those 2 GB yearly files into anything usable. I then decided that I needed to just scrape Fayette County as a proof of concept, however even that had to be done in six months intervals which go back over approximately the last six years.
Let's get into the data!
First I would like to mention that all points on the dashboards listed below are clickable, so if you click a roadway name it will reshape all of the shown data to reflect that roadway name until you click off of it clearing that selection. The same goes for things like day of the week of collision or hour of collision, any of those will reshape all the other existing data on the charts below this goes for all four of the dashboards I have posted below.
This first dashboard highlights the locations of collisions highlighted by particular roadways with day of the week and hour of day frequency being shown at the bottom. The map to the right of the roadway names shows coloration by number of injured in particular accidents the concentration of redness at different locations can be a good indicator of where are the most injurious areas of a particular roadway. Take a look around at the data and remember you can use the controls in the upper left of the map to zoom in on a particular area of interest after you have set your filters were selected your items on the other charts.
This second dashboard is just a comparison of the percentage change of pedestrian and bicycle collisions by month and year. Taken as a whole pedestrian collisions have risen slightly over the last six years while bicycle collisions have fallen slightly . The thing to remember is the percentage differences in these changes are less than 1% so not terribly significant. If we limit the collisions to the last three years we see those trends are reversed. In the last three years , pedestrian collisions have gone down 0.4% while bicycle collisions have gone up 0.2% . I didn't have any particularly significant dates to slide the slider to in order to examine a particular change in Lexington policy or anything. But I left the date slider on the right in case anyone wanted to check something out.
This third dashboard is a slightly more simplified version of the first except that I wanted to look at injury rates particularly. You'll notice the red coloration is the percentage of people injured on that given day or time. Again, the date sliders are on the right as well as selections for the day and hour of collisions though those can be selected by clicking directly on the dots as well . Things like bicycle and pedestrian collisions as well as, in this visualization, fatalities are included.
This last dashboard is just one purely to look at every aspect of timing of cycling collations including by year, month, day of week, and hour with both injury totals and total number of collisions also listed in the chart.
I will be linking the second or third or possibly even fourth part of this viz down below as I complete them but they will all hopefully be part of my one year of viz challenge that I've made for myself. As always, if you have questions or concerns you can leave them in the comments below this blog or hit me up on Twitter at @wjking0.
I posted the original version of this back several years ago as one of my very first geo-located dataviz that I'd created. With the new changes in Tableau Public I have finally found a way to get the live-updated data from the Lexington Health Department. If you'd like to see the raw Google Sheet that I'm pulling this data from I'll make it available here.
I didn't do too much as far as changing this data from it's original form except making the data a live-updating format and putting some additional filters and analysis on top of what I'd done previously.
First off I'd like to announce that I've developed what I think is a good mobile version which you can pull up on your phone if you'd like to bookmark to be able to quickly/easily check food scores/violations for a place. Click on the image below to be linked out directly to the dash!
If you'd like to see the full dash and analysis list click below to open up the rest of the blog post!
As a non-native Kentuckian I wasn't sure what WIC usage looked like in this state. My assumption was generally that WIC was something you'd see more of in large developed cities. It turns out I was wrong.
Total Population Numbers came from the 2000 and 2010 census.
For the calculations I applied the numbers to total calculation and not to subgroups for women or children under 18 so usage percentages for those may be higher but I don't have the WIC info regarding numbers of mothers vs children utilizing services so I didn't want to further muddy the numbers.
Also for these calculations I applied the 2000 census amounts to the 2000 WIC numbers and then for the 2006-2013 WIC numbers I used the closer 2010 census numbers as populations estimates for most regions were fairly stable over that time period.
As you can see, the large urban areas of Lexington and Louisville (Fayette and Jefferson Counties respectively) have fairly low usages of WIC (<2%) while areas particularly in eastern KY you can see have fairly high/consistent usage. I haven't done cost analysis yet but once the USDA fixes their website and I can get some more in-depth numbers I should have some more data to play with.
As usual hit me up at @wjking0 if you have any questions or concerns or just want to talk about public data!
EDIT: I've added the second dashboard/story as there was a request to look at the comparison of WIC % to Median Household Income so I crunched that out real quick:
I'll fully admit that this Viz is a rush-job. I wanted to go ahead and throw it together quickly and I'll update it later as I crunch more data into the map. The last Gallery Hop of the season is tonight in Lex so here's a little map to see what's going on (that I may even get a chance to update once or twice before tonight). I'd also like to include non-official locations so if you know of any please hit me up and let me know about them!
While I'm not making #DataViz out of #WeirdData I work as a Sys Admin for the University of Kentucky. One of the most neat and data-rich departments I get to support is our student-run radio station WRFL 88.1 FM. There are over 300,000 MP3 files stored on their server from their over 30,000 CDs and well over 30,000 vinyl records. At this point a good chunk of the CDs have been ripped and provides us with a rich dataset. It allows us to ask all sorts of neat questions of a unique set of data:
Want to know the average length of a Rock song? 3 minutes, 23 seconds
How many songs on the server have the word "Kentucky" in the title? 57
How many songs does Johnny Cash have on the server? 916
All that being said, I know that this data (because radio station employees can upload their own data to it) is not the most "clean" set of data. Do I believe there was a song from 1675 on the server? Probably not. Eventually also I'll clean up the "Genre" category a lot more over time. I expect I'll be updating this particular dataset somewhere in the 4-6 month range.
ALSO! I was going to throw in a logo and decided instead to throw a live-stream player built into the DataViz which refreshes when the track playing changes allowing people to search out that particular artist/album to see what else is there.
Let me know what you think and suggestions for how to improve the viz at @wjking0.
I got this data from the Lexington, KY Health Department and while I hoped to have this constantly update-able I don't know that is going to be the case. I had to do quite a bit of data-teasing before I could actually do some work (particularly with location data). Additionally I had to create a file that contained all the Health Code violations and definitions so I knew which were "Critical" violations and what each code means. So this may be a one-off data viz.
I wanted to look originally if zip-code (and thusly socio-economic status) of an area had anything to do with food quality but I quickly realized another trend. That "Marts" and grocery stores tended to fall towards the end of the spectrum. Also interestingly every "school" is listed as well... use the search function to search for schools or other clusters of dining places to see if you notice any trends and you can shoot me a message @wjking0!
Search or click on the map and drag to select multiple locations and scroll down for more info about your selection!
After attending #Data14 the Tableau Conference it came to my attention after a session from Jewel and Crew that I should start sharing out all my #WeirdData that I enjoy viz'ing? I hemmed and hawwed about what kind of viz to do so I figured I would start with one from Kentucky (my current home). Here in Kentucky we love three things above all else... Basketball, Bourbon, and Horse Racing.
The Kentucky Derby is one of the oldest continually running horse races in the world. The data set there is pretty expansive but I was amazed at how hard it was to find. The data I ended up using I scraped from a couple Wiki's and I have some other data that I may enhance this with later (such as Purse collected). For now though enjoy playing with my first ever blog post and my first ever Tableau Public Viz. Comments/suggestions welcome! @wjking0