Thursday, December 29, 2016

Fayette Co. (Kentucky) Public Schools (FCPS) Salaries 2015-2016


This is part of my #1YearOfViz series! Check out the archive here: http://bourbonandbrains.blogspot.com/p/one-year-of-dataviz.html

Me when I hit Import.io's scraping limit and get banned (again)
I wanted to start this post talking about the problems I've recently encountered was using Import.io. Multiple times now I have run into the their scraping limit for "free" users and have been temporarily banned from using their services. one time accidentally ran cloud-based scrape as a test but the scrape continued after I closed it so I ended up running a query of a thousand instead of the 20 to 30 I wanted to run. Then this month I've ran a scrape of over 10,000 (their new limit on local clients scraping) in a given month, I was originally told the Legacy client would be allowed to have infinite scraping as long as it was done locally (via their Facebook users group). This was apparently not the case.


I started out looking for a new scraping client checking out several pages for clients to use. But almost all web scraping services required monthly subscription fees, or have no local clients to use for cheaper or free rate. That's when/how I discovered Octoparse!
/\ Me hugging Octoparse!

It is kind of magic! It took me over a week to really learn how to use it, but this is been extraordinarily worth it! The main school and Octoparse are much more tools in imports. Octoparse allows unlimited queries from the local client and you only have to pay when you're using their cloud-based services. Which is the way I suggested import price their systems when support contacted me about my ban from their services.

Hey Octoparse, I just met you,
and this is crazy,
comment my blog,
and sponsor me maybe!?
I only wish I had known about Octoparse earlier so that I could have stayed myself around 12 hours worth of work when I did the West Virginia State salary scrape a while back! what you will be looking at in this visualization is the first scrape that I have completed using Octoparse. The data came out incredibly clean and simple, my only complaint in the export of Octoparse is that CSV export to be not directly readable by excel when opened. It's really a minor complaint next to the awesome flexibility of the product though! Let's get into the data!


I've settled in on my designs for salary-based dashboards with only a single year of data. I decided not to fix it since it's not broke and replicated the same types of dashboards I've done in the UK Salary Viz here and a little bit of the work I did in the WV State Salary Viz mentioned previously. The "Dots Dash" as I call it is really just a fun visual representation of all the people/years/money that goes into something like public education in one single county.






This next one is just Salary Over Time and Number of People Over Time so basically how many people are making approximately how much, how quickly do you see raises given, etc. If you'll notice at the side this viz starts out with a filter of "Instructor" on it to show specifically teachers salaries over time as all teachers (I think) have 'instructor' as part of their titles. You can set this wildcard filter to whatever you'd like (ex. 'bus driver') to see how your or a friend's particular job futures will look over time.



The next story dashboard I really wanted to look at how locations/grade-types pay different teachers. Do art teachers make more at Liberty than at Brian Station? How about music teachers at Elementary schools vs High Schools? Step through the story with the top tabs and you can filter on the right and compare median salaries by location. I'd like to ultimately turn this into part of what I'll use for a future dashboard I'm going to work on that will compare test scores to teacher salaries for particular places... but this will have to do for this week! =D The last little section was just because I was curious how how much principals make in general and I was surprised (and glad) to see they make good money.



This last dash is just the "big list" that a lot of people like to see... if you CLICK on a location or a job title the data to the right (medians/averages of salaries and years worked) will reformat to that highlighted selection. If you click on a job it will not be the medians/averages for that particular school (as each school doesn't have enough non-teaching staff to make that functional) so it reformats to show EVERYONE who shares that job title. You can also filter this list by name if you're looking for someone in particular's salary.




Finally, as the son of a public school teacher let me say to all of you out there doing the work every day...

As always hit me up on twitter @wjking0 or in the comments below for questions/concerns!

Friday, December 23, 2016

A Year on Google's Project Fi


This is part of my #1YearOfViz series! Check out the archive here: http://bourbonandbrains.blogspot.com/p/one-year-of-dataviz.html
Most of you that know me know that I drink deep of the Google Kool-Aid... I've been a nexus 10 owner for years, I've beta tested apps from the Googs... you name it. That's why, about a year ago I was thinking about switching my cell phone service when I realized that they had a no-interest payment plan on their latest phone! I figured I would give the service a shot and see if it was everything it was cracked up to be. This is not as much a DataViz post as it is a Quantified Self post about what I learned while changing cell phone providers.

Also I just SUPER wanted a new phone, again if you've known me for a few years I carried around a Samsung Galaxy S3 with a screen that was more often cracked that normal (thanks alcohol!). Anyway the Nexus 6P was about the sexiest phone I've ever laid eyes on and I figured if Google held up their promises of using multiple networks to boost speeds it might be pretty amazing. I ordered my phone and was going to wait until Jan 1st to turn it on... it arrived and I had it in my hands for a cool 2-3 days (using wifi only) when my Galaxy S3 gave up the ghost and had a major problem with it's motherboard. To this day I think it was just jealous of the new phone. =D

What I hadn't really thought too much on was exactly WHEN and HOW I used my cell service. I kept thinking "I shouldn't really use much 'real' data because I'm always at places with wifi"...like my apt, my office, etc. This is where I was WRONG. After getting the phone that first day I was really wanting to run speed tests all the time and see exactly what this combined network signal would mean as far as speeds on the phone... but to test speeds do you know what you need? Large data files to transfer. I burned through almost 1/2 a gig in a few hours... I'd only allotted myself 2Gb a month (though it's not a problem if you use more, it just adds to your bill). You see I was coming from an UNLIMITED Sprint plan that I'd had forever and it was pretty rad. Anyway... I've logged my wifi connections via IFTTT for years so I figured I would give it a solid year to look at the differences. Let's get into the data:



Let's just take stock of the positives and negatives:
Super fast = Super Pricey!

  • Positives
    • It's AH-MAZING-LY fast! (see screenshot to the right!)
    • Reception is better in most previously "dead" zones
    • The build-quality of the Nexus/Pixel line of phones is impeccable
    • Initial cost of entry is very low ($20/month)
    • Integration with Google Services (like Google Voice/Hangouts) is GREAT
  • Cons
    • Actual phone call quality (particularly on Wifi) is kinda janky
    • $10/Gig of data is TOO DAMNED HIGH
      • Ex. I spent $5 in a few hours just running speed tests around town the first day or so.
      • I could burn through 1/2 Gb of data A DAY walking to work watching YouTube which, if I continued doing, would have cost me approximate $100/month in data
    • Really paying per gig is almost impossibly hard when you're used to unlimited data
    • Have I mentioned that fast network speeds really only matter when you feel that you're not paying for every Mb that flys to your phone at Mach 6!?


How I feel after trashing a Google Service
What does all this mean? Well... have I had good network speeds? Yes. Has my call quality been good? MOSTLY (drops sometimes, particularly in Wifi calling). Have I had to SUBSTANTIALLY alter the way I think about my phone being online? HELLS to the YES. That to me is the big flaw in Project Fi... The fast access just means that ultimately you're going to pay them more because you're going to pull down larger data and more HD video, etc. If they said something like "OK, all Google-related services are going to be FREE to access..." I could subsist on YouTube and Play movies/music etc while walking around town. Now I see the draw to places like T-Mobile who are bundling things like Netflix and Hulu in as "Unlimited" as far as data usage goes. Don't even get me started on things like image-heavy Instagram and other services that are no longer text based but image/video based only... ugh.

Bottom Line (literally)... Can I recommend Project Fi as a service to most people? Yes. Only if you're not someone who likes to constantly have your phone out. If you're a super nerd like myself and live on the Interwebs... you're going to hate Fi.

As always hit me up on twitter @wjking0 with any comments or questions!

Thursday, December 15, 2016

Kentucky State Childcare Map

Yes this is actually my niece! =D
This is part of my #1YearOfViz series! Check out the archive here: http://bourbonandbrains.blogspot.com/p/one-year-of-dataviz.html

Me as a babysitter... also an amazing movie.

I've been pretty busy recently... went to California, hung out in the desert, Los Angeles and everywhere inbetween!

This is a little something I've worked on previously. It isn't much but it's just to get the data out there!

You can find information on what the star ratings mean (4 being the max btw!) here: http://chfs.ky.gov/dcbs/dcc/stars/starsproviderinfo.htm

The data I scraped was located in their search tool located here: https://prdweb.chfs.ky.gov/KICCSPublic/ProviderSearchPublic.aspx

Let's hop right into the map!



As always hit me up on Twitter @wjking0 with any questions!

So true Millhouse, so true.

Thursday, November 24, 2016

West Virginia State Salaries 2007-2015

This is part of my #1YearOfViz series! Check out the archive here: http://bourbonandbrains.blogspot.com/p/one-year-of-dataviz.html



After seeing some work that I did with the University of Kentucky Salaries Viz my mom (who is a teacher at a local college) commented that she thought I should do the same thing with her college. I started looking and found that, unlike Kentucky, the salary records for ALL WV state employees were available all the way back until 2007!!! Yay historical data!

Me when I looked at the site and saw how much historical data there was!
Unfortunately unlike the other salary data I normally have I didn't have access to job titles so there's no way to really know if someone changed positions or anything... it's just name, department, and total compensation for that person per year. Of course with that, particularly given the amount of time... you can do neat things like figure out raise percentages over multiple years! Unfortunately the way the page is laid out that I extracted the data from you cannot look at an individual's salaries over time... so I fixed that with the viz! Below you can type in a name, or a department, and the viz will filter to show that person's salary/raises over time.

Additionally you can click on a particular department or name to have the data re-form to show just that particular set of data. Ie. You can click on the Division of Corrections, then click on Adkins, Lisa to reform the data specifically to show that user. Anyway I'm going to work on some other ways to present this data but in the meantime play around with the dashboard here:



Ultimately you have to remember that, even though it's Thanksgiving... you can't eat money. No matter what Ralph Wiggum tells you:

As always thanks for reading and if you enjoyed this visualization (viz) please share it out on your social networks. If you'd like access to the raw data that I scraped (by hand) from their site then you can download the raw data by clicking here.

Sunday, November 20, 2016

The Inherent Racism of Election Years 2000-2015

This is part of my #1YearOfViz series! Check out the archive here: http://bourbonandbrains.blogspot.com/p/one-year-of-dataviz.html

Better late than never right!? That's my thoughts on this week's post! I had a lot of life-related things happening recently and I haven't had the time to focus on data collection, ETL, etc as I'd hoped. Part of that was some personal things in my life and then I decided in the middle of the week to come to my hometown of Charleston, WV to spend some time with my dad post hip-replacement surgery.
Telling my dad not to do anything is IMPOSSIBLE.

I have something on-deck for next week that should be pretty big if I can find the time to get it done so that'll hopefully make up for the lack-luster few weeks of Viz!

So while looking around on my laptop I realized that I had WAY fewer datasets on here than I anticipated. I've been poking around a little more on Data.World lately and I saw a crosspost between there and /r/datasets about FBI statistics on hate crimes from 2000-2015 and thought with the given political climate this might make for a good viz. Sadly the 2016 data isn't in yet... it's probably gonna be a mess.

If you'd like to read up on what constitutes a "Hate Crime" according to the FBI they have a really great site located here.

You can see I just did the one Tableau Story for this viz as I think it's pretty logical to "step" through and isn't terribly interactive (sorry!). Check it out below:



I can't really think of too many things in the country that occur pretty cyclically every 4 years that could contribute to these pretty significant increases in trends. Of course the largest hate-crime related trend since it has been tracked starting in 2000 is the change in Anti-Islamic hate crimes that happened after 2001 (likely a result of 9/11/2001).

As always hit me up on twitter @wjking0 or via the comments below to talk about dataviz!

Friday, November 11, 2016

Video Game Music Viz - Part 1

This is part of my #1YearOfViz series! Check out the archive here: http://bourbonandbrains.blogspot.com/p/one-year-of-dataviz.html

First let me apologize... for those readers not in the United States this was election week so everything's been a little.... let's use the word hectic around these parts. Secondly, I'm sick (see photo below) and was hoping Thursday/Friday I could really crunch though some data. That said... I didn't START crunching through the data until about 1AM Friday morning (you should be reading this on Friday hopefully).
Me sick in my comfy hoodie vizzing at 3AM.
I know what you're thinking....
Gotham... great show AMIRIGHT!?
The upshot of all this is to say that, due to various reasons, today's viz is going to be a little... thin. I still wanted to scrape and ETL some data for my readers but it's not nearly the level I'd LIKE to do with this data. If I can convince the awesome folks at Import.io to give me their top-tier plan for free I would be vizzing the shiz outta all kinds of stuff... but my feeble internet connection and their lack of support for their legacy application don't lend themselves to me doing 50,000+ queries anytime soon! That said Import.io's product is my ABSOLUTE FAVORITE for data extraction! *bats eyes, looks for endorsement deal*

Today's data is my initial scrape from the Video Game Music Database which really is an exhaustive list of titles... of course most of which are in Japanese so I don't necessary know all the titles the music is referring to, it's impressive that their community has built something so rich! They even have a dedicated stats page that you can poke around in located here.

I found this through a podcast I listen to pretty regularly called the Legacy Music Hour featuring 8-bit and 16-bit era games.

This really isn't so much a comprehensive look at the data as it is a quick viz so I can stick to my schedule... that said....

I just did two dashboards... one which simply augments and simplifies their searching process to give you all results that you can scroll through and load by album title. It's nothing fancy but you can use it to total up things to compare how many game soundtracks Sonic has had to Mario, etc.


This next one shows the trends in the video game music industry over the last several years. Unfortunately game sales data is hard to come by (if anyone knows a source please let me know). The height being in the late 2000's around 2009 or so with a dip after that. I kinda wonder that, like with gaming, VGM reached a saturation point where people had more than they could reasonably listen to or enjoy?

I also filtered the early days of VGM and limited it to 1983+ (which you can edit with the filter-slider) because I felt that really the explosion of game music came when the Famicom hit Japan in 1983. You can see this reflected


I promise you all next week I'll unveil something worthwhile when I'm feeling less like poop for a zillion reasons! In the meantime ... at least I'm keeping my schedule of 1 Viz per Week! I do have the scrape started for the deep dive into this data and I'll likely plow through that and get it published in a week or two from now (probably 2 as I don't like to put two similar topics published back-to-back). Hit me up on Twitter @wjking0 or leave a comment below and tell me what you thought!

Me with this week's viz.




Friday, November 4, 2016

$25,000 Dollar Prop, Mascots with Guns, and other fun things in a Halloween Express Scrape!

This is part of my #1YearOfViz series! Check out the archive here: http://bourbonandbrains.blogspot.com/p/one-year-of-dataviz.html

I'll provide visualizations however some of my initial findings are as follows: 
  • Some assumptions you would make are accurate, such as plus size costumes tend to be more expensive I considerable amount (28.60% more expensive on average).
  • Interestingly enough plus size costumes tend to be cheaper when they are classified as "sexy" (18.18% cheaper!).
  • There are also just about equal percentages of sexy plus-sized costumes as sexy non-plus size costumes (12.12% Plus vs 16.07% Non-Plus).

Thanks to my friend Barbie I thought it would be a good idea to look at Children vs Adult costumes to see which had more. I would assume more children costume exist than adults costumes, as it turns out there are considerably more adult costumes than children's costumes! The assumption could be that people tend to make children's costumes or children's costumes simply require make-up and accessories (which does make up more than 30% of the total items in the Halloween express store).

Also the assumption that women costume cost more than men's costumes is incorrect men's costumes cost more by approximately 20% and there are approximately 20% fewer men's costumes as women's costumes. This may be a result on pricing vs demand. Just guessing that possibly women's costumes are sold more frequently so can ultimately be priced lower.

I would say I'm sorry for all the Mean Girls gifs... but nah.

Since gender of costume isn't specifically stated in every case I did a little formula, I just wanted to give you a quick note on how I defined gender in this data. If any of the categories or the item title or item subtitle contain the word "women" or "girl" then I defined it as 'Female' and "boy" or "men" then I classified it as a 'Male' costume.

Anyway here's the data that I just mentioned! It's not really meant to be "played with" but don't worry the next viz below this factual story viz will be more interactive!



Now as promised I wanted to put some individual links to things in here that I just found horrifying:
  • Basically the ENTIRE "Mascots" Category but most specifically this gem. He's a "Patriot" mascot... with a shotgun. Not sure what Mascot carries heat.... but ya never know.
  • The $25,000 PROP.... which really just scratches the surface of expensive things for sale at Halloween Express. You can play around below with the full dataset and set it to $1,000+ on the filter and you can see HOW many crazy items there are in there!


I've been a little distracted lately so I promise a better and more deep dive into some data next week! As always hit me up on Twitter @wjking0 or in the comments below if you have questions/comments/concerns!
I'm OUT!

Wednesday, October 26, 2016

Roller Derby Injury Survey Results


For those that were wondering, that's my own x-ray in the background!



This is part of my #1YearOfViz series! Check out the archive here: http://bourbonandbrains.blogspot.com/p/one-year-of-dataviz.html


This is the page to display the results of the ongoing Roller Derby Injury Survey. If you are an injured (particularly already recovered) skater I urge you to fill it out to make the data as accurate as possible. One of the nice features of the newest version of Tableau Public that I've been using to display my data is that it can now query Google Sheets nightly for updated data. So what you're seeing on this page is LIVE data (as of midnight Eastern Standard Time on the day of viewing).

Let me first address the big issue everyone has with the survey which is that only a singular injury can be done at a time. This was purposeful as different injuries can have different recovery times/rates/etc and I didn't want to muddy the data with a variety of null results from the approximately 70% or so who had NOT had a previous injury. If you have filled the survey out for one injury but have had others I urge you to PLEASE fill it out again with details about your other injuries! The data is largely parsed by injury type (like time to recovery etc) and to make this data as accurate as possible we need as many data-points as possible. I know it takes time to do but I hope you feel it's time well spent!

Below is the summary of responses thus-far. If you feel that your region (country/state) is not represented enough in the data please spread the survey around to get others to fill it out and help get the data as robust as possible. I'd also like to take this opportunity to thank not only the Gimp Crew forum but also Roller Derby Athletics for helping to get this survey out to as many people as possible in the derby community! Your help is amazeballs! =)






I'll admit going into this data with some preconceived notions I wanted to examine. The first among those was WHEN people get injured. The overarching feeling I've always thought has been the newer skaters are more prone to injury. This turns out to be the case, with almost 1/3 of all injuries happening within the first year of skating.

It seems (with the current dataset) that the longer a skater skates the less-likely an injury is to occur. I may (once there is enough data to warrant this) add in a filter for injury type as I would guess that skaters suffering knee-related injuries would be higher the longer a person has skated with things like ankle injuries still being a "bad fall" type injury that would happen more frequently to newer skaters.









UPDATED 10-26-2016: I realized I had forgotten to put in the following exported image showing the how the age ranges above skew about 5+ years over what you see in the derby community as a whole:


The other thing I was curious about was the frequency of types of injuries and where those tended to take place. As anyone who's been involved in the sport for any length of time can tell you ankle injuries are by far the most prevalent in the data. That said however I feel that there is a LARGE area of unreported injuries with head impact/concussion injuries that is not accurately represented in the data. From personal experience having a teammate that I worked with for years who had to retire due to multiple concussion injuries in a single year (5+) I feel this is probably a part of the sport most people don't see as a "serious" injury so people don't get added to the Gimp Crew forum after concussive injuries. For every ankle break I've seen I've seen at least 3-5 concussions of various levels. At the time of this writing however concussions represent under 10% of the data... just saying that's probably not exactly accurate. While doing a search for some images to represent concussive injuries I came across this AWESOME article about the practices of concussion testing in other pro sports and the policy changes over the last 20 years or so in mandatory testing, etc.



Probably the largest surprise that I found was something I'd guessed about but wasn't sure of the "level" of. One of the reasons that I collected height and weight was to facilitate calculation of BMI to see if BMI's considerably changed after injury. Personally speaking I lost a bit of weight after my injury (leg muscle mass) so I was curious how normal that was. It turns out that weight change is fairly rare with less than 50% (at the time of writing) experiencing any weight change at all. Weight change though is relatively (to height obviously) so I wanted to use the CDC guidelines on BMI health. I'll be the first one to admit that BMI is a ROUGH measurement, I work out, I lift heavy, and have what I would generally consider to be a muscular body (#humblebrag)... that said according to the CDC I'm labelled as "Overweight" based on my height/weight/BMI combo.

Here's the last assumption that people have when thinking about derby injuries... that due to people being injured most typically in their first year of skating most injured skaters aren't in the best shape to begin with. I'm NOT SAYING that ALL injured skaters are out of shape, I'm just putting the data out that (at the time of writing) ~40% of injured skaters have BMI's that would label them as obese. Even if you don't agree with the CDC's definitions, the fact of the matter is that it appears that skaters with some extra weight tend to have a higher rate of injury. Check the chart below for the current numbers on weight change and BMI:



Other common questions that people ask are if surgery sped up or delayed recovery rates. After taking a look at the data I can tell you that in almost every injury case where a surgery was reported that recovery time was longer. In many cases though, in the long term, they tended to feel more "normal" sooner. I don't think this has to do specifically with HAVING surgery but more so the people that medical professionals suggest have surgery are just simply more severely injured. For instance if you look at Compound vs Non-Compound breaks you'll find that in EVERY SINGLE INSTANCE compound breaks required surgery and took significantly longer to heal.



Fortunately for me my health insurance at the time of my accident was pretty amazing and my co-pay and everything for my surgery was seemingly very little. I was curious how many people opted for things like USARS/WFTDA insurance. As it turns out over ~70% have an additional insurance aside from (or instead of) their personal insurance. This number bumps up ~1-3% after an injury... so it doesn't really look like prior injuries are a driving factor in if people decide to get additional insurance. I can create a dashboard for this data but honestly it was just kinda boring. =P

As someone who always wanted to jam I really had hoped that I would be able to come back a jammer... but at the time of writing only about 1/3 of people who used to jam return to jamming post injury whereas 59% of blockers return to blocking. You can check the chart below and see which people stay in which positions etc.



Where do we go from here?! Well the first thing to mention is that this data can be cut/sorted a zillion ways from Sunday. Have a question that this data can answer? Ask! Hit me up at twitter @wjking0 or via the Roller Derby Gimp Crew (which is an AWESOME resource for ANY injured skater with tons of support!) or in the comments below this blog. Since this data is live there is plenty of ways to continue to slice and dice the data in a real-time fashion!

P.S. I have a few more charts I'll likely add on as future blog posts as I get them all polished up (with regards to things like injury/pain data and a few other things). I'll put a link down here when another post goes up about the data!

This is how I feel after looking at injury rates all week...