Monday, November 11, 2019

AI Test Page

This is a test for TensorFlow AI


Teachable Machine Image Model

Wednesday, May 15, 2019

The Most Popular Cartoons of All Time




I frequently grab a delicious breakfast at one of Lexington's many bakeries and about a month or so ago I was at Magee's Bakery. On Saturday mornings (my "pastry day") they play old school Saturday morning cartoons like Mask, GI Joe, etc. I got thinking about cartoons and how today's compare with the ones I watched growing up... and how some long running shows (like The Simpsons) fare over time.

I did some cursory searches and found IMDB had a list of shows (Top 250 Animated Series by Avg Rating). I thought that was neat but I wanted to look at the shows over time so I designed a scrape to extract every episode ratings data from each episode from each season. These had things like number of people to rate, original air date, etc.

I then started vizzing to get my answers!

Things to Note:
  • Most shows don't go past 4 seasons (by season 5 up to 73% have been cancelled)
  • 1/3 of cartoon series don't make it past the first season
  • Median Star Rating for an episode is 7.6
  • Rick and Morty have an ENORMOUSLY passionate fan-base (6,742 votes per episode average) with a votes-per-episode level that is 4X the next highest show (The Simpsons at 1,636 avg ratings per episode)
  • Star ratings (overall) decline by season without any real upswing until a show hits it's 15th season (which only a very few percent do)

Here's a quick viz of top cartoons by season and their season breakdown. Mouse over the show to get a breakdown of their seasons ratings in a little box-plot to the right.





But what if you REALLY want know the "BEST" rated cartoon episode ever...? The problem becomes one of rose colored glasses. I'm personally a fan of Batman The Animated Series... sadly of ALL the Batman cartoons Batman TAS ranks the LOWEST!



I then started thinking about what decade had the greatest cartoons ever and so I built the next viz to help me encompass everything a little more completely (1980 has the highest rating year of any year BTW)... If you click on a year the graphs will reform follow the DIRECTIONS IN RED for help navigating the entire history of cartoons and use your won criteria to figure out for yourself what is the Best Cartoon of All Time! 





What do you think? One Punch Man? South Park? The Simpsons?


As always hit me up on Twitter @wjking0 with any questions!

Monday, November 5, 2018

Jammer Lap Points in Roller Derby



Let’s talk about Jammer Lap Points (further referred to as “JLP”)… I’m a math guy, a stats guy really… it’s in my job title, “Lead Data Analyst”. I’ve analyzed everything from roller derby, esports, corporate travel to education. Roller Derby though, while more driven as a fan-base is largely ignorant of its own stats. So let's look at them!


How I feel when people say it won't really change the game...
Facts:

  • Total Bout time 90 minutes
    • Minimum total number of jams
      • Theoretical, 0 seconds lead with call, 0 points earned: 180
      • Estimated, 0 points to lead, 10 second lap all opposing skaters: 135
    • Maximum total number of jams (full 2 minute jams, 30 seconds between): 36
  • Looking at 2017 data as a complete year set we have the following bout/score info:
    • 5,307 bouts played
      • 2,245 of which were in tournament play
    • 689 were within 18 points (just guessing that a team gets lead about ½ of the time) or about 13% of the bouts
    • 244 in Tournament Play (or approximately 10.8% of the bouts)

Assumptions:

  • If a team won by less than 18 points (which is adjustable in the viz) I gave the win to the other team so basically flip-flopping every game under 18 points.
  • Unfortunately, I don’t have access to the row level data of each games’ scorekeepers or every pass >= 5 I could change to 4 and we could get a DEFINITIVE list of bouts that would have been decided by the JLP change.


First, I wanted to focus on the meta game, commonly referred to in esports as “The Meta.” When we think about most sports, the rules don’t change terribly frequently. Soccer, American Football, Tennis, etc typically are very settled sports, that is to say they don’t have rules that fluctuate very much. Minor changes and typically nothing to do with scoring are changed. Scoring and defensive changes would reasonably cause the largest change to the meta with everything else done in sports in service of either offense (scoring), defense (preventing score), or penalizing bad behavior (personal/team penalties). Let’s get into it!


The chart below shows a scatter plot of all games over the last 3 years with tight games of 18 or less being represented in the red band. You can filter by tournament type or if it was in tournament play at all. Honestly, JLP’s are only affecting about 1 in 10 games. With that said, however, there is a trend in more recent years of it affecting Tournament games more and more… this makes sense as tournaments become better at seeding and finding similarly skilled opponents to play each other.


If you're viewing this on a phone click a here for a better formatted version of these viz optimized for mobile: https://public.tableau.com/views/RollerDerbyJLPViz/FTSMeta?:embed=y&:display_count=yes&publish=yes





While it doesn't seem to affect MANY games... think of every "barn burner" of a game you've ever seen. Like... Jacksonville in the tournament this year. Jammer Lap Points MADE A DIFFERENCE in that game. There's no denying that.


JLP was a huge part of the comeback against Montreal

This next viz I built to show individual teams the difference it could potentially make in their record over the years. Put in your own team name at the top, again if you're on mobile you can go to a mobile version by clicking here: https://public.tableau.com/views/RollerDerbyJLPViz/IndividualTeam?:embed=y&:display_count=yes 



Ultimately, what I've found is that I lack the access to the row-level stats to truly do correct analysis so this is more of a "close games" analysis but if you know that you're a "5 and done" team typically you can guesstimate how many JLP's your team might average in a bout and put those numbers in to see for yourself!

I think the argument that a person or the public not understanding the rules of a game means that you're catering the game to the enjoyment of the fans and the skillfulness of the athletes/refs. I know that this rules change is going to happen but it's worth noting that now things; like the highest point jam record will never be touched, a 20 point jam suddenly becomes a big deal, and previously a 40 point differential was the tipping point to statistically winning a bout... now that will fall to around 32 points.

With all that said, there are plenty of stats nerds in derby that would run these numbers and I think there should be open debate about rules changes that affect the meta game so seriously.
How I feel about the audience not getting the JLP rules...
As always hit me up on Twitter @wjking0 if you have any questions/concerns!

Addendum: Some people (Thanks Skate Beckett!) pointed me towards WFTDA's stats page to get more accurate data... it took a TON of cleaning (hence why this is being published much later than I initially intended.) With that said, I did the following in the same format... but the WFTDA stats data is super limited only having up to beginning of 2018 and primarily only having tournament info included. I actually reformatted the data at the jam level and reduced anything over 5 points in a pass to 4 points. Ultimately what that led to was the following viz: https://public.tableau.com/views/RollerDerbyJLPViz/WFTDAMeta?:embed=y&:display_count=yes





Also what this resulted in was a closer average game point differential going from a point difference of 63.1 to 53.8. So even though I feel deep in my soul that this is one of those things we should fight, it ultimately should lead to some closer bouts in the long run! So yay!?


Thursday, August 9, 2018

Creepypasta - The Popularity of SCP Stories




I had originally planned to do this visualization during the month of October as I felt it would be a little more fitting at that point. However due to the overwhelming votes for this viz over the other one I had planned to do at this time led us to Creepypasta winning out! I decided to scrape the SCP Wiki as I felt it was one of the best curated sources of Creepypasta around.

Unfortunately Tableau Public no longer lets you embed http data sources (rather than https) and SCP is purely http only. =( As you can see below it looks SOOOOO nice in Tableau Desktop!


Why will you let this work in Desktop but not Public Tableau!!?!?!



Some things to note:
Below is the breakdown of time/date/days of posts of SCP articles:


Here is the breakdown of the "popularity" of SCP CreepyPasta, you can arrange it just about any way you'd like coloring or sizing it by different aspects or choosing a date range and the scatter plot above will reshape in that way. For the shape metrics I decided to go with object class (which took QUITE a bit of cleaning) or if an object was marked as "Safe".

Click on individual entries to be taken to the page about that particular SCP!



As always hit me on twitter @wjking0 or leave a comment below or on other social media for comments/questions.

P.S. Sorry this one took so long and was kinda lackluster, I was hoping to find a work-around for displaying webpages inline with Tableau Public but sadly couldn't quite make it happen. =/

Thursday, February 22, 2018

UK Crimes Minutia

Hi new readers! For those of you returning I'm sorry about my absence from blogging for a while! It's been a busy year or two! I lost my job at the University of Kentucky due to a departmental merger... floated around on unemployment for a while applying to jobs and finally ended up with a job at a startup in Southern California where I worked tons and increased my Tableau skills even more than before.

This is the longer/more-boring version of the shortened article here: http://bourbonandbrains.blogspot.com/2018/02/university-of-kentucky-crimes-mapped.html

One thing kept nagging at me though and that was a project I started years ago while I was still working at UK. Here's the quick story... I worked in the Division of Student Affairs for the last 15 or so years of my employment with the University. Years and YEARS ago I thought that the way we reported crimes on campus was ridiculous. Due to both federal and state laws (known as Clery/Minger laws) the University has to report all crimes and acts of arson on campus. That is currently done in the following format:


Now, I don't know about you but while that fulfills the "letter of the law" I don't feel it fulfills the "spirit" of the law. Those laws were done so people could educate themselves on crime and trends to determine their (or their child's) safety on a college campus. I went to my boss at the time telling her I'd like to get a group together to map out crime on campus to help places like our Violence Intervention Center and other places focus efforts and overall to make campus a more safe place. I was told, in no uncertain terms, that I was not to do that. The fear was that it would "reflect poorly on our Greek community." (Which, once I looked at the data, it doesn't actually!) This, of course, pissed me off. I didn't have the tools to do it on my own... fast forward a few years and I became one of the Tableau super-users on campus and I realize that I now have all the tools (data scraping) and knowledge of data visualization (Tableau) to do this whole project myself. I went to my boss at the time, a different VP who was more forward thinking (Dr. Robert Mock) and let him know that I had gathered this data, analyzed it, and I was going to publish it. I let him know (as with now) I did it all on my own time with my own resources from publicly available data and I wasn't so much asking permission as I was just letting him know I might be kicking a hornet's nest. He told me to contact the UK Police and let them know what I'd done before I publish it... so I did. I wanted to show the police what I'd done and give them a chance to weigh in. I met MULTIPLE times with the then Lt. Barefoot (I believe Captain Barefoot now) and when I showed him the data he wasn't at all surprised. He helped clear up some of my understanding of the data and was SUPER HELPFUL. I can't speak highly enough of all the University of Kentucky Police Department. Seriously, they're great and very forward thinking. 

One of the first things they asked me was if I could do this data live against their database, which I was THRILLED to hear. I worked on a version to be published publicly and running against their database they were using to display on the Crime Log page. Unfortunately this project got bogged down in minutia primarily regarding updating map coordinates. I heard whispers up until about a year ago that someone was still going to publish/maintain it but without me driving the issue it ultimately never happened... UNTIL NOW!

This data is hand-entered into a system so you can imagine with 16000+ rows there is a lot of fat-fingering errors that happen. Additionally the text parsing to clean the addresses was a NIGHTMARE. Basically what I did was find every way to say "RD" and every permeation of every street ending you can imagine and did a huge if/then to combine all those. Took away the punctuation from everything and ended up hand-geocoding most of it with the AH-MAZE-ZING service Geocodio.

After that I tried to go in and manually include building names where the addresses matched. I considered doing this problematically but honestly it would have taken about the same amount of time either way. Whenever faced with a big problem like that I always like to refer to this XKCD comic:


Anyway... those are the gory details of why I did all this and how I did it. Just as before this data could be presented in this way live but it has never been made a priority by the University. Given the nature of some of the things that have happened recently maybe things like Terroristic Threatening should get a little more of a spotlight shown on it.

As always if you have any questions/comments/concerns hit me up on twitter @wjking0

University of Kentucky Crimes Mapped



Sorry it's been a while since I posted! I promise I'm going to get on a better public-release schedule.

That said... this data, while formatted nice has a TON of fat-finger errors in it... it's hand-entry of over 16,000+ things from a state-wide police system to their front-end web interface. If you'd like to hear about it and all the minutia that lead up to this and why it's important to me click here.

Me scrubbing data only to find more data that needs scrubbed.
With recent news of the mishandling of a poor young woman's case as detailed by the Kentucky Kernel I decided now was a good time to talk about the public nature of crime data... Let's get into it!



A couple things to note about the above viz is that CSA cases do NOT require a police officer to be involved. Those stand for "Campus Security Authority" so a CSA case can be something like someone spitting on a nurse (a frequent occurrence unfortunately) or drinking in a dorm room etc. To get an idea of the "real" police workload change the filter for "Case Number CSA" to False. Now you'll be looking at only the crimes where an actual officer was involved.
Aside from Eastern State Hospital (which is largely a mental health facility) what is one of the main drivers of crime on campus? Well... turns out that's UK Football. When you look at crimes by individual dates over the years there are some pretty obvious spikes. When I checked the dates, yep... all home UK football games.



What's that? You want to know the longest list of charges? Well that would belong to case 20143565 which has the following laundry-list of offenses:
"FAILURE TO ILLUMINATE HEAD LAMPS/V, DISREGARDING TRAFFIC CONT DEV-TRAFFIC LIGHT/V, OPER MTR VEHICLE U/INFLU ALC/DRUGS/ETC. .08(AGG CIRCUM) 1ST OFF/M, FLEEING OR EVADING POLICE, 1ST DEGREE (MOTOR VEHICLE)/F, FLEEING OR EVADING POLICE, 1ST DEGREE (ON FOOT)/F, CARRYING A CONCEALED WEAPON/M, WANTON ENDANGERMENT-1ST DEGREE-POLICE OFFICER/F, POSS OF MARIJUANA/M, TRAFFICKING IN SYNTHETIC CANNABINOID AGONISTS OR PIPERAZINES/M, POSS CONT SUB 1ST DEG 1ST OFF (COCAINE)/F, PROMOTING CONTRABAND-1ST DEGREE/F, WANTON ENDANGERMENT-1ST DEGREE/F, POSS OF OPEN ALC BEVERAL CONT IN MOTOR VEH PROHIBITED/V."


Want to dig a little deeper into specific crimes? Check out the viz below!



Realistically though the University of Kentucky Police are really SUPER AWESOME and nice people whom I've met with personally several times. The next chart highlights close rates of cases and the ones you'd expect to not get closed (theft, burglary, etc) are the types of things you see most unsolved. Click around and see what you're curious about. When I was going through the data one of the most concerning things to me was to look at the "Unfounded" category and see how Sex Offenses is the highest rank (when including CSA cases). That doesn't seem like a thing people would exaggerate on and I trust the UK Police to have done their due diligence, but I also am concerned about the culture we live in and how that affects things like this in aggregate.


I'd also like to share this cleaned version of data that I am posting out on Google Drive for download as well as one of my new fave repositories at Data.World. If you'd like to know more about what went into cleaning the data again go to the page here where I talk about some of the data cleansing that went on.

As always I hope you all found this informative and if you have questions please post a comment below or hit me up on twitter @wjking0!


Tuesday, May 2, 2017

Things I've Made Up - Pure Dataviz Imagination

This is part of my #1YearOfViz series! Check out the archive here: http://bourbonandbrains.blogspot.com/p/one-year-of-dataviz.html
It's time to use our...
They're not fabricated datasets... they're "alternative facts"!
This week's #1YearOfViz I've decided to look at something a little different. Instead of the normal socio-political or sales data I normally look at I've decided to show you all some of the things I've done with data I've completely fabricated!

Since being terminated from the University of Kentucky due to a reduction in force I've applied to tons and tons of jobs (viz on that coming at a later date) and with several of those jobs I've found myself not as able to EXPLAIN what I wanted to do with the job as much as I wanted to SHOW the people in charge what I was capable of. I started taking several of the positions I've applied for an worked on designing datasets around those jobs to see what I could make out of their fake data. Sometimes there's an example, sometimes there's just a rough outline with a few numerical values thrown in for good measure...

Let's get started with a real dataset that I scraped from a local recruiter. This one came from TEKSystems which is an international company with offices right here in Lexington, KY. I had a meeting with one of their recruiters and I figured I'd better have something to show off my skills. So in about 24 hours I scraped and vizzed the following out of their job listings.

This first dashboard looked at posting trends of the TEK Systems employees to show them trends in their posting habits:



As you can see in the dash above most jobs are either filled or withdrawn after 2-3 weeks... I'm guessing that's their posting window and then they re-list the jobs again to keep it fresh (it looks better that way!). This next dashboard was so I could see what the trends were as far as jobs I'm looking for (IE Tableau dataviz jobs). Where are they located? What are the titles and frequencies etc?



Fun right? Totally a more functional use of their site and their depth of data in my opinion!
Me with data and crappy site interfaces!
I recently had a great interview with Delta Private Jets and was shown a spreadsheet so I was asked to answer a few questions. I fumbled around in Excel for a little bit to answer the bulk of the questions (yuck) but finally visualized the Tableau Desktop interface and walked the interviewer through every step of answer the questions in Tableau step-by-step.

About a week after the interview I got thinking that I'd prefer to actually show them the answers and drive them into some more questions with the data so I set about to re-create some of the data I saw as best I could remember. I vaguely remembered some ranges and the number or rows in the dataset...  then I found mockaroo.com which has been pretty awesome to work with. Given the largest number of rows you can generate for free is 1,000 you can do that as many times as you wish. I downloaded several sets and combined them manually.
Yes Dean, totally made up!
I designed the following Story Dashboard to answer those questions and pose some new ones. Check it out and let me know what you think!




Finally I bring you to the last dataset I've created recently. I read a post from the CEO of Import.io about their sales information and how they'd like to hire a data-wrangler... so I figured I would fake-the-funk with a dataset to show him exactly what I was capable of!

I may fake a dataset... I'll never fake the funk.

Here's the fake sales data from Import.io ... what's wild to think is that I used fake numbers of sales but even on a fairly conservative end over the last couple of years the company should be worth several million dollars (gross profit) based on it's current pricing structure at a less than 50% adoption rate. What other nuggets of data could you get from all this fancy viz? I completed the data generation, data prep, and viz for this all in under 24 hours!


The dash above is meant for an initial overview of a lot of things where you can flip the parameters/colors around... the dash below allows for deep dives in with multiple simultaneous filtering options overlaying the data.




Have any of you readers ever went out of your way to make up something to show off your talents rather than talking about them? Hit me up on twitter or the comments below to let me know! As always if you have any questions/comments/concerns hit me up on Twitter @wjking0 and we'll tweet it up together!

How I feel all these companies look at my resume after seeing my fake data post. =D