Thursday, February 22, 2018

UK Crimes Minutia

Hi new readers! For those of you returning I'm sorry about my absence from blogging for a while! It's been a busy year or two! I lost my job at the University of Kentucky due to a departmental merger... floated around on unemployment for a while applying to jobs and finally ended up with a job at a startup in Southern California where I worked tons and increased my Tableau skills even more than before.

This is the longer/more-boring version of the shortened article here:

One thing kept nagging at me though and that was a project I started years ago while I was still working at UK. Here's the quick story... I worked in the Division of Student Affairs for the last 15 or so years of my employment with the University. Years and YEARS ago I thought that the way we reported crimes on campus was ridiculous. Due to both federal and state laws (known as Clery/Minger laws) the University has to report all crimes and acts of arson on campus. That is currently done in the following format:

Now, I don't know about you but while that fulfills the "letter of the law" I don't feel it fulfills the "spirit" of the law. Those laws were done so people could educate themselves on crime and trends to determine their (or their child's) safety on a college campus. I went to my boss at the time telling her I'd like to get a group together to map out crime on campus to help places like our Violence Intervention Center and other places focus efforts and overall to make campus a more safe place. I was told, in no uncertain terms, that I was not to do that. The fear was that it would "reflect poorly on our Greek community." (Which, once I looked at the data, it doesn't actually!) This, of course, pissed me off. I didn't have the tools to do it on my own... fast forward a few years and I became one of the Tableau super-users on campus and I realize that I now have all the tools (data scraping) and knowledge of data visualization (Tableau) to do this whole project myself. I went to my boss at the time, a different VP who was more forward thinking (Dr. Robert Mock) and let him know that I had gathered this data, analyzed it, and I was going to publish it. I let him know (as with now) I did it all on my own time with my own resources from publicly available data and I wasn't so much asking permission as I was just letting him know I might be kicking a hornet's nest. He told me to contact the UK Police and let them know what I'd done before I publish it... so I did. I wanted to show the police what I'd done and give them a chance to weigh in. I met MULTIPLE times with the then Lt. Barefoot (I believe Captain Barefoot now) and when I showed him the data he wasn't at all surprised. He helped clear up some of my understanding of the data and was SUPER HELPFUL. I can't speak highly enough of all the University of Kentucky Police Department. Seriously, they're great and very forward thinking. 

One of the first things they asked me was if I could do this data live against their database, which I was THRILLED to hear. I worked on a version to be published publicly and running against their database they were using to display on the Crime Log page. Unfortunately this project got bogged down in minutia primarily regarding updating map coordinates. I heard whispers up until about a year ago that someone was still going to publish/maintain it but without me driving the issue it ultimately never happened... UNTIL NOW!

This data is hand-entered into a system so you can imagine with 16000+ rows there is a lot of fat-fingering errors that happen. Additionally the text parsing to clean the addresses was a NIGHTMARE. Basically what I did was find every way to say "RD" and every permeation of every street ending you can imagine and did a huge if/then to combine all those. Took away the punctuation from everything and ended up hand-geocoding most of it with the AH-MAZE-ZING service Geocodio.

After that I tried to go in and manually include building names where the addresses matched. I considered doing this problematically but honestly it would have taken about the same amount of time either way. Whenever faced with a big problem like that I always like to refer to this XKCD comic:

Anyway... those are the gory details of why I did all this and how I did it. Just as before this data could be presented in this way live but it has never been made a priority by the University. Given the nature of some of the things that have happened recently maybe things like Terroristic Threatening should get a little more of a spotlight shown on it.

As always if you have any questions/comments/concerns hit me up on twitter @wjking0

University of Kentucky Crimes Mapped

Sorry it's been a while since I posted! I promise I'm going to get on a better public-release schedule.

That said... this data, while formatted nice has a TON of fat-finger errors in it... it's hand-entry of over 16,000+ things from a state-wide police system to their front-end web interface. If you'd like to hear about it and all the minutia that lead up to this and why it's important to me click here.

Me scrubbing data only to find more data that needs scrubbed.
With recent news of the mishandling of a poor young woman's case as detailed by the Kentucky Kernel I decided now was a good time to talk about the public nature of crime data... Let's get into it!

A couple things to note about the above viz is that CSA cases do NOT require a police officer to be involved. Those stand for "Campus Security Authority" so a CSA case can be something like someone spitting on a nurse (a frequent occurrence unfortunately) or drinking in a dorm room etc. To get an idea of the "real" police workload change the filter for "Case Number CSA" to False. Now you'll be looking at only the crimes where an actual officer was involved.
Aside from Eastern State Hospital (which is largely a mental health facility) what is one of the main drivers of crime on campus? Well... turns out that's UK Football. When you look at crimes by individual dates over the years there are some pretty obvious spikes. When I checked the dates, yep... all home UK football games.

What's that? You want to know the longest list of charges? Well that would belong to case 20143565 which has the following laundry-list of offenses:

Want to dig a little deeper into specific crimes? Check out the viz below!

Realistically though the University of Kentucky Police are really SUPER AWESOME and nice people whom I've met with personally several times. The next chart highlights close rates of cases and the ones you'd expect to not get closed (theft, burglary, etc) are the types of things you see most unsolved. Click around and see what you're curious about. When I was going through the data one of the most concerning things to me was to look at the "Unfounded" category and see how Sex Offenses is the highest rank (when including CSA cases). That doesn't seem like a thing people would exaggerate on and I trust the UK Police to have done their due diligence, but I also am concerned about the culture we live in and how that affects things like this in aggregate.

I'd also like to share this cleaned version of data that I am posting out on Google Drive for download as well as one of my new fave repositories at Data.World. If you'd like to know more about what went into cleaning the data again go to the page here where I talk about some of the data cleansing that went on.

As always I hope you all found this informative and if you have questions please post a comment below or hit me up on twitter @wjking0!

Tuesday, May 2, 2017

Things I've Made Up - Pure Dataviz Imagination

This is part of my #1YearOfViz series! Check out the archive here:
It's time to use our...
They're not fabricated datasets... they're "alternative facts"!
This week's #1YearOfViz I've decided to look at something a little different. Instead of the normal socio-political or sales data I normally look at I've decided to show you all some of the things I've done with data I've completely fabricated!

Since being terminated from the University of Kentucky due to a reduction in force I've applied to tons and tons of jobs (viz on that coming at a later date) and with several of those jobs I've found myself not as able to EXPLAIN what I wanted to do with the job as much as I wanted to SHOW the people in charge what I was capable of. I started taking several of the positions I've applied for an worked on designing datasets around those jobs to see what I could make out of their fake data. Sometimes there's an example, sometimes there's just a rough outline with a few numerical values thrown in for good measure...

Let's get started with a real dataset that I scraped from a local recruiter. This one came from TEKSystems which is an international company with offices right here in Lexington, KY. I had a meeting with one of their recruiters and I figured I'd better have something to show off my skills. So in about 24 hours I scraped and vizzed the following out of their job listings.

This first dashboard looked at posting trends of the TEK Systems employees to show them trends in their posting habits:

As you can see in the dash above most jobs are either filled or withdrawn after 2-3 weeks... I'm guessing that's their posting window and then they re-list the jobs again to keep it fresh (it looks better that way!). This next dashboard was so I could see what the trends were as far as jobs I'm looking for (IE Tableau dataviz jobs). Where are they located? What are the titles and frequencies etc?

Fun right? Totally a more functional use of their site and their depth of data in my opinion!
Me with data and crappy site interfaces!
I recently had a great interview with Delta Private Jets and was shown a spreadsheet so I was asked to answer a few questions. I fumbled around in Excel for a little bit to answer the bulk of the questions (yuck) but finally visualized the Tableau Desktop interface and walked the interviewer through every step of answer the questions in Tableau step-by-step.

About a week after the interview I got thinking that I'd prefer to actually show them the answers and drive them into some more questions with the data so I set about to re-create some of the data I saw as best I could remember. I vaguely remembered some ranges and the number or rows in the dataset...  then I found which has been pretty awesome to work with. Given the largest number of rows you can generate for free is 1,000 you can do that as many times as you wish. I downloaded several sets and combined them manually.
Yes Dean, totally made up!
I designed the following Story Dashboard to answer those questions and pose some new ones. Check it out and let me know what you think!

Finally I bring you to the last dataset I've created recently. I read a post from the CEO of about their sales information and how they'd like to hire a data-wrangler... so I figured I would fake-the-funk with a dataset to show him exactly what I was capable of!

I may fake a dataset... I'll never fake the funk.

Here's the fake sales data from ... what's wild to think is that I used fake numbers of sales but even on a fairly conservative end over the last couple of years the company should be worth several million dollars (gross profit) based on it's current pricing structure at a less than 50% adoption rate. What other nuggets of data could you get from all this fancy viz? I completed the data generation, data prep, and viz for this all in under 24 hours!

The dash above is meant for an initial overview of a lot of things where you can flip the parameters/colors around... the dash below allows for deep dives in with multiple simultaneous filtering options overlaying the data.

Have any of you readers ever went out of your way to make up something to show off your talents rather than talking about them? Hit me up on twitter or the comments below to let me know! As always if you have any questions/comments/concerns hit me up on Twitter @wjking0 and we'll tweet it up together!

How I feel all these companies look at my resume after seeing my fake data post. =D

Monday, April 24, 2017

Lexington Tableau User Group Presentation #2 and Vizzes!

This is part of my #1YearOfViz series! Check out the archive here:

I'm not going to spend a whole lot of time writing up the viz today when I've got the presentation below and the slideshow with all the relevant links in it you can click through yourselves!

We had a GREAT turnout for the Lexington TUG!

Here is the accompanying super-awesome slideshow!

Or click this link to open it in a new window:

Please watch the video for an explanation about these vizzes! Here is the Bluegrass Trust Plaques viz:

Here is the fun one, I've since taken the data and saved it up here to Data.World... which, if you haven't checked out is pretty amazing! I'm just now scratching the surface of all the options they have for datasets! Here is the National Parks Visitation Viz:

As usual if you have any questions feel free to hit me up on Twitter @wjking0!
Me by the end of the day!

Wednesday, April 12, 2017

30+ Years of Video Game Music

This is part of my #1YearOfViz series! Check out the archive here:
One of the first things I wanted to mention was that I had (what I felt) was a really excellent interview the other day with Delta Private Jets. It felt much more in my wheelhouse than when I did the interview a couple of weeks ago with VetData. In all honesty I think they were looking for more of a programmer and not an analyst (and I heard they hired an ex programmer for the position actually). Anyway... I kinda felt like I might not fit in with the Delta 'corporate people' having worked in Academia my entire career but the people were pretty rad and the job sounds exactly like the type of cross departmental data-exploration that I love diving head-first into! Plus... I mean... things like flight benefits would be pretty rad...
How I felt going to interview with Delta Private Jets
Due to that (interview business) and my girlfriend's kids being on Spring Break I'm publishing last week's viz today and I'll publish another one later this week also!

It's time to get down!
Now into the real subject of today's post! Video Game Music! I started this little endeavor after listening for a couple of years now to the Legacy Music Hour Podcast. I can't recommend it highly enough if you're into that sort of thing... and if you're not, what are you reading this post for!?

I poked around at several music databases to find the site with some of the most comprehensive datasets. I ended up landing on, their site format and abundance of already available stats allowed me to really cross-check what I was getting from them!

The dataset seemed easy enough to get scraped. I'd liked to have gotten the track data and lengths as well but the formatting below the initial listing got a little too funky to reliably pull with Octoparse. Still initial costs and years of release are pretty awesome so let's work with those... Except there was a minor (read: HUGE) problem with the cost data... It was in about two dozen different currencies (some of which were no longer in existence)!

I ended up parsing the type away from the number and did the conversion manually according to today's dollar values based on this site's conversion. I thought about it after I'd already written the following formula and realized that I could have done a quick scrape and join instead for the conversion values. Again due to the complexity of the whole conversion process I didn't do past value converted to current values with adjustments for inflation etc... I just felt it was a bit too much hacking for a few cents to a dollar difference on some things.

Let's get into the data! This first dash is a generic look at the full release of game soundtracks (on the top) followed by a look that is customizable by Console Type at the bottom to look at how ratings for the soundtracks of games for that particular system changed over time.

This last one is one where you can explore some of the extremes of the data. The bottom half of this dash reshapes the top ("dots") half of the viz. You can choose what types of measure you'd like to use by year and click on the specific year-point on the line to reshape the "dots" at the top. You can then click on the dots to be taken to that specific page about that album! I think it's a fun kind-of tiered way to get at both analysis and deep dives into the data through some segmentation!

As always if you have any questions hit me up on twitter @wjking0! This whole viz had made me feel SUPER nostalgic... time to go play some old games!

Wednesday, March 29, 2017

Lexington KY Traffic Cameras (Live)

This is part of my #1YearOfViz series! Check out the archive here:

This is going to be a SUPER quick post as this is a SUPER small (but useful!) viz that I whipped up. I literally have about 6-7 vizzes that I could put out but I'm working on an actual big story at the moment that I'm hoping to get picked up by some news organizations so I want to really give it the TLC it deserves. So this week instead of a deep dive you get a shallow wade into a more useful than data-filled viz!

I was poking around looking for some things to work with as far as GeoJSON data (which the new Tableau 10.2 supports!) and I came across the "New Mapping" group out of UK. I poked around a little on their github page and found the GeoJSON for all the Lexington Traffic Cameras! I thought, "Wow, this is neat!" and started building my viz around it... then I thought, "STOP!"...

Where did this data come from? Was it being used anywhere else!? Then I found it... Lexington Fayette Urban County Government had already built a site for this!

Then I realized their site doesn't reformat to mobile and while it does provide live video streams (for about 5 seconds before auto-closing) it required a click on each camera to show the data. This seemed like an unnecessary step so I made the dash below so that if you hover over a point the still image camera data will show immediately  (and will refresh upon scrolling over another and back over again). Additionally I created a mobile-specific version formatted for phones! It isn't much but sometimes just improving a UI can mean a huge difference in the utilization of a tool!
The question I always ask myself when re-doing someone else's work
Click the image below for the mobile version or continue to scroll down for the desktop interface:

As always if you have any questions hit me up in the comments below or on Twitter @wjking0.

Thursday, March 23, 2017

Reddit /r/Datasets Analysis

This is part of my #1YearOfViz series! Check out the archive here:
Let me start off by apologizing. I've been trying to work out some issues with +Tableau Software and getting Tableau Public working with the last several web pages I've tried to do a web-part embed with... and I've tried all the suggestions on the support forum. It works fine in Tableau Desktop and Tableau Public apps but once I upload it to Tableau Public it just doesn't show up. I originally thought it was the mix of http (this blog) and https (Tableau Public) but even when viewing just on the main Tableau Public page it is still showing up as a blank page. =/

Me shaking the 'Do Better' stick at Tableau Public

What this means for you is today's viz (in part) features new window pop-ups because the integration isn't working right with the Tableau web part.

Today's dataset is an analysis of all the links I could mine back through the history of a subreddit I am one of the admins of. If you're reading this blog and you're into dataviz and you haven't been to /r/Datasets yet then you really need to!

How being a subreddit mod really feels.

Below is the viz... you can change which dimensions you'd like to measure votes/comments by and if it has an associated link (such as profiles, domains, etc) you can click on the bar and a pop-up will come up with that data loaded in it.

The second part of this viz is just a little more in-depth breakdown of things if users from the subreddit are checking it out and want to see a how different categories are broken down.

Ultimately here are some of the base numbers:

52.55% are "Requests"
26.3% are "datasets"
7.31% are "resources"
5.71% are "questions"

Keep in mind this data only represents the past 1000 or so posts in /r/Datasets only really spanning about 3 months worth of time from the date of 3/21/2017 (the original scrape date). In the future I'll likely work on a more "live" version of this probably utilizing some IFTTT recipies but until then I hope you enjoyed this little glimpse into the weird world of datasets and the people who love them! <3

As always if you have any questions/comments/concerns hit me up on Twitter @wjking0 or in the comments below!

A meta image about a meta reddit dataset from where... ? You guessed it... reddit.