Monday, December 15, 2014

ThinkGeek Catalog Details Data Analysis (What's on SALE!?)

It's time once again for some holiday shopping! I'm assuming if you're reading this blog you're probably of the nerdy variety so you'll probably appreciate the same kinds of things that I do in a consumer sense. LogoSo I got thinking about where fun places to buy things are. I've recently fallen desperately in LOOOOOVE with I'd been playing with a few little data scrapes but I finally decided to put it through some real paces now that I understood better how to train it and analyze the data after-the-fact.

Ah-Ha ThinkGeek! I'll scrape their entire catalog to see what to get for people!

ThinkGeek Logo

A quick scrape (well 4 hours later) we have a total of 3,930 products in our dataset. In the following categories:

  • Computer Stuff
  • Electronics
  • Electronics & Gadgets
  • Gadgets
  • Geek Kids
  • Geek Toys
  • Home & Office
  • Interests
  • T-Shirts & Apparel
  • Tools Outdoor & Survival

I decided to do sales data two different ways, first was to do sales percentage difference. That is to say the percentage off of the full price in a category (color depth denotes number of products in a category):

Finally the big payoff! Scroll over a product to load the product image and click on a product to load the product webpage (be careful not to scroll across another product after selecting one or another product image will load on top and thwart your shopping efforts!):

Let me know what other retail sites you'd like to see analyzed and I'll see what I can do!

Lastly I figured I'd show the amount of stuff that was out of stock at ThinkGeek which I found to be exceedingly high as well (a full 46% of their total inventory!).

Thursday, December 4, 2014

Crosswalk Data - A Lesson in Finding Exactly What You're Looking For

I enjoy a nice Fall (or even early Winter) walk. I live in the beautiful and vibrant city of Lexington KY. It's a largely urban city so I spend a lot of time crossing crosswalks. Crosswalks have a numerical countdown... I love number problems!

Times obviously weren't a set number so either they're set randomly by whomever is setting them up or there is a pattern. I figured it had something to do with the account of traffic in an intersection (cars/hr or something equivalent). I started looking in the usual places for the stoplight/crosswalk light information I was looking for /r/datasets,, etc. It wasn't until I started glancing into city municipal data for various larger cities that I discovered that the math was already done for me!

T = d/1.065
T = Crosswalk time in seconds
d = Distance in meters

The 1.065 m/s (3.5 ft/s) comes from a study done in 1982 regarding mobility of pedestrians. Generally speaking the speed of the average pedestrians is around 1.22 m/s but a longer time is factored for walking speed to give time to elderly walkers and pedestrians with mobility disabilities (which accounts for about 15% of the population). So now every time you cross a street you can think to yourself how long the crosswalk (and thusly stoplight) SHOULD be and be able to roughly calculate if that's accurate!

Now given the dataset that I just got access to the other day (upcoming viz VERY soon, I promise) I'm now wondering if I could time it based on light changes to walk to work hitting every single crosswalk at the correct time based on the distance between lights, crosswalk distance, and light timing. It's moments like this that I think I'm steadily becoming this guy:

Like I said, new viz regarding stop light data is coming very soon... 

* Most of the municipal data for this post is pulled from this site:

Friday, November 21, 2014

Gallery Hop Locations - Lexington, KY

I'll fully admit that this Viz is a rush-job. I wanted to go ahead and throw it together quickly and I'll update it later as I crunch more data into the map. The last Gallery Hop of the season is tonight in Lex so here's a little map to see what's going on (that I may even get a chance to update once or twice before tonight). I'd also like to include non-official locations so if you know of any please hit me up and let me know about them!

Thursday, October 30, 2014

Roller Derby Flat Track Stats Data


As promised here's the data for ALL of Flat Track Stats info I've been working on. I've broken this up into a couple of Dashboards you can visit or you can go here to view the whole book as a collection.

Dashboard 1: The Whip It Effect

The first Dashboard shows League Type by Start Date (not filtering for leagues that no longer exist). Note the LARGE increase from 2009-2010. Whip It came out September 13, 2009 and was a large boon for the roller derby community obviously as numbers shot up FAR faster than in previous years or since.

The second part of the dash shows Bouts by Type over time and just shows exactly how many bouts and how exponentially it grew in the first few years of the sport to now having over 4500 bouts on average for the last 3 years! This is an area you can filter to see how many bouts there are comparitively between juniors, women, coed, and men's leagues by checking the boxes to the right.

Lastly you'll find the number of tournaments by type (most being labelled obviously as "Invitational") but it's interesting to see how many more Seeded Bracket tournaments have been held in the last few years.

Dashboard 2: Home Team Winners

This dashboard I've filtered by default the teams that will be competing this weekend in the WFTDA Championships. You can click "All" on the Filter at the right, click it again to basically clear "All" and then type in your team name if you're interested how your team (or your league) is doing at home.

The left half of this dashboard are score differentials over time for all bouts (above the median line and in green means they won, below the median and going towards red means a loss). The trend lines show how those selected teams have done over time at home and if they're doing better or worse (hint: most teams are doing better and better with home field advantage even when normalized for total score, which this is not).

The right half of this dashboard is all the games that the selected teams have had at home. Again red is a loss, green is a win with the point difference at the end of the bar. This is just to kind of see how a team has performed over time and gives you a quick glance at who they've played and how they've done against them.

Dashboard 3: Where did all these derby people COME FROM!?

If you haven't watched the fancy video I made I'll go ahead and imbed that again here:

The data itself looks like Dashboard 3 where you'll see the totals for different types and can click around to sort by genus of league (again, coed, junior, men, women's and all).

Bonus Viz: How Popular is Roller Derby Where You Live?

Finally I wanted to see, based on state population where derby was "most popular" by capita. I took 2013 State Population Estimates from and ran that against the states teams were located in. This data is based on all ACTIVE teams (teams with a "Disbanded Date != Null" were removed). Since I don't have the number of players per team I decided to multiply the numbers up a bit so we could see how many roller derby team per 1 Million people there were in each particular state. What I found was really interesting and will require more research but seems to point to the fact that larger states (regardless of population) tend to have more teams. The ultimate conclusion though is if you want to know where has the most derby per capita. Wyoming. No kidding gang!

The top number on each state is the number of teams (again only Active teams) that are in that state. The bottom number is the number of teams per 1 million people in that state's population. Play around, sort out just junior leagues or just look at your state in particular. All maps in these visualizations are zoomable so go nuts!

I hope you enjoyed these graphs and if you want to talk more about it hit me up via email here or at twitter @wjking0 to talk to me about more data nerd stuff I've been working on. Thanks again to Flat Track Stats for having an AWESOME data set to work with and I look forward to doing more in the future (penalty info anyone!?)! Much derby love! <3 -Jack Flash

Wednesday, October 29, 2014

Roller Derby Flat Track Stats (FTS) Data Preview Video!

Just to reitterate what I have down on my YouTube page this video was created using access to the data. It's a little sample of something I'll release probably tomorrow afternoon that will be FULL of all kinds of data for you roller derby kids to play around with! Because Tableau doesn't do animation well though I figured I would go ahead and export this to throw a little sample video of the data on YouTube for you to enjoy! The leagues that "pop in" during this video are the ones with listed start dates on them. Some leagues (for instance Derby City Roller Girls who are dear friends of mine) have no "start date" data but they (and everyone else with location data) is represented in the last still image you'll see in the video. Post it out and let me know what you all think and watch tomorrow for the BIG data drop! Much Derby Love! <3 -Jack Flash

Tuesday, September 23, 2014

WRFL 88.1 FM Lexington, KY MP3 File Breakdown

WARNING! Ultra-WIDE #DataViz Ahead!!!

While I'm not making #DataViz out of #WeirdData I work as a Sys Admin for the University of Kentucky. One of the most neat and data-rich departments I get to support is our student-run radio station WRFL 88.1 FM. There are over 300,000 MP3 files stored on their server from their over 30,000 CDs and well over 30,000 vinyl records. At this point a good chunk of the CDs have been ripped and provides us with a rich dataset. It allows us to ask all sorts of neat questions of a unique set of data:

  • Want to know the average length of a Rock song? 3 minutes, 23 seconds
  • How many songs on the server have the word "Kentucky" in the title? 57
  • How many songs does Johnny Cash have on the server? 916

All that being said, I know that this data (because radio station employees can upload their own data to it) is not the most "clean" set of data. Do I believe there was a song from 1675 on the server? Probably not. Eventually also I'll clean up the "Genre" category a lot more over time. I expect I'll be updating this particular dataset somewhere in the 4-6 month range.

ALSO! I was going to throw in a logo and decided instead to throw a live-stream player built into the DataViz which refreshes when the track playing changes allowing people to search out that particular artist/album to see what else is there.

Let me know what you think and suggestions for how to improve the viz at @wjking0.

Friday, September 19, 2014

Where (Not) To Eat In Lexington, KY

I got this data from the Lexington, KY Health Department and while I hoped to have this constantly update-able I don't know that is going to be the case. I had to do quite a bit of data-teasing before I could actually do some work (particularly with location data). Additionally I had to create a file that contained all the Health Code violations and definitions so I knew which were "Critical" violations and what each code means. So this may be a one-off data viz.

I wanted to look originally if zip-code (and thusly socio-economic status) of an area had anything to do with food quality but I quickly realized another trend. That "Marts" and grocery stores tended to fall towards the end of the spectrum. Also interestingly every "school" is listed as well... use the search function to search for schools or other clusters of dining places to see if you notice any trends and you can shoot me a message @wjking0!

Search or click on the map and drag to select multiple locations and scroll down for more info about your selection!

Wednesday, September 17, 2014

Kentucky Derby Winners Viz

After attending #Data14 the Tableau Conference it came to my attention after a session from Jewel and Crew that I should start sharing out all my #WeirdData that I enjoy viz'ing? I hemmed and hawwed about what kind of viz to do so I figured I would start with one from Kentucky (my current home). Here in Kentucky we love three things above all else... Basketball, Bourbon, and Horse Racing.

The Kentucky Derby is one of the oldest continually running horse races in the world. The data set there is pretty expansive but I was amazed at how hard it was to find. The data I ended up using I scraped from a couple Wiki's and I have some other data that I may enhance this with later (such as Purse collected). For now though enjoy playing with my first ever blog post and my first ever Tableau Public Viz. Comments/suggestions welcome! @wjking0