Showing posts with label Tableau. Show all posts
Showing posts with label Tableau. Show all posts

Tuesday, May 2, 2017

Things I've Made Up - Pure Dataviz Imagination

This is part of my #1YearOfViz series! Check out the archive here: http://bourbonandbrains.blogspot.com/p/one-year-of-dataviz.html
It's time to use our...
They're not fabricated datasets... they're "alternative facts"!
This week's #1YearOfViz I've decided to look at something a little different. Instead of the normal socio-political or sales data I normally look at I've decided to show you all some of the things I've done with data I've completely fabricated!

Since being terminated from the University of Kentucky due to a reduction in force I've applied to tons and tons of jobs (viz on that coming at a later date) and with several of those jobs I've found myself not as able to EXPLAIN what I wanted to do with the job as much as I wanted to SHOW the people in charge what I was capable of. I started taking several of the positions I've applied for an worked on designing datasets around those jobs to see what I could make out of their fake data. Sometimes there's an example, sometimes there's just a rough outline with a few numerical values thrown in for good measure...

Let's get started with a real dataset that I scraped from a local recruiter. This one came from TEKSystems which is an international company with offices right here in Lexington, KY. I had a meeting with one of their recruiters and I figured I'd better have something to show off my skills. So in about 24 hours I scraped and vizzed the following out of their job listings.

This first dashboard looked at posting trends of the TEK Systems employees to show them trends in their posting habits:



As you can see in the dash above most jobs are either filled or withdrawn after 2-3 weeks... I'm guessing that's their posting window and then they re-list the jobs again to keep it fresh (it looks better that way!). This next dashboard was so I could see what the trends were as far as jobs I'm looking for (IE Tableau dataviz jobs). Where are they located? What are the titles and frequencies etc?



Fun right? Totally a more functional use of their site and their depth of data in my opinion!
Me with data and crappy site interfaces!
I recently had a great interview with Delta Private Jets and was shown a spreadsheet so I was asked to answer a few questions. I fumbled around in Excel for a little bit to answer the bulk of the questions (yuck) but finally visualized the Tableau Desktop interface and walked the interviewer through every step of answer the questions in Tableau step-by-step.

About a week after the interview I got thinking that I'd prefer to actually show them the answers and drive them into some more questions with the data so I set about to re-create some of the data I saw as best I could remember. I vaguely remembered some ranges and the number or rows in the dataset...  then I found mockaroo.com which has been pretty awesome to work with. Given the largest number of rows you can generate for free is 1,000 you can do that as many times as you wish. I downloaded several sets and combined them manually.
Yes Dean, totally made up!
I designed the following Story Dashboard to answer those questions and pose some new ones. Check it out and let me know what you think!




Finally I bring you to the last dataset I've created recently. I read a post from the CEO of Import.io about their sales information and how they'd like to hire a data-wrangler... so I figured I would fake-the-funk with a dataset to show him exactly what I was capable of!

I may fake a dataset... I'll never fake the funk.

Here's the fake sales data from Import.io ... what's wild to think is that I used fake numbers of sales but even on a fairly conservative end over the last couple of years the company should be worth several million dollars (gross profit) based on it's current pricing structure at a less than 50% adoption rate. What other nuggets of data could you get from all this fancy viz? I completed the data generation, data prep, and viz for this all in under 24 hours!


The dash above is meant for an initial overview of a lot of things where you can flip the parameters/colors around... the dash below allows for deep dives in with multiple simultaneous filtering options overlaying the data.




Have any of you readers ever went out of your way to make up something to show off your talents rather than talking about them? Hit me up on twitter or the comments below to let me know! As always if you have any questions/comments/concerns hit me up on Twitter @wjking0 and we'll tweet it up together!

How I feel all these companies look at my resume after seeing my fake data post. =D

Monday, April 24, 2017

Lexington Tableau User Group Presentation #2 and Vizzes!


This is part of my #1YearOfViz series! Check out the archive here: http://bourbonandbrains.blogspot.com/p/one-year-of-dataviz.html

I'm not going to spend a whole lot of time writing up the viz today when I've got the presentation below and the slideshow with all the relevant links in it you can click through yourselves!

We had a GREAT turnout for the Lexington TUG!



Here is the accompanying super-awesome slideshow!


Or click this link to open it in a new window: https://goo.gl/lTi9YO

Please watch the video for an explanation about these vizzes! Here is the Bluegrass Trust Plaques viz:






Here is the fun one, I've since taken the data and saved it up here to Data.World... which, if you haven't checked out is pretty amazing! I'm just now scratching the surface of all the options they have for datasets! Here is the National Parks Visitation Viz:





As usual if you have any questions feel free to hit me up on Twitter @wjking0!
Me by the end of the day!

Thursday, January 5, 2017

Veterans Affairs Database of Military Graves


This is part of my #1YearOfViz series! Check out the archive here: http://bourbonandbrains.blogspot.com/p/one-year-of-dataviz.html

I hope this week's viz doesn't come across as too morbid, You'll also find none of my typical gifs in this post as I wanted to keep it as respectful as possible. I was looking through data.gov the other day for a Kentucky-centric piece of data to viz when I found the Kentucky record from the VA of military burial sites and it seemed to be out of a larger dataset. Sure enough, I found the entire dataset which includes full names, birth dates, death dates, wars fought in, branch and occasionally rank of the individuals!

This is really neat for me as I've had several family members serve including both grandfathers who fought in World War II. As a matter of fact I recently got to go to LA to hang out with one of my best friends and we got to tour my Grandfather Davis' old battleship the USS Iowa! Here's a pic:
Me in front of my Grandpa's old ship the USS Iowa - Photo courtesy of Casey Miller

I having birthdays and death dates I was hoping I could do some more in-depth calculations on ages at time of death but the problem is that you can't specifically know based on the data that an individual died IN that particular conflict. You can make some pretty strong inferences to see how wars in particular pull down the median ages of people who pass away at different points. Let's go ahead and look at that dashboard now. You can click on a branch of military (the % of ALL records are on the right) and the ages/dates will reformat on the left (Median ages of people who passed away certain years). You can also just enter in a particular branch or rank on the left hand side if you'd like to filter the whole dashboard that way as well.


As you can see periods of war, particularly the first and second World Wars caused great dips in the median ages of veterans that passed away on those years. These pronunciations become much more distinct if you limit the ages of those who pass away to 35 and younger. The data as it's shown above represents ages 16-62 (the mandatory retirement age for military personnel barring special circumstances). In this next dash however you can see some gaps in the VA's data. Particularly between the years of 2001-2005... there's only a very small spattering of people represented in those years compared to the rest. Again you can reformat either side of the data by clicking or selecting parts from the opposite side. I'd suggest using a box select on the left side and selecting an individual war on the right. You can expand the "War" column on the right to allow for people who served in multiple wars to be selected. Generally speaking they are in chronological order but occasionally not. Some people, for instance, had WWI listed AFTER WWII... but let me assure you I did a TON of data cleanup on this... and I had to stop somewhere so ordering all the dimensions was my stopping point!


Finally this last one is a map of all the known military burial sites listed by the VA. I imagined there would be more than 183 but several have VERY large sections dedicated to soldiers. If you know that you have a family member or friend that is buried in one of these locations you can do searches. If you limit it down to a single person per location it will actually give you the full details of the location of the grave in the tooltip if you hover over the location with your mouse or finger. I hope that can be useful for some of you out there to find your loved ones.


Lastly let me say thank you to all those who have served... who lived and died for the freedom for people like me to be able to find public data and publish it.

As always hit me up in the comments below or on twitter @wjking0 if you have any questions or concerns.

Friday, December 23, 2016

A Year on Google's Project Fi


This is part of my #1YearOfViz series! Check out the archive here: http://bourbonandbrains.blogspot.com/p/one-year-of-dataviz.html
Most of you that know me know that I drink deep of the Google Kool-Aid... I've been a nexus 10 owner for years, I've beta tested apps from the Googs... you name it. That's why, about a year ago I was thinking about switching my cell phone service when I realized that they had a no-interest payment plan on their latest phone! I figured I would give the service a shot and see if it was everything it was cracked up to be. This is not as much a DataViz post as it is a Quantified Self post about what I learned while changing cell phone providers.

Also I just SUPER wanted a new phone, again if you've known me for a few years I carried around a Samsung Galaxy S3 with a screen that was more often cracked that normal (thanks alcohol!). Anyway the Nexus 6P was about the sexiest phone I've ever laid eyes on and I figured if Google held up their promises of using multiple networks to boost speeds it might be pretty amazing. I ordered my phone and was going to wait until Jan 1st to turn it on... it arrived and I had it in my hands for a cool 2-3 days (using wifi only) when my Galaxy S3 gave up the ghost and had a major problem with it's motherboard. To this day I think it was just jealous of the new phone. =D

What I hadn't really thought too much on was exactly WHEN and HOW I used my cell service. I kept thinking "I shouldn't really use much 'real' data because I'm always at places with wifi"...like my apt, my office, etc. This is where I was WRONG. After getting the phone that first day I was really wanting to run speed tests all the time and see exactly what this combined network signal would mean as far as speeds on the phone... but to test speeds do you know what you need? Large data files to transfer. I burned through almost 1/2 a gig in a few hours... I'd only allotted myself 2Gb a month (though it's not a problem if you use more, it just adds to your bill). You see I was coming from an UNLIMITED Sprint plan that I'd had forever and it was pretty rad. Anyway... I've logged my wifi connections via IFTTT for years so I figured I would give it a solid year to look at the differences. Let's get into the data:



Let's just take stock of the positives and negatives:
Super fast = Super Pricey!

  • Positives
    • It's AH-MAZING-LY fast! (see screenshot to the right!)
    • Reception is better in most previously "dead" zones
    • The build-quality of the Nexus/Pixel line of phones is impeccable
    • Initial cost of entry is very low ($20/month)
    • Integration with Google Services (like Google Voice/Hangouts) is GREAT
  • Cons
    • Actual phone call quality (particularly on Wifi) is kinda janky
    • $10/Gig of data is TOO DAMNED HIGH
      • Ex. I spent $5 in a few hours just running speed tests around town the first day or so.
      • I could burn through 1/2 Gb of data A DAY walking to work watching YouTube which, if I continued doing, would have cost me approximate $100/month in data
    • Really paying per gig is almost impossibly hard when you're used to unlimited data
    • Have I mentioned that fast network speeds really only matter when you feel that you're not paying for every Mb that flys to your phone at Mach 6!?


How I feel after trashing a Google Service
What does all this mean? Well... have I had good network speeds? Yes. Has my call quality been good? MOSTLY (drops sometimes, particularly in Wifi calling). Have I had to SUBSTANTIALLY alter the way I think about my phone being online? HELLS to the YES. That to me is the big flaw in Project Fi... The fast access just means that ultimately you're going to pay them more because you're going to pull down larger data and more HD video, etc. If they said something like "OK, all Google-related services are going to be FREE to access..." I could subsist on YouTube and Play movies/music etc while walking around town. Now I see the draw to places like T-Mobile who are bundling things like Netflix and Hulu in as "Unlimited" as far as data usage goes. Don't even get me started on things like image-heavy Instagram and other services that are no longer text based but image/video based only... ugh.

Bottom Line (literally)... Can I recommend Project Fi as a service to most people? Yes. Only if you're not someone who likes to constantly have your phone out. If you're a super nerd like myself and live on the Interwebs... you're going to hate Fi.

As always hit me up on twitter @wjking0 with any comments or questions!

Wednesday, October 19, 2016

Car, Cycling and Pedestrian Collision Data - Part 1



This is part of my #1YearOfViz series! Check out the archive here: http://bourbonandbrains.blogspot.com/p/one-year-of-dataviz.html

I knew I would have to do this eventually, I finally came upon a data set so large that I have to split it into multiple posts. I originally was turned on to this data as part of some work I had done with the University of Kentucky Police Department, I had done some work on their crime log and one of the captains thought I would be interested in traffic collision data (Captain Matlock who's super rad and a data nerd himself!). I said oh you mean traffic accidents? He replied, there are no accidents. when he gave me a sample of their data I realized I could deduce several trends in it, particularly in regards to pedestrian and cycling accidents as they occurred on campus.


Being an avid cyclist myself I saw the potential for this data to really help and inform other cyclists and people working on the planning of the University of Kentucky's roads and pathways. One of my last days as an employee of the University of Kentucky was spent with the cycling committee briefing them on this data . They then informed me that this looks to have come from a larger data set from crashinformationKY.org. When I pulled up the site I was giddy with excitement at the fact that there are so many data fields and so much historical data, way more than I had originally been given by the UK police department.

Unfortunately, the yearly downloads from that state police website were not very functional. They were .DAT files but neither of the data definitions listed on the website allowed me to properly parse those 2 GB yearly files into anything usable. I then decided that I needed to just scrape Fayette County as a proof of concept, however even that had to be done in six months intervals which go back over approximately the last six years.

Let's get into the data!

First I would like to mention that all points on the dashboards listed below are clickable, so if you click a roadway name it will reshape all of the shown data to reflect that roadway name until you click off of it clearing that selection. The same goes for things like day of the week of collision or hour of collision, any of those will reshape all the other existing data on the charts below this goes for all four of the dashboards I have posted below.

This first dashboard highlights the locations of collisions highlighted by particular roadways with day of the week and hour of day frequency being shown at the bottom. The map to the right of the roadway names shows coloration by number of injured in particular accidents the concentration of redness at different locations can be a good indicator of where are the most injurious areas of a particular roadway. Take a look around at the data and remember you can use the controls in the upper left of the map to zoom in on a particular area of interest after you have set your filters were selected your items on the other charts.



 This second dashboard is just a comparison of the percentage change of pedestrian and bicycle collisions by month and year. Taken as a whole pedestrian collisions have risen slightly over the last six years while bicycle collisions have fallen slightly . The thing to remember is the percentage differences in these changes are less than 1% so not terribly significant. If we limit the collisions to the last three years we see those trends are reversed. In the last three years , pedestrian collisions have gone down 0.4% while bicycle collisions have gone up 0.2% . I didn't have any particularly significant dates to slide the slider to in order to examine a particular change in Lexington policy or anything. But I left the date slider on the right in case anyone wanted to check something out. 



This third dashboard is a slightly more simplified version of the first except that I wanted to look at injury rates particularly. You'll notice the red coloration is the percentage of people injured on that given day or time. Again, the date sliders are on the right as well as selections for the day and hour of collisions though those can be selected by clicking directly on the dots as well . Things like bicycle and pedestrian collisions as well as, in this visualization, fatalities are included. 


This last dashboard is just one purely to look at every aspect of timing of cycling collations including by year, month, day of week, and hour with both injury totals and total number of collisions also listed in the chart.



 I will be linking the second or third or possibly even fourth part of this viz down below as I complete them but they will all hopefully be part of my one year of viz challenge that I've made for myself. As always, if you have questions or concerns you can leave them in the comments below this blog or hit me up on Twitter at @wjking0.


Thursday, October 13, 2016

National Parks Tourism and Money Comparison


This is part of my #1YearOfViz series! Check out the archive here: http://bourbonandbrains.blogspot.com/p/one-year-of-dataviz.html

Originally found the subject of today's blog in a Reddit forum that I am an admin of called r/datasets/. I'm a pretty big fan of our national parks and have several faves including Monument, Yellowstone, etc. This year marks the 100th year of the United States National Park Service has been in existence so I thought this would be a great time to make a viz about our nations national parks!

Some of the things that I discovered in the data are as follows:


I started noticing a trend that over the last couple of years national parks have been visited more than any time in the last 20 or so years! I've been searching for a reason in this uptick of people visiting national parks have yet to discern one. One suggestion is that the "Every Kid in a Park" initiative is responsible for the growth over the last couple of years. This seems unlikely however because after further research the beginning of the Every Kid in a Park initiative was September 1, 2015. You'll see by the chart below the last two years in particular since 2013 has seen some of the heaviest growth in recent recorded history.










Unfortunately, not all of the data that I'm showing Nice charts was available from a singular data source. I had to scrape the national parks website located here, as well as the national parks statistical site located here. Unfortunately, I was also unable to easily join the data set as the names of the national parks on the website do not match the names of the national parks on the report that they issue on their statistical page. Additionally, the main national parks website does not have any year by year breakdown of monies coming into a state per park (just a state total).

That said, we can still do some calculations with the amount of money that has come in totally and the number of parts located in the state to get a rough value of return over the last years 20 years per park. As you can imagine, none of this requires mind blowing mathematics or calculations. After I began examining data I noticed a trend in which coastal states tended to have a higher return value per national park than non-coastal states. I decided to do a grouping to see if that assumption was correct and it turns out, that it is! Check out the Story below and click through the stages I described above to see for yourself!



This makes sense if you think about it, most people (that I know anyway) don't vacation by going inland but a lot of people who are in land-locked states I believe tend to go towards the coast for vacation purposes. If you're curious about how the NPS calculates the amount of money coming in feel free to check out their write-up on these numbers here (PDF). What this appears to be on the outside is that coastal based national Park tend to pull about half million dollars more a year in revenue then non-coastal national parks. We can figure this out by assuming that the total on the national parks website was from the last 20 years of data they have collected.

Interestingly enough, the state with the highest return her national Park is actually North Carolina!

As always, if you have any questions or concerns you can leave a comment below or hit me up on Twitter at wjking0.
Enjoy those waves gang!

Wednesday, October 5, 2016

Where (NOT) to Eat in Lexington, KY - UPDATED LIVE DATA!

Lexington, KY Skyline
This is part of my #1YearOfViz series! Check out the archive here: http://bourbonandbrains.blogspot.com/p/one-year-of-dataviz.html

I posted the original version of this back several years ago as one of my very first geo-located dataviz that I'd created. With the new changes in Tableau Public I have finally found a way to get the live-updated data from the Lexington Health Department. If you'd like to see the raw Google Sheet that I'm pulling this data from I'll make it available here.

I didn't do too much as far as changing this data from it's original form except making the data a live-updating format and putting some additional filters and analysis on top of what I'd done previously.

First off I'd like to announce that I've developed what I think is a good mobile version which you can pull up on your phone if you'd like to bookmark to be able to quickly/easily check food scores/violations for a place. Click on the image below to be linked out directly to the dash!



If you'd like to see the full dash and analysis list click below to open up the rest of the blog post!

Friday, September 23, 2016

Does Marijuana Legalization Affect Drug Deaths?



I saw a question recently on Facebook that was asked somewhat rhetorically asking the following:

So with all the heroin overdoses I sit here wondering what the overdose percentage is in the states where marijuana is medically approved or legal. Do they have the same trouble with heroin as the rest of the country?

I thought to myself... 'I bet I could legitimately answer that!' I started searching around and discovered that there was a study done just a few months ago that looked at opioid usage in conjunction with state laws for medicinal marijuana. The findings were inconclusive when looked at as a whole but when the researcher looked at the 21-40 year old age group there was a pretty significant decline in automobile fatalities when compared to similar cases in areas where marijuana dispensaries (for medicinal purposes) were unavailable. Link to the full study can be found here.

That wasn't really getting at the core of what I think the person was asking which I see as 'does recreational marijuana's legalization cause a decline in opioid and particularly heroin usage?'


Me looking for the right up-to-date data
I searched around pretty extensively looking for facts about heroin usage and drug deaths but almost all data was, at the most recent, published for 2014. Most states and municipalities didn't legalize recreational marijuana until 2015 with Colorado being the exception. Even then finding drug related fatalities proved difficult and when I found drug-specific totals they were always at the national level. The upshot is that this search for data turned out to be WAY more difficult than I anticipated! The big problem was that arrest data or death data was just not as recent as I needed it to be to compare multiple states.



Suddenly I found out that the CDC keep records of "drug poisoning deaths" (overdoses). I found this article from the Colorado Public Radio which finally linked me to the data I needed! I started looking at the CDC blog... man this graph-style looks so famil-IT'S TABLEAU PUBLIC! Crap! I had already pulled down the raw data myself and started doing some work showing that the trends in Colorado were indeed a little worse than the national average of age-adjusted deaths by drugs.

That's when I noticed that the CDC and myself had built almost the EXACT same dashboard! (Screenshots below):

http://blogs.cdc.gov/nchs-data-visualization/drug-poisoning-mortality/
The CDC's Dashboard they created, click image to go to blog post about it!
The Dashboard I designed before seeing theirs!

On the plus side it made me feel pretty good that I was making similar design choices as someone who's employed by the CDC to do this type of dataviz!

Now the thing has become 'How can I salvage this or make it better?!'
Thinking how I could improve this to salvage the weekly #1yearofviz challenge!
I know I'm replicating some effort here but I think it's working looking at the way I lay out the map of drug deaths over time country-wide and state-wide. Particularly worth looking at is the last page of this Tableau Story where you see the national averages slide from the left to the right side over time:

If you'd like to see how your state looks compared to the same time nationally by state averages surrounding it you can use this dashboard here:



Of course most of this can be viewed in the CDC viz and I didn't want to duplicate too much effort....


The CDC was focused on how drug deaths have been steadily increasing year-to-year so I decided to change up the bottom graph to show relative change over time... what PERCENTAGE were drug deaths going up year-to-year and is there any difference in Colorado in that regard? I then came up with the following dashboard:


Now while this is just one year's worth of data the lower uptake of drug-related deaths in Colorado in 2014 is SIGNIFICANT. This is officially the slowest increase since 1999 and WAY below the national average! This is a key thing as Colorado has been (as mentioned previously) in the top states for drug related deaths per capita for the past few years. One would tend to think that trend upwards would continue as it has nationally but in 2014, while it DID INCREASE, it was the smallest increase in 16 years! Now correlation doesn't equal causation but this data can be revisited later for other states who adopted recreational marijuana policies in 2015 when that data becomes more readily available! The answer to our earlier question if it reduced opioid/heroin deaths... that's hard to say but as those are the most likely cause of death currently among illegal drugs we can assume that those drug overdoses were reflected in this reduced increase in numbers.

Me at my friend's places after they go to Colorado for "hiking"


I hope you round this data interesting. If so please comment/like/share it out on social media. As always if you'd like to say something feel free to comment below or to hit me up on twitter @wjking0. If you have a question you'd like Viz'ed out as part of my #1YearOfViz please hit me up and let me know! Thanks to James for the question this week and I'm sorry more data wasn't available to get a more robust answer!