Friday, September 30, 2016

Phoning It In - Analyzing My Call History


This is part of my #1YearOfViz series! Check out the archive here: http://bourbonandbrains.blogspot.com/p/one-year-of-dataviz.html

Disclaimer: This viz is only calls I've MADE, not calls I've RECEIVED. There isn't really any way for IFTTT to track incoming calls and Project Fi (my provider) does have a data-dump utility but it doesn't have contact names etc in it. Additionally it's only limited to around Feb 2016 and Forward so the historical data isn't really there yet for me. Also this viz (thanks to the new Google Sheets connector in Tableau 10.0) will automagically update by itself as time goes on so the viz you're looking at now will be the most fresh version anytime you look at it!

For the last few years I've been keeping some details of my usage of various things (calls, wifi, etc) that I do with my phone in order to work more on what a lot of data scientists called the "Quantified Self". A little better self-understanding never really hurt anyone and understanding your own usage of things can be a good predictor of future needs as well as making behavioral changes.

I started logging all of my outgoing calls on April 24, 2014 and had a slight hiccup in data collection from 5/2/2015 to 12/18/2015 as I didn't know there was a problem with the IFTTT formula I was using and it stopped working until I checked on it. DOH!

Like the title suggests this was a pretty quick viz for me to throw together. Let's jump into the data! The first chart is just something I found interesting when the data is zoomed out to the topmost level. You think that you're making less phone calls and your talking less but according to my data (which again is largely incomplete from 2015) that's actually inaccurate. I'm making MORE calls in 2016 than in previous years!


The second viz is literally just a chart of all the breakdowns you can imagine for a phone call, Month of Year, Day of Month, Day of Week, and Time of Day.


And of all the strange things I found when I was doing the write-up for this viz I came across this gem...

The last one is the one I like the best, it shows frequency of contacts. I decided the most fun calculation was to see how likely I was to call a given person any given day. I calculated up how many days there had been total that I'd gathered data and divided by the count of days for each individual user to come up with a nice little percentage chance that you'll get a phone call from me!


If you really want to talk to me though you'll have to reach out to me either in the comment section below or via twitter @wjking0 (Or Click the giant Pusheen kitty below!).


Friday, September 23, 2016

Does Marijuana Legalization Affect Drug Deaths?



I saw a question recently on Facebook that was asked somewhat rhetorically asking the following:

So with all the heroin overdoses I sit here wondering what the overdose percentage is in the states where marijuana is medically approved or legal. Do they have the same trouble with heroin as the rest of the country?

I thought to myself... 'I bet I could legitimately answer that!' I started searching around and discovered that there was a study done just a few months ago that looked at opioid usage in conjunction with state laws for medicinal marijuana. The findings were inconclusive when looked at as a whole but when the researcher looked at the 21-40 year old age group there was a pretty significant decline in automobile fatalities when compared to similar cases in areas where marijuana dispensaries (for medicinal purposes) were unavailable. Link to the full study can be found here.

That wasn't really getting at the core of what I think the person was asking which I see as 'does recreational marijuana's legalization cause a decline in opioid and particularly heroin usage?'


Me looking for the right up-to-date data
I searched around pretty extensively looking for facts about heroin usage and drug deaths but almost all data was, at the most recent, published for 2014. Most states and municipalities didn't legalize recreational marijuana until 2015 with Colorado being the exception. Even then finding drug related fatalities proved difficult and when I found drug-specific totals they were always at the national level. The upshot is that this search for data turned out to be WAY more difficult than I anticipated! The big problem was that arrest data or death data was just not as recent as I needed it to be to compare multiple states.



Suddenly I found out that the CDC keep records of "drug poisoning deaths" (overdoses). I found this article from the Colorado Public Radio which finally linked me to the data I needed! I started looking at the CDC blog... man this graph-style looks so famil-IT'S TABLEAU PUBLIC! Crap! I had already pulled down the raw data myself and started doing some work showing that the trends in Colorado were indeed a little worse than the national average of age-adjusted deaths by drugs.

That's when I noticed that the CDC and myself had built almost the EXACT same dashboard! (Screenshots below):

http://blogs.cdc.gov/nchs-data-visualization/drug-poisoning-mortality/
The CDC's Dashboard they created, click image to go to blog post about it!
The Dashboard I designed before seeing theirs!

On the plus side it made me feel pretty good that I was making similar design choices as someone who's employed by the CDC to do this type of dataviz!

Now the thing has become 'How can I salvage this or make it better?!'
Thinking how I could improve this to salvage the weekly #1yearofviz challenge!
I know I'm replicating some effort here but I think it's working looking at the way I lay out the map of drug deaths over time country-wide and state-wide. Particularly worth looking at is the last page of this Tableau Story where you see the national averages slide from the left to the right side over time:

If you'd like to see how your state looks compared to the same time nationally by state averages surrounding it you can use this dashboard here:



Of course most of this can be viewed in the CDC viz and I didn't want to duplicate too much effort....


The CDC was focused on how drug deaths have been steadily increasing year-to-year so I decided to change up the bottom graph to show relative change over time... what PERCENTAGE were drug deaths going up year-to-year and is there any difference in Colorado in that regard? I then came up with the following dashboard:


Now while this is just one year's worth of data the lower uptake of drug-related deaths in Colorado in 2014 is SIGNIFICANT. This is officially the slowest increase since 1999 and WAY below the national average! This is a key thing as Colorado has been (as mentioned previously) in the top states for drug related deaths per capita for the past few years. One would tend to think that trend upwards would continue as it has nationally but in 2014, while it DID INCREASE, it was the smallest increase in 16 years! Now correlation doesn't equal causation but this data can be revisited later for other states who adopted recreational marijuana policies in 2015 when that data becomes more readily available! The answer to our earlier question if it reduced opioid/heroin deaths... that's hard to say but as those are the most likely cause of death currently among illegal drugs we can assume that those drug overdoses were reflected in this reduced increase in numbers.

Me at my friend's places after they go to Colorado for "hiking"


I hope you round this data interesting. If so please comment/like/share it out on social media. As always if you'd like to say something feel free to comment below or to hit me up on twitter @wjking0. If you have a question you'd like Viz'ed out as part of my #1YearOfViz please hit me up and let me know! Thanks to James for the question this week and I'm sorry more data wasn't available to get a more robust answer!

Wednesday, September 14, 2016

Live Fast, Die Young... Celebrity Birthdays/Deathdays!



Given the rash of celebrity deaths in 2016 I got thinking about the longevity of celebrity. Not the celebrity itself or the time someone is famous, but the longevity of actual celebrities lives themselves.


Of course, the first trick is figuring out what constitutes a celebrity? I thought about looking at list on Wikipedia what is the top 100 rock stars or something like that. But none of that seemed like a reliable list with a lot of historical data that I can do something with a real analysis on. That's when I found www.famousbirthdays.com and I realize that I could do a whole lot which date set they were using!

When I realized they had WHAT people were "famous for" I could have just died... they even had what I will go on to classify as "new media" stars in it (people like YouTube stars, Instagram celebs, etc). This was pretty much my face when I started getting into the thick of it:

I started wondering not only if celebrities were dying younger today than they used to but also if I could figure out any trends based on "type" of celebrity.

Let's hop into the data!

First off lets look at the nature of the celebrities on FamousBirthdays.com. The big thing to remember is the LOWER the number the HIGHER the celebrity "value". Ie "1" would be the top celebrity on the site (which is currently Justin Bieber by the way).


For all those people out there who think astrological signs play into personality I have this breakdown of astrological signs:

Cancers tend to be the largest contingent of "famous" people with 8.9% of the population but consist of only 8.5 % of the US population.

But you didn't come here for astrology....
Dave Coulier is almost as awesome as Bob Sagat


First let's look at the age spread in the population of the data both currently and at time of death:

The median age at the time of death for this group is 51.5 years old. When we think of celebrities though we tend of think of the Kurt Cobain's, the Chris Farley's, and other young talent killed because of choices or lifestyle. Given 52 is still far younger than the median death age of 78.6 here in the United States we could chalk it up to all kinds of external factors like diet, lack of sleep, etc.

So finally here's the big chart, here's the proof in the puddin' that, YES, celebrity is killing people quicker now than ever before. You can see at the bottom of the following chart that the average age of celebrities dying over time has gotten younger and younger... younger now than I honestly thought would be reasonable. I believe the main cause for this extremely low age of death is due to the fact that a LARGE chunk of this data set is taken up by "New Media" (as I've defined it) stars and because these stars tend to be of a younger generation it's skewed more recent data towards younger trends in death. Obviously though that's not entirely the case as if you mouse-over areas below such as "Musician" you'll notice that the trend towards younger deaths has been going for almost 100 years!


The average lifespan in the United States increases by approximately 1 year for every decade while the average lifespan of "celebrities" tends to drop by about 8 years for every decade!

Of course you may hope that your fave celebrity will live forever and you wouldn't be alone.

In the meantime I say that we cast whatever witchcraft we have to to keep Betty White around and telling jokes!


I hope you found this interesting. This data isn't quite as malleable as I hoped that it would be and it took a TON of data-shaping to get the large-scale celebrity groups created but I think the analysis has been worth it. If you have any questions or concerns you know you can always hit me up on twitter at @wjking0 or leave a comment below!

P.S. This is the second entry in my #1YearOfViz that I'm working on. You can check out the list of all published works here.

Wednesday, September 7, 2016

Churches versus Stoplights - "Small Towns" in KY





Growing up in a small town just outside of Charleston, WV and being part of the "Bible Belt" it was always a running joke in my hometown between some of us there there were (literally speaking) twice as many churches as there were stoplights in our small town. With 6 Churches and 3 Stoplights (giving the town the 3rd stoplight is pretty generous as it's on the very edge of town) the math was easy. I wondered though, could I do it on a larger scale? A scale of a larger city? Or a whole State?

First trick would be to find the church data? I tried to think of religious databases then I realized I was approaching it was from the wrong direction. What is every church besides a religious organization...? A tax exempt organization! You know who likes taxes? The Federal Government! I knew that tax records are a matter of public record so a quick jaunt over to data.gov later and I'm swimming in tax exemption data!



I realized after I got this data that I'll do a future blog just about the non-profit data in the US, there is WAY more than I anticipated there being in that data. Luckily there is a tax exemption category for "churches" which includes churches, synagogues, mosques, and of course the Church of Scientology.

Next I wondered how in the world I was going to get the location of every single stoplight in KY. I hopped onto the Kentucky Transportation Cabinet and found their IT staff and shot one of them an email. BOOM! The county/latitude/longitude of EVERY SINGLE operating stoplight in the entire state! They were SUPER NICE about it too! I didn't even have to file an open records request! That is how you do public service ladies and gentlemen!

Thanks to the Kentucky Transportation Cabinet for making Kentucky a safer place to drive than this!


Interestingly one of the things I came to notice really quickly was that several entire COUNTIES within Kentucky contained not ONE single stoplight! This thusly caused a "divide-by-zero" error in my calculations which is why you'll see several that are "null" in the maps etc (which are by zip not county). I figured out a different way to write my calculations to take into account the Null/Zero data for stoplights in certain zip codes.

Let's get into the data!

Where ARE all these things!? Check out this and click through the tabs to see where the locations and densities of churches and stoplights throughout Kentucky!


Next is a little Tableau Story showing the True/False status of "Small Towns" by Zip. Red represents small towns and blue are "Big City" towns. The next tab contains a more granular breakdown of "levels" of smallness. Finally is a chart showing "largest" to "smallest" using a difference over sum equation to normalize for the total number of churches/stoplights and keep it relative! Where does your hometown or birthplace fall!?





Ultimately what it all boils down to is that there are 504 zip codes with relevant data in the state of Kentucky and of those 185 are "big towns" in Kentucky and 319 are "small towns".

I know this data may not seem like much but it's been a LONG time in preparation and presentation. As always if you have any questions or anything hit me up on twitter at @wjking0!

P.S. This is the first in what I'm going to call my #1YearOfViz where I'm going to try to do a visualization EVERY SINGLE WEEK. I can't promise I'll always publish on the same day of the week or time but right now I'm looking at either Mondays or Wednesdays as my "publish days". If anyone knows any newspaper contacts or data journalism contacts that are looking for fun data related news stories have them tune in and get in touch! Also if you have any suggestions or thoughts on what you'd like to see over the next 52 weeks of viz give me a shout or leave a comment below!


Friday, July 8, 2016

Police Shootings 2015-2016



I don't have the time or the drive to make this "pretty". It shouldn't be "pretty", it isn't. I wanted to get FACTS out to people because it ultimately isn't about 1 or 2 individuals getting killed out there, it's about the systemic racism of police forces in America. I pulled this data from the Washington Post's great viz about this which you can find here.

I'll post the viz's below and you can play around with the morbid data yourselves but I'd like to point out a few things that came to light for me:
  • UNARMED Black people in America are 2x more likely to be killed by a police officer than a White person (UNARMED Hispanic Americans are 1.5x more likelythan White Americans to be killed)
    • If you are UNARMED, Black, and show signs of mental illness you are4.6x more likely to be killed by a police officer than an UNARMED, White, and showing signs of mental illness
  • UNARMED Black Americans are 3.39x more likely to be killed than White's in a state of "Not Fleeing" and not "Attacking" (Hispanics are 2.18x more likely)
  • The approximate US Population is as follows from the 2010 US Census:
    • White American 223,553,265 72.4 %
    • Black American 38,929,319 12.6 %
    • Asian American 14,674,252 4.8 %
    • American Indian or Alaska Native 2,932,248 0.9 %
    • Native Hawaiian or other Pacific Islander 540,013 0.2 %
    • Some other race 19,107,368 6.2 %
    • Two or more races 9,009,073 2.9 %

That is to say there are literally more than 5x as many "White" people as "Black" people in America so regardless of the stories of individual numbers. Are you going to tell me why the numbers of UNARMED people getting killed in America is almost equal between these two racial groups? No really, I'll wait.

All of this said, I believe that police officers risk their lives for us every single day. The large majority of those people are heroes in every sense of the word, but I feel that the systemic racism in America causes an "other" reaction when dealing with someone not of your own race and unfortunately in quick life-or-death moments those systemic feelings can cause people to make tragically wrong decisions. Are there racist cops out there? Certainly. Should we treat all cops like racists? Of course not. If you'd like to hear a great video that I feel encapsulates my feelings check out Trevor Noah's take on the whole thing:



Now let's get on to the Viz, like I said it isn't pretty or super-cleaned-up, it's just the facts. These percentages represent how many UNARMED people of that particular group were killed vs ARMED members of that group. When I say UNARMED I excluded ALL weapons, they had no guns, no knives, no shovels, nothing in their hands. I urge you to mess with the filters to find your own truths in this data.



This last one is just a map so you can see city-by-city and state-by-state:




As always if you have any questions about this dataviz or any other please feel free to hit me up on twitter @wjking0. Try to stay safe out there everyone and if you know someone who is struggling with all that's happened now, just offer them a shoulder to cry on and a hug. *hugs to you all*

Monday, June 27, 2016

Instagram In My Hood (1 Year of EVERYONE's Lexington, KY Instagram Posts)



PREFACE: This page contains LARGE-SCALE dataviz! It will NOT work on your phone! Walk or run to a desktop/laptop/tablet computer to view the dataviz properly formatted!

A little over a year ago I came across an amazing IFTTT (If This Then That) recipe for "Instagram in My Hood" and I thought "well I'm going to look at this just for the name..." and what I found was fantastic! It was an IFTTT recipe for cataloging ALL the geo-tagged Instagram posts within a region!

UNfortunately, Instagram's usage policy changed and now those location-based IFTTT recipes will no longer function due to changes in their API. BOOOO! =/

When I checked to see what all I'd gathered since turning it on I found it had run from March 26th, 2015 until June 1st, 2016 so a GOOD chunk of data! Of course I would have like to have run it multiple years to see if trends change or if predictors held true but alas, that's not the world we live in. Instead I can show you when and where certain people talk about certain things in Lexington!

Let's talk for a second about what this data IS:

  • PUBLIC Instagram posts
  • GEO-LOCATED posts
  • Limited to WITHIN New Circle Road in Lexington, KY (this was approximately the limit on the area I could cover with the IFTTT Instagram API call).

What the data is NOT:
  • PRIVATE Instagram posts
  • NON GEO-LOCATED posts
Interestingly enough if you choose to geo-tag an Instagram post that makes it public regardless of settings (essentially because you are "tagging" a place). 

First let's look at posts over time and by hour-of-day and day of week. Please note that the days where there are only 6-10 posts are ones where Instagram and IFTTT had some technical glitches. Also notice that if you're interested in a particular hashtag or word you can search the text content of the Instagram posts to look for frequency with the search bar on the right of this viz screen.


You can see that (as you would expect) Friday, Saturday, Sunday are the largest post days-of-the-week but I thought Thursday (because of Thursday Night Live) might actually be the next highest day-of-week. Surprisingly the next highest is actually Tuesday for some reason! I haven't done a deeper dive into the data to figure out why yet. If anyone has any suggestions let me know!

I realized that I could figure out the average posts-per day for a place but I realized that there were some places that had tons of posts per day (I'm looking at you Wild Fig! ;-) ) but I decided to scale the size of the dot on the following image to the number of posts per day and then use the count of distinct users to help bring a "pop" to the places where there are actually large numbers of different users talking about/from.


The next thing I looked at is WHO is posting and where do particular users post from the most?

I know this is a little messy but given the number of users I wanted some color variation (highlighting didn't seem to work as well without it). You can enter a username or select from the list of names below sorted by most frequent posters. If you mouse-over the name or the bar representing their number of posts it will show you a highlight on the map of all the places that particular user posts from around Lexington.





Finally I replicated some of the functionality of Instagram's search by doing a text search as well as adding mapped locations where that thing is mentioned. Below you can see "Beer" as the search term and you'll notice it coordinates to bars but more specifically breweries in Lexington! Imagine if you could do this globally with Instagram and you could find the most talked about bars in a town!


As always if you have any questions about this dataviz or any other please feel free to hit me up on twitter @wjking0

And also this is totally how I feel when I spend the majority of my birthday writing up dataviz blog posts, playing video games, and eating donuts from the awesome North Lime Coffee and Donuts. =D
<------

Thursday, June 2, 2016

Kentucky School Vaccination Rates (2015)



Let's talk about vaccines. First off, I'm NOT going to have a debate about how effective or dangerous vaccines are. They're both effective AND safe. I've crunched the numbers for the amount of things like mercury (Thimerosal actually) contained in vaccines and basically if you've eaten fish in the last year or two you've consumed more actual mercury than in all your childhood vaccines combined.

OK, now that we're done with that... let's talk about vaccination rates! Contrary to popular belief MOST of the world is vaccinated!
Rates of measles vaccination worldwide
Turns out that even in super-rural and third-world countries people will travel great distances to get their children vaccinated.

I stumbled across the Student Health Data provided by the Kentucky Department of Education and thought, "Man, I wonder how many kids in this state go unvaccinated?" For those with other questions such as what the average BMI of school kids of different grades in different counties are etc.

Turns out more kids are vaccinated than I expected when I started crunching through the data! Good job Bluegrass! To let you know how these numbers were calculated I used the enrollment number of each school and just did a little division with the other variables represented. No fancy-dancy math needed here! To be clear the data comes from the 2015 school year.

The classification for the numbers you're going to see here may need a little defining.


  • "Grade"
    • 0 = 5-6 years old 'Preschool' (pre-1st Grade)
    • 6 = 12-13 years old 'Middle School' age
  • Vaccinated Definitions
    • 'Vaccinated' = Fully Vaccinated and Up-To-Date on Boosters
    • 'Non-Vaccinated Missing' = No Vaccines and No Boosters
    • 'Non-Vaccinated Expired' = Previously Vaccinated but did not receive booster shots
    • 'Non-Vaccinated Religious' = Vaccinations not applied for "religious reasons"
    • 'Non-Vaccinated Medical' = Vaccinations should not be applied to these individuals likely because of immuno-compromising diseases or treatments (such as AIDs or chemotherapy)
    • 'Non-Vaccinated Provisional' = Vaccines may not be completely up-to-date and/or may be being delivered at a staggered rate for medical reasons but are planning to be delivered on a particular schedule.
  • "In/Out of Independent School"
    • In = Independent/Private School System
    • Out = Public School System
  • "District"
    • For most senses this is represented as the county in which the school resides but excludes Independent School systems

Let's jump right in to the data!





For those of you who would prefer a sort-able list to see where your county falls in the scheme of things you can also use the following Tableau Story to click through... also feel free to click around and sort any of these fields you would like to!




I know there's not a ton of interactivity on these Viz's nor a lot of differentiation but I wanted to just share that the percentage of immunizations in KY was surprising to me. The big thing is that there are preventable things happening in regards to immunizations in children which could easily be preventable. The prime thing is keeping children current on vaccines.

The trend in the data from my perspective is that In almost every other category

To summarize here are a few things I found interesting:

  • The majority of KY children who are susceptible to these types of infections are ones who have not received booster shots so they fall into the "Expired" vaccine category
    • The largest change in any group is in the "Expired" group
    • The increase in students from grades 0-6 is about 4.869% in lack of updated booster shots
  • In virtually EVERY other category (save Provisional which increases by 0.010%) all other reasons for non-vaccination go down rather drastically between grades 0-6
  • Looking at the difference between Independent (private) and Public schools I saw very little difference on most issues and didn't feel it was relevant to look at it with this differentiation included. A few things worth noting:
    • Independent schools do start with a higher average of students with religious and provisional exceptions
    • By grade 6 stay Independents retain almost exactly the same % of vaccinated students
      • Expired %s go way up (3x approximately)
      • Missing and Religious %s go down
  • The data in some places is missing a fairly large number of students
  • Bell and Bath Counties all are VERY low as far as full vaccination rates (this could be because of missing data, which we have to count as a loss)
  • Breathitt County has 15.8% of their students non-vaccinated due to legitimate medical reasons




Places like Breathitt County are the reason that the idea of herd immunity is very important! Unfortunately the rest of their stats aren't looking very good either, the big problem is the total number of enrollment there is very low so the likelihood that those immuno-compromised students will interact with non-vaccinated students is very high. Finally I just wanted to share out with you this little gif explaining why herd immunity is important in protecting people:

As always for comments or questions comment below or hit me up at @wjking0 on Twitter!