This is part of my #1YearOfViz series! Check out the archive here: http://bourbonandbrains.blogspot.com/p/one-year-of-dataviz.html
WARNING! If you're offended by "bad" language steer clear of this viz!
Originally this week I was going to work on a viz about Girl Scout Cookies.... I was hoping to find some sales numbers... then after some brief searches I realized that was a pretty fruitless endeavor and the closest I could come was looking at the google trends for the different Girl Scout Cookie names. While interesting... isn't exactly dataviz worthy.
So switching out from the totally mundane and safe for work topic of Girl Scout Cookies (THIN MINTS FOREVER!)... I flipped my mindset entirely and decided to look at Urban Dictionary. I was thinking back to an article that I read about when IBM's Watson was fed Urban Dictionary to help learn slang and ended up having to have it purged as the AI wouldn't stop swearing as part of it's normal speech pattern.
I considered attempting to scrape it but wasn't sure how large a scrape that would be... when low and behold I found someone had already done the work for me! Huzzah!
I'm too lazy to scrape that much data! |
- Merriam-Webster
- Has ~225,000 definitions
- Keeping the Urban Dictionary average we can say Merriam-Webster has approximately 176,194 words
- Estimates the number of english words to be around 1,000,000
- This may be off by as much as 250,000
- Includes names of chemicals and other "scientific entities"
- Merriam-Webster's Unabridged International Dictionary contains 470,000 "entries" (I'm going to assume those are "words")
- Urban Dictionary
- Avg 1.277 definitions per word
- Avg word length 10.05 letters
- Median word length 9 letters
- Total number of definitions is 2,079,261
- This contains phrases as well as words
- 1,457,980 Unique Words/Phrases
Before we actually get into the data remember that I just manipulated the data into the viz and am not the author of any of this. If you're easily offended by slurs or bad words... now would be the time to check out another viz!
I limited the whole viz to the top 10,000 words/phrases by their sum difference between their Upvotes and Downvotes. Some had multiple definitions and so the list looks slightly different if we use Total instead of Average for the Up/Down difference (in that instance "Sex" becomes the top word instead of the second word). CLICKING ON ANY WORD will cause a pop-up to that word so you may need to disable pop-ups to go out to Urban Dictionary from within the viz!
Now if we compare that to the trending words on Merriam-Webster you may see a SLIGHT difference.
How I picture the people at Merriam-Webster right now. |
Now this next viz is really just to let you play around and reformat the data however you'd like. You can change both X/Y/Color axises to answer some of your own questions you may have. I'm still limiting this to the 10000 words... ALL words were just too many to really manipulate the data and click around to learn definitions!
I was thinking because Urban Dictionary uses a "defid" field that seems pretty sequential so I wondered what some of the first words were. Obviously several had been deleted as out of the first 100 "defid" fields only 37 were left. The first remaining one that is visible is ID#7.... Janky which was posted December 09, 1999. The user Boomer is likely one of the first admins and has since posted 19 items total, most of which were at the very beginning of the site.
I hope every had as much fun kicking around in Urban Dictionary's data as I did! I know I learned some new swear words!
Who knew it was SO VERSATILE! |
Of course if you have any questions or concerns please give me a shout in the comments below or via twitter @wjking0! As usual please share this if you found it fun/interesting!
How I felt after finishing this viz! |
No comments:
Post a Comment