Me when I hit Import.io's scraping limit and get banned (again) |
I wanted to start this post talking about the problems I've recently encountered was using Import.io. Multiple times now I have run into the their scraping limit for "free" users and have been temporarily banned from using their services. one time accidentally ran cloud-based scrape as a test but the scrape continued after I closed it so I ended up running a query of a thousand instead of the 20 to 30 I wanted to run. Then this month I've ran a scrape of over 10,000 (their new limit on local clients scraping) in a given month, I was originally told the Legacy client would be allowed to have infinite scraping as long as it was done locally (via their Facebook users group). This was apparently not the case.
I started out looking for a new scraping client checking out several pages for clients to use. But almost all web scraping services required monthly subscription fees, or have no local clients to use for cheaper or free rate. That's when/how I discovered Octoparse!
/\ Me hugging Octoparse! |
It is kind of magic! It took me over a week to really learn how to use it, but this is been extraordinarily worth it! The main school and Octoparse are much more tools in imports. Octoparse allows unlimited queries from the local client and you only have to pay when you're using their cloud-based services. Which is the way I suggested import price their systems when support contacted me about my ban from their services.
Hey Octoparse, I just met you, and this is crazy, comment my blog, and sponsor me maybe!? |
I only wish I had known about Octoparse earlier so that I could have stayed myself around 12 hours worth of work when I did the West Virginia State salary scrape a while back! what you will be looking at in this visualization is the first scrape that I have completed using Octoparse. The data came out incredibly clean and simple, my only complaint in the export of Octoparse is that CSV export to be not directly readable by excel when opened. It's really a minor complaint next to the awesome flexibility of the product though! Let's get into the data!
I've settled in on my designs for salary-based dashboards with only a single year of data. I decided not to fix it since it's not broke and replicated the same types of dashboards I've done in the UK Salary Viz here and a little bit of the work I did in the WV State Salary Viz mentioned previously. The "Dots Dash" as I call it is really just a fun visual representation of all the people/years/money that goes into something like public education in one single county.
This next one is just Salary Over Time and Number of People Over Time so basically how many people are making approximately how much, how quickly do you see raises given, etc. If you'll notice at the side this viz starts out with a filter of "Instructor" on it to show specifically teachers salaries over time as all teachers (I think) have 'instructor' as part of their titles. You can set this wildcard filter to whatever you'd like (ex. 'bus driver') to see how your or a friend's particular job futures will look over time.
The next story dashboard I really wanted to look at how locations/grade-types pay different teachers. Do art teachers make more at Liberty than at Brian Station? How about music teachers at Elementary schools vs High Schools? Step through the story with the top tabs and you can filter on the right and compare median salaries by location. I'd like to ultimately turn this into part of what I'll use for a future dashboard I'm going to work on that will compare test scores to teacher salaries for particular places... but this will have to do for this week! =D The last little section was just because I was curious how how much principals make in general and I was surprised (and glad) to see they make good money.
This last dash is just the "big list" that a lot of people like to see... if you CLICK on a location or a job title the data to the right (medians/averages of salaries and years worked) will reformat to that highlighted selection. If you click on a job it will not be the medians/averages for that particular school (as each school doesn't have enough non-teaching staff to make that functional) so it reformats to show EVERYONE who shares that job title. You can also filter this list by name if you're looking for someone in particular's salary.
Finally, as the son of a public school teacher let me say to all of you out there doing the work every day...
As always hit me up on twitter @wjking0 or in the comments below for questions/concerns!