How to: Using OpenRefine, Excel to clean messy, user-submitted data


Chris Essig

Note: This is cross-posted from Lee’s data journalism blog, which you can read over there by clicking here.

We at The Courier ran into an interesting problem last week that I’m sure other news organizations run into as well.

Every year, The Courier puts on a Best of the Best competition. We select a series of categories like food, dining,  jewelry stores, etc. and ask our readers to pick the best business in town for each category.

The businesses with the most votes in each category are rewarded by being featured in the newspaper. They also get plaques they can hang at their business to show off their nifty award.

We have several forms that people use to submit their votes. They are allowed to type in their favorite businesses, instead of pick from a field of choices.

We store their votes in a database. We then spit…

View original post 1,645 more words