Wednesday, September 17, 2014

Python scripts to shorten column names, or to fetch Google Ngrams data

I've made a couple new GitHub repos:

google_ngram_py, which allows you to look up one- to five-word phrases in Google Ngrams Viewer (which shows the frequency by year) from python and returns the data as pandas dataframes, separated into parent and child for case-insensitive searches (e.g. parent is 'the (All)', children are 'the', 'The', 'THE').

shorten_column_names, which allows you to find the most common words in a list of phrases and abbreviate them; I used them for shortening the sometimes 100+-character column names from World Bank data (e.g. population -> pop), but you could use it on any list of strings.
