In the previous post we started our first Amazon EMR cluster with Hadoop, Hive and a couple of other tools installed. We saw even our first examples of the Hive query language to retrieve the available tables and databases.
In this post we’ll continue with the basics of Hive. We’ll create a database and a table, fill it with some raw data and execute a couple of aggregation scripts on the data. We’ll finally store the aggregated values in another database.
View original post 1,326 more words