Microsoft BigData

Microsoft BigData/HDInsight Datawarehouse

Introduction

Micorosft are developing a BigData Solution based on Hadoop. Currently is Hortonworkd HDP 1.1 used but this might change. What’s good is that HDP has good support for ODBC which makes it possible to connect to the data warehouse from a vareity of reporting solutions (foir instance Excel).

Installation

Steps:

  1. Launch a AWS EC2 instance, I’m using m1.xlarge for now
  2. Connect and install HDInsight using this http://www.microsoft.com/en-gb/download/details.aspx?id=35397
  3. Install Hortonworks ODBC driver (I’m using 64 bit) - http://hortonworks.com/thankyou-hdp12/#addon-table
  4. Check that hive is up and running. Open a PowerShell session
    • cd C:\Hadoop\hive-0.9.0\bin
    • ./hive -h localhost -p 10000
    • show tables- empty result but there should not be any errors
  5. Open the task manager, check that CPU and Memory is ok
  6. Open the ODBC manager
  7. Navigate to System DSN->select Sample Hortonworks…->Configure
  8. Now enter Host: localhost, Hive Server Type: Hive Server 1
  9. Hit test
  10. Go to the Task Manager and click on the CPU column

Java consumes 30% CPU and ODBC 15% and ODBC manager does not respond.

Test with MapR ODBC Connector

Download the MapR ODBC driver from here: http://www.mapr.com/doc/display/MapR/Hive+ODBC+Connector

Do the same test as above. Same result…