Big Data – Data Warehousing

There are many different areas of the architecture to design when looking at a big data project. Are you looking for a fast and reliable data warehouse that does not require the data massaging, aggregation and timing of a traditional data warehouse?  Here are the major elements we look at in an architecture with a focus on data warehousing in this section.

Data now comes from more places than ever and rapidly. New big data warehousing technology provides the ability to perform multiple parallel queries eliminating the need for pre-aggregation and cumbersome processing.

These big data warehouses are fast because of the underlying approach and architecture are different in these ways:

  • They run the query in a parallel way
  • Use memory in an efficient way
  • The data is distributed across disk
  • Only the columns requested are returned

There are some thoughts below on the pros and cons of these tools. Advanced inSight has experience with many of the products below with a concentration on Amazon Redshift. Let us help you make the decision.


Pros Cons of Data Warehousing

Amazon Redshift

  • Pros: Standard SQL DB with MPP features that allow it to scale. Supports SQL tools
  • Cons: Based on older Postgres version. Significant management required


  • Pros: MPP awareness directly into the BLU columnar query engine. Supports SQL tools
  • Cons: Some confusion over many IBM data base offering (BigSQL; Blumix; BigInsights; Netezza)

HP Vertica

  • Pros: Good value for investment. Supports integration with Hadoop with Vertica for SQL
  • Cons: Market adoption a question – ongoing industry developers might be less than competitors

Microsoft SQL Data Warehouse

  • Pros: Familiar T-SQL and Power BI for query across relational data in your data warehouse
  • Cons: Some reported back-end infrastructure issues. Confusion over so many offerings

Google Big Query

  • Pros: Eliminates SQL overhead. Good for custom implementations and teams who dislike SQL
  • Cons: Does not use standard SQL and does not support standard SQL tools