Big Data - Data Processing -

Big Data – Data Processing

There are many different areas of the architecture to design when looking at a big data project. As data is being added to your Big Data repository, do you need to transform the data or match to other sources of disparate data? Can you handle the amount of data streaming into your Big data framework or can you mostly focus on processing the data coming in and pick the right data store or warehouse? Here are the major elements we look at in an architecture with a focus on Data Processing in this section.

Data now comes from more places than ever and need to be connected to other data sets.As data is being added to your Big Data repository, do you need to transform the data or match to other sources of disparate data? This step of processing the data is most critical the right decision on which tool to select is imperative. There are some thoughts below on the pros and cons. Advanced inSight has experience with many of the products below including MapReduce, Hive on Tez, and Spark. Let us help you make the decision.

bigdata_data_processing

Pros and Cons of Data Processing

MapReduce

Pros: handles any scale of data, reliable, lots of customization
Cons: hard to program against, slow

Pig

Pros: scalable, reliable, some customization possible
Cons: still hard to program against, slow

Hive (on MapReduce)

Pros: scalable, reliable, easy SQL interface
Cons: slow (Hive on Tez faster), little customization possible

Spark Core/Storm

Pros: lots of customization, in-memory processing
Cons: not reliable, hard to program against

Presto / Spark SQL

Pros: easy SQL interface, fast in-memory processing
Cons: not reliable (out of memory), little customization possible, smaller data sets

Big Data – Data Processing

OUR PRODUCTs

Why Choose
Advanced inSight

Our Services

Enterprise in Data Processing Big Data

Big Data – Data Processing

OUR PRODUCTs

Why Choose Advanced inSight

Our Services

Enterprise in Data Processing Big Data

Why Choose
Advanced inSight