Big Data refers to large sets of complex data,both structured and unstructured which traditional processing techniques and / or algorithms are unable to operate on. It aims to reveal hidden patterns and has led to an evolution from a model-driven science paradigm into a data-driven science paradigm. According to a study by Boyd & Crawford it “rests on the interplay of:
(1)Technology:maximizing computation power and algorithmic accuracy to gather,analyze,link,and compare large data sets.
(2)Analysis:drawing on large data sets to identify patterns in order to make economic,social,technical,and legal claims.
(3)Mythology:the widespread belief that large data sets offer a higher form of intelligence and knowledge that can generate insights that were previously impossible,with the aura of truth,objectivity,and accuracy.” IBM scientists mention that Big Data has four-dimensions:Volume, Velocity ,Variety,and Veracity.
Equipping an enterprise with Big Data driven e-commerce architecture aids in gaining extensive insight into customer behaviour,industry trends,more accurate decisions to improve just about every aspect of the business,from marketing and advertising,to merchandising,operations,and even customer retention.
Big Data should not be looked merely as a new ideology but rather as a new environment,one that requires new understanding of data collection,new vision for IT specialist skills,new approaches to security issues,and new understanding of efficiency in any sphere. This environment,when analyzed and processed properly enhances business opportunities;however the risks involved should be taken into account when collecting,storing and processing these large data sets.
First,before we can get anywhere,we will need to make clear what exactly “Big Data” is. The concept of Big Data has been around for more than 10 years,but there has never been an accurate definition,which is perhaps not required. Data engineers see Big Data from a technical and system perspective,whereas data analysts see Big Data from a product perspective. But,Big Data cannot be summarized as a single technology or product,but rather it comprises a comprehensive,complex discipline surrounding data.We can look at Big Data from two aspects:the data pipeline(the horizontal axis in the following figure)and the technology stack(the vertical axis in the following figure)as is shown in Fig. 2.1.
Fig. 2.1 Essence of Big Data
Social media has always focused on emphasizing,even sensationalizing the “big” in Big Data. This does not really show what Big Data actually is. We would prefer just to say “ data” because the essence of Big Data essentially just lies in the concept and application of “data”.
In Duhigg’ s new piece for the New York Times ,a father finds himself in the uncomfortable position of having to apologize to a Target employee. Earlier,he had stormed into a store near Minneapolis and complained to the manager that his daughter was receiving coupons for cribs and baby clothes in the mail.
It turned out Target knew his daughter better than he did. She really was pregnant.It was a fact Target had obtained after carefully collecting information about her. The company,like many others,assigns each shopper a unique Guest ID. Every time you buy toilet paper with a credit card,visit its website,fill out a survey or,really,interact with the retailer in any way,Target assigns this information to that ID.
Major disruptions in your life—weddings,job changes and especially babies—are gold mines to retailers,who finally have a chance to break into your well-worn routines.That’ s why Target hires statisticians like Pole to sift through data to try and identify when someone is approaching or undergoing one of these major life events.
Here's how Target knew the young woman was pregnant:
Pole,data analyst for American retailer Target Corporation,developed a pregnancypredictive model,as indicated by the name customers are assigned a “ pregnancy prediction” score. Furthermore Target was able to gain insight into how “ pregnant a woman is ”. Pole initially used Target's baby-shower-registry to attain insight into women's shopping habits,discerning those habits as they approached their due date;which women on the registry had willingly disclosed. Following the initial data collection phase,Pole and his team ran series of tests,analyzed the data sets and effectively concluded to patterns that could be of use to the corporation.
Within the model each customer is assigned a unique number,internally classified as a Guest ID number. Linked with this is a shoppers purchased products,methods of previous payments(coupons,cash,credit cards,etc.),virtual interaction(clicking links in sent e-mails,customer service chat,etc.). Compiling these data sets along with demographic data that is available for purchase from information service providers such as Experian and the alike,Target is able to channel their marketing strategy effectively.In this case,grasp a pregnant shopper’ s attention by sending a coupon via e-mail or post,which can be distinguished by analyzing previous effective methods. Additionally,by obtaining shoppers demographic data,Target is able to trigger a shopper’ s habit by including other products that may not commonly be purchased at Target,i.e. milk,food,toys,etc. Contributing to a valuable competitive advantage over competitors during the early years when the model was unrevealed to the public.
It's tempting to imagine that predictions will just get better and better as our ability to gather data gets more and more powerful.