《人工智能英语入门》

2.2 Big Data Hastens the Age of Intelligence

2.2.1 From Machine Intelligence to Artificial Intelligence

What is Machine Intelligence？ Machine Intelligence is advanced computing that enables a technology（a machine，device，or algorithm）to interact with its environment intelligently，meaning it can take actions to maximize its chance of successfully achieving its goals. The concept of Machine Intelligence highlights the intersection of Machine Learning and Artificial Intelligence，as well as the broad spectrum of opportunities and approaches in the field.

What is Artificial Intelligence？ Artificial Intelligence is a discipline of computing science that allows a system to complete tasks we typically associate with cognitive functions-such as reasoning，strategizing and problem-solving，without requiring an explicit solution for every variation. These algorithms，processes and methodologies allow a computer system to perform tasks that would normally require advanced intellect.

2.2.2 Statistics and Big Data

What is statistics and what is Big Data？

Statistics is the science of collecting，analyzing and understanding data，and accounting for the relevant uncertainties. As such，it permeates the physical，natural and social sciences，public health，medicine，business，and policy.

Big Data is the collection and analysis of data sets that are complex in terms of the volume and variety，and in some cases the velocity at which they are collected. Big Data is especially challenging because some data is not collected to address a specific scientific question.

How are Big Data problems being tackled？

Big Data problems usually require multidisciplinary teams by their very nature. At the very least，they typically require subject area（domain）experts，computational experts，Machine Learning experts and statisticians.

Why is it important for statistics to be one of the key disciplines for Big Data？

Statistics is fundamental to ensuring meaningful，accurate information is extracted from Big Data. The following issues are crucial and are only exacerbated by Big Data：

· Data quality and missing data；

· Observational nature of data，so that causal questions，such as the comparison of interventions may be subject to confounding；

· Quantification of the uncertainty of predictions，forecasts and models.

The scientific discipline of statistics brings sophisticated techniques and models to bear on these issues.

Statisticians help translate the scientific question into a statistical question，which includes carefully describing data structure；the underlying system that generates the data（the model）；and what we are trying to assess（the parameter or parameters we wish to estimate）or predict.

What does statistics bring to Big Data and where are the opportunities？

Big Data will often not be served well by “ off the shelf ” methods or black box computational tools that work in low-dimensional and less complicated settings，and therefore require tailored statistical methods.

Statisticians are skillful at assessing and correcting for bias；measuring uncertainty；designing studies and sampling strategies；assessing the quality of data；enumerating limitations of studies；dealing with issues such as missing data and other sources of nonsampling error；developing models for the analysis of complex data structures；creating methods for causal inference and comparative effectiveness；eliminating redundant and uninformative variables；combining information from multiple sources；and determining effective data visualization techniques.

In Big Data，statistical science and domain sciences are more intertwined than ever before，and statistical methodology is absolutely critical to making inferences.

Example：Google Flu Trends Prediction.

With Big Data comes big noise.

Google learned this lesson the hard way with its now kaput Google Flu Trends.

（1）The online tracker，which used Internet search data to predict real-life flu outbreaks，emerged amid fanfare in 2008.

（2）At first Googles tracker appeared to be pretty good，matching CDC data with late-breaking data somewhat closely.

（3）But，two notable stumbles led to its ultimate downfall：an underestimate of the 2009 H1 N1 swine flu outbreak and an alarming overestimate（almost double real numbers）of the 2012-2013 flu seasons cases.

（4）Then it met a quiet death in August，2015 after repeatedly coughing up bad estimates.

With hubris firmly in check ，a team of Harvard researchers（led by Kou）have come up with a way to tame the unruly data，combine it with other data sets，and continually calibrate it to track flu outbreaks with less error. Their new model（ARGo，autoregression model based on Google data），published in the Proceedings of the National Academy of Sciences（2015），out-performs Google Flu Trends and other models with at least double the accuracy. If the model holds up in coming flu seasons，it could reinstate some optimism in using Big Data to monitor disease and herald a wave of more accurate second-generation models.

Big Data has a lot of potential. It is just a question of using the right analytics.

2.2.3 Big Data and Data Science

Data Science is not a new concept or term for statistical scientists，but this is a simply computerization of statistical old methods as per the need of present time. Big Data are large data sets. However，there is a lot of informative value hidden in this data，so many companies have desire to access this data for maximize their profit. It is very innovative and competitive research area in present time.

During the last few years，the most challenging problem the world developed was Data Science. The Data Science problem means that data is growing at a much faster rate than computational solution techniques，and it is the result of the fact that storage cost is getting cheaper gradually day by day，therefore，keeping data safe and secure for further use becomes cheaper with time. Social activities，biological explorations，scientific experiments，along with the sensor devices are great data contributors. Data Science is beneficial to the society and business but at the same time，it brings challenges to the scientific communities. The existing traditional tools，Machine Learning algorithms and techniques are not capable of handling，managing and analyzing Big Data，although various scalable Machine Learning algorithms，techniques and tools are prevalent.

Big Data has given rise to Data Science. Data Science is rooted in solid foundations of mathematics and statistics，computer science，and domain knowledge，not everything with data or science is Data Science. The use cases for Data Science are compelling.

2.2.4 Big Data Miracle：from Quantitative to Qualitative Change

Digitization has accelerated in the postwar era. However，even as the exponential growth rate of processing capacity relative to cost predicted by Moore ’ s Law has assumed an almost taken-for-granted status since its first articulation in 1965（Moore 1965），something dramatic has changed in recent years（According to expert opinion，Moore's Law is estimated to end sometime in the 2020s. What this means is that computers are projected to reach their limits because transistors will be unable to operate within smaller circuits at increasingly higher temperatures.）We suggest that this“something ” can be understood as a transition from quantitative improvements to qualitative changes.

Consider the example of recorded digital music：The first major digital transition was from analog to digital formats. This was mainly a shift in representation，from the capture of physical markers（grooves in a record；magnetic distributions on a tape）to digital markers（ones and zeros on a CD），with implications for fidelity and replicability of the information and technical capabilities of industry participants. The second transition was from physical format to downloadable formats—the shift from CDs to MP3 files distributed not through physical means but through platforms like Napster and iTunes.This was essentially a shift in connectivity，which enabled music content to be accessed through a digital network，with implications for access（any song posted on the network was now available to all network members），governance（redefining the rules of behavior and legality），and form（the unbundling of albums into individual songs）. The third and current transition，exemplified by services like Spotify，entails a shift from requested content to suggested content. More than a change from downloads to streaming，this is a shift primarily in aggregation. By combining and analyzing the past content requests of numerous other users as well as rating and other usage data regarding the focal user，it is now possible to proactively customize suggestions for a specific user and even to predict the likelihood that the user will follow the suggestion.This shift from responsive to predictive streaming changes the relationship between producers and consumers and impacts the very nature of consumer demand，choice，and preference. The qualitative shifts and interactions are embodied in the three core processes underlying digital transformation：representation，connectivity，and aggregation.

Digitization does not require us to abandon basic conceptualizations of the economic phenomena we are familiar with. Transaction costs and bounded rationality as conceptual building blocks and resource，and industry analysis as analytical tools remain important guideposts on the journey. At the same time，it is critical to recognize the need for new additional tools and conceptualizations. In this spirit，we identify three foundational processes that，in our view，explain much of the variety of phenomena that are subsumed under the rubric of “digital transformation”. We propose that any example of contemporary strategic interest—whether it be Instagram’ s apparently unlimited appeal to teenagers，Tesla's efforts in autonomous driving，or the startling popularity of multiplayer online gaming as a spectator sport—can be usefully deconstructed into these core components：representation，connectivity，and aggregation.

Representation： Digital transformation begins with digitization. It is the digital representation of information that enables analysis and algorithmic manipulation. It has become a truism to state that data are the new oil，the key input to the engine of information age. However，the explosion in the quantity of data available has been accompanied by qualitative revolution in the representation of these data that underlie digital transformation. The resulting deluge of data would be more of a hindrance than help if we only deal with it using human-bounded rationality；however，it is now possible to represent large volumes of data and the actionable insights they contain in the forms of algorithms. Machine Learning is essentially a form of function approximation. Critically，there is limited need for human guidance in functional form selection，and the resulting function is not always easy to interpret for humans. This ability to represent data algorithmically rather than in a human-guided form（as in traditional descriptive statistics or statistical modeling for hypothesis testing）is qualitatively distinct in terms of what it implies，both for human-bounded rationality and in terms of raising the intriguing question of how to approach the potential for competence without comprehension.

Connectivity： Digitization creates new connections and enhances existing connections among objects，individuals，and organizations. From the one-to-one connectivity of email or text messaging to the many-to-many connectivity of social media，e-commerce platforms and sensor-embedded production lines today instantiate enormous increase in potential connections among economic actors and inputs into economic decision making.The sheer size and density of the network of connections as well as the range and number of new actors who are part of the network of connectivity are the first major effects of digitization. Greater network density has generally followed Metcalfe ’ s Law in yielding exponentially greater network value. The quantitative explosion of connected points has enabled the emergence of completely new business and organizational models，some of which have cannibalized their non-digital equivalents. However，the shift from connectivity-on-demand to connectivity-by-default has resulted in a qualitative change that goes beyond quantitative increases in network density. As products and services become more digitized，every product or service can be used to facilitate connections. This transition to always-on connectedness enables revolutions in search，monitoring，and control. For example，whereas the success of a search used to be assessed in terms of accuracy and comprehensiveness of results（whether in the search engine battles between Google and Yahoo or the knowledge management system quest for information retrieval），search success is now assessed in terms of context-specific relevance—“is it right” versus “is it right for me，right here，right now.” Whether from the perspective of a consumer engaged in information search or a producer engaged in information targeting，the challenge has shifted from broadening the search space to assure more comprehensiveness to an ever greater urgency to winnow down information and choices into manageable sets. Thus，how firms allocate their attention has become a more important strategic decision than ever.

Aggregation： Finally，beyond the quantitative growth in data storage capacity and reduction in storage costs，is a third qualitative shift—that of data aggregation. A qualitative shift arises from the ability to combine previously disjoint data（e.g. location，search query，and social network）to answer questions that were formerly impossible to address. For example，combining multiple types of data on individuals’ changes with what we can say about their health risks or their financial soundness. Combining data related to human resources with traditional supply chain data provides managers an unprecedented opportunity to understand their internal organization and its constituents.Enhancing such synergies explains the drive toward diversification and the blurring of boundaries at firms such as Oracle and SAP. While that is an energizing vision for many，it has a few dystopian shades as well. Governments can now have more information regarding their citizens than they ever could in the past，raising a specter of Orwellian observation and control. Similar concerns could apply to the relationship between corporations and their employees. The new corollary to Star Trek ’ s Borg mantra of “you will be assimilated” may be “your data will be aggregated.”

As digital transformation continues，the impact of three processes that have witnessed qualitative changes—representation，connectivity，and aggregation—and their interactions will be more pronounced. These processes will continue to push firms in all industries to create and capture value differently，develop new business models and ecosystems，manage new forms of intellectual property，grow scale and scope differently，and create new opportunities and challenges for organization design and management practices. Digital transformation undoubtedly offers exciting times ahead for strategy researchers.