购买
下载掌阅APP,畅读海量书库
立即打开
畅读海量书库
扫码下载掌阅APP

2.1 Basic Theory of Big Data

Since the 21st century, with the wide application of electronic information technology such as computer and Internet, beautiful scenery, news and hot trends around the world are stored and packaged in the form of data based on the Internet media. Huge data has become an important way for people to get information. In 2008, Nature magazine launched a column with Big Data as the cover title; In 2009,“ big data” became a hot word and gradually came into people's view.

2.1.1 Basic Concepts of Big Data

Data is a fact or the result of observation, which is a logical inference of objective things,and is the unprocessed raw materials used to express objective things. Data can be continuous values, such as sound and images, referred to as analog data; or discrete values, such as symbols and text, referred to as digital data. If it can reflect information about nature or human society, it can be called data.

Big data, also known as mega data, is a collection of information data with abundant content, broad dimensions, and wide coverage. It is a new factor of production that is difficult to extract, store, share and analyze using traditional data processing tools. Big data technology is like searching for valuable things in a vast landfill. Big data technology mainly deals with various types of data generated dynamically and in real time.

From data to big data, it is not only the accumulation of quantity, but also a qualitative leap. Big data allows the original massive, different sources, different forms of single data integration, and can be systematically analyzed, so as to dig out new knowledge difficult to find in the era of small data, and constantly create new value for human society.

2.1.2 Characteristics of Big Data

With the rise of digital technology, big data has gradually become one of the core production factors. Compared with data, big data also presents at least the following different characteristics:

2.1.2.1 Large Data Volume

Big data itself carries a huge amount of information. In the era of big data, the development model of data follows Moore's Law, and the Internet Data Center (IDC) predicts that data will grow at a rate of 50% per year, which also means that the amount of data will double every two years.

2.1.2.2 High Frequency

In the era of informatization and globalization, the process of information exchange and resource exchange based on data is cheaper, faster and more frequent. Compared with traditional data, the transmission and processing speed of big data has shown exponential growth.

2.1.2.3 Low Value Density

Big data, although it has high overall value, is inevitably plagued by problems such as data redundancy, poor quality, and information overload due to its huge volume, which leads to a relatively low proportion of effective information in the big data set and a lower utilization efficiency of data, thereby reducing the value density of data.

2.1.2.4 Variety and Complexity

Data can be divided into structured data, semi-structured data, and unstructured data.Traditional data models have a clear definition, are easy to store, manage, and integrate, have consistency among data, and are easy to analyze and dig deep. They generally belong to structured and semi-structured data. Big data is obtained from various sources and is expressed in the form of documents, emails, social media posts, images, audio, and video. It has a personalized and customized data model that stands out and is difficult to process and interpret using traditional methods. The data volume is huge but disordered, showing unstructured characteristics.

2.1.2.5 Full Quantization of Data

The essence of big data is “ deep learning”, which requires the establishment of large-scale training data sets and the use of machine learning to train artificial intelligence systems to build simulated environments and expert systems for decision-making. Ultimately, big data uses data quantitative analysis to make precise predictions about the future based on the “ known”and the “past”. The process of inferring the “unknown” from the “known” and predicting the“future” from the “ past” is called data quantification, and it runs throughout the entire process of big data application.

2.1.3 Development Status and the Prospects of Big Data

In November 2018, Seagate and International Data Corporation (IDC) released the white paper with the theme of “The digitization of the World by Edge to Core”. The white paper predicts that the total amount of global data will reach 175ZB by 2025 ( See Figure 2.1), and that 49% of the world's stored data will be located in public cloud environments.

From intelligent push to intelligent search, from intelligent assistants to smart homes, from autonomous driving to humanoid robots, in the era of information sharing and “interconnection of everything”, big data has greatly facilitated human production and life. With the continuous deepening of intelligent transformation and information interconnection in various industries,the growth speed and accumulation numbers of data will grow exponentially, and the rapid development of the “global data ecosystem” will drive the digital transformation of all humanity.

In the future, the application of big data will bring unlimited prospects to human society.Consequently, big data governance will inevitably become a concern for people. The technological revolution has driven the explosive growth of information data, and human society is undergoing a digital transformation. Using big data to drive social development, improve social governance, and enhance service supervision capabilities is an important trend for the future development of big data.

Figure 2.1 Annual Size of the Global Data Sphere

Source:Seagate and International Data Corporation (IDC)“Data Age 2025”. https://www.seagate.com/files/www-content/our-story/trends/files/idc-seagate-dataage-chine-whitepaper.pdf

2.1.3.1 Big Data Practice Technology

In terms of application, the practice of big data can be roughly divided into three levels.Although the development of big data is still in its infancy, the three levels of big data application are very extensive in reality.

The first level is descriptive analytics application, which utilizes data access, data storage, data processing, and visualization technologies to precisely locate target data or extract relevant information from databases based on user needs. Building user profiles and understanding sales of goods are examples of the technical applications within this dimension.

The second level is predictive analysis application, which refers to the construction of mathematical models to analyze the relationship between data based on the results of previous large amounts of data, and accordingly predict the development status and trend of things. For example, David Rothschild, a researcher at Microsoft's New York Research Institute, established a prediction model by collecting and analyzing a large amount of public data such as the gambling market, the Hollywood Stock Exchange, and posts published by social media users,and predicted the attribution of many Oscar awards. In 2014 and 2015, it accurately predicted 21 of the 24 Oscar awards, with an accuracy rate of 87.5%.

The third level is guiding analysis application, which is based on the first two levels,using flow computing, graph database and other technologies to predict multiple different results through data imaging, and optimize the final decision. For example, autonomous vehicles can predict the consequences of different driving paths through real-time perception data from multiple sensors, combined with accurate positioning of maps and road conditions, and realize intelligent driving.

In the future, with the expansion of data application fields and the improvement of data sharing platforms, the application of big data at the above three levels will burst out greater value. It is an important direction for the future development of big data to realize accurate tracking, accurate prediction and automatic judgment on the basis of big data, and make production and life fully intelligent.

2.1.3.2 Big Data Governance

Big data can help the intelligent transformation and upgrading of the industry and promote the overall high-quality development of society, but it also causes governance problems. On the one hand, due to the unstructured characteristics of big data, information resources are scattered and disorderly, so it is necessary to establish a shared, open and unified database platform to conduct integrated management and planning of data assets. On the other hand, the construction of digital platforms based on big data may lead to data leakage and privacy security issues. Therefore, how to protect personal privacy data in the case of data interoperability is the development trend of big data governance. Only by handling the relationship between data sharing, data security and privacy protection can we promote the development of big data applications. /2P9xVT33SvIbXkVkH6P+DxK4nTiURU9yRiAJALb+OYihT5xCJaf0PEd2A8OT16K

点击中间区域
呼出菜单
上一章
目录
下一章
×

打开