Future Integrated Information Technologies
Future integrated information technologies refer to new concepts, knowledge, and products proposed due to the development of new technologies in recent years.
These mainly include technologies such as Cyber-Physical Systems (CPS), Artificial Intelligence (AI), robotics, edge computing, digital twins, cloud computing, and big data.
11.1 Overview of Cyber-Physical Systems Technology
11.1.1 Concept of Cyber-Physical Systems
-
Origin of Cyber-Physical Systems
-
Cyber-Physical Systems (CPS) are extensions and expansions of control systems and embedded systems, with relevant underlying theoretical technologies derived from the application and enhancement of embedded technologies.
-
Essence and Definition of CPS
-
CPS is the result of the integration and development of different technologies across multiple fields and disciplines.
-
CPS integrates advanced perception, computation, communication, control, and other information technologies with automatic control technologies to construct complex systems where elements such as people, machines, objects, environments, and information interact, map, and collaborate efficiently in both physical and information spaces, achieving on-demand response, rapid iteration, and dynamic optimization of resource allocation and operation within the system.
-
The ultimate goal of intelligent systems built on a series of industrial and information technologies, including hardware, software, networks, and industrial clouds, is to achieve optimized resource allocation.
-
The essence of CPS is to construct a closed-loop empowerment system based on the automatic flow of data between information space and physical space, enabling state perception, real-time analysis, scientific decision-making, and precise execution, thereby addressing complexity and uncertainty issues in production and service processes, improving resource allocation efficiency, and achieving resource optimization.
11.1.2 Implementation of CPS
1. CPS Architecture
-
Unit Level
-
Unit-level CPS is the smallest indivisible unit of CPS, essentially perceiving the state of physical entities and environments through software, performing computational analysis, and ultimately controlling physical entities, forming the most basic closed-loop of automatic data flow, integrating the physical and information worlds.
-
Unit-level CPS should have communication capabilities to interact with the outside world.
-
Unit-level CPS is the smallest unit with perceptive, computational, interactive, extensible, and self-decision-making capabilities; an intelligent component, an industrial robot, or a smart machine tool can all be a minimum unit of CPS.
-
System Level
-
Interconnection, edge gateways, and data interoperability mainly achieve heterogeneous integration of unit-level CPS.
-
Plug-and-play mainly implements component management at the system level CPS, including identification, configuration, updating, and deletion of groups (unit-level CPS).
-
Cooperative control refers to the linkage and cooperative control of multiple unit-level CPS.
-
Monitoring and diagnosis mainly involve real-time monitoring of the status of unit-level CPS and diagnosing whether they possess the required capabilities.
-
Multiple minimum units (unit-level) achieve broader and deeper data automatic flow through industrial networks (such as industrial field buses, industrial Ethernet, etc.), realizing interconnection, interoperability, and interoperability of multiple unit-level CPS, further enhancing the breadth, depth, and precision of manufacturing resource optimization.
-
System-level CPS achieves self-organization, self-configuration, self-decision-making, and self-optimization of local manufacturing resources based on the state perception, information interaction, and real-time analysis of multiple unit-level CPS.
-
In addition to the functions of unit-level CPS, system-level CPS also mainly includes interconnection, plug-and-play, edge gateways, data interoperability, cooperative control, monitoring, and diagnosis.
-
SoS Level
-
Main functions include: data storage, data fusion, distributed computing, big data analysis, and data services, forming asset performance management and operational optimization services based on data services.
-
Multiple processes (system CPS) form a workshop-level CPS or an entire factory CPS.
-
The organic combination of multiple system-level CPS constitutes SoS-level CPS.
-
SoS-level CPS mainly realizes data aggregation, optimizing assets internally and forming operational optimization services externally.
-
SoS-level CPS can achieve interconnection, interoperability, and interoperability across systems and platforms through big data platforms, facilitating the integration, exchange, and sharing of multi-source heterogeneous data in a closed-loop automatic flow, achieving comprehensive perception, deep analysis, scientific decision-making, and precise execution of information on a global scale.
2. CPS Technical System
The CPS technical system is mainly divided into overall CPS technology, supporting CPS technology, and core CPS technology.
-
Overall CPS technology mainly includes system architecture, heterogeneous system integration, security technology, and testing and verification technology, which are the top-level design technologies of CPS.
-
Supporting CPS technology mainly includes intelligent perception, embedded software, databases, human-computer interaction, middleware, SDN (Software Defined Networking), Internet of Things, big data, etc., which support CPS applications.
-
Core CPS technology mainly includes virtual-physical fusion control, intelligent equipment, MBD, digital twin technology, field buses, industrial Ethernet, CAXMES, ERP, PLM, CRM, SCM, etc., which are the foundational technologies of CPS.
The above technical system can be divided into four core technical elements: “one hardware” (perception and automatic control), “one software” (industrial software), “one network” (industrial network), and “one platform” (industrial cloud and intelligent service platform).
-
Perception and automatic control are the hardware support for CPS implementation.
-
Including four levels: embedded control, virtual control, centralized control, and target control.
-
Embedded control mainly targets physical entities for control.
-
Virtual control refers to control calculations performed in the information space, mainly targeting information entities.
-
Centralized control refers to the integration and control of information entities in the information space, mainly through CPS bus.
-
Target control refers to determining whether production meets targets through real-time comparison during the measurement of actual production results or the collection of product data.
-
(1) Intelligent perception technology. The main intelligent perception technology used in CPS systems is sensor technology.
-
(2) CPS virtual-physical fusion control is a multi-layer “perception-analysis-decision-execution” cycle, based on state perception, which is often performed in real-time, synchronizing or providing immediate feedback to higher levels.
-
Industrial software solidifies the rules of CPS computation and data flow, being the core of CPS.
-
Industrial software refers to software specifically designed for the industrial field to improve the R&D, manufacturing, production, service, and management levels of industrial enterprises, as well as the usage value of industrial products.
-
Through application integration, industrial software enables mechanized, electrified, and automated production systems to possess digital, networked, and intelligent characteristics, thus providing a networked, collaborative, and open product design, manufacturing, and service environment throughout the product lifecycle.
-
Industrial networks are the network carriers for interconnection and data transmission.
-
The industrial network technology in CPS is based on a distributed new paradigm, allowing various intelligent devices to connect with each other to form a network service.
-
High-level CPS is formed by interconnecting and integrating lower-level CPS, allowing for flexible combinations.
-
Industrial cloud and intelligent service platforms are the foundation for data aggregation in CPS and support upper-layer solutions, providing resource management and capability services externally.
-
Industrial cloud and intelligent service platforms process data through edge computing, fog computing, big data analysis, etc., forming the capability to provide data services externally, and offering personalized and specialized intelligent services based on data services.
11.1.3 Construction and Application of Cyber-Physical Systems
1. Overview of CPS Application Scenarios
Intelligent design, intelligent production, intelligent services, and intelligent applications are the four aspects.
2. Typical CPS Application Scenarios
-
Intelligent Design
-
With the maturity of CPS, most of the work in product and process design, as well as factory design, can be simulated in virtual space, allowing for iteration and improvement.
-
At the same time, data-driven approaches can form optimized product design solutions, achieving high coordination between product design and product usage.
-
In terms of production line/factory design, comprehensive consideration of the integration between different devices and systems in the production line/factory is necessary. Based on the data collected under existing conditions of production line/factory construction, reasonable layouts for equipment, personnel, tools, materials, and workshop transportation can be analyzed and calculated, establishing a production simulation model for the production line/factory, and optimizing the design scheme based on simulation results.
-
Intelligent Production
-
Equipment management application scenarios
-
Production management application scenarios
-
Flexible manufacturing application scenarios.
-
CPS can break the information silos in the production process, achieving interconnection of equipment, monitoring the production process, reasonably managing and scheduling various production resources, optimizing production plans, and achieving collaboration between manufacturing and resources, upgrading from “manufacturing” to “intelligent manufacturing”.
-
Intelligent Services
-
Through CPS, a local and remote cloud service collaborative system can be formed, integrating individual and group interactions, as well as group and system collaborations, better serving production, addressing the increasing complexity of equipment operation and the growing difficulty of use, achieving collaborative optimization of intelligent equipment, and supporting the economic, safety, and efficiency goals of enterprise users.
-
Health management
-
Intelligent maintenance
-
Remote diagnostic capabilities
-
Collaborative optimization
-
Shared services
-
Intelligent Applications
-
Unmanned equipment
-
Industry chain interaction
-
Value chain win-win
3. CPS Construction Path
Under the data closed-loop of state perception, real-time analysis, scientific decision-making, and precise execution, CPS can achieve autonomous coordination, intelligent optimization, and continuous innovation in manufacturing, significantly enhancing all aspects of enterprise manufacturing.
The construction of CPS cannot be achieved overnight; it must be gradual and in-depth.
The construction path can be divided into several stages: CPS system design, unit-level CPS construction, system-level CPS construction, and SoS-level CPS construction stages.
Enterprises in the SoS-level CPS construction stage should focus more on data storage and distributed processing capabilities, as well as intelligent service capabilities, prioritizing the construction of big data platforms and intelligent service platforms.
11.2 Overview of Artificial Intelligence Technology
11.2.1 Concept of Artificial Intelligence
Artificial Intelligence (AI) is the theory, methods, technologies, and application systems that use digital computers or machines controlled by digital computers to simulate, extend, and enhance human intelligence, perceive the environment, acquire knowledge, and use knowledge to achieve optimal results.
The goal of artificial intelligence is to understand the essence of intelligence and produce a new type of intelligent machine that can respond similarly to human intelligence. Research in this field includes robotics, natural language processing, computer vision, and expert systems.
-
Weak AI: Weak AI refers to intelligent machines that cannot truly perform reasoning and problem-solving. These machines appear intelligent on the surface but do not genuinely possess intelligence and will not have self-awareness.
-
Strong AI: Strong AI refers to intelligent machines that can truly think and are considered to have perception and self-awareness. Such machines can be divided into humanoid (machines whose thinking and reasoning are similar to human thinking) and non-humanoid (machines that produce perceptions and consciousness completely different from humans, using reasoning methods entirely different from humans).
11.2.2 Development History of Artificial Intelligence
11.2.3 Key Technologies of Artificial Intelligence
-
1. Natural Language Processing (NLP)
-
2. Computer Vision
-
3. Knowledge Graph
-
4. Human-Computer Interaction (HCI)
-
5. Virtual Reality or Augmented Reality (VR/AR)
-
6. Machine Learning
The mainstream machine learning technology is black-box technology, which makes it impossible to predict hidden crises. To address this issue, it is necessary to make machine learning interpretable and intervenable.
-
According to different learning modes, machine learning can be divided into supervised learning, unsupervised learning, semi-supervised learning, and reinforcement learning.
-
Furthermore, according to different learning methods, machine learning can be divided into traditional machine learning and deep learning.
-
Common algorithms in machine learning also include transfer learning, active learning, and evolutionary learning.
-
Machine Learning (ML) is one of the core research areas of artificial intelligence, involving various fields such as statistics, system identification, approximation theory, neural networks, optimization theory, computer science, and brain science.
-
Specifically, machine learning is data-driven, seeking patterns from sample data and predicting future data based on the identified patterns.
-
Machine learning is widely applied in data mining, computer vision, natural language processing, biometric recognition, and other fields.
-
Classification of Machine Learning
-
Comprehensive Applications of Machine Learning
-
The Future of Machine Learning
11.3 Overview of Robotics Technology
11.3.1 Concept of Robots
The English term for robots is Robot.
11.3.2 Definition and Development History of Robots
Robots are flexible machines characterized by mobility, individuality, intelligence, versatility, semi-mechanical and semi-human characteristics, automation, and servitude.
They possess ten characteristics: automation, intelligence, individuality, semi-mechanical and semi-human characteristics, operability, versatility, informationality, flexibility, limitation, and mobility.
A machine can be called a robot if it meets the following three conditions:
-
(1) It has the three elements of brain, hand, and foot.
-
(2) It has non-contact sensors (to receive information from afar using eyes and ears) and contact sensors.
-
(3) It has sensors for balance and inherent awareness.
The development of robots can be simply divided into three stages:
-
(1) First-generation robots: Teaching and reproducing robots.
-
(2) Second-generation robots: Sensory robots.
-
(3) Third-generation robots: Intelligent robots.
Robotics includes both fundamental and applied research, with major research areas including:
-
Design of robotic hands;
-
Robot kinematics, dynamics, and control;
-
Trajectory design and path planning;
-
Sensors;
-
Robot vision;
-
Robot language;
-
Device and system structure;
-
Robot intelligence, etc.
11.3.3 Core Technologies of Robotics 4.0
Robotics 4.0 mainly includes the following core technologies: seamless collaborative computing across cloud-edge-end, continuous learning and collaborative learning, knowledge graphs, scene adaptability, and data security.
-
1. Seamless collaborative computing across cloud-edge-end
-
2. Continuous learning and collaborative learning
-
3. Knowledge graphs
-
4. Scene adaptability
-
5. Data security
11.3.4 Classification of Robots
Robots can be classified according to the required control methods:
-
Operational robots
-
Program robots
-
Teaching and reproducing robots
-
Intelligent robots
-
Comprehensive robots
-
1. Industrial robots
-
2. Service robots
-
3. Robots for special fields
11.4 Overview of Edge Computing
Similar to edge computing, edge computing shifts data processing, application execution, and even some functional services from the network center to the nodes at the network edge.
11.4.2 Definition of Edge Computing
1. Definition of Edge Computing by the Edge Computing Industry Alliance (ECA)
The essence of edge computing is the extension and evolution of cloud computing beyond data centers, mainly including three types of deployment forms: cloud edge, edge cloud, and cloud-based gateways.
-
(1) Cloud Edge: Edge computing in the cloud edge form is an extension of cloud services on the edge side, logically still a cloud service, primarily relying on cloud services for capability provision or requiring close collaboration with cloud services.
-
(2) Edge Cloud: Edge computing in the edge cloud form builds small to medium-scale cloud service capabilities on the edge side. Edge service capabilities are mainly provided by edge clouds, while centralized DC-side cloud services mainly provide management and scheduling capabilities for edge clouds.
-
(3) Cloud-based Gateway: Edge computing in the cloud-based gateway form reconstructs existing embedded gateway systems with cloud-based technologies and capabilities, providing protocol/interface conversion, edge computing, etc., on the edge side, while controllers deployed on the cloud side provide resource scheduling, application management, and business orchestration capabilities for edge nodes.
2. Definition Concept by the OpenStack Community
Edge computing provides cloud services and IT environment services for application developers and service providers on the edge side of the network. The goal is to provide computing, storage, and network bandwidth close to data input or users.
3. Definition of Edge Computing by ISO/IEC JTC1/SC38
ETSI (European Telecommunications Standards Institute) provides IT service environments and computing capabilities at the mobile network edge, emphasizing proximity to mobile users to reduce latency in network operations and service delivery, enhancing user experience.
11.4.2 Characteristics of Edge Computing
Edge computing is a distributed open platform (architecture) that integrates network, computing, storage, and application core capabilities at the network edge, providing edge intelligent services nearby to meet key industry digitalization needs in agile connectivity, real-time business, data optimization, application intelligence, and security and privacy protection.
Edge computing has the following characteristics:
-
(1) Connectivity: Connectivity is the foundation of edge computing.
-
(2) Data First Entry: As a bridge from the physical world to the digital world, edge computing is the first entry point for data, possessing a large amount of real-time and complete data, which can be managed and valued based on the entire data lifecycle, better supporting predictive maintenance, asset efficiency, and management innovations. At the same time, as the first entry point for data, edge computing also faces challenges of data real-time, determinism, and diversity.
-
(3) Constraints: Edge computing products must adapt to relatively harsh working conditions and operating environments in industrial sites, such as electromagnetic protection, dust protection, explosion-proof, vibration resistance, and current/voltage fluctuation resistance.
-
(4) Distribution: Edge computing is inherently distributed in its actual deployment.
11.4.4 Edge-Cloud Collaboration
Edge computing and cloud computing each have their strengths; cloud computing excels in global, non-real-time, long-cycle big data processing and analysis, providing advantages in long-cycle maintenance and business decision support.
Edge computing is more suitable for processing and analyzing local, real-time, short-cycle data, better supporting real-time intelligent decision-making and execution for local businesses.
Edge computing and cloud computing are not in a substitution relationship but rather a complementary and collaborative relationship, with edge-cloud collaboration amplifying the application value of both.
Edge computing is not a single component or a single layer but involves an end-to-end open platform of EC-IaaS, EC-PaaS, and EC-SaaS.
The capabilities and connotations of edge-cloud collaboration involve comprehensive collaboration across all levels of IaaS, PaaS, and SaaS, mainly including six types of collaboration: resource collaboration, data collaboration, intelligent collaboration, application management collaboration, business management collaboration, and service collaboration.
-
(1) Resource Collaboration: Edge nodes provide basic infrastructure resources such as computing, storage, network, and virtualization, with local resource scheduling management capabilities, while also collaborating with the cloud to accept and execute cloud resource scheduling management strategies, including device management, resource management, and network connection management for edge nodes.
-
(2) Data Collaboration: Edge nodes are mainly responsible for collecting on-site/terminal data, performing preliminary processing and analysis according to rules or data models, and uploading processing results and related data to the cloud; the cloud provides massive data storage, analysis, and value mining.
-
(3) Intelligent Collaboration: Edge nodes execute inference according to AI models, achieving distributed intelligence; the cloud conducts centralized model training for AI and distributes the models to edge nodes.
-
(4) Application Management Collaboration: Edge nodes provide application deployment and runtime environments, managing and scheduling the lifecycle of multiple applications on the node; the cloud mainly provides application development and testing environments, as well as application lifecycle management capabilities.
-
(5) Business Management Collaboration: Edge nodes provide modular, microservice-based application/digital twin/network instances; the cloud mainly provides business orchestration capabilities for applications/digital twins/networks according to customer needs.
-
(6) Service Collaboration: Edge nodes implement part of ECSaaS services according to cloud strategies, achieving on-demand SaaS services for customers through collaboration between ECSaaS and cloud SaaS; the cloud mainly provides service distribution strategies for SaaS services at the cloud and edge nodes, as well as the SaaS service capabilities undertaken by the cloud.
11.4.5 Security of Edge Computing
Edge security is an important guarantee for edge computing.
Edge security involves a security protection system that spans both cloud computing and edge computing, enhancing the ability of edge infrastructure, networks, applications, and data to identify and resist various security threats, building a secure and trustworthy environment for the development of edge computing, and accelerating and ensuring the development of the edge computing industry.
The value of edge security is reflected in the following aspects:
-
Providing a trustworthy infrastructure
-
Providing reliable security services for edge applications
-
Ensuring secure device access and protocol conversion
-
Providing secure and trustworthy networks and coverage
-
Providing an end-to-end comprehensive security operation protection system, including threat monitoring, situational awareness, security management orchestration, emergency response to security incidents, and flexible protection.
11.4.6 Application Scenarios of Edge Computing
-
1. Smart Parks
-
(1) Massive network connections and management:
-
(2) Real-time data collection and processing:
-
(3) Local business autonomy:
-
2. Android Cloud and Cloud Gaming
-
Cloudification of Android full-stack capabilities, matching gaming operating environments.
-
Rendering, compression, and transmission of cloud video, supporting good presentation on terminals.
-
End-to-end low-latency response, supporting gaming operation experience.
-
3. Video Surveillance
-
Video surveillance is evolving from “seeing” and “seeing clearly” to “understanding”.
-
(1) Edge node image recognition and video analysis support intelligent video surveillance at the edge.
-
(2) Edge node intelligent storage mechanisms can link video data storage strategies based on video analysis results, efficiently retaining valuable video data while improving storage space utilization at edge nodes.
-
(3) Edge-cloud collaboration, cloud-side AI model training, rapid deployment and inference at the edge, supporting multi-point deployment and multi-machine linkage for video surveillance.
-
4. Industrial Internet of Things
-
(1) A unified industrial field network built on OPCUA over TSN, achieving data interconnection and interoperability.
-
(2) vPLC (virtual programmable logic controller) built on an edge computing virtualization platform, supporting flexibility in production processes and workflows.
-
(3) Image recognition and video analysis for detecting product quality defects.
-
(4) Edge computing security mechanisms and solutions tailored to manufacturing scenarios.
-
(1) Numerous field network protocols make interconnection difficult and have poor openness.
-
(2) Multi-source heterogeneous data lacks a unified format, hindering data exchange and interoperability.
-
(3) Product defects are difficult to detect in advance.
-
(4) Predictive maintenance lacks effective data support.
-
(5) Insufficient security measures for key process and production data.
-
5. Cloud VR
Edge computing can support solving the following common problems:
In industrial IoT scenarios, the main functions of edge computing include:
VR (Virtual Reality) refers to the simulation or replication of real or virtual environments, achieving immersive experiences for users through deep perception and interaction.
The high bandwidth and low latency characteristics of Cloud VR services promote the transition of platforms from centralized services to edge distributed services.
Some services, such as rendering computation, transcoding, and caching acceleration, are offloaded to edge processing.
11.5 Overview of Digital Twin Technology
Digital twin technology is a bridge that establishes communication between the real world and the virtual world across levels and scales. It is one of the universal purpose technologies and core technology systems of the Fourth Industrial Revolution, supporting the Internet of Everything, forming the foundation for the digital economy, and serving as the information infrastructure for the future intelligent era. The next decade will usher in the “Digital Twin Era”.
Virtualization is the process of creating an information expression based on bits for physical products based on atoms.
11.5.3 Key Technologies of Digital Twins
Modeling, simulation, and data fusion-based digital threads are the three core technologies of digital twins.
Systems engineering and MBSE that can oversee modeling, simulation, and digital threads become the top-level framework technology for digital twins.
The Internet of Things is the underlying companion technology for digital twins, while cloud computing, machine learning, big data, and blockchain serve as enabling technologies for digital twins.
-
Modeling
-
Simulation
-
Other Technologies
11.5.4 Applications of Digital Twins
Digital twins are mainly applied in manufacturing, industry, cities, and battlefields.
-
Manufacturing
Main functions include digital model design, simulation, and emulation.
-
Industry
Digital twins, supported by IT and DT enabling technologies such as cloud computing, big data, IoT, AI, and blockchain, combine with industry trends and demands for industrial upgrades to construct digital mirrors of physical entities. Through various integrated forms, they follow a quantitative approach.
-
Cities
To build a new type of smart city, it is essential to construct a digital twin of the city.
The core value of digital twins in urban construction and development lies in their ability to establish real-time connections between the real world and the digital world, thereby digitizing, modeling, and visualizing changes throughout the entire lifecycle of urban physical entities.
Digital twin cities exhibit characteristics such as real-time monitoring, integration of urban information, interactive information transmission, scientific decision-making for development, intelligent control and management, and convenient urban services.
-
Battlefields
11.6 Overview of Cloud Computing and Big Data Technologies
Big data and cloud computing have become two mainstream technologies in the IT field.
11.6.1 Overview of Cloud Computing Technology
“Cloud computing” is a term that simultaneously describes a system platform or a type of application.
Servers in a cloud computing platform can be physical or virtual servers.
The connotation of cloud computing includes two aspects: platform and application.
Based on a deep understanding of the definition of cloud computing, the industry and academia have summarized the service methods of cloud computing. It is currently widely recognized that cloud computing has three typical service methods from top to bottom: “Software as a Service (SaaS)”, “Platform as a Service (PaaS)”, and “Infrastructure as a Service (IaaS)”. The following will briefly discuss each.
-
Software as a Service (SaaS)
-
Platform as a Service (PaaS)
-
In the PaaS model, service providers offer distributed development environments and platforms as a service.
-
Infrastructure as a Service (IaaS)
-
Service providers integrate memory, I/O devices, storage, and computing capabilities into a virtual resource pool, providing customers with the necessary storage resources, virtualized servers, and other services.
After analyzing the three service methods, it can be seen that these three service models have the following characteristics:
-
(1) In terms of flexibility, SaaS → PaaS → IaaS increases in flexibility sequentially. This is because the resources that users can control become increasingly lower-level, with finer granularity, enhancing control and flexibility.
-
(2) In terms of convenience, IaaS → PaaS → SaaS increases in convenience sequentially. This is because IaaS only provides basic computing capabilities such as CPU and storage, requiring users to build application systems based on their needs, which involves a significant workload and is less convenient.
-
In contrast, under the SaaS model, service providers directly offer applications with basic functions to users, who can simply configure the applications according to their specific needs to bring them online, resulting in a smaller workload and better convenience.
3. Deployment Models of Cloud Computing
-
1) Public Cloud
-
2) Community Cloud
-
3) Private Cloud
-
4) Hybrid Cloud
4. Development History of Cloud Computing
The core idea of virtualization is to use virtualization software to create one or more virtual machines on a physical machine, maximizing the utilization of computer resources through virtualization technology.
Distributed computing refers to a computing method where multiple hardware and software systems, concurrent processes, or multiple programs process tasks in a loosely coupled or centrally controlled manner.
Distributed programs typically emphasize the heterogeneity of the environment: that is, networks with different latencies and unpredictable failures between computers or within the network.
11.6.2 Overview of Big Data Technology
Big data spans three dimensions: volume, velocity, and variety.
Big data refers to datasets whose size or complexity cannot be captured, managed, and processed using existing common software tools within a reasonable cost and acceptable time frame.
Granter focuses on three quantifiable metrics of big data: data volume, data variety, and processing speed, which mainly present the following three major challenges.
-
Challenge 1: The continuously growing volume of data.
-
Challenge 2: Multi-format data.
-
Challenge 3: Performance.
2. Research Content of Big Data
Research on big data will face five challenges:
-
Challenge 1: Data acquisition issues.
-
Challenge 2: Data structure issues.
-
Challenge 3: Data integration issues.
-
Challenge 4: Data analysis, organization, extraction, and modeling are functional challenges inherent to big data.
-
Challenge 5: How to present the results of data analysis and interact with non-technical domain experts.
To address the above challenges, the analysis steps for big data are outlined, roughly divided into five main stages: data acquisition/recording, information extraction/cleaning/annotation, data integration/aggregation/presentation, data analysis/modeling, and data interpretation.
-
Data acquisition and recording
-
Information extraction and cleaning
The extracted objects may include images, videos, and other complex structured data, and this process is often highly application-dependent.
-
Data integration, aggregation, and presentation
-
Query processing, data modeling, and analysis
Big data often contains a lot of noise, exhibiting dynamic, heterogeneous, correlated, and untrustworthy characteristics.
Data mining requires complete, cleaned, trustworthy, and efficiently accessible data, as well as declarative queries (e.g., SQL) and mining interfaces, along with scalable mining algorithms and big data computing environments.
A current issue in big data analysis is the lack of collaboration between database systems, necessitating research and implementation of data analysis systems that organically integrate declarative query languages with data mining and statistical packages.
3. Application Areas of Big Data
-
1) Applications in manufacturing
-
2) Applications in services
-
3) Applications in the transportation industry
-
4) Applications in healthcare
The above content is excerpted from Tsinghua University Press’s “Tutorial for Information System Architects” (Second Edition) for learning purposes. If there is any infringement, please contact for removal.