Active Metadata: Not only records the structure and description of data but also can perceive, capture, and update changes in data in real-time, supporting automatic lineage tracking, impact analysis, and semantic understanding. It is like“the nervous system of data”, enabling the weaving platform to possess adaptive and intelligent capabilities.
Virtualization Federation: Integrates distributed, multi-source heterogeneous data logic through a unified interface without physical migration. It is like“an ad-hoc aggregator of data”, allowing users to query global data in a single view, ensuring real-time and flexibility.
Comparison table of two types of managed data in the data weaving architecture
|
Dimension |
Virtualization Federation Management |
Active Metadata Management |
|
Main Focus |
Actual data sources that can be federated queried |
Global metadata of data, processes, semantics, and governance |
|
Structured Data |
✅ Database tables, views, data warehouses |
✅ Tables, fields, primary and foreign key relationships, metric systems |
|
Semi-structured Data |
✅ JSON, XML, Parquet, ORC |
✅ Schema information, field mapping, parsing rules |
|
Unstructured Data |
⚠️ Limited to some (e.g., documents/logs), and requires transformation |
✅ Descriptive information of files, documents, images, videos, audio, etc. |
|
Operational Process Data |
❌ Rarely involved |
✅ Lineage, data quality, scheduling, pipeline operational status |
|
Business/Semantic Information |
❌ Not involved |
✅ Business terminology, domain knowledge graph, semantic bridging |
|
Governance and Security Information |
❌ Not involved |
✅ Permissions, compliance, audit logs |
|
User Interaction Data |
❌ Not involved |
✅ Query logs, user behavior, recommendation records |
|
Core Positioning |
Achieving“queryable and integrable” |
Achieving“understandable, governable, and inferable” |
In thedata weaving architecture, Active Metadata (ActiveMetadata, AM) and Virtualization Federation (Data Federation) both involve unified representation and control of data, but they target different ranges of data types:
1.Data Types Managed by Virtualization Federation
Mainly focuses onsource data that can be directly queried and computed, such as:
Structured Data: Relational databases, data warehouses, distributedSQL engines (Trino, Presto, Hive etc.).
Semi-structured Data: JSON, XML, Parquet, ORC etc. (as long as they can be mapped into tables or key-value pairs).
Limited Unstructured Data (if there are connector supports, such as documents/log files), but usually requires transformation into a query-friendly structure to be effective.
📌 Summary: Virtualization Federation management targets “queryable federated data”.
2.Data Types Managed by Active Metadata
Has a broader scope, managingall metadata about data, not limited to whether it can be directly queried:
Data Resource Metadata
Schema information for structured, semi-structured, and unstructured data.
Database tables, views, fields, primary and foreign key relationships.
Descriptions of files, object storage (videos, images, audio).
Operational Process Metadata
Data lineage (source, processing, flow).
Data quality (validation rules, exception records).
Job scheduling, data flow tasks, data pipeline operational status.
Business and Semantic Metadata
Business terminology, metric systems.
Domain knowledge graphs, semantic bridging models.
Tags, classifications, data asset catalogs.
Governance and Security Metadata
Data permissions, access control.
Data compliance and audit information.
User Interaction Metadata
Query logs, user profiles, usage behavior.
Recommendation results,AI interaction context.
📌 Summary: Active Metadata management targets “describable data + manageable data + inferable data”, with a scope significantly larger than that of virtualization federation.
3.Relationship between the two
Virtualization Federation: Solves “how to access and query different data sources in real-time”.
Active Metadata Management: Solves “how to understand, describe, govern, and drive the entire data lifecycle”.
👉 Virtualization Federation is“weaving data”, while Active Metadata is“weaving knowledge”.
👉 The former is limited to operational data sources, while the latter covers data + process + semantics + governance.
📌 Summary:
Virtualization Federation Management = “weaving scattered data into a network”, with the core being real-time access + query.
Active Metadata Management = “adding maps and rules to this network”, with the core being global understanding + governance + intelligent driving.