Demystifying Data Federation for SOA
Understanding Data Federation Patterns
There is profound agreement in the industry that data services have a transformational influence on enterprise data-centric architectures. Our analysis indicates that there are three important scenarios where data can be exposed as reusable services (Figure 2):
| • | Simple Data Access |
| • | Data Hub |
| • | Data Federation Services |
Finally, data federation is defined as the capability to aggregate information across multiple sources into a single view. It leaves data at the source and consolidates information virtually, almost the same way an enterprise service bus (ESB) virtualizes messages but for data. Data federation essentially allows companies to aggregate data across multiple sources into a real-time view which can be re-used as a service.
|
Figure 2 |
We’ll next look at these three examples in detail and discuss the pros and cons of implementing each pattern.
Simple Data Access
Build a data-source adapter into your SOA platform to access the data fields/records individually. Let your process application deal with putting the data together.
Advantage: Simple, low up-front costs, and easy to build
Disadvantage: Isn’t highly reusable in and between large SOA implementations, leaves the manageability headache of mapping constantly changing data to business analysts and consumers of the data.
Data Hubs
Leverage bulk data transformation logic (ELT/ETL) for extracting, loading and transforming data into a consolidated data hub. This bulk data technique can be service-driven and also exposed for data access using the simple data service access approach.
Advantage: Essential for data warehouse and business intelligence applications, can scale for multi-terabyte implementations. Data hubs are ideal for large result sets or where such a consolidation provides optimizations for accessing and managing data. Also useful when data hubs can be managed offline from their original sources, for example analytics applications that should work off copied data from a web storefront.
Disadvantage: Requires copies of data which in turn require features like change data capture (CDC) to keep data in synch. The data hub approach might be overkill for smaller implementations. In some business cases there might be restrictions to copying data.
Data Federation
Aggregate data from multiple sources into a single view of the data and leverage that as a service to be re-used by your process application.
Advantage: Simple to build, reusable, all in one step, leaves the data aggregation mapping to a data services architect, leaves the data in place without requiring copying or synchronization logic.
Disadvantage: Performance characteristics must be closely watched and optimized because this approach adds an extra ‘hop’ or indirection through a federation server; the aggregation logic takes place in a middle-tier server instead of the database. Performance dimensions to consider include both the latency of the query transaction, and the load under large queries. The federation approach also introduces a new query paradigm (XQuery) which is more friendly to XML data but might require new skills and training.
Each approach has its own appropriate usage in a data-centric environment. Let’s now focus on data federation to explore its core functional requirements and what you should look for in evaluating a solution.
Article Type:
Opinion/Editorial
- Login or register to post comments
- 6758 reads
- Printer-friendly version
(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)










