Demystifying Data Federation for SOA


Simple Data Access

Build a data-source adapter into your SOA platform to access the data fields/records individually. Let your process application deal with putting the data together.

Advantage: Simple, low up-front costs, and easy to build

Disadvantage: Isn’t highly reusable in and between large SOA implementations, leaves the manageability headache of mapping constantly changing data to business analysts and consumers of the data.

Data Hubs

Leverage bulk data transformation logic (ELT/ETL) for extracting, loading and transforming data into a consolidated data hub. This bulk data technique can be service-driven and also exposed for data access using the simple data service access approach.

Advantage: Essential for data warehouse and business intelligence applications, can scale for multi-terabyte implementations. Data hubs are ideal for large result sets or where such a consolidation provides optimizations for accessing and managing data. Also useful when data hubs can be managed offline from their original sources, for example analytics applications that should work off copied data from a web storefront.

Disadvantage: Requires copies of data which in turn require features like change data capture (CDC) to keep data in synch. The data hub approach might be overkill for smaller implementations. In some business cases there might be restrictions to copying data.

Data Federation

Aggregate data from multiple sources into a single view of the data and leverage that as a service to be re-used by your process application.

Advantage: Simple to build, reusable, all in one step, leaves the data aggregation mapping to a data services architect, leaves the data in place without requiring copying or synchronization logic.

Disadvantage: Performance characteristics must be closely watched and optimized because this approach adds an extra ‘hop’ or indirection through a federation server; the aggregation logic takes place in a middle-tier server instead of the database. Performance dimensions to consider include both the latency of the query transaction, and the load under large queries. The federation approach also introduces a new query paradigm (XQuery) which is more friendly to XML data but might require new skills and training.

Each approach has its own appropriate usage in a data-centric environment. Let’s now focus on data federation to explore its core functional requirements and what you should look for in evaluating a solution.



Functional Requirements for Data Federation

The basis for just about all service-oriented architectures is reusability. A high level of reuse can be achieved through a layer of data sources and business logic that are exposed as services accessible through an implementation independent service contract. A fundamental design goal of SOA is to provide a logical abstraction to these capabilities and prevent exposure of the physical topology of underlying sources to consuming applications. This is also the design premise of data federation and the basis for its functional requirements (Figure 3). In addition, a data federation solution should contain:


Figure 3


Data Source Abstraction Layer for your SOA – Map multiple physical data sources into a set of services defined by logical entities rather than physical location. Reuse data services to define new abstractions as required.
Federated, optimized Queries – Queries produce execution plans that use the specific capabilities of underlying databases and apply optimizations of distributed operations and nested data.
CRUD-style Data Updates – Data services support creating, reading, updating and deleting of data in place at the original sources. Updates may be distributed and should support some type of reliability or XA compliance on the update cycle.
Rich Hierarchical Model – Needs to be able to model relational data implementations as well as interchange with payloads from SOAP or REST services to be processed without format conversion (Figure 4).
Security – Security poses new challenges for a data services environment: Services can be reused in unpredictable patterns, requiring flexibility while maintaining control of sensitive information. Data services require rich access control functionality, from policy-based authorization to fine-grained row and column-based security, identity propagation or credential mapping.



Figure 4


A data services federation layer is often seen as a way to take the first step in SOA. This services layer provides data mediation or abstraction between different data consumers and heterogeneous sources. Data services can be virtualized, aggregated views built from sources across the enterprise. They simplify data access and once created, are highly reusable. This approach eliminates the need to build workflows or code Java by hand, making it possible to automate data service creation and maintenance. Other consumers of data services include business processes, business intelligence applications, master data management (MDM), portals, and Web 2.0 applications.

0
Average: 5 (1 vote)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)