Enterprise Integration Zone is brought to you in partnership with:

Masoud Kalali has a software engineering degree and has been working on software development projects since 1998. He has experience with a variety of technologies (.NET, J2EE, CORBA, and COM+) on diverse platforms (Solaris, Linux, and Windows). His experience is in software architecture, design, and server-side development.

Masoud has published several articles at Java.net and Dzone. He has authored multiple refcards, published by Dzone, including but not limited to Using XML in Java, Java EE Security and GlassFish v3 refcardz. He is one of the founder members of NetBeans Dream Team and a GlassFish community spotlighted developer. Recently Masoud's new book, GlassFish Security has been published which covers GlassFish v3 security and Java EE 6 security.

Masoud's main area of research and interest includes service-oriented architecture and large scale systems' development and deployment and in his leisure time he enjoys photography, mountaineering and camping. Masoud's can be followed at his Twitter account.

Masoud has posted 82 posts at DZone. You can read more from them at their website. View Full User Profile

Demystifying Data Federation for SOA

09.22.2008
| 18795 views |
  • submit to reddit

In this article Dain Hansen unravels the mystery surrounding data federation. What are the essential requirements for a data federation solution? How is data federation considered a stepping stone to a successful SOA? Why are both data quality and data profiling so critical in bringing about manageable data services? What’s the impact of these emerging federation techniques on data warehousing and business intelligence applications? This article will answer these questions through industry data, architectural patterns in data integration, and compelling case studies that illustrate the importance of embracing data federation.



Introduction: The Need for Reusable Data

In the struggle for information agility – take one part SOA and add it to two parts data, the result is the formidable and extremely popular data services. But what is it beyond the hype? Data services help transform data sources into reusable data components for the purposes of better accessing, aggregating and of managing data. Our SOA founding forefathers taught us that service-oriented architectures tend to be much more agile than their constrained EAI contemporaries. This is also the way we need to consider our data architectures.

There are a number of reasons that an organization might undertake a data-services implementation.

Data is everywhere. The rate at which data grows in organizations is on the rise as is its critical importance for turning data into useful information.
Data originates from a variety of structured and unstructured sources.
Without data services, data access is too complex a task, involving different data models and data formats for each data source.
The proliferation of point-to-point connections between numerous data consumers and providers increases as IT implementations evolve.
The lack of a real-time, unified view across multiple sources makes data even more difficult to leverage consistently.



Figure 1


For most organizations, data appears on the architectural radar only when there is a mandate to derive specific value from collecting data. The data needed may be related to sales forecasts, campaign results, customer-service metrics, regulatory compliance, operational performance results, or customer profiling. Companies initiating projects that require data must make choices. For instance, should it continue to connect data in the same customized, rigid, point-to-point way that it uses for applications (Figure 1)?

Experienced IT professionals know that single-use data integration projects create long-term maintenance challenges and offer negligible ROI on the data integration portions of the project. The better strategy is to apply SOA principles to data integration, turning data into a service that is available as logical modules, each with a standards-based interface. This enables users to access and use the service more easily, improves data visibility, and promotes greater reuse.



Understanding Data Federation Patterns

There is profound agreement in the industry that data services have a transformational influence on enterprise data-centric architectures. Our analysis indicates that there are three important scenarios where data can be exposed as reusable services (Figure 2):
Simple Data Access
Data Hub
Data Federation Services
For a solution to meet expected results of the moniker of data services, it must achieve more than simple data access. Many off-the shelf SOA, BPM, DB products and even development tools include basic functionality for accessing data sources. Data services can also be generated from data-consolidation - bulk data scenarios through the use of ETL as a data hub.

Finally, data federation is defined as the capability to aggregate information across multiple sources into a single view. It leaves data at the source and consolidates information virtually, almost the same way an enterprise service bus (ESB) virtualizes messages but for data. Data federation essentially allows companies to aggregate data across multiple sources into a real-time view which can be re-used as a service.


Figure 2


We’ll next look at these three examples in detail and discuss the pros and cons of implementing each pattern.
Published at DZone with permission of its author, Masoud Kalali.

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)