Demystifying Data Federation for SOA



Turning Bad Data Good

Bad data puts enterprise software projects at risk. Because inaccurate and inconsistent data exists pretty much everywhere, the demand for trusted data continues to spiral upward, driven by investments in packaged applications and business intelligence applications. Strategic IT initiatives like MDM also add additional pressure. Further complicating the matter, regulatory compliance initiatives (such Sarbanes-Oxley, U.S. Patriot Act, and Basel II) require tracing the source of the data used in financial reports, as well as examination, tracking (through snapshots), and certification of the state and quality of the business data.


Figure 5


Technology alone doesn’t deliver trusted data. Information managers need to define what data quality means to their organizations. These can be implemented through data governance programs that can define data quality rules and their respective processes for how they are maintained, approved and iterated to improve benefits.

Data quality generally requires data movement. Often we see data cleansing examples that are part of data hub approaches, as shown in Figure 6. That is not to say that data federation cannot implement a more real-time style of data quality during the process of data access, but the cleansing actions must be architected carefully for performance when they are implemented through the query access step of federation. For example, many MDM systems require a cleansed customer record to be presented on the application screen, if that record must be recovered from the source system without any cleansing; it can be performed in real-time once the data is accessed from the source. This type of real-time data cleansing is tricky to optimize correctly, but entirely possible in practice.

Regardless of the technology, without a fundamental data governance strategy at the core, these initiatives may suffer, just as many SOA implementations languished without a strong SOA Governance foundation to give greater visibility and control.



Consolidate or Federate… or Both?

At first glance it may appear that data federation and data consolidation are at polar opposites. Federation acts to abstract data from multiple sources into a view; consolidation acts to move data into a central hub. In fact, we’re starting to see companies that use the best of both worlds together in their enterprise data-centric architectures. In fact, most pragmatic organizations start by integrating business services in a ‘point-to-point’ manner and only shift to a virtualization approach when the need arises.


Figure 6


For example: A company may require a data hub for storing Web-store-front purchasing data. It uses consolidation/ETL to easily build, deploy and manage this data hub. It can change data capture technology to keep this data hub in sync with any real-time updates. But for a call center application that needs customer information and a whole lot more, it may need to access data beyond the single customer hub. The data used by the call center may live also across other data marts and other application constituents, hence an opportunity for data federation as an alternative to designing a new call center operational data store (Figure 6).

An exciting element of this platform is that vendors are merging different data integration technologies into unified solutions, enabling multiple tools to be used at will. We will undoubtedly be seeing more of these approaches as federation becomes useful for business intelligence and MDM applications that are broadening the need for more flexible and agile data integration approaches.

0
Average: 5 (1 vote)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)