Enterprise Integration Zone is brought to you in partnership with:

Gero has posted 3 posts at DZone. View Full User Profile

Top 10 SOA Pitfalls: #4 - Incorrectly applied Canonical Data Model

06.26.2008
| 11614 views |
  • submit to reddit

Earlier this week Vincent explained the BDUF Pitfall en this week we’ll continue with #4: Incorrectly applied Canonical Data Model (CDM).

CDM is one of the silver bullets often fired in SOA projects. It should address miscommunication, ease integration and reduce integration costs. It surely can facilitate all of this, but attempts to use a CDM can also turn your SOA project into an endless discussion because one attempts to cover too much, because of a lack of alignment with business and because of a lack of design principles.

A Canonical Data Model (sometimes called CIM: Common Information Model) defines the business entities relevant for a specific integration domain, their relations and their semantics. What added value does a well defined and correctly used CDM bring to the table? First of all, it facilitates a common understanding of what a business entity really is. For example is the ‘Customer’ business entity a person or organization? Or is ‘Customer’ business entity a role that can be executed by a ‘Person’ or ‘Organization’ entity. In the same realm of "understanding", it facilitates a common understanding of the relations between business entities. This common understanding eases communication between departments and on a broader scope between organizations as illustrated by the SID model of TM Forum. Lastly, integration costs can be significantly reduced if systems to be integrated speak the same language / use the same concepts (language in this case is not a programming language, but an understanding what a business entity is and what the relations between these entities are).

What are the pitfalls when attempting to create and use a CDM? CDM creators often try to boil the ocean and include each and every piece of information used in the organization. This explodes the amount of entities to be modeled and turns the CDM initiative into an endless exercise. A CDM is intended to be used in the integration domain and should therefore only include entities that are relevant in that domain. Another pitfall refers back to SOA Pitfall #10: Not Invented Here Syndrome and are from the ground up developed CDMs. Potential models that could be reused are ignored, while various potential reuseable domain models are available (SID, UDEF and AFD). Some are industry specific, but even then, definitions for customer, contracts, etc. can often be reused. The next pitfall is the big flat CDM without any structuring. This makes the model hard to use and understand, even when you only need to interact with a small part of it. It slows down adoption of the model. Adoption is also slowed down by inconsistencies in naming conventions and modeling patterns used. One of the biggest pitfalls is to not consult domain experts when defining the CDM entities and their relations. A CDM, just like any IT artifact, should support the business. Therefore it is crucial to ensure that the model reflects the business and it not a pure IT view. And lastly there is the pitfall of CDMs based on vendor models or current applications. A model like a CDM should model business concepts and therefore not be bound to vendors or current applications. Both vendors and systems come and go, your business hopefully survives these.

To prevent failure of the CDM the following guidelines can help:

  • Develop a CDM in context of concrete projects and include entities that are needed for these projects. Sure, you need to think ahead a little, but this does not imply the model should include every entity you can think of. When looking ahead, consider where the model should be extensible and focus on the entities that are currently needed.
  • CDM CoverageA CDM is intended to be used for integration and therefore should only cover entities that are used on the integration layer. Entities that are not exchanged between systems should not be part of the CDM. The bright red area of the figure on the right illustrates the information that should be part of the CDM.
  • Divide the model into a number of domains with strong cohesion between the entities in a domain and loose coupling between the various domains. This eases understanding and adoption of the model because readers can easily focus on the parts that are relevant for them.
  • Check if there are models available that you can use as a base. There are industry specific models available and sometimes these models also include generic concepts like customers, products, contracts, prices, etc. that can perfectly be used outside that particular industry. Examples are SID, UDEF and AFD.
  • Collaborate with domain experts, they know the business best and can help you ensure that the model reflects the business.
  • Define a limited set of design principles to be used. This will lead to a more consistent and easier to understand model. Clear naming conventions for entities, attributes and relations also helps in this area.
  • Provide examples that illustrate how the model should be used, how it should not be used and the reasoning behind it.
  • When distributing the model, do it in a way that readers can easily navigate through it.
Defining a CDM is a challenging exercise, but following these guidelines should help you to win the challenge. Not using a CDM in an SOA can introduce extra complexity in the SOA because there will be many point-to-point connections on the data level. As stated in pitfall #6 - SOA does not solve complexity automatically, a CDM is one the items that can reduce complexity.Next week Viktor will take us to #3...
Published at DZone with permission of its author, Gero Vermaas.

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)