Tuesday, July 22, 2008

Getting Meta All The Time

I recently did an interview with Inside Reference Data Magazine and here is the outcome....

Metadata may not be the best word to use to try to get senior management excited about reference data projects. But metadata management is a vital part of enterprise data management (EDM), and as EDM projects are now maturing, metadata is swiftly moving up the agenda. Tine Thoresen explores the best strategies for implementing metadata tools and systems

Getting Meta All the Time

Metadata may not be the best word to use to try to get senior management excited about reference data projects. But metadata management is a vital part of enterprise data management (EDM), and as EDM projects are now maturing, metadata is swiftly moving up the agenda. Tine Thoresen explores the best strategies for implementing metadata tools and systems

John Carroll, who recently retired from Merrill Lynch, had been working in the same role for 10 years. He was initially hired to build on the vision of an enterprise data repository. But this type of project is not done overnight. Carroll was still working on enhancing the system 10 years on. This is the reality of reference data initiatives - the projects often seem never-ending. Yet, many enterprise data management projects are now entering a new phase - the metadata management phase. Firms are starting to focus more on standardizing and centralizing 'data about the data.'

Peter Serenita, chief data officer at JP Morgan Chase, says the types of reference data technology he expects firms to invest in this year are metadata management tools. "Firms will invest in the ability to define data using data management tools so that this information is not buried in code," he says (Inside Reference Data, Reference Data Technology Special Report, May 2008).

But so far there have not been many metadata success stories in the financial industry. These types of projects can require a lot of resources, as a large portion of the work tends to be taxonomic. Adam Honore, senior analyst at Aite Group, says some of the larger firms are currently working on these projects, but that it is probably too early to talk about successes.

Yet, this is not to say that metadata management tools cannot add tremendous value. "By providing visibility to the data definitions, the firm's ability to manage the data increases exponentially," says Serenita. It is often difficult to improve data quality if data definitions are all over the place. Everyone needs to be on the same page - both to improve reporting and consolidate silos.

To succeed, firms simply need to identify the best strategy to overcome challenges such as structuring the data, increasing automation, keeping the content current and managing the refresh rate. But this is obviously not that simple when there is little public information on how to get there for financial institutions.

So, as with every other data problem, can firms turn to vendors for help? Honore, who focuses on financial services, says the projects he has seen have been internal developments, and he does not know of anyone who is using off-the-shelf tools.

There are, however, many vendors targeting the metadata management space across industries. Stu Carty, founder and metadata solutions expert at Gavilian Research, says there are 17 vendors offering metadata solutions. The providers can be divided into two groups - one group offers independent metadata management tools that can be bought separately and the other offers metadata management tools bundled into a tool suite.

“The only way to collect, refine,
integrate and communicate this
data (about data) is to ensure it
can bi-directionally integrate via
open standards”

Greg Keller, Embarcadero Technologies

The third option, which Honore mentioned, is building systems in-house. Carty says most firms might start with internal developments, mainly using Excel, but since vendors are getting better and better, off-the-shelf solutions are becoming more viable. Metadata management solutions are database applications that can be filled with descriptions and relationships. Carty says Google is an example of a metadata tool - probably the biggest one.

Finding the Right Product

Although Google might help firms find information, selecting vendors requires more thorough research. When assessing applications, firms should look for harvesting population utilities, according to Carty. He says most customers will be able to automatically load 80% of definitions into the system, while 20% will typically be manually inputted. So when reviewing applications, firms should be shooting for that 80%.

And standards can offer better integration opportunities. Greg Keller, vice-president, product management at metadata management product vendor Embarcadero Technologies, advises firms to move on if the tool does not speak 'standards.' "There is no panacea, but the only way to collect, refine, integrate and communicate this data (about data) is to ensure it can bi-directionally integrate via open standards such as the OMG's (object management group) set of data interchange standards," he says, adding that OMG's forthcoming information management metamodel (IMM) implementation is rooted in universally defining storage for data standards.

Keller also says firms should choose model-driven products - "critical to be able to express complex business rules in a manner that can abstract the information in a graphical format" - and products that embrace logical and physical data modeling as a means to express the core underlying metadata.

In fact, ease of use of the application is vital to review when assessing metadata tools. This might sound very basic, but it is particularly important in this market, where some tools have traditionally had a poor reputation. Carty says some products are difficult to use and implement, but the best vendors today recognize that problem, are focusing on making it easier for customers and are leveraging new technology.

So the area is worth exploring. Some firms have seen early successes, according to Keller, who has witnessed some "well-implemented metadata management programs in large household banks both in the UK and US." In most cases, the projects revolved around data rationalisation, refining and documenting the metadata values of core business attributes in an effort to point to live data in order to continually assess quality, he says.

And as metadata management tools mature, the industry is set to see more of these projects succeed and materialize.

Q&A

The Metadata Master

IRD talks to Peter Serenita, chief data officer at JPMorgan Worldwide Securities Services, about how metadata management tools can help add transparency and flexibility to a data management program

Why will metadata management be an area firms will invest in this year?

The way to manage data effectively is via metadata. In 'the old days,' data projects were treated like standard technology projects and the technologies that were used were standard technology tools. Therefore, analysis was performed with a standard functional design and most of the data management logic was built into code. This worked tactically but the result was costly to maintain and treated data like transactions. With the use of metadata tools we can use data-style tools to manage the data and expose the end-to-end process instead of burying it in code. This adds transparency and flexibility to the data and the events that affect that data. It would also improve time to market not just for the initial solution but more importantly to any subsequent changes.

Would this be an area where it is worth looking at packaged solutions or are in-house builds more suitable?

Packaged solutions would certainly be preferable. Previously, metadata products were focused on specific areas of data management. These include business intelligence/data warehousing tools and ETL (Extract, Transform, Load) tools that traditionally leveraged (managed) metadata for its specific purpose. I believe there is a real opportunity here for a firm to focus on the overall management of data via a metadata tool from the analysis phase (ie, determining the end-to-end transformations that data goes through from its origination all the way to the end-user consumption).

In what ways can metadata management tools benefit the business?

The metadata management tools can help in two ways. Firstly during the data analysis phase, the metadata tool can help the team automate the analysis of the data between a set of vendors and the firm's standard reference data repository, as well as between the standard reference data repository and the downstream subscriber(s). The software would compare the data between the two sources (vendor to repository or repository to subscriber) and produce a report of how the data fields between the two sources align. An example is that the field CPN_Date in the repository matches the field DT_Coupon 98% of the time, however, the field CPN_Date needs to be multiplied by 100 to equal DT_Coupon. This would now define the transformation rule needed to move the coupon data from the repository to the downstream subscriber system. This software would allow the data analysis/data integration team to speed up their analysis as well as back test the transformations on an on-going basis to ensure good data quality. The next benefit is that you can then implement the transformation via a rule-set instead of burying the transformations in code. This will provide much more visibility to the transformation rules and allow for more agility in any necessary modifications.

Are there any pitfalls?

Metadata management tools represent continued automation of the data management process, so I would view it as a significant step forward as we continue to develop data-related tools instead of traditional technology tools to solve data management issues. The only pitfall I can see is if a particular tool locks in to a proprietary solution and does not integrate with all phases of the data management process, from analysis to execution/run-time.

How can it impact other MDM strategies?

The metadata tools can be used across all the master data management data domains. The issue of understanding data from origination to consumption is universal for client data, security data or any other data. True management of the data requires tools specifically built for data management, and the way to manage data is via a tool to manage the definitions (ie, metadata) and the content's consistency to the definitions.

Who would typically be the sponsor of a metadata management project within a financial institution? And who should own it?

The data management program should own the metadata management project as it is a key tool in its arsenal. Previously, metadata tools were managed 'to purpose,' that is, that the metadata to support data warehousing was managed by the data warehouse team and so forth. By having the data management program sponsor the metadata management project/tool, the data analyst and data team can manage the data standards as well as the translation to those standards to other legacy formats.

© 2008 Incisive Media Investments. All rights reserved. Used by permission. First published in IRD June 2008.

2 comments:

Anonymous said...

I read Peter Serenita's quote about data analysis tools and was intrigued, "...The software would compare the data between the two sources (vendor to repository or repository to subscriber) and produce a report of how the data fields between the two sources align. An example is that the field CPN_Date in the repository matches the field DT_Coupon 98% of the time, however, the field CPN_Date needs to be multiplied by 100 to equal DT_Coupon. "

The only software that I have heard of that even claims to do what Peter mentions is software from a startup company called Exeros. I was wondering, does JP Morgan Chase use this software? Did it solve their problem? Are there other software companies who also solve the problem of discovering metadata between data sources and then measuring its accuracy?

Greg said...

Thanks for the comment, Anon. Truthfully, I am not sure if JPMC uses Exeros or not and am not intimate with its feature functionality. For the functionality you are speaking about, Embarcadero Change Manager’s CM/Data module would be ripe for this use case as well. As an example, say you have a Sybase instance and an Oracle one. Perhaps data from Sybase is replicated/ETL’d to Oracle and should be the same once replicated. CM/Data can scan and compare this data across these database platforms in order to look at the discrepancies and differences and provide reports on %’s of differences in addition to detail the exact rows which are different. In any event, this may be interesting for you to have a look at.

Thanks for commenting!