Data quality assurance: what other disciplines can teach us
Data quality as both a necessity and ideal is not unique to the capital markets, yet little has been written regarding techniques common in other disciplines that can be applied to our world, such as data collection, error/anomaly detection, and data cleansing.
Looking for a place to start, I used the world’s most popular MapReduce search engine (aka Google) to look for “Data Quality Assurance”, which led me to an organization named DataONE (www.dataone.org/). According to their website, “DataONE is a community driven project providing access to data across multiple member repositories, supporting enhanced search and discovery of Earth and environmental data. DataONE promotes best practices in data management through responsive educational resources and materials.”
In Capital Markets, “member repositories” are typically Data Repository Service Providers (DRSPs), which are regulated industry utilities to which member firms are required to report their trading activity. Examples are FINRA’s OATS platform for Equities, the DTCC’s TRACE platform for Fixed Income, and various Trade Repositories (TRs) and Swaps Data Repositories, such as the LSE’s UnaVista and the DTCC’s Global Trade Repository (GTR), for derivatives trades and transactions.
In a sense DataONE’s support of “enhanced search and discovery of Earth and environmental data” is very similar to how DRSPs provide transparency into trades and transactions across regulated markets and jurisdictions. However, despite the existence of industry organizations like the International Swaps Dealers Association (ISDA) that provide a forum for standardization and feedback to regulators, no regulator or group of repositories “promotes best practices in data management”.
The DataONE website contains “Tutorials on Data Management” that are available for download (https://www.dataone.org/education-modules), covering the following topics:
- Why Data Management?
- Data Sharing
- Data Management Planning
- Data Entry and Manipulation
- Data Quality Control and Assurance
- Protecting Your Data
- Data Citation
- Analysis and Workflows
- Legal and Policy Issues
The Data Quality Control and Assurance module is particularly interesting to practitioners within the capital markets responsible for trade and transaction reporting, since what gets reported represents what their firm owns and has bought and sold — how much, at what price, with whom, and with derivatives, the details of each contract over its lifecycle. Any error or mismatch can result in substantial fines. While the stages in the feedback loop depicted in the Data Life Cycle diagram above is more applicable to the sciences, in which data is collected and measured, the sections below describe a feedback loop that applies to the field of trade and transaction reporting.
Anomaly Detection: Errors of Commission
As most practitioners know, and according to DataONE, there are two types of errors: Errors of Commission, in which incorrect data is entered, and Errors of Omission, in which data has not been entered or recorded.
Errors of commission are common, and always involve a person, e.g. a trader “fat-fingers” the wrong amount or price into a deal capture system (or wrong counterparty, wrong IBOR, wrong spread). The most egregious errors can be caught before the bad data starts flowing through the system in which it was entered. Reasonableness checks can be built into trading software to check for what DataONE calls “outliers” – data values that deviate from expected norms. Either these entries can be rejected outright, e.g. a swap with a notional value of 0, or users can be forced to confirm suspicious values, e.g. “Price of $2,000 USD is 200% higher than previous trade for this security. Are you sure? (Y/N)?” However, designers of trade and transaction reporting software need to assume that errors of commission have been allowed to flow through and thus need to have in place mechanisms to filter and catch bad data.
In 2017 it’s almost a given that post-trade processing and reporting systems are straight-through, message-based platforms. Industry standard protocols specific to payments, trades, and transactions such as SWIFT, FIX, and various forms of XML (FpML, IOS 20022) have been endorsed by regulators and industry associations alike, so the ultimate transmission of transaction reports is almost always in the form of an industry standard protocol. In order to get messages into the correct format, a software layer will always exist that enriches trade details with reference data and maps between a firm’s trading or booking systems and the required format. The software layer just described can also introduce errors of commission just like a human, so Data Quality Surveillance (DQS) software needs to exist to catch errors regardless of where they’re introduced.
This requires error detection software to be multi-layered: software that not only understands the protocol of the message, which will understand the syntax of a message, but more importantly the contents of each field within each message, which will need rules to understand the acceptable values within each field within a message. Depending on the type of data field being inspected, the DQS software will need to have rules for allowable text strings, numerical precision, date formats, text string length, fields that are allowed to be blank, fields whose values are dependent on each other, etc.
Anomaly Detection: Errors of Omission
Regulators are concerned not only with accurately-reported messages, but with COMPLETELY reported activities. Errors of omission, which would be when firms are found to have under-reported their trades and transactions, typically results in fines in the range of millions of pounds, euros, or dollars.
By the time a trade or transaction reporting message is formatted and ready for inspection by DQS software, the decision of whether a trading activity should be reported and WHERE it should be reported has already been made by upstream systems, thus the need for a COMPLETENESS control that is run on a periodic basis. Typically this will entail receiving a feed of all of a firm’s trading activity, running a set of rules to produce a set of expected trade and transaction reporting messages that SHOULD have been reported, and then matching those against the trade and transaction reports that were ACTUALLY reported. This will produce four buckets of reporting completeness:
- Correctly Reported
- Under-Reported (expected reports were missing from actual)
- Over-Reported (actual reports were not expected)
- Correctly Not Reported (Activities not eligible for reporting were not reported)
Data Quality Assurance
According to DataONE, Data Quality Assurance (DQA) covers “activities that involve monitoring and maintaining the quality of data”. For the groups within firms responsible for the accuracy and completeness of trade and transaction reports, DQA requires a dashboard from which authorized users can detect, investigate, and resolve data quality issues. This implies the existence of a workflow platform from which issues that are identified can be assigned, investigated, resolved, and closed.
Reconciliation with the repositories that have received a firm’s trade and transaction reports is a key step in the DQA process — ultimately even with DQS tools in place, a firm needs to guard itself against the possibility that data stored within a repository doesn’t match the firm’s books and records. Thus the DQA dashboard needs to provide users access to the following categories of trade and transaction reports:
- Correctly Reported Accurately
- Correctly Reported with breaks — the report is at the correct repository but not all fields match
- Under Reported
- Over Reported
Note that the “Correctly Not Reported” category need not flow into the DQA platform — if a firm didn’t report what it didn’t need to, then no action is necessary.
The final step is the existence of a feedback loop by which corrected trade and transaction messages can be (re)submitted. From within the context of a control framework, this may require integration with a firm’s reporting platform, but many times it can be a separate path to repositories that is used to submit trade or transaction reports that had been omitted or have been corrected. Once submitted, the Workflow component in the DQA platform will be updated to reflect that the data quality issued has been resolved.
As providers of Trade Reporting Data Quality Surveillance and Assurance solutions, it was interesting and reassuring to discover the existence of DataONE, which is an academic organization and is supportive of having content that they have created being used in this article.
As a rule, most capital markets firms have borrowed from disciplines like the natural sciences to model the behavior of financial markets and specific securities. In the case of data quality, especially in the middle/back office world of trade and transaction reporting, most data quality controls and assurance have been designed to look for Negative Acknowledgments (“NACKS) from Trade Repositories, which is an inefficient and 100% reactive approach to maintaining data quality.
By looking at how other fields outside of Financial Services approach and address Data Quality, we can learn from their best practices.
This article was first published in edition 10 of Rocket, our magazine. Download available Rocket editions here, and save your up to date address in your profile to to indicate your interest in receiving a printed copy of the magazine. Copies are also available to purchase and subscribe to via the shop.
To save your address into your profile:
- Visit the home page
- Click Account (in the middle of the row of black buttons)
- Click Edit Profile (in the row of buttons at the top)
- Click Reader (top right)
- There you can see your profile, with a box for your address - complete it accurately, and click Save