Indeed, using Data Analytics is a fanciful proposition, given its look and feel-good reporting showcasing patterns, relationships and trends. Indeed, using Data Analytics allows for an improved reliance on the reporting owing to greater credibility. Indeed, using Data Analytics enables greater comfort due to its perceived objectivity owing to figures reported rather than opinions and statements.

But should it be a replacement for all sorts of reporting? And more importantly, should it be the basis of all decision making?

Certainly not! For the simple reason that Data Analytics are neither all-encompassing nor free from bias, much like the non-analytics-based approaches. This in turn means that it isn’t something that cannot be corrected by exercising due professional care.

Let’s get to that once we’ve reviewed the problems with data analytics reliance.

Problems with Data Analytics Reliance

Problems with Data & Analytics

Impacts how

Integrity of sources

If the source from where the data is planned to be procured is not credible the data would be misleading.

For instance, the integrity of the source could be impaired if it lacks access controls, authorization, input controls, data processing controls, output controls, etc.

Objectivity of sources

If the source from where the data is obtained is subjective / biased, the data would not be valid for the purpose it is being analyzed.

For example, the source wouldn’t be objective if it only captures data that is anomalous or non-compliant or is influenced by unusual events such that entries pertaining to these are recorded in a way to afford them greater weightage, or is based on questionnaires that carry closed and leading questions that can carry respondent bias, etc. The data in all these cases would not be representative of the entire population.

Methodologies and Approaches of Data Extraction

The methodology to obtain access to data, stage of the process where the data will be utilized and the form the data would be in (raw / refined), the parameters used to query data, the sampling approach will all impact upon the validity of data.

When data is obtained indirectly after revealing its purpose, for instance, the resultant data might be skewed.

Similarly, if the analysis is undertaken at a stage so late in the process that it becomes useless to alter a result or approach already arrived at or the data obtained is prior to application of controls, the data would not be fit for use.

Also, if the query through which data is obtained is not complete or is incorrect, the resultant data would have the same issues and if the sampling method is biased in favor of certain amounts or timelines, the data would not be representative of the population.

Aggregation (summarizing), Modelling and Compiling

The techniques used to aggregate, model and compile the data have bearing on the quality of data procured for the purposes of analyses.

Data aggregation might not be possible or might be inaccurate if the common fields (keys) within multiple datasets are unaligned for the purpose of analyses or contain different data. Incomplete and incorrect aggregation would have an impact on the correct compliance and non-compliance extent determination for example, resulting in misleading analysis. If the aggregation becomes impossible, datasets meaningful to the analyses might have to be ignored.

Data sets that are gibberish would remain so even if extensively modelled, and even if they could be modelled at all! Furthermore, the modelling technique might not be suited to the objectives of the analyses.

The compilation (aggregating findings) exercise might be strictly confined to the purpose of analyses, leaving out other insightful revelations that could be significant inferences. On the other hand, the compilation technique might not be aligned with the purpose of analyses.

Availability and use of multiple sources

The possibility of multiple data sources being available for analysis might not be fully explored simply owing to lack of information or decidedly not explored owing to lack of resources and time.

The alternatives available might be more suited for the purpose of analyses or their inclusion might reveal differing inferences or lend more credence to the analyses. If these alternatives are ignored, meaningful insights couldn’t be derived even if objectives of analyses are met, leading to subpar usage of the analyses.

Correlation and linkages with other different datasets

If trends and patterns identified from analysis of a dataset are not correlated and linked with trends and patterns from other dataset, the analysis could be incomplete or worse could be misleading.

Inferences drawn on analysis of a dataset cannot be assumed to be comprehensive if these are not correlated with other information. Furthermore, the inferences drawn from analysis of a singular dataset could be contrary to the inferences drawn from analyses of numerous other datasets or information.

Also, this other information could identify recurring events explaining the anomalies identified in an analysis which if not considered could lead to drawing conclusions that would be factually incorrect.

Incompetent design of database

This one is basic. A database that’s poorly structured will not capture all relevant data (lack of completeness) or captures data that’s not aligned with the objectives of the process.

For the purpose of a meaningful analysis this will mandatorily require inclusion of a number of datasets or creation of one through a fit for purpose query. Even a multitude of analyses over a dataset coming out of an incompetently designed database would serve no purpose.

Attraction bias

Getting attracted to fancy visual representations of data is yet another purpose the data analyses is usually sought after when it alone definitely makes up the most nonsensical case for data analytics.

All these issues indicate that data analytics shouldn’t be the sole approach to reporting and decision making.

Because, even if done carefully enough, it cannot consider all such factors that aren’t captured by data. As within the realm of performance management it is widely believed that not everything could be objectively measured through quantifiable metrics with the most notable exceptions being softer / human considerations, a certain weightage of judgement should always be in place.

That is exactly the place that should be afforded to data analytics, an aid, a tool, a means to an end, something that supplements! And that is also how the problems described above could be managed. Its use, role, purpose, objectives, derivation sources and methodologies need to be planned in advance by conventional wisdom and approaches! Indeed, if done this way, data analytics could be the most significant and objective reporting and decision-making tool one could use.

As for us, the internal auditors, be it audit planning, transaction data analysis to determine the state of controls flagging and segregating anomalies, etc., determining extent of detailed testing, presenting the amounts-based exposures involved or even showcasing the audit work in perspective of figures and visuals, data analytics is pivotal.

But the patterns, trends and relationships it shows need to corroborate / align with information that is obtained going forward or already obtained through other regular audit procedures. No analytics can eliminate the need for detailed testing of transactions to confirm anomalies and unearth the reason for these. And most importantly and to begin with if the dataset is a complete and accurate reflection of the state of process under review!

 

Once, when I was told that a data analysis, I was preparing over the medical expense incurred on and chronic ailments amongst employees could be utilized to make decisions about their medical fitness for work, I was excited with the interesting prospects this had.

It was only when no decision making was made on its basis, thankfully, I realized that although management used its typical approach of picking and choosing candidates amongst employees to serve a restructuring requirement, had a decision been made and executed solely on the basis of that data analysis, it would have been nothing short of a disaster.

Because the relevant, even more meaningful and linkable datasets not included in that analysis were employee performance metrics results, employee attendance and regular work processing results! Being medically fit for a job should ideally come from these and not simply costs!