Alteryx Automation Series Validation

Alteryx Automation Series: Validation

Building trust in data in any organisation is always a challenge. When you are working in a fast paced environment and the next change in data is happening before the last one has been signed off, you need to have a sure fire way of keeping on top of your data validity.

Here is why Alteryx, Alteryx Server and an Alteryx ACE are super perfect for this task.

Looking under the hood

When you have access to your base code and KPI definitions and you need to compare them against the reporting layer. In all organisations data can be messy, it can go through a number of data points, transformations and when it lands in the final destination it can be very different from the starting point. Therefore data validation is key to ensure that data is monitored and kept fresh for your end users.

This does of course mean you are taking ownership of the data’s accuracy, because without ownership you can never raise the profile, gain the access you need and develop the relationships within the organisation to fix any data issues you may uncover. So make sure you are in it for the right reasons.

Validation through data points does have a particular design pattern in Alteryx that you can follow, its a standard one and you may even recognise it from some of your workflows already.

Firstly you need to understand your KPIs and your definitions of these KPIs before you jump in. i.e. Sales = Gross_Sales – Returns, so that when you are calculating these at each data point you have the correct detail.

Secondly you need to know the architecture of your data flow within the business and have access to these systems so that you know how many data points you need to test.

Thirdly do not try and go to granular with this, keep high level, i.e. monthly or totals across each data point

Lastly, you need access to the data points, Alteryx will do all the hard work but make sure you have permissions and your mission is understood for the benefits to the business. (such as ensuring the data at the reporting layer is accurate)

Example of a Validation Design Pattern in Alteryx

As you can see we first lookup the current period as this runs daily, depending on your business you could use a Date Time Tool here but sometimes that is not possible for example you may run marketing campaign style where you have a business period of two weeks or two months, so this is why the initial input depends entirely on your business. The formula tools are only there if you need to specifically run the validation for a particular business period, its a quick way of not having to change your workflow but having the tools in place to do so, if you need to.

Then with the dynamic input we grab the KPI definitions at each data point. This can be multiple places within the organisation, so you can see here I have two but the reality is that its more likely to be three or four in any business.

Before I go onto transpose I just wanted to show you an example of the data before and afterwards so you can visualise what is happening here.

Before and After the Transpose Tool

Transposing the data is important so that you can have as many KPIs as you need as rows, ensuring that the business period is your KEY, KPI Name is your NAME and the KPIs are your VALUES. As we add data points we can then use a JOIN tool to ensure that we make it as easy as possible to compare them across systems, as we all know its easiest to compare them left to right.

Now an important factor for your join is that you need to join on the business period value and the Name of your KPI, this is why it is imperative in your SQL Queries within the dynamic inputs you call your KPIs the same name, even if their names within that system may be different. Personally I always use the reporting name as the KPI name and the Source data system as Data Point 1. Why the reporting name and not carry out the source name through to the end? Well this is purely down to the business users / end users at the end of the day. If you are to be visible about this information, which I encourage, they will understand the reporting name of SALES, PROFIT versus BI34, FIELD0192, for example.

Post Join Tool – renaming Values columns to relate to the
Data Points

Within your join, rename your VALUE fields from your two data streams to the name of your Data Points and ensure that your source system is always the first, then the next data point is your second, etc. Once this is complete it is to the Formula tool, where a few columns need setting up so that your output looks like the following:

Post Formula Tool – Final Output

Within your formula tool set up the following:

  • Difference – This can be absolute or just pure difference (Data Point 2 – Data Point 1)
  • % Error – 1 – (Data Point 2 / Data Point 1)
  • Validation – if Difference = 0 then “VALID” else “INVALID” endif
  • Validation Engine – “Alteryx”
  • Date – DateTimeNow() – set as a Date datatype
  • Time – DateTimeNow() – set as a Time datatype

Now all that is left to do is set it up to run once a day on your Alteryx Server and save it to your Reporting database and overlay a visual on it for your end users and yourself. You can also extend this by sending an email to your data team should these results be “INVALID” so that they can act up on it immediately. We have an amazing dashboard that shows us source as 100% and then the reporting layer at 99.5% so that we can see the timing differences and the potential gap in the data whenever we need, its perfect for validating what we need before we embark on a data project.

So today we have learnt how to quickly set up validation between your data points within your organisation to build TRUST within your organisation on data within your reporting layer using Alteryx and Alteryx Server. Just remember it is only quick if you have access to the data points in the first place. Enjoy owning your data.

The next post in the series is going to help you think dynamically within Alteryx and when or why you would change an input tool to a dynamic input tool.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s