A common problem I have come across when geocoding data from the UK is extracting the postcode from a single string field. The issue being unable to dynamically extract the Postcode without having to split a single column into multiple columns, based on a delimiter and then creating some logic to capture the Postcode from the correct columns. Previous solutions I’ve come up with to solve this problem, utilising such a methodology are typically one-offs tailored for a specific data set and not repeatable. However the first the Alteryx User Group (Midlands) I attended showed me an alternative solution. In this blog I going to show how to solve this problem first in an Alteryx workflow then produce a Macro so that I can reuse this solution. Then finally I am going to turn my macro into a HTML macro using the HTML SDK.
The first Alteryx User Group I attended exposed me to Regex (Regular expression) which is a sequence of characters that define a search pattern. Regex is a very useful and powerful toolset particular for extracting postcodes as there are only 6 sequences for UK postcodes so you can code for each sequence and extract from a string field.
Below is the Alteryx workflow I developed which I will explain in more detail. The workflow doesn’t utilises many tools and essentially creates a Record ID for each row of data to be joined again later to ensure none of the original data in the workflow that is inputted is lost. The Regex tools extracts the Postcodes from the character string for all 6 possible Postcode formats, plus outputs the reminder of the original string column minus the postcodes parsed into 6 relevant fields respectively. The formula tool combines all the columns together using an If formula to merge all the columns into one Postcode column and one address field. The Select tool deselects the unwanted fields then the Join tool combines the data stream of the extracted postcodes to the data stream with the original data, this is so any data that was not parsed will be outputted in the right join to be investigated.
The Regex tool in the workflow is set to parse as an output method and the Regular Expression holds the sequence of characters I am looking for. A number of output fields are created for each sequence I am searching for. In fact I create six Original_Field, Address and Postcode fields as that is how many sequences I am searching for. I will admit I could write a more condensed Regular Expression however this expression in my opinion is easy to troubleshoot and explain as there are 6 possible postcode formats I am looking for. Also this Regular Expression gives me two columns an output column with the postcodes and then a column minis the postcode sequence I have parsed.
As stated there are six possible formats for a UK Postcode the Regex Code below shows in more detail the code that parses the Postcode.
The output of this Regex in Alteryx will create a column an Original Field an Address and Postcode for each Postcode sequence. As you can see from the output the Regex is extracting the Postcode from the Address field. AA9 9AA was extracted in Postcode1 because in the Regex the sequence of “\u\u\d\s\d\u\u” was found in the string and the same occurs for the other sequences I have searched for.
My approach to parse UK postcodes create six versions of every columns and some of the rows of data have null values. Therefore I utilise a formula tool with an If formulas to replace any null rows with a row that does contain data for each relevant column.
The select tool removes unwanted fields as I have used the formula tool to condense the data I have parsed into three columns as seen below. The Output has an Original Field which is the data being inputted then an Address field which is the string minis the sequence parsed from regex and then finally the Postcode field, which is parsed due to matching the Regex sequences found from the input field.
The last tool is the Join tool which allows any other columns from the data set to be joined onto the data stream of parsed postcodes. Any data that hasn’t been parsed will come out of the Right output and can be investigated as to the reason why, while anything that has parsed will come out from the Join Output.
The result of the Join output shows that I have successfully extracted the Postcodes from the string field and I have also created an address field. The Regex I utilised in this workflow could be utilised for other data sets and could be part of a dynamic and re-usable solution.
The first way to re-use this Regex is to create a Macro in Alteryx. A macro in Alteryx is a workflow or a group of tools built into a single tool of an analytical process that you perform repeatedly. An added benefit of macros is you can give your users options for how they use the macro. The Macro can be saved to location on your Computer or network added to the Macros Tools in Alteryx and even the icon can be customised or you could use standard icons provided by Alteryx.
To change the original Alteryx workflow into an Alteryx Macro wasn’t difficult all I needed was a Macro Input and Two Macro Outputs and a Dropdown interface tool to allow a user to choose which field to parse. The only change required from the Alteryx workflow was a select tool this was in order for the action to update the column name to be used in the Macro. Otherwise the workflow would error as the regex tool wouldn’t know which field to parse. However for my macro version of this solution I wanted enable the end user to choose the output they receive so have added in a radio button option which disables a tool container when selected. The options of the radio buttons affects if you receive an address column as in some cases you just may want the postcode to geocode and wish to leave the original field intact.
The workflow now has two tools containers both of which are enabled, however when the end user selects one of the options of the radio buttons it subsequently disables one of the tools containers. The results of the enabled tool container are pushed through the union tool and is outputted. This setup means no matter what option is selected, results should always be passed through this macro. The tool containers are controlled by the radio buttons interface tools for example when the user selects Regex Parse: Postcode it will disable the tool container labelled “Regex Parse: Address and Postcode”. I utilise a dropdown tool to enable the end user to select what field is the address string they want to parse the postcode from. The results of these changes means now this solution is more dynamic and can be more easily applied to other datasets.
When creating a macro Alteryx provides you with a list of default standard icons images however for my macro I have created my own custom icon, which I created using a software called Inkscape. Weekly Challenge 97 is very useful if you want to get the tool icon to use to create your own custom icon that looks similar to the other Alteryx tools as I have done for this macro.
The output of the weekly challenge 97 is that you create an image for every tool icon in Alteryx. I utilised the Regex icon and used Inkscape to trace the icon and then I fill in the white spaces with the same colour of the icon and then I paste an envelope icon on top then save the image as new icon for the Postcode Parser Macro.
Why create a custom tool (Html Macro)? It makes sharing easier as the file format .yxi is just a zipped package with all the tool and components. When opened the tool is saved automatically and added to the tool bar and is ready to use. The Alternative of just using a standard macro within Alteryx and sharing with other users it requires access to the same location, where the macro is saved and setting up Macro repository via user’s settings in designer. This process can be time consuming and requires a number of steps the alternative offers a more seamless approach.
From previous experience of hosting macros on a shared network drive if your user’s connection to that network drive is interrupted they also lose access to macros. Another issue I found is the Macro tool category is you have to scroll to find on the right hand side of the tool bar and is not necessarily the most intuitive place.
There a number of useful links when creating customs tool which helped me.
Alteryx also provides an example of both Html and Python SDK in your version of Alteryx Designer which is useful when creating your own tool as they provide examples widget saving you starting from scratch.
The most important thing when creating a custom tool is the folder structure essentially you need to create a 3 tier folder structure. The first tier has a folder with the tool to be installed the config.xml and then an icon. Then next tier holds a folder with the supporting macro then another config.xml for the tool then an html file and icon that make up the tool. The last tier holds the macro of which the tool is based on.
Config File: PostcodePaserToolConfig.xml
This xml tells the Alteryx engine what the tool is and how to use it. It essentially the back end plumbing between the macro and the tool. Here you need adjust the Engine Settings the GUI Settings and the connections.
The connection section should match the number of Inputs and Outputs as your in macro they also must have the same name in the connections as in the macro. The reason for this is so Alteryx knows what the inputs and outputs streams of the tool are. For the EngineDllEntryPoint and the EngineSetting sections it points to the Marco I’ve built so that the instructions from the tool is passed to the macro. Most importantly you can select the Category for which the tool should be included in. As I have chosen Parse it should be installed with the other Parse tools such as the Regex tool.
Building the User Interface (PostcodePaserToolGui.html)
For the user interface I utilised HTML – GUI library example as it has a vast choices of widgets. Building the interface was trickier than I originally anticipated. I had to reach out to the Alteryx customer support and the community to find a solution. The solution to building an interface similar to my macro did mean I had to make some changes to the original macro. For the Postcode Parser Tool I needed a DropDown widget and choose field selector enabling my user to select a field from their workflow. For the radios buttons I had to utilise a radiogroup and specify my options and the values that would be passed into the workflow when a radio button was selected. The dataName needs to match the names of the interface tools used in the Macro.
As the dataName needs to match the names of the interface tools and also as the radiogroup is passing a string value when selected I had to configure the macro to accommodate for this. I used a dropdown interface rather than two radio buttons with the selection of the dropdown matching the values of the RadioGroupSelector. This setup still has the same effect of the original macro when the end user makes a selection only one of the tool containers will be enabled and pass through the results.
Despite changing the macro from using a Radio buttons to a dropdown to control which tool container is enabled the interface shows the options as radio buttons due to selecting the radiogroup widget from GUI library.
This config file is the Metainfo displayed on the pop-up when the tool is being installed.
The MetaInfo provides important information to the user but allows you to version and author your tool.
To share the New HTML macro you need the zip your tool specific folder.
After you zipped all the files change the format to .yxi a warning error will appear but press yes.
This is the final step and now you have your installer for your custom tool.
Once this tool is shared the user just needs to click on it and it will automatically install on that users version of Alteryx designer and be placed into the tool category you specified.
Hope you find this solution helpful and shows how to can develop a solution into a reusable tool. Please leave a comment below or reach out to me on Twitter @JosephSerpis or on the Alteryx Community and Linkedin.