Location Optimisation (part 2)

In Part 1, we introduced the Location Optimiser macro and how the p-median based approach for solving location-allocation problems is shown in the Alteryx sample workflows (trying to minimise the overall weighted distance that demand points have to travel to a set of potential facilities). In this part, we’ll give some examples to bring it to life a bit, and discuss some of the different approaches to simply minimising distance.

The approach that may be needed will depend on how the location optimisation problem is framed. The minimise impedance problem was discussed in part 1 of the blog in the form of the p-median problem. This approach is designed to minimise the sum of weighted costs between demand and facilities. Another approach is to maximise coverage in some way to capture the most demand from a fixed number of facilities within a set distance or time. The result of this approach is that the most demand (people, patients, fires, crimes etc) within the distance cut-off will be selected (a set cut off means that some of the demand may be excluded if there is insufficient demand near to a facility). In both of these, you need to know how many facilities that you want to add. In cases where you don’t know this, the problem could be framed as minimising the number of facilities. That is, we are trying to maximise the coverage whilst minimising the facilities. For a bit more reading on these, see the useful overview in the ArcGIS network analyst help.

In practice, we may be asked questions that require us to re-frame them into one of these location optimisation approaches. Let’s imagine that we were launching a new brand into the UK. We might want to see “how many locations would we need for 80% of the population to live within 30 miles of a store?”. This is really a minimise-facilities problem, but with an acknowledgement that 20% of the population are going to live further away than 30 miles. Whilst we are trying to minimise the facilities, we are actually trying to maximise the population counts at each location by choosing those that are most densely populated.  

Of course, we want to use the location optimiser macro (which is what the blog is all about after all!), but you’ll remember that one of the inputs for this is the number of facilities we want to add to the network. This is exactly what we want to find! In actual fact, we can use a combination of an iterative macro to estimate the number of facilities, and then a location optimiser macro to find the best (the most population) combination of n facilities. This saves us running multiple configurations of the location optimiser which takes some time, whilst also making sure that we get the best overall combination of facilities.

Input Data

First, we need some demand points and some supply points. The former is just small-area population counts (for example Output Areas or LSOAs from the census which are displayed as centroids). For the latter, we need to create some potential facilities, as we don’t actually have any locations in mind. We can do this with a 10km grid around the UK coastline, and then extract the centroids with the spatial info or formula tool).

Estimating the number of facilities with an iterative macro

We want to use this macro to, run by run, select the facilities with the highest population within 30 miles, calculate the running percentage of overall population and then see how many facilities to run the location optimiser for. The challenge is that in Alteryx, iterative macros which have to run spatial processes can be (painfully) slow. However, we can design our macro to run entirely using joins and numeric calculations, doing any spatial work outside the iterating itself.

First, we create a large long-thin matrix table containing the supply and demand ID pairs (one demand point can be in many supply point’s 30-mile radius), the demand for each demand point, and the total demand for the supply point. In our example, there are almost 14 million pairs.  

Supply-Demand Pairs with Demand and Total Demand at the supply point

On the first run, we want to choose the supply point that has the highest total demand and use this as our first facility. Here, 12.5 million people live within 30 miles of the supply point, spread across almost 39k demand points.  

Next we need to burn down, or remove any of these demand points from the looping data. This is so that once a demand point is allocated, it can’t be used again. We do this by joining all of the supply-demand pairs to the demand points that are used, and only taking forward the unmatched pairs.

If we were to map the remaining demand points, we will see a large hole around London, the most densely populated part of the UK:

Note the hole covering London

The final thing is to re-create the input table of supply-demand pairs, and re-calculate each of the supply point’s total demand for the next iteration. In doing so, the order of the most densely populated supply points is changed. Many of the supply points having a high proportion of demand that was removed after the first iteration are now the least densely populated and are no longer viable to use as a facility. Accordingly, in iteration 2, the supply point at the next most densely populated part of the UK is chosen, and so on. Here is the macro in entirety:

If we calculate the running percent of population coverage for each iteration, this gives us a similar table to before, and we can map the facilities to use in order to reach coverage of 80% of the population within 30 miles. In this case, its 15 facilities.

This all runs in about 8 minutes, much quicker than the compute heavy location optimiser would be for assessing multiple facilities. Here are the final locations selected:

Final solution by iterating.

This method is really a brute force approach with cumulative population density above all else, in a single order. In fact, a better solution (higher population overall) could be reached by a different combination of facilities which have less overlap. The iterative approach does not allow this. So…

The location optimiser

With the output from the iterative macro – 15 – we set this as the number of facilities to find. Inside the location optimiser macro, we use the spatial match tool to find which demand points are within the 30 mile trade area. As the macro randomly chooses which 15 facilities to be part of the solution, we may have situations where a demand point is within 30 miles of more than 1 facilities within the solution. We can remove these duplicated demand points with the unique tool (it doesn’t really matter which facility they are assigned to in this case, since we are assessing the random selection of 15 facilities as a whole).

This assessment or scoring is done simply by summing up the population within the solution, and we want to change our macro settings to maximise the population who live within 30 miles of the set of facilities (a higher score).      

Then we can calculate the % of the population that the best 15 facilities has within 30 miles. The table below compares this to the iterative approach, and as you can see, whilst its pretty marginal, the location optimiser does find a better overall solution.

MethodPopulationUK Total Population% of population
Iterative Macro51,123,29163,182,17880.9%
Location Optimiser51,231,28463,182,17881.1%
Best Selected 15 Locations

As might be expected, the result is basically a map of the major cities and their surroundings, or where the highest combined population density is. This took about 15 minutes to run. Incidentally, a solution with 14 facilities almost reaches the cut off – 79.1% (again, marginally higher the the equivalent using just an iterative macro).

You’ll notice that the two approaches actually lead to quite similar results, both in terms of where they find and the number of locations needed. However, using the location optimiser finds a slightly better solution (81.1% vs 80.9%) using the same number of facilities. This is because the location optimiser finds the best overall solution, whilst the iterative approach is cumulative and has only 1 possible outcome – there is no flexibility in the combination of facilities that it will choose. For minimise facility problems, a combination of the two may work well to reduce computation time, first running an iterative approach to reach the likely number of facilities, and then running a location optimiser macro.

Obviously in reality, constraints of existing locations, capacity, closeness to road links and costs (and many more) play a much more significant role than this simple example, and using vehicle journey times would likely yield a different result as well. Nevertheless, this hopefully gives an idea of how location optimisation is a really powerful technique that can be used to solve many typical location-allocation problems in a few different ways.

Location Optimis(z)ation in Alteryx

Ever wondered what the mysterious 4th macro type is in Alteryx? The one at the bottom called the Location Optimizer macro…well, hopefully this blog (in two parts) will be useful. This type of macro is designed to answer what is known as location-allocation problems (more on this later). According to Alteryx, this is “a macro that runs in multiple iterations determining the best locations to add or remove from a location network”. This post gives a brief introduction into what these types of problems are and how these problems are solved by examining the Alteryx sample workflow. In the next part of the post, we will show it in action.

Location allocation problems have been written about in GIS and operations research since the 60s and 70s (for some detailed background reading see here, or Tomintz, Clarke & Alfadhli in chapter 11 of Chris Brunsdon and Alex Singleton’s excellent Geocompution: a practical primer). At their most simple, they try to choose a set of facilities which cost the least to serve the most demand. They are a specific type of optimisation problem where geography, and the link between a set of origins (or demand points) and supply points (or facilities), play the key role. The idea is to find the optimal set of supply points that serve the demand as effectively as possible. There are many examples of practical applications that may use this problem, from choosing locations for depots, stores and pharmacies, to optimising the location of police stations and hospitals.

In mature networks, these types of approaches can be more theoretical and strategic than practical – after all it is unlikely that lots of locations would be closed just to open a different series of slightly more optimum locations. Moreover, in industries where the opening and closing of locations is dynamic, careful consideration of the local market, and indeed micro-location factors, are likely to be more important in decision making than creating the optimum network as a whole, and this is before we get into issues of site acquisition and planning permission. However, this doesn’t preclude the approach from being useful for some purposes.

One of the earliest and most used examples is the p-median problem attributed in Hakami in 1964 (I couldn’t find a pdf of this, but details here). This tries to find the optimum number of p locations that minimises the weighted distance of the system. Its probably easiest to see this in practice, via the Alteryx sample workflows (Optimise location by minimising distance), with a small bit of explanation first. This calculates the distance between a set of origin (demand) points (the orange boxes) and a set of potential (supply) locations (blue dots), and in doing so, choose which one facility has the lowest overall cost. The diagrams below try to explain this bit by bit. Let’s say that have 3 potential locations, but we don’t really want to use all of these, instead we need to minimise the costs of operating these by choosing 1. The question in this case is which one is the best?

To choose the best facility, we need to loop through each facility in turn, calculating the total distances (and weighting these by the demand at each demand point), and then select the best single facility which has the lowest (weighted) distance overall.

Obviously in reality, the scale of the analysis is likely to be much larger – with many more potential locations and many more demand points, and we will probably want to select more than just 1 facility. In practice this often means hundreds or thousands of potential locations that can be selected in thousands of possible combinations. This has to be done by running multiple iterations until the (heuristically) optimal solution is found.

Let’s have a look at an example via the Alteryx sample workflow, which is really useful for understanding the nuts and bolts. As with all macros, we need to run it from another workflow, and the example tries to add in a set of 3 new facilities to an existing network of 6 facilities, from a candidate list of over 1800 potential locations. The idea is to find the best combination of 3 new facilities to add to the existing 6 facilities. At the heart of the sample is the LocOpt_MinDistanceOptimizer, as shown below:

Alteryx Location Optmiser Minimizer distance Macro

Stepping through this, the demand locations coming into this from the D input already have the distance to the nearest existing facility appended to each of them. This is the current view of the network.

Note the existing distance field on the right

The macro will run by randomly select 3 new locations from the list of potential locations. It compares these to the current network by:

  1. Finding the nearest demand point to each supply point
  2. Establishing if the “new” distance is less than the existing distance (using two fields and the minimum formula)
  3. Then calculates the weighted average distance (it uses demand as the weighting field) for the entire network as a whole.

After 3, we have a single value for all demand and supply points which represents the “score” for the iteration’s combination of locations. The score and the existing distances are retained ready for the next iteration, which selects a different random combination of potential locations, and again tries to calculate the score. The use of the minimum function serves to decide the best demand-supply pair each time, and so a good pairing (which minimises the distance) is retained until a better pairing (with a lower distance) comes along. This is repeated many times with different combinations until the lowest score is found.

This happens REALLY quickly (if you use straight line distance). The sample actually runs hundreds of iterations across multiple phases (see below). The phases just start with random combinations (phase 1) before proceeding into assessing smaller changes with the best combinations in phase 2, according to this link. I couldn’t find much more on this – perhaps someone on the Alteryx Engine side who knows can add to this?

The settings for the macro are pretty easy. You just need to specify the output that you want to use as the score for each iteration, and whether you are trying to find a higher score or lower score, which will vary depending on your use. In the sample, it optimises for a lower score, as we want to find the lowest overall distance for each demand point.

And really, that’s it for the introduction to location optimiser macros. In part 2, we’ll describe a bit more about the types of location-allocation problems, and show how these can be used in more practical settings.

Another Kind of Zoom

At the moment we are all loving connecting with family and friends through Zoom. Where would we be without video calling these days? You have to love and embrace the technological age that we have been blessed with, especially during these difficult times.

Tonight, I just wanted to write a quick blog to share a really useful tip for those who create maps and automated reports within Alteryx. In particular for those times when you are trying to zoom in around a particular area around a map or when you have catchments or areas that have smaller areas too far away to be significant but you don’t want to remove them and you need to zoom into the main area of the catchment as that is the focus of the task in hand. You know when you have this:

Well I have a useful tip for you!

The workflow

Let me walk you through…

Within the first container, I am just creating an example dataset to work with here, just a few points and a trade area tool to create a polygon with a separate polygon away from the main area. When we report on that it gives the following which you have already seen, but just for completeness:

As you can see this makes the main area of the map not the centre of the focus and directs your eye around Coventry and Rugby. When in fact you need to be focusing on that not so hidden Mickey over Northampton.

As this is one multi-polygon, we first need to start by splitting the polygon into detailed regions and then finding the area of these regions individually. Keeping its RecordID, especially if we are working with a larger dataset or automation for reporting.

So we do this with the Poly-Split tool followed by the Spatial Info tool.

Poly-Split settings

The next logical step here would be to select the larger polygon from each catchment or multi-polygon area that you have by using the sort and sample tools (ensuring that you group your data using your RecordID in the sample tool).

Sample settings

Once you have your area of interest selected for each RecordID create a bounding box using another Spatial Info tool.

Spatial Info settings

The final part to this magic is to set up your Report Map tool correctly, because you want two feeds of data, you need the data itself and then you need the bounding box. You can see in the images below that we use the bounding box data feed (renamed as Zoom to Layer) but we don’t display it or show it in the legend, we only use it to zoom in. Then the data itself we show in the legend.

Then as if by magic, you get the following:

A much better centralised image on the area of focus and directs the reader directly where they need to be compared to our previous image of:

This could easily become a macro with a drop down selection for RecordID and another for selecting the spatial object. Then you could have the option of selecting the larger item in the object set or the smaller item and then amend the sort to suit.

In summary I have guided you through how to zoom into specific areas of your multi-polygons, for reporting automation, within Alteryx so that your user can see instantly what you are trying to share with them via the report map tool.

COVID19 – Automation is key

During the COVID19 Pandemic, I was asked to provide some maps and support with the automation of the downloading of the data. This was all going great just like your normal workflow until the UK Government and PHE decided to change their website. Cue another day or two added to the job. Life’s little pleasures.

The new website: https://coronavirus.data.gov.uk/

Its simple enough, however they have been exceptionally clever and locked down their download links for the data on the front end.

*o*

During this blog I want to support other businesses who also may need this data and need to automate the feed into their databases so that it can feed internal maps.

So how did I find this access to the data?

Using the developer tools, I was able to look at all the necessary parts that create the webpage. To do this you need to hit F12, go to the Network Tab and hit Refresh on your page and it will fill up with lots of information.

On the Network tab you have the ability to filter, you can filter by anything here, in this example I used “json” and it brought back this very useful list. Lets take a closer look.

Now I dont know about you but that second entry down is looking very promising indeed. “data_202004171502.json”

To give it a try, right click and copy > link address.

Paste this into a new web browser and you get the following:

This is perfect, exactly what we need, it has the local authority regions, along with the numbers of cases and deaths and the relevant date. I already have the spatial boundaries to match this data feed, so I am happy to proceed to automate collecting this file from the website daily.

So lets break down the file URL:

https://c19pub.azureedge.net/data_202004171502.json

We have three parts here which are relevant, the URL Prefix “https://c19pub.azureedge.net/data_”

The date and time the file was uploaded “202004171502”

and finally the file type “.json”

The only part here which needs thought is the date and time element. This is due to not knowing when they are likely to upload the file in the day. If it was just the date, that part would be easy but as you can see there is a 1502 at the end which represents it was uploaded at 15:02. We can verify this by viewing the website image at the start of the blog, review the date the page was last updated.

Ah, so automate the grab of that you think and then you have the date and time, nah its not there in the standard HTML grab of the website using the download tool. Now I had an option here to move over to the Python and Selenium methods or stick with Alteryx and Base A. I know Alteryx can do this, so I stuck with it.

Generate Rows

So I know the file has today’s date when updated, or the previous dates when running it (be mindful of that). So that part is covered, dealing with the time problem was as simple as generating rows for every minute of the day from 00:00 – 24:00 or 0 to 2400. This way I would be able to “catch” which ever time the file was updated and using the Download Headers from the download tool, you can easily filter out the correct winning file.

Now because the generate rows tool doesn’t present the number into the 4 characters required for time, it is important to use a formula tool to PAD LEFT that baby out to 4 characters with 0s.

Within the formula tool, I also build up the URL back up with the prefix, add the new calculated date and time and then finally the .json. This is what is fed into the download tool.

The data is in JSON so of course, a JSON parse is the best tool here to break it down for transformation.

Alternative and faster method…

So after some valuable comments on this post I was able to dig deeper into the website and find the following: (Thank you James & John)

What is it you ask? It is an XML list of the items within an Azure Blob container, completely overlooked the first time I viewed this site, because I was specifically looking for .json and other data files. What is useful about this link is that it tells us the name of every file within that container. So we no longer have to “guess” the file time, we can find the list of files, discover which one is the latest one and then download that. I revised my workflow and it now looks like this:

Because we are not hitting 2400 web pages, this process is now 10 x faster, so worth the change.

Back to the transformation…

Then finally once I have the data how I needed it I created a few spatial layers.

The whole process runs in just over a minute (revised time, around 5 seconds!!) and can be run every day on an Alteryx server to update all your data feeds. Therefore keeping you up to date with the COVID19 stats within your business domain. With the daily shift of numbers and different areas being impacted more than others its important to stay ahead. Automation is the key.

This then leads to creating wonderful maps in ESRI to be able to feed to the business to support our decision making.

COVID19 Deaths by Country (18 April 2020)
Cases by Government Office Regions (18 April 2020)
Cases by Upper Tier Local Authorities (18 April 2020)

Non Overlapping Geography within Alteryx – How to do it like a Pro!

Part 2 – Rebuilding Geography where Trade Areas are split across Rivers

So welcome back, Part 2 of Non Overlapping Geography within Alteryx. So far we have built non overlapping trade areas and removed parts of the polygons which spanned over the rivers around the coastline within Part 1.

Which left us with the following predicament, a store trade area spanning over the river, which if we are considering the nearest population to a store, this isn’t true as the population here cannot access a store over a river. This would leave large errors around these areas if we were modelling with these population numbers, so refinement is key.

So how do we deal with this? Well the first thing we need to do is identify all those regions which have now got more than a single region post being cut by the coastline. We can do this by using the Spatial Info tool, selecting the spatial field we are interested in and checking the Number of Parts checkbox, this outputs the number of regions within each spatial object, which is great to identify those with more than 1 for example.

For all those regions which only have a single polygon then are fed off straight to the end process. For those which have more than one part to a region a number of processes need to happen. Firstly we remove small error polygons from the cookie cut by using the area of the polygon and those which are really small and insignificant we just ignore from the process. Then we need to create a flag which helps us understand if the store is within the region part or not.  As regions are broken into at least two parts around the coastline its important to understand which region part is the main region for the store and which are additional parts. This is done within the formula tool with the following ST_Within(Point, Polygon).

Next is the clever bit, we filter using the store within flag, and those regions which contain the main store area become the root of the new polygons for the stores and then those areas which are split from their original areas are fed in as the universe within the spatial match. Using Touches or intersects we can now join areas together where they are next to a main store area. What we see here is the areas on the right side of the river being joined up with a region which is on the same side.

As you can see here, the Grimsby region now has a section highlighted to be attached which was originally part of a region from over the river.

Where there is only the one combination of a part being added to a main store region, then we go ahead combine those regions and then feed them to the end, however it would be no fun if it was that easy. When using a spatial match you can get many combinations which come out that are a potential match. So how do we decide which main store region should gain the additional region parts? What if there are two, three or more additional parts?

So to find the right region to join to, we use a centroid of the region parts combined for each combination, and then calculate how far from the store it is. The store that it is closest to, this is the pairing that is successful and goes ahead to join the rest of the stores which made it to the final list.

The image above shows how the centroid for the newer larger polygon shifts when joined with a newer part. The distance between the original store and the centroid of the newer larger polygon determines which store gains the extra area. The smaller the distance the better. The two areas above are competing for the central part and the area on the left hand side is the store which wins it due to the centroid of the newer polygon being closer to the store. The polygon on the right is then reverted back to its original state without the additional part and has lost the gain of trade area.

Now we have a final store list of non-overlapping boundaries that don’t span over the river.

We have seen lots of examples of Hull, so here is a before and after image of Edinburgh, Scotland.

Combining all the feeds and checking we have just the one unique record for each store, we can then do analysis such as calculate the number of people or households within the areas for use within regression models. As this is a macro, we can add it to a process so when new stores open this is all calculated dynamically as part of a feed within Alteryx. It takes all of 50 seconds to update these figures.

Within this blog, I have shown some handy techniques in which you can reassign polygons which have been split by a river and where there is possibly a closer store which initially lost that trade, it is now able to be re-assigned and be counted where it counts. I have found that my modelling has increased in accuracy purely just through improving this process, so its worth giving it a shot.

Data Sources to support you on your journey through Part 2:

Populated Weighted Centroids

https://www.nrscotland.gov.uk/statistics-and-data/geography/our-products/census-datasets/2011-census/2011-boundaries

http://geoportal.statistics.gov.uk/datasets/output-areas-december-2011-population-weighted-centroids

https://data.gov.ie

Create a union between these and join them to the population numbers you want to use, using the OA Code. Unfortunately England, Scotland and Ireland there are three different sources and it does require combining these datasets to gain a fuller picture, however once completed you have a great dataset that you can use again and again.

There are many places you can source statistical datasets such as postcode level population stats, CACI, Experian, etc. These of course will gain you better granularity and better results when doing spatial matching like in this example.

Non Overlapping Geography within Alteryx – How to do it like a Pro!

Part 1 – Non Overlapping Geography and Inverse UK Cookie Cutting

So we have all wanted to do it, 10 mile radius around a set of locations and then removed the overlap between them so you are left with trading areas exclusive to a single location. This is great for calculating the serving population to each location, as each population point will only be served by one location.

A Hidden Mickey?
Non Overlapping Polygons within Alteryx

This is achievable easily in Alteryx with Trade Area tool using a specific value radius and checking the Eliminate Overlap box, as shown below.

The concept is great and should give you a great looking map like so…

If you need some locations to play with, see my data sources list, I have included a link to Geolytix Open Supermarkets dataset, great for playing with location analysis.

Voronoi Polygons or Non Overlapping Trade Areas

These polygons are also known as voronoi polygons, a polygon which consists of all points in the plane which are closest to that point over another point. I am always taken back to science when I create voronoi polygons as they remind me of looking at plant cells through a microscope.

But its not all plain sailing

However when you look around the coast line and rivers, islands, etc. we have a little bit of a problem.

Hull, UK – Non Overlapping Boundaries spanning over the River Humber

As you can see around the River Humber here in Hull (UK), the radius of the trading area, expands over the coastline as well as the rivers. This is not possible in the real world. Customers from over the river do not have access to this location, there is no bridge or magical transportation system (its not Disney), so how do we rectify this problem?

There are two steps we need to do in order to complete this like a pro, its easy to leave this and accept a level of error, however we can reduce that error by doing the following:

  1. Remove the sections of the non-overlapping trade areas which span into the sea and rivers
  2. Rebuild the Geography where the trade areas are split from their main store area (I will cover this in Part 2 of this blog)

The first area we need to focus on is removing the sections of the polygons which cross the rivers, we can do this within Alteryx by building a cookie cutter and clipping these areas. This concept is well known within geospatial field, its where we take a boundary and effectively cookie cut the parts we don’t want and remove it, keeping those important parts we need.

To do this in Alteryx in this context, we need to create an inverse of the UK within Alteryx to remove these parts we don’t need along the rivers. Here are the steps to do that.

Inverse UK Boundary Workflow
UK Coastline, ROI & NI and River Thames Boundaries

To achieve this you will need a coastline boundary, this can be as detailed or as generalised as you need, the more detail you have the better for cutting around the coast. It does become more complex the more detail you have so it’s a personal / job specific preference, these boundaries can be found in sources section and you can generalise them in Alteryx beforehand to remove some of the complexity if needed. For this task, I sourced two datasets, Republic of Ireland boundaries as well as mainland UK coastline. I also created my own third boundary file around the River Thames in London, as none of the datasets I found were as detailed as I needed. Good old fashioned digitising, is sometimes the quickest way to get to your goal. I completed this in ESRI ArcGIS Pro, it can be achieved in a MapInput tool too. (See the data sources section at the end for links to the files used)

Point to note: The more generalisation you carry out with your boundary files, the more work you will have to do around the complex coastline of the UK. However the more complex you leave it, the longer it takes to load and process. Its a fine balance, I’ll leave that one in your hands.

Then create a larger bounding box, within a MapInput Tool, which expands beyond the boundaries. This can then be used to create the inverse boundary of the UK. In other words, it’s like cookie cutting out the UK from the bounding box, within Alteryx we would use the Spatial Process and cut 1 from the other.

Inverse UK Boundary – The Perfect Cookie Cutter

This then leaves you with polygons that are now split in two across rivers. Now depending on how detailed your coastline boundaries are will leave you with some polygons which are split and then some further up the river which are not. In the case of Hull below I was happy with where the splits stop as there was a bridge which connected the final polygon.

Hull UK, post cookie cut leaving split regions across the River Humber

So how do we deal with the fact these polygons are now split across the rivers? We know that the population in the highlighted polygons in the image above cannot get across the river easily to be served by that store. So what do we do? We need to reassign these splits, especially those without the store within them to a polygon that has a store within it nearby.

This is why I love spatial analytics. There are concepts called split, touch and intersects as well as area calculation and calculating the number of parts to a region all in Alteryx that can help us with the above predicament.

I will continue this in Part 2 of this blog, however lets recap what we have done here. We have sourced boundary files, locations and created non overlapping boundaries using the trade area tool. Then we have created an inverse of the UK boundary to cut from the non-overlapping boundaries where they span over rivers around the coastline.

There are many ways to achieve this within Alteryx, however I chose this route because time was critical, I needed to be able to update population statistics on many scenarios very quickly.

Data Sources to support you on your journey through Part 1:

Ireland Boundaries – Administrative County Boundaries

https://data.gov.ie/dataset/census-2011-boundary-files

UK Coastline Boundary – OS Open Boundary Line

https://www.ordnancesurvey.co.uk/opendatadownload/products.html

River Thames Boundary

Plotted myself on ArcGIS Pro using Web Mercator projection or can be created in a MapInput Tool if needed.

Geolytix Open Supermarkets Dataset

https://geolytix.co.uk/#!geodata

Can I make Alteryx a Quiz Master?

In 2020 everyone is spending more time at home and having video calls with families, friends and colleagues and some are holding quizzes during these calls. It led me to think was it possible to use Alteryx to host these. The short answer is yes you can! I have to acknowledge this is not an original idea and Dr Tim Rains (DRains) first showed me this was possible in 2018. I did though take this on as a challenge, with the objective of not reaching out to him to ask how he solved this. Therefore on a Saturday morning I came up with the following to make Alteryx a Quiz Master.

It was relatively easy to build a process in Alteryx the below screenshot shows the entire workflow, a text input of words you want your computer to speak and a batch macro. In summary how this works is each row of the text input, is read into the batch macro. Per batch the computer speaks out the text within that row of data, plus also waiting a specified amount of time to give the quiz participants time to write down their answers.

Workflow Overview

The Text Input holds all the text I want my computer to speak, including a welcome message and a list of questions to be asked.

Text Input of words to be spoken

The configuration of the batch macro is really simple with no fields being used in the Group By tab and then for the Questions tab I’m using the field “Questions”, which has all the text to spoken by my computer.

Batch Macro Configuration

The macro itself has few tools a Macro Input a Filter tool which is being used  to select the correct batch every time it runs. Then the next tool is WaitASecond which I can specfity the amount of seconds before the data in the macro is passed downstream in the macro.

Batch Macro Overview

The WaitASecond tool is actually a CReW Macro released in 2016 more information about this macro can be found here.

WaitASecond Tool Overview

In the Formula tool I am creating two columns one is a VBS_script with the Question column with the text to be spoken, being inputted as a parameter per batch. The second column is a File path location to output the file later in the process.  

Formula Tool VBS Syntax Creation

Next in the process I use a Select tool to remove unwanted fields however it not a requirement. I use a Blob Convert tool, to convert the VBS_Script field created in my Formula tool into a Blob data type.

Blob Convert

The reasons for this conversation is shown in the Blob Output tool, where I output the Blob field as a VBS file. With also the option of “Modify File Name By:” with the drop down of “Replacing Entire Path with Field” using the column “Field”.

Blob Output

The result of this Blob conversation and output is a VBS file, where the highlighted text is being replaced per batch. This VBS file alone will play once it’s clicked however I want the file to be played automatically without needing to be clicked.

VBS File Syntax

This is where the Run Command tool comes in and executes a batch file, which calls on the VBS file I’m creating per batch. These makes my computer speak the text within that VBS file every batch.

Run Command Overview

The syntax within the bat file is very simple it’s just start and the name of the VBS file.

Batch File Syntax

The result of all of this is when I press run in Alteryx Designer is that Alteryx will now become a quiz master and starts to fire off questions.

Hope you find this blog helpful and seen how easy it is to turn Alteryx in a quiz master. Please leave a comment or reach out to me on Twitter @JosephSerpis or on the Alteryx Community and Linkedin.

Using Alteryx to Map Sales Territories

Have you ever needed to create sales regions? I am going to show you in this blog how easy it is to create sales regions using Alteryx.  Typically a team will do a pencil exercise and allocate stores and then hand it off to a GIS team to map the regions to ensure they make sense. In my fictitious scenario I am mapping the sales regions of Giuseppe Stores so that area managers know the territories they cover.

My dataset is a list of stores with a centroid point and the regions that have been allocated via a pencil exercise, from the list of stores I will create sales regions utilising the tools in Alteryx.

Giuseppe Stores Data Input

To turn these points into the polygons of the sales regions we wish to generate, is quite simple and can be done using a trade area tool. The key is to generate non overlapping trade areas or voronoi polygons.

Non Overlapping Trade Area/Voronoi Polygons

In the trade area tool configuration window you just need to tick Eliminate Overlap and specify the radius to a high enough value. This will generate non overlapping trade areas as seen above however as you can see they don’t quite yet look like sales regions yet.

Trade Area Tool Configuration

The key is to use the summarise tool which is part of the Transform Category in Alteryx designer but has the ability to perform a number of spatial functions. In this instance we need to use the spatial function of combine and grouping by the regions field. This will combine all the spatial objects by all the groups of the regions.

Combined Regions (Summarize Tool)

The numerous trades areas have now all be combined however as you can see they extend into the sea plus into other countries. The next step will be to clip the regions to the extent of the UK, to remove the excess areas of the sales regions that I am generating.

I simply downloaded a shapefile of the UK and brought this into my Alteryx workflow. I then appended the shapefile to my regions. Using the Spatial Process tool I then selected create an intersection object, this will create a polygon field of the areas of the UK where both polygons overlap. Therefore cutting all the areas that extend to the sea and other countries outside of the UK.

Clipped Sales Regions

As you can see now the regions are clipped to the UK extent and look sensible and can now be turned in a thematic map in order to be presented.

Due to the odd shape of the UK I had to perform some manual inputs to stop regions crossing rivers, or the sea between Northern Ireland and Scotland. I did this by using the Map Input tool and digitised some fake stores to push the boundaries of the regions, when I generated the non-overlapping areas.

Digitsing Map Input Tool

For speed when making this blog I digitised lines and named them similar to the region names then I used Poly-Split tool to break the line into a series of points and then used a union tool for my input of stores and my digitised points together before I generated the voronoi polygons. 

Poly-Split Line to Points

While I would recommend an approach of drawing lines for the manual interventions, when going through multiple iterations of generating sales regions. When building the final sales regions and especially if the output regions is being outputted into SQL which requires the geographies not to be broken, then I would digitise using points and check you don’t have any multi-split or broken polygons in the regions you are producing.

Sales Regions for Giuseppe Stores

By putting the regions into Tableau and creating a thematic map you can see that I have successfully generated sales regions from an input of points that can be presented back to the area managers to show them what territories they cover.

Hope you find this blog helpful and seen how easy mapping regions are in Alteryx. Please leave a comment or reach out to me on Twitter @JosephSerpis or on the Alteryx Community and Linkedin.

Advent of Code

Christmas, Christmas, Christmas…. Well its Christmas time, pretty lady and code is falling to the ground! (Christmas Chronicles anyone? My girls loved this scene)

Yes its that time of year when we all have get competitive and have a great time playing against each other across the world. We are already moaning here in the UK as its well against us here with the release time of the challenges, 5am. But I am not doing to bad after the first day of Advent. I can guarantee with the week I have ahead, I will not be anywhere near 10th so I am going to take this small victory whilst I can. 🙂

Advent of Code Website

he Private Leaderboard

Why should I take part in AoC?

  • Fun
  • Challenging
  • The theme is always great for playing along, this year’s theme is SPACE
  • You can complete it in any programming language you like
  • You can complete it in Alteryx
  • You can try and complete it in Alteryx using BaseA (see below)
  • You can join a leaderboard with the ACEs and beat us! It’s all down to timing… you could wake up at 5am UK time and smash it out.

What on earth is BaseA?

There is a concept amongst ACEs that I want to let you in on, called “BaseA”. This is when you complete a challenge like “Advent of Code” without using code, such as Python, R or the Run Command tool, no SDK usage or using the download tool to get answers from an API as these make doing these challenge far easier. Therefore they are BANNED. It is achieving these challenges using Alteryx core tools that allows us to get our heads around the problem in a different way. It also helps us push Alteryx as far as it can go. More info on James Dunkerley’s post.

So you are a first timer… how do you get started?

Firstly log in to the leaderboard, set up by our very own James Dunkerley: https://adventofcode.com/2019/leaderboard/private/view/453066 using the code: 453066-ca912f80 under the join a leaderboard page, looks like this:

Then you have to view it to see the wonderful stats as above.

Once you are linked in to the leaderboard and logged on via Twitter, GitHub or Reddit. You can then get started on the puzzles.

Now lets look at a puzzle, no I am not going to give you any answers! But there may be clues, so if you are not wanting any tips at all, please stop reading. Come back when you are done, good on you!

ARE YOU SURE?

Welcome to my NOT so HIDDEN Mickey

Well you have scrolled this far because you are after some tips with Part 2 of Day 1, because the first part is simple maths, which is explained in the text.

Puzzle – Part 1
Puzzle – Part 2

Part 2 – however requires some thought and for those who are not strong in macros, you may feel like you have fallen at the first hurdle. However I am going to point you to some really useful resources to help you out, all can be found on our fantastic Alteryx Community.

Treyson’s post on macros here is really useful: https://community.alteryx.com/t5/Engine-Works-Blog/Hello-Iterative-Macro-My-Old-Friend/ba-p/420308

Weekly Challenge #12 – Creating a HR Hierarchy: https://community.alteryx.com/t5/Weekly-Challenge/Challenge-12-Creating-an-HR-Hierarchy/td-p/36740

A post by PaulT that has some great info and visuals to support those who love visuals: https://community.alteryx.com/t5/Alteryx-Designer-Knowledge-Base/How-create-an-Iterative-Macro-The-Collatz-Conjecture-Alteryx/ta-p/108830

I hope these get you started and enjoying Advent of Code, its great to learn, push yourselves and achieve something you never thought possible. If you find it too easy, try again in Python or R and challenge your friends.

Enjoy this years Advent of Code, see you on the leaderboard!

My Journey to Alteryx Ace-hood

I am an Alteryx Ace!

I was so thrilled when I found out I was an Alteryx ACE, they are amazing group of people who are held in such high esteem.  I’m not sure if I will ever get use to the fact, that I am now counted among them.

My initial thought was that I thought after finding out I was an Alteryx Ace is the blog I share “Intersections and Overlap” with Samantha Hughes and Tim Rains might need to change the subtitle of “The Geospatial Trinity – Two Geeks and a Hipster” to possibly “The Geospatial Trinity – Two Alteryx Aces and a Doctor” given the successful year we all had.

Where did my Alteryx Journey begin and how I got here?

My Alteryx journey actually began by accident and coincidence, I became one of the first 4 people within Sainsbury’s to use Alteryx. However I was not initially intended to be one of those people, at the time we had Alteryx on test desktops computers with a higher version of windows and higher specs compared to our standard machines, within the organisation. The machine I eventually commandeered was meant for my line manager at the time, however they left the business when Alteryx landed at Sainsbury’s. As the machine was a higher spec machine I started use it for my GIS analysis using MapInfo at the time. My colleagues Tim and Sam both encouraged me to use Alteryx and I was instantly impressed on how fast and easy it was to use. I very quickly became a power user of Alteryx automating various aspects of my role and helping in the strategy of our team of removing SAS dependency. I have Iain Sterland to thank our Head of Department who let me use the machine and also letting me keep Alteryx.

After decommission of SAS within our department we moved onto removing MapInfo and replacing with Esri ArcGIS PRO plus also redeveloping bespoke tools in Alteryx.  During this time we were also developing other use cases and providing self-service analytics, to organisation for all things Spatial. A key aspect of my role in Sainsbury’s was to answer any question regarding Spatial, Alteryx greatly helped in answering these questions. This was great training and helped me build my expertise in Alteryx as I never knew what possibly question I could face.

After attending 2017 Inspire Europe I decided that I would take up Alteryx weekly challenges, to improve my skillset and get into the top 10 weekly challenges. I eventually succeeded and became a top challenger in February 2018.  I then went onto becoming certified passing my core certification in May 2018. This awoke a competitive side in my team with all of us trying to become the first Advanced certified. I am happy to say that I was first person to pass both certifications, but it was an intense competition between us all.  This set the trend for other members in the team and various Alteryx users in the department becoming certified.

In the summer of 2018 I convinced both my colleagues Sam and Tim that we should do a talk at Inspire as I felt we had a story to tell. The talk was well received and we got some amazing feedback and it was great presenting with two colleagues who I worked so closely with for over 5 years. To the delight of both Tim and Sam I was extremely nervous before doing the talk rather than my usual confident self.

Inspire Europe 2018 (Me, Samantha Hughes, Tim Rains)

Like all good things must end I had taken the decision to leave the team and Sainsbury’s and join Keyrus as a Consultant and had handed in my notice, a few weeks prior to the Inspire Europe conference in 2018. My nervousness before delivering the talk was due to wanting to deliver a good presentation with the team, I was leaving and wanting to impress the colleagues in my new company also sat in the audience, watching me present.

My first one to one with my line manager Jamie Laird after joining Keyrus, was about potentially aiming towards becoming an Alteryx Ace. We had an honest conversation of what activities we thought would be required in order to achieve this. One of things we spoke about was blogging and when Sam and Tim asked me in early 2019 if I would interested in blogging with them, I immediately said yes.  The idea session we had for the name of the blog was so funny I was on a train at the time and I was laughing so much at some of the suggestions. My first ever blog I posted was republished onto the Alteryx community, the blog has continued with a number of successful posts and also now features as one of the blog on Alteryx’s website.

After being an Alteryx user for 5 years I decided that I would give back to the Alteryx community and began starting to answering questions on the community page. I quickly became one of the top solutions authors and thoroughly enjoyed sharing my expertise and helping others.

This led today the 17th of October where Libby announced me as Alteryx Ace in London where I was able to celebrate this honor.

Thank You

I would like to thank so many people in supporting me to becoming an Alteryx Ace. Iain Sterland, Samantha Hughes and Tim Rains had a massive part to play in this, with the years of training and mentorship. Jamie Laird for his guidance, supporting me and being the person to actually steering me towards aiming to becoming an Alteryx Ace. Thank you to everyone who nominated me this is amazing.

I look forward to sharing my Alteryx knowledge as an Alteryx Ace. You can also find me on Twitter @JosephSerpis or on the Alteryx Community and Linkedin.