An Introduction to PostGIS (with a use case in QGIS)

This post veers off my usual takes on Tableau and focuses on a little more technical topic. Today I would like to introduce you to PostGIS — an open source, freely available spatial database extender for PostgreSQL.

Why should I care about PostGIS?

For those of you that love geographic data (like me) and SQL (like me) this brings both of these worlds together. It gives you a way to store your shapefile data without storing all those pesky shapefiles on your hard drive.

The other powerful feature is all the spatial SQL functions it comes packaged with. For example, lets say you have a table of point locations with their coordinates, but you want to find which county they fall into. You could launch QGIS (or ArcMap) and run a spatial join, but if you are doing this on a regular basis, it could get pretty annoying.

Instead, you can just write one line of SQL:

FROM points, counties
WHERE ST_Contains(points.geom, counties.geom);

(assuming your geometries are stored in the respective geom fields).

Finally, Tableau is working on supporting spatial queries and geometry types according to tweets by Kent Marten:


I won’t go into that beta (or is it alpha) version, but it’s pretty exciting.

Without further comment, lets get started.

Getting Started

There are a ton of tutorials out there on how to download and install PostGIS, so I’ll try not to repeat too much.

1.) Download and install PostgreSQL and pick the appropriate version based on your operating system. I’m on a Windows machine, so I picked version 11.1 for Windows x86-64.

2.) Click on the installer and follow the instructions. Select all the default components. If it’s not already checked at the end of the install, click on the checkbox to launch StackBuilder.

3.) After picking the server on your local machine, you’ll see a menu like this:


Click on Spatial Extensions, and click on PostGIS 2.5 bundle and click Next.

4.) Click through the instructions, and you’ll come to a menu like this:


On my installs, I typically check the box to “Create spatial database”.

5.) Name the database what you like, I just named mine “postgis”. Click through the menus to finish the installation.

Loading some data

Now we get into the fun part loading and querying some data.

1.) You now have a new program/application on your computer called pgAdmin 4. Pull that up. It should launch a new browser tab. You should see “Servers” on the left hand side with the ability to expand it. If you expand it, you should see something like this:


But we have no data.

2.) How do we get some spatial data into our PostgreSQL database? Pretty easily actually.  With your PostGIS installation, it comes with a program called “PostGIS 2.0 Shapefile and DBF Loader Exporter.”

If you start this program, you’ll see this:


Under “View Connection Details…” type in the credentials for your PostgreSQL database and the database you want to import your shapefile into. Mine is postgis.

3.) Next, we are going to add a file. Let’s add a shapefile of the United States. To keep it simple, let’s just download and import the Tiger 2018 shapefile. To download it, go to the US Census Tiger/Line – Geography site; click on download > web interface and select “States (and equivalent)”. It will download a zip file.

4.) Unzip the files. Go back to your Shapefile Importer Exporter and click “Add file” and navigate where you unzipped the file you just downloaded. Once you select it, you will see this:


Click Import. It will take a second, and if all works, it should say “Shapefile import completed.”

5.) Go back to pgAdmin. Under your database tree, right click on postgis (or what you named your spatial database) and click refresh. Navigate to > Schemas > tables and you should see your shapefile you just loaded!


If you right click on it, and select View/Edit data, you can see all the fields it imported, as well as a geometry field (which is the most important part).

Let’s visualize it in QGIS!

1.) Open up QGIS. (If you don’t have it, download it here.)

2.) There’s a couple different ways of bringing it into QGIS. The easiest is to click on Layer > Add Layer > Add PostGIS layers…

3.) Click on New Connection, type in your credentials and click Connect. Navigate to your state table and click add.


4.) It will then prompt you to define a CRS. Click OK, and your map will appear!


Yes, I know that’s not very exciting, but hey it works!

That’s it for now!







Building a Month-to-Date vs. Prior Year KPI Dashboard with Set Actions

Special thanks to Lindsey Poulter for her input and guidance with this post!

In this post I’m going take a page out of my friend Lindsey Poulter (@datavizlinds)’s book and show how to build a KPI dashboard using set actions.

In the business world, we deal a lot with KPIs and people love Big Ass Numbers. They love up/down arrows, and (very soon) they will love set actions, they just don’t know it yet.

The What?

Here’s what we are creating: (6)



At my company, one of our major dashboards was built pre-Tableau and it involves A LOT of javascript, html and java. The calculations behind it are pretty easy (all done in database), and for each major business category (lets say there’s 3), you can drill down and drill up (like an accordion) to get more detail.

Sounds pretty cool. Problem is, a consultant developed it, so making changes to it is a hassle. So how can we reproduce this in Tableau?

Set Actions to the Rescue!

Getting Started

WARNING – This dashboard is going to involve a lot of calculated fields. We will build the set first.

Building the Set Actions

This is going to look pretty familiar to Lindsey’s blog post on adding bar charts into a drilldown table with set actions.  #WorkoutWednesday 2019 Week 1 also will look familiar. We will be using the Superstore dataset.

1.) Open a new Sheet and drag Category to the rows shelf.

2.) Create a new set from Category. To do this, right click on Category in the dimension shelf and select create > set. Name the set “Category Set”.

3.) Create a new calculated field called “Subcategory Breakout.” This will tell Tableau what to do when a value when Category is clicked on (and added to the set). This will be the drill down, where clicking on any category will then show the corresponding sub-categories.

IF [Category Set] THEN [Sub-Category] END

4.) Add “Subcategory Breakout” to the rows shelf.


5.) Add a new worksheet set action. This can be done by going to worksheet > actions > add action > change set values. Run the action on select, with clearing the selection removing the set values.


6.) Add subtotals to the view by going to analysis > totals > add all subtotals

7.) Add “Subcategory Breakout” to the colors mark. Then, on the color legend, right click on “Null” and select “hide”. This will hide the “null” rows that appear due to the category not being in the set and will leave only the total rows for non-selected categories.

8.) Remove “Subcategory Breakout” from the colors mark.

9.) Go to Analysis > Totals > Column Totals to Top.

10.) Right click on Total in your table, click format. Get rid of the Total label, so it shows blank.

11.) Create a new calculated field called “Category Label” with the following.

if [Category Set] then “▼ ” + [Category] ELSE “► ” + [Category] END

Shifting to the KPIs

To create the KPIs, I like to incorporate window functions, LOD (Level of Detail) calcs and lookup functions (If there’s an easier way let me know!)

1.) Create a new calculated field called “Month / Year Calc”. Since our superstore only has transactional level of detail, we need to figure out which month the orders fell in. All I’m doing here is creating a new date for the first of the month.

DATE(STR(MONTH([Order Date])) + “/01/” + STR(Year([Order Date])))

2.) Create a new calculated field called “Max Month”. This will find the absolute max month in the entire dataset.

{max([Month / Year Calc])}

3.) Create a new calculated field called “Most Recent Month Flag”. This will be a boolean (T/F) field telling us if it’s the most recent month based on the month you have in question.

DATEDIFF(‘month’,[Month / Year Calc],[Max Month])=0

4.) Create 2 new calculated fields: one called 0.5 and one called 2, with the values {min(0.5)} and {min(2)} in the calcs, respectively. These will be our place holders for our table. I put {} around these so they don’t do a sum and we get a huge number (because it’s summing over the dataset).

5.) Create a new calculated field called “Most Recent Month Sales” with the following calculation. This is probably one of my most used calculated fields in Tableau. I picked it up a few years back when watching a video by Andy Kriebel that combines KPIs with Sparklines.

LOOKUP(WINDOW_SUM(sum(iif([Most Recent Month Flag],{FIXED [Order Date], [Product Name]: max([Sales])},null))),0)

This is a bit complex, so I’ll explain it in pieces:

a.) {FIXED [Order Date], [Product Name]: max([Sales])}

This is a LOD calc. All we are doing here is taking the max sales based on the order date and product name level of granularity. You can take max, avg, min, if you choose. Max is usually my weapon of choice.

b.) sum(iif([Most Recent Month Flag],{FIXED [Sub-Category], [Category], [Order Date], [Product Name]: max([Sales])},null))

This wraps that LOD calc in an iif (if and only if) statement, and does a sum over it. It is saying if it’s the most recent month, sum the data, else make it null.

c.) LOOKUP(WINDOW_SUM(sum(iif([Most Recent Month Flag],{FIXED [Sub-Category], [Category], [Order Date], [Product Name]: max([Sales])},null))),0)

This wraps a window sum and a lookup function around the whole thing. It is saying after taking the window sum, lookup the first value it finds (hence, the 0 at the end).

Now we are going to use a similar process to create the prior year KPIs.

6.) Create a new calculated field called “Same Month – Last Year”. This will be a boolean (T/F) field telling us if it’s the same month last year as your most recent month calculation.

DATEDIFF(‘month’,[Month / Year Calc],[Max Month])=12

7.) As like the “Most Recent Month Sales” calculated field, create a calculated field called “Most Recent Month Sales – PY” with the following calculation.

LOOKUP(WINDOW_SUM(sum(iif([Same Month – Last Year],{FIXED [Sub-Category], [Category], [Order Date], [Product Name]: max([Sales])},null))),0)

8.) Create a new calculated field called “% Sales over PY”:

([Most Recent Month Sales]-[Most Recent Month Sales – PY])/[Most Recent Month Sales – PY]

Under Default Properties > Number formatting, make a percentage, with 1 decimal precision.

9.) Now I’m going to use a trick I learned from Corey Jones during his presentation at TC18. Let’s create 2 fields we will use as KPI indicators in our table:

a.) Pos Sales over PY

if [Most Recent Month Sales]-[Most Recent Month Sales – PY] > 0 then “▲” end

b.) Neg Sales over PY

if [Most Recent Month Sales]-[Most Recent Month Sales – PY] < 0 then “▼” end

Formatting Sales

Next, lets format Sales. I really liked this look from Jacob Olsufka‘s Superstore Insights Dashboard, so implemented it here.

a.) Sales – <K

IF [Most Recent Month Sales] < 1000 THEN [Most Recent Month Sales] END

Right click on “Sales – <K”. Under Default Properties > Number formatting, click on Currency (Custom), and edit to 0 decimal places.

b.) Sales – K

IF [Most Recent Month Sales] >= 1000 AND [Most Recent Month Sales] < 1000000 THEN [Most Recent Month Sales] END

Right click on “Sales – K”. Under Default Properties > Number formatting, click on Currency (Custom), edit to 1 decimal place, and change Display Units to “Thousands (K)”.

c.) Sales – M

IF [Most Recent Month Sales] >= 1000000 AND [Most Recent Month Sales] < 1000000000 THEN [Most Recent Month Sales] END

Right click on “Sales – M”. Under Default Properties > Number formatting, click on Currency (Custom), edit to 1 decimal place, and change Display Units to “Millions (M)”.

Back to our table (for now)

1.) Drag your 0.5 field and your 2 field to the columns shelf.

2.) On your sum(0.5) Mark, change the mark type to a circle, and drag Sales – <K, Sales – K and Sales – M and drag to the label mark.

Under color, change the color to white, and under size, make it the smallest as possible.

Click on the label, and make sure all 3 fields are on the same line.

Finally, right click on each field under marks. Click on Compute Using > Cell.

3.) On your sum(2) Mark, change the mark type to circle, and drag % Sales over PY, Neg Sales over PY, and Pos Sales over PY under label.

Under color, change the color to white, and under size, make it the smallest as possible.

Click on the label, and make sure all 3 fields are on the same line. Change the color of the Neg Sales over PY to Red and Pos Sales over PY to Green.


4.) Up on the columns shelf, right click on SUM(2) and click on Dual Axis and then Synchronize the Axes. It may add a “Measure Names” as a color. Remove it if it comes up.

5.) Click on the 2, click on Edit Axes and fix the axes from 0 to 5.


6.) You can pick the sort order for Category and . What I did was for each field in your column shelf, right click on sort, and sort by sum of sales descending.

7.) Click on Category in your column shelf, and click on Show Header to hide the header.

8.) On your table, right click on one of the row labels and click “Hide field labels for rows”

9.) Use the steps above to add whatever metrics you like to your table!

Wrapping it up

Create a new dashboard and put your table onto your dashboard. Make sure you add a dashboard action (Dashboard > Actions > Change Set Values) for your set action like we did above for the workbook.


The rest of the dashboard I created is just formatting, filters and using BANs!

This dashboard was a deep dive into both KPIs and set actions and there’s a lot to digest here, but if you lasted this long, here is a link to the completed dashboard.




Where Should I Put My Filters? – Part 1

Nicole Lohr (@nicole_lohr9) had an interesting poll on Twitter earlier this month:


I was torn on this one. I typically put them at the top, but depending on how many filters/parameters there are, I might put them on the left or the right depending on how I’m feeling that day. After contemplating it, I really don’t know what to do with filters. They are a necessary evil, though in the business world. I feel like this guy:


In this post, I’m going to go through three of the simplest options for filters on a dashboard.

1.) Filters at the top

The first option is to put the filters at the top your dashboard. If you don’t have many of the them (5-6 or less) — and they don’t need much explanation — then this should work. Here’s an example using a dashboard from a previous post.


2.) Filters at the Right

If filters at the top doesn’t work, I prefer filters at the right over the left, because I like to put my most important metric in the top left since that will get the most eye contact. I would rather not have filters be the first thing people see, but it all depends on the flow of the dashboard. Here’s an example of filters on the right:


3.) Filters on the Left

If you absolutely have to, you can put the filters on the left. Of the three options I presented, I find this one the least desirable, but if you have no other space for filters, this will have to do.



That’s it, for now. Here’s a link to the workbook if you want to download and play with it.

In Part 2, I’ll explain some advanced techniques: collapsible menus using containers.

All for now.

Happy New Year!


Three main techniques for dashboard layouts

Before I got into statistics and data visualization, I was a journalist and copy editor.

Throughout J-School, every text book and professor teaches you about the inverted pyramid when you are writing a story. For those not familiar, it is the concept of putting the most newsworthy info at the top of your story, followed by supporting details, then the “nice to know” info at the end.

As a copy editor, my job was to layout and design the newspaper. We had to know design concepts. For example, we had to worry about descending font sizing in headlines, spacing, where photos/graphics were placed on the page. As an introduction, I read (read is a loose term) this book:


Tim Harrower shows examples of great front pages, and gives copy editors examples to use in their layouts.

Fast forward a few years. I haven’t worked for a newspaper since before Twitter was a thing, and a few years ago, I found this software called Tableau.

Game changer.

For the first couple years, I got by in Tableau without any design standards, because Tableau provides a lot of tools out of the box for you not to screw things up too bad. With my background in journalism, I had an intuitive design sense. I knew what looked right, where to place maps and certain charts to get the right impact and tell the right story.

Being a huge Tableau evangelists at my company, I started helping other people to build their dashboards. I realized that they don’t have the same design abilities as I do, and most get into Tableau because they are really, really good with data. So with this post I wanted to go over some basic design and layout techniques for dashboards.

This post is intended to be a primer on these layouts. There are a lot of other people who know a hell of a lot more than me about this topic, and I will try to point those folks out where necessary.

For this I took Jacob Olsufka’s Superstore Sales Insights Dashboard and reworked it to use each of these layouts.

1.) The Inverted Pyramid Layout

This layout displays the most significant information at the top of the dashboard, trends in the middle, and details in the bottom.

This acts like a article in a newspaper. If you don’t scroll to the bottom, you still get the most important information at the top of the dashboard.


(Image from Sisense)

Here’s how it would workout with Superstore data. I love BANs (Big Ass Numbers) so you’ll see them displayed prominently.


You see the main metrics at the top, some trends in the middle, and then some granular information at the bottom to fill in some of the gaps.

2.) Gutenberg Diagram or Z-pattern

The Gutenberg Diagram is often attributed to Edmund Arnold in the 1950s. His work was around newspapers and eye movement. His theory was that people typically start reading at the top left, so you should put the most important information there; followed by top right, bottom left and bottom right. See the following diagram:


We can design a dashboard in the same way.


In full disclosure, my initial dashboards were like this because of the containers built into Tableau (most without the BANs). This is a great layout. Just add filters and you should be set!

3.) F-pattern

The F-pattern is an adaptation of the Z-pattern with the advent of eye tracking and web site design. In 2006, the Nielsen Norman Group examined how 232 users viewed web pages. The findings concluded that the main reading behavior was in an F-Pattern.


Hey, if it works for text, let’s use it for dashboard design. Here’s the same Superstore dashboard using a F-Pattern layout:


(I could have did a little more justice to this one, but you get the idea.)


In full disclosure, Tableau has completed a lot of research around eye tracking with Andy Cotgreave and Amy Alberts. They found that BANs matter, particularly at the top left of the dashboard. They also found that form is a part of function, meaning the eyes of the user will follow the form (or layout) of your dashboard. (Andy please correct me if I misspoke!)

My main purpose was to give some basic layout templates, and potentially some new ideas for your next dashboard!

Here’s a link to the workbook if you want to use it for your next project!


Four different ways to visualize spatial data in Tableau

I had a break of action, so it’s time to get caught up on a few blog posts!

The one I would like to do today is reproducing Sarah Battersby’s presentation from #TC18. Sarah did a great job of explaining it during the conference, but for those not familiar with heat maps and QGIS this will give some steps and background to reproduce it.

We are going to be working with some Oil and Gas locations in Pennsylvania. You can download it here and play along at home.

Today, we are going to be looking at 4 different ways of displaying spatial point data in Tableau.

1.) Just the points (ma’m).

The first and easiest way of displaying spatial points data is just displaying the points. To do this, just open Tableau, and click on spatial file. Look for the file called OilGasLocations2018_12.shp.

Once you have a sheet open, double click on Geometry, and the points will appear. Add Site Id to the detail (just to be safe), and you should have what the screenshot shows below.

points_screenshot If I have a new spatial dataset, this is usually my go-to. I want to see how it looks on a  map. It’s visualizing a distribution, a spatial equivalent of looking at a histogram. While this shows us something, its not really telling a story. We have a bunch of Oil/Gas wells on top of each other, and with this visual, you can’t really see the density of the wells.

2.) Custom geographies.

So lets bring in other dataset. For example, what is the density by county? I just pulled down the county shapefile from the same site. Back in your dataset, bring in your county shapefile, and when you join, click Intersects, like the screenshot below.


This will bring in the county that each Oil/Gas Well falls in. To be perfectly honest, this is slow for large datasets. If it was a bigger dataset, I would probably load it into a database (probably Postgresql), and do the intersect there instead of within Tableau.

To visualize this, open up a new sheet, then find Geoid10 in dimensions. In this dataset, this is the FIPS code (Federal Information Processing Standards), which is a unique identifier for each county. Right click on it, click Geographic Role, and choose county. Then double click on it to view it on the map. To add some density to it, drag Number of Records to color. You should end up with something like this:


You can see there are more than 24K Oil/Gas locations in McKean County! In most cases I would clean this up, add a different color scale and most likely throw it on a dashboard.

But there are other ways!

3.) Density map.

This is one of the most anticipated features I have been waiting for in Tableau. Kent Marten showed it off at the Devs on Stage at #Data17, and I’ve been patiently (or impatiently!) waiting for it to come to production.

Anyway, to display those points as a density map, use the following steps:

1.) Open a new sheet.

2.) Double click on Geometry like you did to display the points. Add Site Id to a detail mark. Instead of Automatic, click on Density. You should end up with something like this:


There’s not really a standard size or color scheme to go by. Tableau gives you a lot of flexibility and you can change it depending on your liking. In color, I have it set to 70% Intensity, and I chose the Density Multi-color for color.

This shows the density of the sites. You see how McKee County has the most locations, but now you can see the finer level of detail that a county map has a hard time showing

4.) Hexbin map.

This will bin our data into hexagonal bins, and will be a mix between our density map and our county map. It will show a similar viz as the density map, but the granularity to find which locations fall into which bin.

This requires bringing in a second program: QGIS. QGIS, previously known as Quantum GIS, is an open-source desktop GIS software that’s free and fairly flexible. It is a great alternative to ESRI’s ArcMap, but that’s probably worth another blog post comparing the two.

1.) Download QGIS here.

2.) Click on Plugins > Manage and Install Plugins, and search for MMQGIS and install the plugin MMQGIS.

3.) Load your point shapefile into QGIS.

Go to Layer > Add Layer > Add Vector Layer.

4.) To create the hexagon grid, go to MMQGIS –> Create –> Create Grid Layer.

5.) In the Grid GUI that pops up, select the “Shape Type” from the drop down as “Hexagons”. This will also create another shapefile on your computer, which you can join to (I called mine grid.shp).

6.) For convenience, I added an ID field my grid.shp. To do this, under layers in QGIS, click on Open Attribute Table.  Click on the abacus (Open Field Calculator), name it ID and double click on row_number. Then click OK.

7.) Export it to a shapefile. Under layers, right click on grid.shp, export Feature As… and chose ESRI Shapefile.

8.) Once you have exported this to a shapefile, Bring this grid into your datasource, and use the same intersect process as we did for the custom geographies.

9.) Double click on your new Geometry field on your hexgrid shapefile, and add your ID field to a detail mark. Next add a Sum of number of records, and with a little formatting, you can end up with something like this:


In Conclusion

There are many ways to show spatial point data on a map, it comes down to the story you are trying to tell. I hope this post gives you some background and direction the next time you have some spatial point data!

The workbook is up on Tableau Public if you want to see the final product.





Time Series Decomposition Explained (and an application in Tableau)

Time Series Decomposition is a technique I use regularly, namely because I work with fairly seasonal sales data for my day job.

While this has become second nature because I use it often, it could use some explanation to those who are not familiar. Just a warning, this will get technical with R and an application in Tableau.

What is Time Series Decomposition?

In non-math terms, Time Series Decomposition is a technique of breaking time series data into parts so you can analyze its parts instead of the whole. Typically, algorithms break them into their trend, seasonal and remainder (part that can’t be explained by trend or seasonality) parts.

There are two different types of time Series decomposition, additive and multiplicative.

  • The additive model is useful when the seasonal variation is relatively constant over time.
  • The multiplicative model is useful when the seasonal variation increases over time.

There are a few ways to estimate the seasonal variation, I typically use moving averages, and with most of the data I use it comes out additive.

So why should I care?

Time Series decomposition can help explain a lot of business problems. For example, lets say your boss comes to you and asks “We just ran a promotion, what’s the impact on sales?” Or “How are we trending in X product?” Instead of supplying a line chart that looks like a cardiogram, what if you could show him/her the true trend?

That’s how I got into time series decomposition. Sales guys LOVE year over year comparisons. I loathe them. Seasonality has a HUGE impact on these numbers. What if last year you ran a huge promotion? What if it is 10 degrees warmer this month compared to same month last year? Time Series decomposition takes all these into account to give a better look at the true trends, the true seasonality and what is just random noise.

Lets look at some data.

Here’s a dataset of monthly beer sales in millions of barrels, 01/1975 – 12/1990. This comes from the TSA library in R. You can download it here: beersales. Here’s what it looks like plotted:beer_capture

Just by looking at it, you can see the seasonality fairly easily. The variation looks relatively constant and consistent, so we can guess it’s an additive model. (You can test it if you want.)

Cool, we have some time series data.

Now lets decompose it.

If you are using Tableau, you are going to be connecting to R. For those not familiar with R, I know that can sounds scary, but it’s really easy.

1.) Download R. Most people will tell you to go to CRAN, and download it from there, but my personal preference is to download it from Microsoft R Application Network. I have a couple reasons why I prefer this: a.) It includes the Math Kernel Libraries (MKL), which supports multi-threading. b.) It’s libraries are stored in MRAN (a mirror of CRAN), which promotes reproducibility.

2.) Once you have R installed, install the RServe library.


3.) While still in R, run the following command to launch Rserve


It will then output something like the following:beer_Capture1

4.) Go back into Tableau. Click on Help > Settings and Performance > Manage External Service Connection


5.) I’m using Tableau 2018.3, and this is a screenshot of what you will see:beer_Capture3

Under server, type localhost, and click on Test Connection to make sure you configured it correctly. If not, go back and follow the steps again.

Finally, we’re going to decompose some data

1.) Once we have those all setup, we can finally get into the meat of the analysis.

We’re going to be using the stl function from R which stands for A Seasonal-Trend Decomposition Procedure Based on Loess. I tend to lean on this one a lot for weekly, monthly and yearly data. stl is in the base stats package in R, so we won’t have to install any libraries to get it to work.

2.) Set up a new calculated field called Trend with the following calculation:

SCRIPT_REAL(“tmp_table <- ts(.arg1, frequency=12);
stl_data <- stl(tmp_table, s.window=’periodic’, robust=TRUE);

I’ll walk through this one.

tmp_table <- ts(.arg1, frequency=12)

This just means I’m creating a temporary time series object in R, with frequency of 12. This just means we have 12 months in every year, since our data is monthly. If you had weekly data, it would be 52.

stl_data <- stl(tmp_table, s.window=’periodic’, robust=TRUE);

Now we are using the stl function on that time series object. To read more, the documentation is here. For the s.window=’periodic’ part, this is my default. Robust is TRUE essentially makes the trend and seasonality more robust, so more of the unexplained/error appears in the  remainder.

Now that we made our trend, now we can make our seasonality and remainder (or random) using the same methods.


SCRIPT_REAL(“tmp_table <- ts(.arg1, frequency=12);
stl_data <- stl(tmp_table, s.window=’periodic’, robust=TRUE);


SCRIPT_REAL(“tmp_table <- ts(.arg1, frequency=12);
stl_data <- stl(tmp_table, s.window=’periodic’, robust=TRUE);

Throwing these all on the rows, and doing a little bit of formatting, we see the following:


It begs a lot of questions. Like What happened in December 1975? Or May 1985? Or January 1982?

As you can see, time series decomposition can open up a lot more avenues for analysis and interesting data points.

I hope you found this useful and potentially a new way of looking at your time series data!



Proportional Brushing with Tableau

I have a confession to make. I’m in love with another software package other than Tableau. It’s called JMP. I’ve used JMP for about the last 7 years or so, and I fell in love pretty quickly with its Graph Builder and Distribution capabilities.

One of their core principles is that every stat has a graph and every graph has a stat. Building complex models, such as neural nets or forest models, is also quick and easy. As of JMP 14, they also added a way to export your models to SQL, which is pretty awesome. Anyway, I digress.

In JMP, the main example data source is called Big Class. In the Tableau world, Big Class is the Superstore dataset. It’s pretty much what everyone starts out with. So instead of using Superstore, I’m going to bring JMP and Big Class into Tableau. Here’s an export of Big Class for you to look at. Big Class

Like I said, in JMP you can do distributions pretty quickly, and even export them out to interactive HTML, (pretty cool stuff.)

So how do you do this in Tableau? It was pretty much impossible. Now with set actions, it is very possible. Here’s a walkthrough of how to create a similar viz.

1.) Create 4 sheets, one for each. For age, create a bar chart, and for age, height, and weight, create a histogram from the Show Me Pane. For each of the histograms, they should create bins as dimensions.

2.) Drag them all onto a dashboard, and distribute evenly.

You should end up with something like this.


3.) Go back to your Gender sheet (or any other sheet). Create a set on gender (sex). Then add it to the color mark on each of your sheets.

4.) On your Dashboard, add a set action under Dashboard, Actions, change set values. Click on Select under (“Run action on”), choose Gender Set, and click on Add all values to set. (Screenshot below).


5.) Now going back to your dashboard you see the following behavior. (3)

That’s it!

Here’s the link to workbook on Tableau Public




Creating a Drill-Down Map Using Set Actions

Hey guys, welcome to the first blog post! I’ve been looking forward to Tableau 2018.3 since the beta testing, knowing that density marks were going to be included. However, Set Actions are a game changer.

In this post, I’m going to go through how to create a drill-down map, from state to counties using set actions.

First, I have a dataset I’ve compiled of State, County and Zip Code relationships. You can download it here. state_fips_zip

Like Superstore, lets create a few sets and dynamic names:

1.) Create a set of States. Call it State Set

2.) Create a set of Counties (FIPS). Call is County Set

3.) Create a Calculated Field Called Dynamic County:

IF [State Set] then [FIPS] ELSE [State] END

4.) Create a Calculated Field Called Dynamic Zip:

  • If both sets are true, return the zip code otherwise return NULL.
  • If the result is NULL return the first calculated field, otherwise return the result from step one.

IFNULL(if [State Set] and [County Set] THEN [Zip Code] END,[Dynamic County])

5.) Now create a worksheet with State, Dynamic County and Dynamic Zip in the details.


6.) Create a new Dashboard and create 2 set actions:

7.) By clicking on a state, you see that it works…kind of. You need to go back to your sheet, and change some geographic roles.


8.) Now, going back to your dashboard, you can drill-down from state to county with ease! (1)

Here’s a link to the workbook on Tableau Public. There are still issues with Tableau public and set actions working, so you’ll have to download it to get the full effect.