A Sankey diagram, also known as a Sankey chart, visualizes the movement or flow of resources, energy, or data between different points, showing how quantities flow from one set of entities to another.
The width of the links in a Sankey diagram is proportional to the magnitude of the flow. Thicker links represent bigger flows, and thinner links show smaller flows.
The example Sankey diagram below shows how a particular individual’s monthly income is distributed across various spending categories.
Click here to download the example file
Main Components of a Sankey Diagram
The main components of a Sankey chart are:
- Nodes/Pillars – Entities or categories between the flows represented by pillars or blocks. They represent the sources, intermediaries, or destinations. The nodes in the example Sankey chart above are Wages, Rent, Royalties, Utilities, etc.
- Links/Lines – Represent the flow between the nodes, drawn as bands or lines. The thickness of the links is proportional to the size or quantity of the flow, allowing for a quick visual comparison of different flow magnitudes.
- Flow Direction – The direction of the links indicates the flow path from one node to another. Flows typically move left to right, but they can also move vertically or in other directions.
- Flow Values – The values (quantities) represented by the thickness of the links may be shown as labels to provide more detailed information to help users understand the exact volume of resources or energy being transferred between nodes.
- Colors – Different colors distinguish between various categories of flow, nodes, or pathways.
- Legends and Labels – They explain the meaning of colors, line thicknesses, or categories.
Excel doesn’t offer a built-in option for generating Sankey charts in its charts library.
While you can use third-party add-ins to generate Sankey diagrams in Excel, I will show you how to create a Sankey chart without third-party add-ins.
You need to prepare the data for the Sankey diagram before you can construct the diagram.
Click here to download the example file
Prepare Data for a Sankey Diagram
Suppose you have the data range below showing a particular individual’s income distribution. You want to use a Sankey diagram to visualize the data.
You can use the steps below to prepare the data for the diagram.
Step #1: Convert the Data Range to an Excel Table
Convert the data range above to an Excel table using the below steps.
- Select any cell in the data range and press CTRL + T.
- On the Create Table dialog box, verify that Excel has correctly guessed the address of the data range and click OK.
The above step converts the data range to an Excel table, as shown below.
Step #2: Name the Table
Do the following to name the table you have created in Step #1:
- Select any cell in the table.
- Open the Table Design contextual tab, click the Table Name box in the Properties group, and replace the default table name with the desired name.
In this example, I have named the table ‘SourceData.’
Step #3: Define a Value for Spacing Categories in the Sankey Chart
Do the following to define a value for spacing the categories in the Sankey chart:
- Enter an appropriate value in a cell below the table. In this case, I have entered 200 in cell B7.
- Select the cell containing the value, open the Formulas tab, and click Define Name on the Defined Names group.
The above step opens the New Name dialog box.
- On the New Name dialog box, enter the desired name on the Name box and click OK.
Step #4: Create the Sankey Lines/Links Table
Create a Sankey lines/links table with all possible combinations of rows and columns and after each category, enter the value in the ‘Spacing’ named range as shown below.
Note: Name the table ‘SankeyLines’ in the Name box as explained in Step #2 above.
After creating the Sankey lines table, enter the formula below in cell C2 to fill in the relevant values from the ‘SourceData’ table:
=IF(LEFT([@Source],7)="Spacing",Spacing,INDEX(SourceData,MATCH([@Source],SourceData[From/To],0),MATCH([@Target],SourceData[#Headers],0)))
Notice that when you enter the formula in the table it is automatically copied to all the other cells in the column.
The formula checks if the value in the ‘Source’ column starts with ‘Spacing.’ If TRUE the formula returns the value in the named ‘Spacing’ range you defined in Step #3 above.
Otherwise, the formula uses the INDEX function to look up a value in the ‘SourceData’ table using a row and column determined by the ‘Source’ and ‘Target’ values and returns the corresponding value.
Insert an End Position Column
Add an End Position column to the above table to contain values indicating the order of the links at the destination or target nodes of the Sankey diagram.
Enter 1 for the first ‘Rent,’ 2 for the second ‘Rent,’ 3 for the third ‘Rent,’ 4 for the fourth ‘Rent,’ and 5 for the first ‘Spacing,’ and so on. After finishing entering the end position of one ‘Target,’ the next position goes to the ‘Spacing’ with no position at the top.
The results should look as shown below.
Calculate the Width of the Sankey Lines and the Space Above and Below Them
Add 12 columns to the table above to calculate the Sankey lines’ width and the space above and below them. You will use the results to create a 100% stacked area chart.
The resultant table should look as shown below.
The table below shows the formulas used in the 12 additional columns.
Column | Formula |
---|---|
Above Start | =SUM(SankeyLines[[#Headers],[Value]]:[@Value])-[@Value] |
Above Mid 1 | =[@[Above Start]] |
Above Mid 2 | =[@[Above End]] |
Above End | =SUM([Value])-SUMIFS([Value],[End Position],”>=”&[@[End Position]]) |
Value Start | =[@Value] |
Value Mid 1 | =[@Value] |
Value Mid 2 | =[@Value] |
Value End | =[@Value] |
Below Start | =SUM([Value])-[@[Above Start]]-[@[Value Start]] |
Below Mid 1 | =SUM([Value])-[@[Above Mid 1]]-[@[Value Mid 1]] |
Below Mid 2 | =SUM([Value])-[@[Above Mid 2]]-[@[Value Mid 2]] |
Below End | =SUM([Value])-[@[Above End]]-[@[Value End]] |
Step #5: Create Data Tables for Sankey Source and Target Pillars/Nodes
Data Table for Source Pillars
To create the data table for the source pillars, enter the sum of values of all source categories in the ‘Source’ column of the ‘SankeyLines’ table, inserting the ‘Spacing’ value between each category. Name the table ‘SourcePillars’ in the Name box.
Use the formula below to enter the sum of values of all the source categories and insert the ‘Spacing’ value between the categories:
=IF(ISNUMBER(SEARCH(“Spacing*”,[@Source])),Spacing,SUMIF(SankeyLines[Source],[@Source],SankeyLines[Value]))
The formula checks if ‘Spacing’ is found in the ‘Source’ column of the ‘SourcePillars’ table.
The formula returns the value of the ‘Spacing’ named range if found. Otherwise, the formula sums the values from the ‘SankeyLines’ table that match the current [@Source] value.
Data Table for Target Pillars
To create the data table for the target or destination pillars, enter the sum of values of all target categories in the ‘Target’ column of the ‘SankeyLines’ table, inserting the ‘Spacing’ value between each category. Name the table ‘TargetPillars’ in the Name box.
Use the formula below to enter the sum of values of all target categories in the ‘Target’ column of the ‘SankeyLines’ table, and insert the ‘Spacing’ value between the categories:
=IF(ISNUMBER(SEARCH(“Spacing*”,[@Target])),Spacing,SUMIF(SankeyLines[Target],[@Target],SankeyLines[Value]))
The formula checks if ‘Spacing’ is found in the ‘Target’ column of the ‘TargetPillars’ table. The formula returns the value of the ‘Spacing’ named range if found.
Otherwise, the formula sums the values from the ‘SankeyLines’ table that match the current [@Target] value.
Step #6: Create a Named Range for Spacing the X-axis
You will require spacing values for the x-axis of the chart.
Create a named range ‘Space’ and enter the values 0, 10, 90, and 100 in the cells B1, C1, D1, and E1 respectively.
Now you have all the data you need to construct the Sankey diagram.
Also read: How to Make Box Plot (Box and Whisker Chart) in Excel?
Construct the Sankey Diagram
You can use the steps below to create the Sankey chart.
Step #1: Draw the Individual Sankey Lines
- Click the Insert tab, open the ‘Insert Line or Area Chart’ drop-down menu, and select the ‘100% Stacked Area’ chart option in the 2-D Area section.
The above step inserts an empty chart into the worksheet.
- Right-click the empty chart and select ‘Select Data’ on the shortcut menu.
The above step opens the ‘Select Data Source’ dialog box.
- Click ‘Add’ on the ‘Legend Entries (Series)’ box on the ‘Select Data Source’ dialog box.
The above step opens the ‘Edit Series’ dialog box.
- Use the Edit Series dialog box to add 3 data series representing the space above the Sankey line, the width of the Sankey line, and the space below the Sankey line from the ‘SankeyLines’ table.
You add the series from the ‘SankeyLines’ table.
- Click the Edit button on the ‘Horizontal (Category) Axis Labels’ box on the Select Data Source dialog box to open the Axis Labels dialog box and select the values in the ‘Space’ named range.
- Click OK on the Select Data Source dialog box.
The above steps insert the first chart into the worksheet, as shown below.
Format the Chart to Form a Sankey Line
Do the following to the above chart to form a Sankey line:
- Double-click the x-axis to open the Format Axis task pane on the right of the Excel window. Select the ‘Values in reverse order’ option to reverse the order of the values on the y-axis.
- Change the color and transparency of each series as follows on the Format Axis task pane:
- Series 1 – Select no fill. Note that Series 1 is invisible on this chart because it has 0 values.
- Series 2 – Select a desired color and 50% transparency.
- Series 3 – Select no fill.
Uncheck all the options on the Chart Elements menu.
The chart should appear as shown below.
Double-click the Chart Area to open the Format Chart Area task pane on the right of the Excel window. Select the options ‘No fill’ and ‘No line.’
The first Sankey line appears as shown below.
- Repeat the above six steps for all the rows in the ‘SankeyLines’ table except the spacing rows.
After creating the Sankey lines for a category, arrange them in a neat, consecutive order, one directly following the other. For example, the Sankey lines for the ‘Wages’ category should appear as depicted below.
After creating and grouping the Sankey lines for each category, you can combine them, as shown below.
Use text boxes to add source and destination pillars to the Sankey chart. Populate the text boxes with the relevant data from the ‘SourcePillars’ and ‘TargetPillars’ tables. Fill each text box with a different color for distinction.
The final chart should look like the one below.
I have shown you how to create a Sankey diagram in Excel. I hope you found the tutorial helpful.
Other articles you may also like: