Log in | Sign up

Resolver One: The Python-powered spreadsheet
Download or buy ($99)


Tutorial

About This Tutorial

This tutorial starts out by describing some simple, traditional aspects of Resolver One, and works up towards the more advanced and interesting features, highlighting ways of working that produce spreadsheets that are easy to maintain and modify, while avoiding the types of errors that are common in other spreadsheets.

It will take about 1 hour to work through the whole tutorial.

Installation

If you haven't already got a copy of Resolver One, you can get it from our download page.

Import Data

To begin the tutorial, we will begin by importing some data from an Excel document, which you can download from here: cambridge-1961-2008.xls.

Try This Now

Select from Resolver One's pull-down menus: File / Import... / From Excel (No formulae), and navigate to the .xls file.

Note

Note: Most Excel documents should import seamlessly into Resolver One. If you have one that doesn't, or if you find any other things that you wish you could do but can't, then we'd love to hear about it, so we can keep on making Resolver One better.

Once imported, you should be able to see a sheet containing historical weather data, recorded in Cambridge in the UK. Included for each month are the mean minimum and maximum temperatures, the number of frost days, the rainfall in millimetres and the sunshine in hours.

images/s01-import.png

Note

This and other meteorological data has been made publicly available by the UK Met Office.

Simple formulae and Drag fill

Traditional spreadsheet-like operations in Resolver work as you would expect them to. For example, to find the range between each month's minimum and maximum temperatures, we can add a formula in a new column to subtract Tmin from Tmax.

Try This Now

Add a new column heading by clicking in cell H1, and typing the new heading, TRange.

Now, to calculate the first month's temperature range, in cell H2 enter the formula =C2 - D2

images/s02-simple-formula.png

Traditionally, this formula would be copied downwards to populate the rest of the cells in column H, for instance using drag fill.

Try This Now

Click on cell H2 again to select it, and move the mouse near the bottom-right corner of the cell. The mouse cursor becomes a cross-hair, and a small black rectangle appears on the cell's bottom-right corner.

images/s03-dragfill1.png

Drag this rectangle downwards to indicate the cells that the formula should be copied into.

images/s03-dragfill2.png

You don't have to cover the entire column, just a few cells will do for now.

images/s03-dragfill3.png

See how the original formula is modified in each row, so that while the original references cells C2 and D2, the copy in H3 references C3 and D3.

While drag fill is supported by Resolver One, techniques like this have some limitations. In particular, it can be awkward to use when there are large quantities of data, and it is prone to failure when new rows are added, since the new rows might not be included in the previously filled area. One approach which does not suffer from these problems is to use column formulae instead.

Column Formulae

Column formulae allow you to enter a single formula which applies to every cell in a column. Where you would ordinarily reference cells using a column-letter and a row number, in a column formula you would commonly replace the row number with an underscore character, for example, in the column formula:

=C_ - D_

the underscores indicate that each cell in the column should reference values from the corresponding row of the C and D columns.

Try This Now

Clear all the populated cells in column H, by selecting them and pressing Delete.

Now, select the whole H column by clicking on the 3D-shaded H cell, and enter the column formula in the formula bar, above the grid: =C_ - D_

images/s04-column-formula1.png

Once the formula is entered, you may see a red cross in row 1. This is an error caused by trying to evaluate the column formula in row 1 (which is attempting to subtract the word 'Tmin' from 'Tmax', an understandably erroneous operation.)

images/s04-column-formula2.png

This is nothing to worry about. You can override this error by entering a header for the column in cell H1 - enter the word TRange.

images/s04-column-formula3.png

Note the asterisk at the top of column H, to remind you that this column is being populated by a column formula.

Entering a column formula like this populates all cells in the column, and isn't susceptible to erroneously missing out any rows that are added to the sheet in future, as drag-filling or copying might be.

Note

Similarly, row formulae can be used to populate all the cells in a row.

Familiar and Compatible, with Benefits

Resolver One supports many of the functions and idioms that are familiar from other spreadsheets.

Try This Now

Add the following formaula to cell I2, to calculate the average rainfall over all recorded months:

=average(F1:F999)

The average function is one of many provided in Resolver One to ensure formula brought forward from legacy spreadsheets continue to work as you expect.

F1:F999 represents a cell range, a rectangular area of a spreadsheet. In this instance, this formula will produce an error if the spreadsheet grows beyond 999 rows. Increasing the number beyond what is required also introduces a performance penalty. Fortunately, there is a better way:

Try This Now

Replace the function in I2 with:

=average(F_)

This function produces the average value of all the numbers in column F, no matter how many rows are added to the spreadsheet.

Try This Now

Clear the cell I2

Using a Header Row

By designating one row as the header row, we can replace the cryptic C and D column labels in our formulae with the text of the column headers.

This uses a syntax that surrounds the column header with hash (#) symbols.

Try This Now

Right click in row 1, and select Set Header Row (1).

Now the =C_ - D_ formula in column H can be replaced with:

=#Tmax#_ - #Tmin#_

Although this formula is longer than the previous version, it is more self-explanatory, making your spreadsheet more comprehensible and self-documenting. The sections below cover ways to tackle formulae that threaten to become too lengthy or complex.

Note

As well as header rows, you can use header columns.

User-defined functions

Imagine a more complex formula, for example a heuristic which uses weather statistics to approximate the rate of evaporation from lakes each month. This could be added to column I as a column formula, such as this evaporation formula:

=(#Sun#_ * #Tmax#_ + (720 - #Sun#_)* #Tmin#_) / 789

However, to give correct results, the temperatures used must be clipped at freezing point (which is zero degrees Centigrade):

=(#Sun#_ * max(0, #Tmax#_) + (720 - #Sun#_) * max(0, #Tmin#_)) / 789

Formula such as this are easy enough to create, but as they get larger and more complex, they soon become difficult to understand and to modify further. Lengthy formulae like this are better expressed as a Python function.

Note

Python is a powerful programming language, which is embedded in Resolver One as a scripting language, and is ideal for implementing functionality such as this.

More specifically, Resolver One embeds IronPython, which is a reimplementation of the Python language by Microsoft, in .NET. This combines all the elegance of the Python language with the ability to use .NET features, such as calling C# libraries.

Directly below the spreadsheet grid is Resolver One's coding pane. In here is Python code representing the entire spreadsheet. Feel free to look around in here if you wish.

images/s05-codingpane.png

Somewhere around the middle of the coding pane is an empty green area, the pre-formula usercode section. Usercode is our term for any Python code added to the coding pane by you, the user. We shall now add the Python equivalent of the lengthy evaporation formula into this pre-formula usercode section.

Try This Now

The following Python is equivalent to the evaporaton formula described above. Scroll the code pane up or down to find the green pre-formula usercode section, and add this Python function into it (you can just copy-and-paste it in):

def evaporation(sun, tmin, tmax):
    tmin = max(0, tmin)
    tmax = max(0, tmax)
    sun_evap = sun * tmax
    indirect_evap = (720 - sun) * tmin
    return (sun_evap + indirect_evap) / 789

Having defined this function in the pre-formula usercode, we can now use it in a spreadsheet formula.

Enter the word Evap into cell I1, as a heading for the column.

Populate column I using the evaporation function, by selecting the whole column (clicking on the 3D-shaded I cell at the top of the grid), and entering the column formula:

=evaporation(#Sun#_, #Tmin#_, #Tmax#_)

The values calculated by this function appear in the grid.

images/s05-values-from-usercode-function.png

This column formula calls the evaporation function for every row on the spreadsheet, passing in the values of Sun, Tmin and Tmax from that row. The value which is returned at the end of the function is then displayed in that row of the grid.

The Python function is more verbose than the original evaporation formula. However, the separation of the calculation into discrete steps, combined with the use of meaningful variable names, makes the Python version more comprehensible and maintainable.

Notice how the green pre-formula usercode section is useful for code which must be executed before the spreadsheet formulae are evaluated, such as in this case, where we define a function which is then used by the spreadsheet formulae.

Doing More With Python and Usercode

The transition from a spreadsheet formula to a Python function opens the door to many more powerful techniques. Indeed, it is this tightly integrated marriage of a spreadsheet's intuitive visual accessibility with a modern, fully-featured programming language which makes Resolver One special.

Conditional Formatting

Formatting can be applied to the Evaporation column to modify its appearance, for example limiting the display to a couple of decimal places.

Try This Now

Right click on the shaded I column heading to select the whole column, and select Format Number... from the context menu.

images/s06-formatting-contextmenu.png

Set the number of decimal places to 2 and hit OK.

This dialog could also be used to set the formatting of a column to display negative numbers in red. However, that's not exactly what we want in this instance. What would be more useful is to highlight cells for which the rate of evaporation is greater than the rainfall for that month.

Conditional formatting like this is exactly the sort of problem-specific, customised behaviour that is well-suited to a solution in usercode.

Try This Now

Scroll the the code pane down to the bottom, where you will find a yellow post-formulae usercode section.

images/s08-postformula-usercode.png

Add the following Python to this yellow section:

sheet1 = workbook['weatherdata']

def isDrought(row):
    return row['Evap'] > row['Rain']

for row in sheet1.ContentRows:
    if isDrought(row):
        row.Cells['Evap'].BackColor = Color.Yellow

After making changes to usercode like this, you can apply the changes by either clicking outside the coding pane (eg. in the grid), or else pressing F9. The effect is then visible in the grid:

images/s09-conditional-formatting.png

The above code is fairly straightforward, even if you don't know know Python:

The variable sheet1 is created as a synonym for the worksheet. The for statement then loops through every row in the worksheet. If the row represents a drought month (because the value for Evap is greater than the value for Rain) then we turn the cell in the Evap column yellow.

Since this formatting relies upon values in column I that are calculated by spreadsheet formulae, we must use the yellow post-formula section, to ensure that the formatting is applied after the cell values have been calculated.

Custom Formatters

A custom formatter is a Python function which is used to modify the value that appears in a cell. It can modify the text in a cell, but not the rendering (the font or colors) of the cell.

For example, we could change the appearance of the Month column, to replace the ugly numbers with month names.

Try This Now

Add the following formatting function to the yellow post-formula usercode, after the other code we already added to this section (it reuses the previously-defined sheet1 variable):

from calendar import month_abbr

def monthFormat(value, _):
    if ISNUMBER(value) and 1 <= value <= 12:
        return month_abbr[value]
    else:
        return value

sheet1.Cols['Month'].Formatter = monthFormat

The months now appear with their names, instead of numbers:

images/s10-custom-formatter.png

This code imports an array of abbreviated month names ('Jan', 'Feb', etc) from the Python standard library, and then defines the function monthFormat, which will convert a number from 1 to 12 into its corresponding entry in this array of month names.

Note that the second parameter to the format function (described in the documentation linked above) is not used in this example. It is conventional in Python to name unused function parameters '_' or '__' (single or double underscore.)

The formatting function is then applied to the spreadsheet's Month column. So when Resolver One displays this column of the spreadsheet, it calls this function for each cell in turn, to determine what to display within it.

Remember, though, that format functions only modify the appearance of the cell on screen. The underlying data in the cell is still unchanged.

Try This Now

Enter the following formula into cell J1:

=B2 + B3

The result is 3, since B1 and B2 still contain 1 and 2 respectively, even though the formatter makes them look like 'Jan' and 'Feb'.

Clear cell J1 again (click on it and press delete).

Note

Like many of the features discussed in this tutorial, a custom formatter can be applied to an individual cell, or to a row, a column, a cell range or a whole worksheet.

Usercode calculations

Usercode can be used for more than just formatting - it can also calculate new values in the spreadsheet. Here we are going to calculate the chances of heavy rains occurring within a given month of the year.

Try This Now

Change to Sheet 2, by clicking on its tab.

Enter the column headings Month in A1, and Prob in B1.

Make row 1 the header row by selecting it, right-clicking in it, and selecting Set Header Row (1). You might like to also make it bold with Ctrl-B or the toolbar button.

Next, add the following usercode to the end of the yellow post-formula usercode section, to calculate the values in this worksheet:

sheet2 = workbook['Sheet2']

def calc_probs():
    totals = [0] * 12 # num datapoints for each month
    counts = [0] * 12 # num high stat occurences for month
    for row in sheet1.ContentRows:
        month = row['Month'] - 1
        totals[month] += 1
        counts[month] += row['Rain'] > 80

    return [counts[x] / totals[x] for x in range(0, 12)]

def display_probs(probs):
    for month in range(0, 12):
        row = sheet2.Rows[month+2]
        row['Month'] = month+1
        row['Prob'] = probs[month]

display_probs(calc_probs())

Don't forget to hit F9 (or click on the grid) after changing usercode, to see the updates in the grid.

images/s11-usercode-calc-prob-raw.png

The calc_probs function in the above code returns a list of 12 values, where the nth value in the list represents the proportion of times month[n] had rainfall higher than 80mm.

Note

Function calc_probs performs a division of count by total, for each month of the year. This is done using true division.

In Python 2.x, division usually produces an integer result for integer inputs, rounding down if there is any remainder (eg. 5 / 4 == 1) Resolver changes this by importing true division from Python's built-in module __future__. Within Resolver, dividing integers with a remainder will produce a floating point result (eg. 5 / 4 == 1.25)

The display_probs function uses a list of 12 values to populate the cells A1:B13 on sheet 2, with the month numbers in column A, and the probabilities in column B.

The final line display_probs(calc_probs()), merely calls these two functions, passing the output of one into the input of the other.

The figures this produces are a bit ugly-looking. Let's add some formatting to spruce them up.

Try This Now

Format the months using the same month formatter we previously defined, by adding this to the end of the yellow post-formula usercode:

sheet2.Cols['Month'].Formatter = monthFormat

Format the probability values using the GUI - select column B and right click on it. Choose Format Number... and format the column as a Percentage, to 2 decimal places.

images/s11-usercode-calc-prob.png

Named Cell Ranges

A cell range is a rectangular area of a sheet, for example as used in the formula =SUM(A1:A20). A cell range can be given a name to make them easy to use and pass around.

We can modify the probability calculation performed above to make it a little more flexible and useful. Instead of embedding the predicate row['Rain'] > 80 in the function that calculates probabilities, we can parameterise this condition using values from the grid.

Try This Now

Put the value Rain into cell D2, and 80 into D3.

Click on D2 to select it, and then select from the pull-down menus: Data / Edit Names....

In the Edit Names dialog, click New and enter the name statname. See how the location of this single-cell named cell range is already populated for us by us selecting cell D2. Click OK to close the Edit Names dialog.

Click cell D3 to select it, and create a second named cell range, this one called statmax. The Edit Names dialog should look like this when you're done:

images/s14-edit-names-dialog.png

Now we can use these named ranges in our usercode. Find the line near the end of function calc_probs which tallies up the number of times a month experiences high rainfall:

        counts[month] += row['Rain'] > 80

This line relies on Python treating False as 0 and True as 1 when added to the number stored in counts[month].

Change the line now, to use the values from the named ranges instead, like so:

        counts[month] += row[statname.Value] > statmax.Value

This produces the same results on the grid as before. Now, however, we can edit the values in cells D2 and D3 to get probability information on other weather statistics. For example, the probability of getting more than 120 hours sun in a month:

images/s15-prob-sun.png

Run Workbook

RunWorkbook is a function that allows one spreadsheet document to pass its own data in to the processes or calculations defined in a second document. The results of that calculation are then available for the first spreadsheet to use.

This can be used to define calculation once, in a single document, and then use that process from many other spreadsheet documents, each passing in their own sets of data and retrieving their own results. When the calculation needs to be fixed or modified, these changes only need be made in a single place, not in every document.

Let's try to reuse the probability calculations defined above from a different spreadsheet document.

Try This Now

Save your current document as engine.rsl - this document will be doing our probability calculations.

Make a copy of the following new spreadsheet of weather data: oxford-1853-1903.xls. This will be supplying the data for the calculations.

Import the oxford data into a new Resolver document, as you did before, using File / Import... / From Excel (No formulae) and navigate to the new oxford .xls file.

Save this newly-imported sheet as oxford.rsl, in the same directory as engine.rsl.

Paste the following usercode into the green pre-formula usercode of oxford.rsl:

sheet1 = workbook['data']
sheet2 = workbook['sheet2']

weatherdata = CellRange(
    sheet1.Cells.A1,
    sheet1.Cells['G', sheet1.MaxRow])
overrides = {
    ('weatherdata', 1, 1): weatherdata,
    'statname': sheet2.D2,
    'statmax': sheet2.D3,
}

engine_workbook = RunWorkbook('engine.rsl', overrides)

prob_sheet = engine_workbook['Sheet2']
results = CellRange(prob_sheet.Cells.A1, prob_sheet.Cells.B13)
CopyRange(results, sheet2.Cells.A1)

The crux of this code is the call to RunWorkbook about 2/3 of the way down. All the code before this, creating weatherdata and overrides, prepares the Oxford weather data to be passed in to the calculation. All the code after RunWorkbook retrieves the results calculated by engine.rsl and displays them here.

The code does not work, as yet, because it is expecting to find values for statname and statmax in cells D2 and D3.

Try This Now

In oxford.rsl's sheet 2, enter the value Frost into cell D2, and 5 into D3.

After a pause, while the other document is accessed, sheet 2 is filled with the probabilities for encountering 5 or more days of frost in a month, for Oxford during the latter half of the 19th century.

images/s17-runworkbook.png

Note how oxford.rsl contains no logic for calculating probabilities. It passes its data to engine.rsl, which performs all the calculations, and then retrieves the results for display.

Note

There are more samples online showing ways to use RunWorkbook, calculating value-at-risk, and the Black-Scholes pricing model.

Calling External Components

Resolver One embeds IronPython instead of regular Python (CPython, as it's known.) IronPython can seamlessly use external .NET components, and Python modules.

Calling .NET components

For instance, we can draw a graph of data in Resolver one using ZedGraph, an open-source .NET graphing library.

Try This Now

Download a copy of the ZedGraph.dll from here. ZedGraph is redistributed here in binary form for dynamic linking, under the terms of its LGPL license.

Copy it into the same directory as your .rsl file.

Note

Alternatively, DLL files such as this can be put into any directory on your Python sys.path, which is populated from the environment variable IRONPYTHONPATH.

ZedGraph exposes, amongst many other things, a C# class GraphPane, which has methods AddCurve, to add data to a chart, and GetImage, to return the rendered chart as a .NET Bitmap.

We can call these C# methods directly from our usercode or cell formulae.

Try This Now

Add the following code to the end of the yellow post-formula usercode:

# some imports from .NET
from System import Math
from System.Drawing import Bitmap, Graphics, RectangleF

# the clr module lets us add references to new .NET DLLs
import clr
clr.AddReference("ZedGraph")

# Now we can import from the DLL as though it were Python code
from ZedGraph import GraphPane, PointPairList, SymbolType

# Display a chart using the ZedGraph API
def display_chart(title, xlabel, xdata, ylabel, ydata):
    pane = GraphPane(
        RectangleF(0, 0, 640, 480),
        title,
        xlabel,
        ylabel)
    points = PointPairList()
    for x, y in zip(xdata, ydata):
        points.Add(x, y)

    pane.AddCurve(ylabel, points, Color.Blue, SymbolType.Diamond)
    pane.AxisChange(Graphics.FromImage(Bitmap(1, 1)))

    # Display the chart as a bitmap on a Resolver image workbook
    workbook.AddImageWorksheet(title, pane.GetImage())

# Convert the 'year'/'month' columns into x co-ordinates
# use the 'rain' column as y co-ordinates
xdata = [row['Year'] + (row['Month'] - 1) / 12
         for row in sheet1.ContentRows]
ydata = [row['Rain'] for row in sheet1.ContentRows]

display_chart("Rainfall by month", "Date", xdata,
              "Rain (mm)", ydata)

The above code adds a new worksheet, an image worksheet, which contains the bitmap of the rendered chart:

images/s12-chart.png

Using this technique, your spreadsheets can easily use functionality from external 3rd party .NET components. Additionally, your own functionality that is common to several spreadsheets can be moved out into a single external component, and shared between all the sheets that require it.

Calling Python Modules

It's equally simple to use external Python modules, either your own or 3rd party ones.

Try This Now

Put the following function into a new file called notchfilter.py, in the same directory as your .rsl file:

from Library.Compatibility import AVERAGE

def notchfilter(values, width):
    output = []
    for index in range(len(values)):
        begin = max(0, index - width)
        end = min(index + width + 1, len(values))
        output.append(AVERAGE(values[begin:end]))
    return output

We can now import this function and call it from our usercode, or from a cell formulae.

Insert the following just before the call to display_chart at the end of the yellow post-formula usercode:

from notchfilter import notchfilter
ydata = notchfilter(ydata, 7)

By applying notchfilter to the y-values, we average out each month's rainfall with the seven months on either side of it, thus reducing the rapid seasonal variation of rainfall levels to expose longer-term trends.

images/s13-chart-filtered.png

Note

Your usercode can import functions or classes from Python modules located anywhere on your IRONPYTHONPATH, not just from modules in the current directory. Once they have been imported, they can be used from usercode or from cell formulae.

The only major limitation with this technique is that IronPython will not currently import those Python libraries which use C-extensions. Resolver Systems are working on an open-source project called Ironclad to address this deficiency.

This technique makes it easy to use 3rd party Python modules in your spreadsheets, or to move functionality that is common to several spreadsheets out into its own external module, to prevent error-prone duplication of functionality.

Wrapup

That's the end of this tutorial. If you want more, there are other samples that get installed with Resolver One.

Otherwise, check out Resolver One's main documentation page, or the reference documentation for Resolver One's object model and spreadsheeting functions.

Comments