spellbook.plot#

High-level functions for creating and saving plots

Functions:

pairplot(data, xs[, ys, fontsize, histplot_args])

Create a pairplot

parallel_coordinates(data, features, target, ...)

Parallel coordinates plot

plot_1D(data, x[, xlabel, fontsize, ...])

Create a single univariate plot

plot_2D(data, x, y[, relative, fontsize, ...])

Create a single bivariate/correlation plot

plot_confusion_matrix(confusion_matrix, ...)

Create a confusion matrix heatmap plot

plot_grid_1D(nrows, ncols, data[, target, ...])

Create a grid of univariate plots

plot_grid_2D(nrows, ncols, data, xs, ys[, ...])

Create a grid of bivariate/correlation plots

save(fig, filename[, dpi])

Save a plot to a file

Functions#

pairplot#

spellbook.plot.pairplot(data, xs, ys=None, fontsize=12.0, histplot_args={})[source]#

Create a pairplot

../_images/pairplot-3x5.png

The plot does not need to contain the same variables or number of variables in x and y. It can be rectangular with any number of rows and any number of columns. The subplots with the same variable in x and y are detected automatically, no matter where they are located in the pairplot, and instead of a 2D/bivariate/correlation plot, the appropriate 1D/univariate distribution is shown. This behaviour allows to split a full and possibly large pairplot for all variables into arbitrarily-sized separate smaller pieces.

The visual representation of the distributions and correlations is chosen automatically depending on the type of random variables (categorical, ordinal, continuous).

Parameters
  • data (pandas.DataFrame) – The dataset to plot

  • xs (typing.List[str]) – Names of the variables to plot on the x-axis

  • ys (typing.Optional[typing.List[str]]) – Optional. Names of the variables to plot on the y-axes. If not specified, the same variables will be shown on the x-axes and the y-axes.

  • fontsize (float) – Optional. Baseline fontsize for all elements. This is probably the fontsize that medium corresponds to?

  • histplot_args (dict) – Optional. Arguments for seaborn.histplot(), which is used to draw the histograms

Return type

matplotlib.figure.Figure

Returns

The figure containing the grid of plots

parallel_coordinates#

spellbook.plot.parallel_coordinates(data, features, target, categories, fontsize=None, shift=0.3)[source]#

Parallel coordinates plot

../_images/parallel-coordinates.png

Based on Parallel Coordinates in Matplotlib, but extended to also support categorical variables.

For categorical variables, a random uniform shift is applied to spread the lines in the vicinity of the respective classes. This way, there is an indication for the composition of the datapoints in a particular class/category in terms of the target labels/classes. Furthermore, the shift interval is sized according to the number of datapoints in the respective class/category in order to give an impression for how many datapoints there are in that class.

Parameters

Todo

Support more than the 10 colours included in Matplotlib’s tableau colours

Return type

matplotlib.figure.Figure

plot_1D#

spellbook.plot.plot_1D(data, x, xlabel=None, fontsize=12.0, figure_args={}, barchart_args={}, histogram_args={}, histplot_args={}, statsbox_args={})[source]#

Create a single univariate plot

The type of the variable (categorical or continuous) is determined automatically and either spellbook.plot1D.barchart() or spellbook.plot1D.histogram() is called.

Parameters
Return type

matplotlib.figure.Figure

Returns

The figure containing the plot

plot_2D#

spellbook.plot.plot_2D(data, x, y, relative=False, fontsize=12.0, figure_args={}, heatmap_args={}, violinplot_args={}, cathist_args={}, scatterplot_args={})[source]#

Create a single bivariate/correlation plot

The types of the variables (categorical or continuous) are determined automatically and the corresponding 2D plotting function is called:

Parameters
  • data (pandas.DataFrame) – The dataset to plot

  • x (str) – Name of the variable to plot on the x-axis

  • y (str) – Name of the variable to plot on the y-axis

  • relative (bool) –

    Optional, whether or not the heatmaps drawn with spellbook.plot2D.heatmap() should be normalised or not

    • True: heatmap will be column-normalised (normalisation = norm-col)

    • False: heatmap will be show absolute numbers (normalisation = count)

  • fontsize (float) – Optional. Baseline fontsize for all elements. This is probably the fontsize that medium corresponds to?

  • figure_args (dict) – Optional. Arguments for the creation of the matplotlib.figure.Figure with matplotlib.pyplot.figure()

  • heatmap_args (dict) – Optional. Arguments passed on to spellbook.plot2D.heatmap() for correlations between a categorical variable on the x-axis and a categorical variable on the y-axis

  • violinplot_args (dict) – Optional. Arguments passed on to spellbook.plot2D.violinplot() for correlations between a categorical variable on the x-axis and a continuous variable on the y-axis

  • cathist_args (dict) – Optional. Arguments passed on to spellbook.plot2D.categorical_histogram() for correlations between a continuous variable on the x-axis and a categorical variable on the y-axis

  • scatterplot_args (dict) – Optional. Arguments passed on to spellbook.plot2D.scatterplot() for correlations between a continuous variable on the x-axis and a continuous variable on the y-axis

Return type

matplotlib.figure.Figure

Returns

The figure containing the plot

Examples:

  • simple example

    fig = sb.plot.plot_2D(data=data, x='age', y=target, fontsize=14.0)
    
  • advanced example

    The target variable has two categories and therefore, two histograms will be stacked on top of each other. Via the histogram_args parameter, a list of two dictionaries is passed on to spellbook.plot2D.categorical_histogram() - one dictionary for each of the two categories. Each one of the dictionaries is then passed on to spellbook.plot1D.histogram().

    fig = sb.plot.plot_2D(
              data=data, x='age', y=target, fontsize=11.0,
              cathist_args = {
                  'histogram_args': [
                      dict(
                          show_stats=True,
                          statsbox_args = {'alignment': 'bl'}
                      ),
                      dict(
                          show_stats=True,
                          statsbox_args = {
                              'y': 0.96,
                              'text_args': {
                                  # RGBA white with 50% alpha/opacity
                                  'backgroundcolor': (1.0, 1.0, 1.0, 0.5)
                              }
                          }
                      )
                  ]
              })
    

plot_confusion_matrix#

spellbook.plot.plot_confusion_matrix(confusion_matrix, class_names, class_ids=None, normalisation='count', crop=True, figsize=(5.8, 5.3), fontsize=None, fontsize_annotations=None)[source]#

Create a confusion matrix heatmap plot

../_images/confusion-matrix-absolute.png

Both the absolute frequencies as well as the relative frequencies, either normalised by the true labels, the predictedlabels or their combinations, can be shown. The desired behaviour is specified with the parameter normalisation.

Parameters
  • confusion_matrix (tf.Tensor) – The confusion matrix

  • class_names (typing.List[str]) – List of the class names

  • class_ids (typing.Optional[typing.List[int]]) – Optional, list of IDs for each target class. These IDs are shown on the x-axis and, together with the class names, on the y-axis.

  • normalisation (str) –

    Optional, indicates if the absolute or relative frequencies should be plotted

    • count: Numbers of datapoints

    • norm-all: Percentages normalised across all combinations of the true and the predicted classes/labels

    • norm-true: Percentages normalised across the true labels

    • norm-pred: Percentages normalised across the predicted classes

  • figsize (typing.Tuple[float, float]) – Optional, size (width, height) of the figure in inches

  • crop (bool) –

    Plots with normalisation set to norm-true/norm-pred do not include the SUM row/column, respectively. When crop is set to

    • True, the excluded SUM row/column is removed from the heatmap matrix, thus making it occupy a larger portion of the plot

    • False, the excluded SUM row/column is kept empty but still included in the heatmap matrix, so as to make each cell appear in the same position as with normalisation set to count or norm-all

  • fontsize (typing.Optional[float]) – Optional. Baseline fontsize for all elements. This is probably the fontsize that medium corresponds to?

  • fontsize_annotations (typing.Union[str, float, None]) – Optional. Fontsize for the annotations. As specified in matplotlib.text.Text.set_fontsize().

Return type

matplotlib.figure.Figure

Returns

The figure containing the plot

used in#

Serving *TensorFlow* Models in *Docker*

Serving TensorFlow Models in Docker

Serving *TensorFlow* Models in *Docker*

plot_grid_1D#

spellbook.plot.plot_grid_1D(nrows, ncols, data, target=None, features=None, xlabels=None, fontsize=12.0, figure_args={}, stats=True, stats_align=None, binwidths=None, histogram_args={})[source]#

Create a grid of univariate plots

The type / visual representation of each variable is determined automatically via spellbook.plotutils.get_data_kind(). Categorical variables are shown as barcharts and continuous variables are shown as univariate / 1D histograms. Summary statistics boxes can be shown for the histograms.

../_images/plot_grid_1D.png
Parameters
  • nrows (int) – Number of rows

  • ncols (int) – Number of columns

  • data (pandas.DataFrame) – The dataset to plot

  • target (typing.Optional[str]) – Optional. The name of the target variable. If specified, the target variable will be plotted first and highlighted by plotting it in orange. Either target or features has to be specified.

  • features (typing.Optional[typing.List[str]]) – Optional. List with the names of the feature variables. If specified, the feature variables will be plotted after the target variable. Either target or features has to be specified.

  • xlabels (typing.Union[str, typing.List[str], None]) – Optional. The titles of the x-axes. If unspecified or set to None, the names of the variables, as specified by target and features will be used.

  • fontsize (float) – Optional. Baseline fontsize for all elements. This is probably the fontsize that medium corresponds to?

  • figure_args (dict) – Optional. Arguments for the creation of the returned matplotlib.figure.Figure with matplotlib.pyplot.figure()

  • stats (typing.Union[bool, typing.List[bool]]) – Optional. Bool or list of bools that indicate if statistics boxes are shown in each plot

  • stats_align (typing.Union[str, typing.List[str], None]) – Optional. List of alignment strings, one for each plot

  • binwidths (typing.Union[float, typing.List[float], None]) – Optional. Float or list of floats that indicate the binwidth in each plot

  • histogram_args (dict) – Optional. Dictionary of parameters and values that are passed to spellbook.plot1D.histogram()

Return type

matplotlib.figure.Figure

Returns

Figure containing the grid of plots

Example

import pandas as pd
import spellbook as sb
data = pd.read_csv('dataset.csv')
plot_vars = sb.plot.plot_grid_1D(2, 4, data,
    target='z', features=['x', 'y'],
    stats=True, stats_align=['tl', 'br', 'tr'])

plot_grid_2D#

spellbook.plot.plot_grid_2D(nrows, ncols, data, xs, ys, relative=False, fontsize=12.0, figure_args={}, heatmap_args={}, violinplot_args={}, cathist_args={}, scatterplot_args={})[source]#

Create a grid of bivariate/correlation plots

Parameters
  • nrows (int) – Number of rows

  • ncols (int) – Number of columns

  • data (pandas.DataFrame) – The dataset to plot

  • xs (typing.List[str]) – Names of the variables to plot on the x-axis

  • ys (typing.List[str]) – Names of the variables to plot on the y-axis

  • relative (bool) –

    Optional. Whether or not the heatmaps drawn with spellbook.plot2D.heatmap() should be normalised or not

    • True: heatmap will be column-normalised (normalisation = norm-col)

    • False: heatmap will be show absolute numbers (normalisation = count)

  • fontsize (float) – Optional. Baseline fontsize for all elements. This is probably the fontsize that medium corresponds to?

  • figure_args (dict) – Optional. Arguments for the creation of the matplotlib.figure.Figure with matplotlib.pyplot.figure()

  • heatmap_args (dict) – Optional. Arguments passed on to spellbook.plot2D.heatmap() for correlations between a categorical variable on the x-axis and a categorical variable on the y-axis

  • violinplot_args (dict) – Optional. Arguments passed on to spellbook.plot2D.violinplot() for correlations between a categorical variable on the x-axis and a continuous variable on the y-axis

  • cathist_args (dict) – Optional. Arguments passed on to spellbook.plot2D.categorical_histogram() for correlations between a continuous variable on the x-axis and a categorical variable on the y-axis

  • scatterplot_args (dict) – Optional. Arguments passed on to spellbook.plot2D.scatterplot() for correlations between a continuous variable on the x-axis and a continuous variable on the y-axis

Return type

matplotlib.figure.Figure

Returns

The figure containing the grid of plot

save#

spellbook.plot.save(fig, filename, dpi=200)[source]#

Save a plot to a file

Parameters
  • fig (matplotlib.figure.Figure) – The figure to plot

  • filename (str) – The filename under which to save the plot

  • dpi (int) – Optional resolution