spellbook.plot#

High-level functions for creating and saving plots

Functions:

`pairplot`(data, xs[, ys, fontsize, histplot_args])	Create a pairplot
`parallel_coordinates`(data, features, target, ...)	Parallel coordinates plot
`plot_1D`(data, x[, xlabel, fontsize, ...])	Create a single univariate plot
`plot_2D`(data, x, y[, relative, fontsize, ...])	Create a single bivariate/correlation plot
`plot_confusion_matrix`(confusion_matrix, ...)	Create a confusion matrix heatmap plot
`plot_grid_1D`(nrows, ncols, data[, target, ...])	Create a grid of univariate plots
`plot_grid_2D`(nrows, ncols, data, xs, ys[, ...])	Create a grid of bivariate/correlation plots
`save`(fig, filename[, dpi])	Save a plot to a file

Functions#

pairplot#

spellbook.plot.pairplot(data, xs, ys=None, fontsize=12.0, histplot_args={})[source]#

Create a pairplot

The plot does not need to contain the same variables or number of variables in x and y. It can be rectangular with any number of rows and any number of columns. The subplots with the same variable in x and y are detected automatically, no matter where they are located in the pairplot, and instead of a 2D/bivariate/correlation plot, the appropriate 1D/univariate distribution is shown. This behaviour allows to split a full and possibly large pairplot for all variables into arbitrarily-sized separate smaller pieces.

The visual representation of the distributions and correlations is chosen automatically depending on the type of random variables (categorical, ordinal, continuous).

Parameters

data (pandas.DataFrame) – The dataset to plot
xs (typing.List[str]) – Names of the variables to plot on the x-axis
ys (typing.Optional[typing.List[str]]) – Optional. Names of the variables to plot on the y-axes. If not specified, the same variables will be shown on the x-axes and the y-axes.
fontsize (float) – Optional. Baseline fontsize for all elements. This is probably the fontsize that medium corresponds to?
histplot_args (dict) – Optional. Arguments for seaborn.histplot(), which is used to draw the histograms

Return type

matplotlib.figure.Figure

Returns

The figure containing the grid of plots

parallel_coordinates#

spellbook.plot.parallel_coordinates(data, features, target, categories, fontsize=None, shift=0.3)[source]#

Parallel coordinates plot

Based on Parallel Coordinates in Matplotlib, but extended to also support categorical variables.

For categorical variables, a random uniform shift is applied to spread the lines in the vicinity of the respective classes. This way, there is an indication for the composition of the datapoints in a particular class/category in terms of the target labels/classes. Furthermore, the shift interval is sized according to the number of datapoints in the respective class/category in order to give an impression for how many datapoints there are in that class.

Parameters

data (pandas.DataFrame) – The dataset to plot
features (typing.List[str]) – The names of the feature variables
target (str) – The name of the target variable
categories (typing.Dict[str, typing.Dict[int, str]]) – Dictionary holding the category codes/indices and names as returned by spellbook.input.encode_categories()
fontsize (typing.Optional[float]) – Optional. Baseline fontsize for all elements. This is probably the fontsize that medium corresponds to?
shift (float) – Optional. The half-size of the interval for uniformely shifting categorical variables

Todo

Support more than the 10 colours included in Matplotlib’s tableau colours

Return type: matplotlib.figure.Figure

plot_1D#

spellbook.plot.plot_1D(data, x, xlabel=None, fontsize=12.0, figure_args={}, barchart_args={}, histogram_args={}, histplot_args={}, statsbox_args={})[source]#

Create a single univariate plot

The type of the variable (categorical or continuous) is determined automatically and either spellbook.plot1D.barchart() or spellbook.plot1D.histogram() is called.

Parameters

data (pandas.DataFrame) – The dataset to plot
x (str) – Name of the variable to plot
xlabel (typing.Optional[str]) – Optional. Title of the x-axis. If unspecified or set to None, the name of the variable, as specified by x, will be used.
fontsize (float) – Optional. Baseline fontsize for all elements. This is probably the fontsize that medium corresponds to?
figure_args (dict) – Optional. Arguments for the creation of the matplotlib.figure.Figure with matplotlib.pyplot.figure()
barchart_args (dict) – Optional. Arguments passed on to spellbook.plot1D.barchart() for categorical data
histogram_args (dict) – Optional. Arguments passed on to spellbook.plot1D.histogram() for continuous data
histplot_args (dict) – Optional. Arguments for seaborn.histplot(), which is used to draw the plot
statsbox_args (dict) – Optional. Arguments passed on by spellbook.plot1D.histogram to spellbook.plotutils.statsbox()

Return type

matplotlib.figure.Figure

Returns

The figure containing the plot

plot_2D#

spellbook.plot.plot_2D(data, x, y, relative=False, fontsize=12.0, figure_args={}, heatmap_args={}, violinplot_args={}, cathist_args={}, scatterplot_args={})[source]#

Create a single bivariate/correlation plot

The types of the variables (categorical or continuous) are determined automatically and the corresponding 2D plotting function is called:

x is categorical and y is categorical: spellbook.plot2D.heatmap()
x is categorical and y is continuous: spellbook.plot2D.violinplot()
x is continuous and y is categorical: spellbook.plot2D.categorical_histogram()
x is continuous and y is continuous: spellbook.plot2D.scatterplot()

Parameters

data (pandas.DataFrame) – The dataset to plot
x (str) – Name of the variable to plot on the x-axis
y (str) – Name of the variable to plot on the y-axis
relative (bool) –
Optional, whether or not the heatmaps drawn with spellbook.plot2D.heatmap() should be normalised or not
- True: heatmap will be column-normalised (normalisation = norm-col)
- False: heatmap will be show absolute numbers (normalisation = count)
fontsize (float) – Optional. Baseline fontsize for all elements. This is probably the fontsize that medium corresponds to?
figure_args (dict) – Optional. Arguments for the creation of the matplotlib.figure.Figure with matplotlib.pyplot.figure()
heatmap_args (dict) – Optional. Arguments passed on to spellbook.plot2D.heatmap() for correlations between a categorical variable on the x-axis and a categorical variable on the y-axis
violinplot_args (dict) – Optional. Arguments passed on to spellbook.plot2D.violinplot() for correlations between a categorical variable on the x-axis and a continuous variable on the y-axis
cathist_args (dict) – Optional. Arguments passed on to spellbook.plot2D.categorical_histogram() for correlations between a continuous variable on the x-axis and a categorical variable on the y-axis
scatterplot_args (dict) – Optional. Arguments passed on to spellbook.plot2D.scatterplot() for correlations between a continuous variable on the x-axis and a continuous variable on the y-axis

Return type

matplotlib.figure.Figure

Returns

The figure containing the plot

Examples:

simple example

fig = sb.plot.plot_2D(data=data, x='age', y=target, fontsize=14.0)

advanced example

The target variable has two categories and therefore, two histograms will be stacked on top of each other. Via the histogram_args parameter, a list of two dictionaries is passed on to spellbook.plot2D.categorical_histogram() - one dictionary for each of the two categories. Each one of the dictionaries is then passed on to spellbook.plot1D.histogram().

fig = sb.plot.plot_2D(
          data=data, x='age', y=target, fontsize=11.0,
          cathist_args = {
              'histogram_args': [
                  dict(
                      show_stats=True,
                      statsbox_args = {'alignment': 'bl'}
                  ),
                  dict(
                      show_stats=True,
                      statsbox_args = {
                          'y': 0.96,
                          'text_args': {
                              # RGBA white with 50% alpha/opacity
                              'backgroundcolor': (1.0, 1.0, 1.0, 0.5)
                          }
                      }
                  )
              ]
          })

plot_confusion_matrix#

spellbook.plot.plot_confusion_matrix(confusion_matrix, class_names, class_ids=None, normalisation='count', crop=True, figsize=(5.8, 5.3), fontsize=None, fontsize_annotations=None)[source]#

Create a confusion matrix heatmap plot

../_images/confusion-matrix-absolute.png

Both the absolute frequencies as well as the relative frequencies, either normalised by the true labels, the predictedlabels or their combinations, can be shown. The desired behaviour is specified with the parameter normalisation.

Parameters

confusion_matrix (tf.Tensor) – The confusion matrix
class_names (typing.List[str]) – List of the class names
class_ids (typing.Optional[typing.List[int]]) – Optional, list of IDs for each target class. These IDs are shown on the x-axis and, together with the class names, on the y-axis.
normalisation (str) –
Optional, indicates if the absolute or relative frequencies should be plotted
- count: Numbers of datapoints
- norm-all: Percentages normalised across all combinations of the true and the predicted classes/labels
- norm-true: Percentages normalised across the true labels
- norm-pred: Percentages normalised across the predicted classes
figsize (typing.Tuple[float, float]) – Optional, size (width, height) of the figure in inches
crop (bool) –
Plots with normalisation set to norm-true/norm-pred do not include the SUM row/column, respectively. When crop is set to
- True, the excluded SUM row/column is removed from the heatmap matrix, thus making it occupy a larger portion of the plot
- False, the excluded SUM row/column is kept empty but still included in the heatmap matrix, so as to make each cell appear in the same position as with normalisation set to count or norm-all
fontsize (typing.Optional[float]) – Optional. Baseline fontsize for all elements. This is probably the fontsize that medium corresponds to?
fontsize_annotations (typing.Union[str, float, None]) – Optional. Fontsize for the annotations. As specified in matplotlib.text.Text.set_fontsize().

Return type

matplotlib.figure.Figure

Returns

The figure containing the plot

used in#

Serving TensorFlow Models in Docker

Serving *TensorFlow* Models in *Docker*

plot_grid_1D#

spellbook.plot.plot_grid_1D(nrows, ncols, data, target=None, features=None, xlabels=None, fontsize=12.0, figure_args={}, stats=True, stats_align=None, binwidths=None, histogram_args={})[source]#

Create a grid of univariate plots

The type / visual representation of each variable is determined automatically via spellbook.plotutils.get_data_kind(). Categorical variables are shown as barcharts and continuous variables are shown as univariate / 1D histograms. Summary statistics boxes can be shown for the histograms.

Parameters

nrows (int) – Number of rows
ncols (int) – Number of columns
data (pandas.DataFrame) – The dataset to plot
target (typing.Optional[str]) – Optional. The name of the target variable. If specified, the target variable will be plotted first and highlighted by plotting it in orange. Either target or features has to be specified.
features (typing.Optional[typing.List[str]]) – Optional. List with the names of the feature variables. If specified, the feature variables will be plotted after the target variable. Either target or features has to be specified.
xlabels (typing.Union[str, typing.List[str], None]) – Optional. The titles of the x-axes. If unspecified or set to None, the names of the variables, as specified by target and features will be used.
fontsize (float) – Optional. Baseline fontsize for all elements. This is probably the fontsize that medium corresponds to?
figure_args (dict) – Optional. Arguments for the creation of the returned matplotlib.figure.Figure with matplotlib.pyplot.figure()
stats (typing.Union[bool, typing.List[bool]]) – Optional. Bool or list of bools that indicate if statistics boxes are shown in each plot
stats_align (typing.Union[str, typing.List[str], None]) – Optional. List of alignment strings, one for each plot
binwidths (typing.Union[float, typing.List[float], None]) – Optional. Float or list of floats that indicate the binwidth in each plot
histogram_args (dict) – Optional. Dictionary of parameters and values that are passed to spellbook.plot1D.histogram()

Return type

matplotlib.figure.Figure

Returns

Figure containing the grid of plots

Example

import pandas as pd
import spellbook as sb
data = pd.read_csv('dataset.csv')
plot_vars = sb.plot.plot_grid_1D(2, 4, data,
    target='z', features=['x', 'y'],
    stats=True, stats_align=['tl', 'br', 'tr'])

plot_grid_2D#

spellbook.plot.plot_grid_2D(nrows, ncols, data, xs, ys, relative=False, fontsize=12.0, figure_args={}, heatmap_args={}, violinplot_args={}, cathist_args={}, scatterplot_args={})[source]#

Create a grid of bivariate/correlation plots

Parameters

nrows (int) – Number of rows
ncols (int) – Number of columns
data (pandas.DataFrame) – The dataset to plot
xs (typing.List[str]) – Names of the variables to plot on the x-axis
ys (typing.List[str]) – Names of the variables to plot on the y-axis
relative (bool) –
Optional. Whether or not the heatmaps drawn with spellbook.plot2D.heatmap() should be normalised or not
- True: heatmap will be column-normalised (normalisation = norm-col)
- False: heatmap will be show absolute numbers (normalisation = count)
fontsize (float) – Optional. Baseline fontsize for all elements. This is probably the fontsize that medium corresponds to?
figure_args (dict) – Optional. Arguments for the creation of the matplotlib.figure.Figure with matplotlib.pyplot.figure()
heatmap_args (dict) – Optional. Arguments passed on to spellbook.plot2D.heatmap() for correlations between a categorical variable on the x-axis and a categorical variable on the y-axis
violinplot_args (dict) – Optional. Arguments passed on to spellbook.plot2D.violinplot() for correlations between a categorical variable on the x-axis and a continuous variable on the y-axis
cathist_args (dict) – Optional. Arguments passed on to spellbook.plot2D.categorical_histogram() for correlations between a continuous variable on the x-axis and a categorical variable on the y-axis
scatterplot_args (dict) – Optional. Arguments passed on to spellbook.plot2D.scatterplot() for correlations between a continuous variable on the x-axis and a continuous variable on the y-axis

Return type

matplotlib.figure.Figure

Returns

The figure containing the grid of plot

save#

spellbook.plot.save(fig, filename, dpi=200)[source]#

Save a plot to a file

Parameters

fig (matplotlib.figure.Figure) – The figure to plot
filename (str) – The filename under which to save the plot
dpi (int) – Optional resolution

	Contact me on GitHub
	Contact me on LinkedIn

spellbook

spellbook.plot

Contents

spellbook.plot#

Functions#

pairplot#

parallel_coordinates#

plot_1D#

plot_2D#

plot_confusion_matrix#

used in#

plot_grid_1D#

plot_grid_2D#

save#