Welcome to slearn’s documentation!
skearn is a package linking symbolic representation (SAX, ABBA, and fABBA) with scikit-learn machine learning for time series prediction. Symbolic representations of time series have proved their usefulness in the field of time series motif discovery, clustering, classification, forecasting, anomaly detection, etc. Symbolic time series representation methods do not only reduce the dimensionality of time series but also speed up the downstream time series task. It has been demonstrated by [S. Elsworth and S. Güttel, Time series forecasting using LSTM networks: a symbolic approach, arXiv, 2020] that symbolic forecasting has greatly reduced the sensitivity of hyperparameter settings for Long Short Term Memory networks. How to appropriately deploy machine learning algorithms on the level of symbols instead of raw time series poses a challenge to the interest of applications. To boost the development of the research community on symbolic representation, we develop this Python library to simplify the process of machine learning algorithm practice on symbolic representation.
Before getting started, please install the slearn package simply by
Installation guide
slearn has the following dependencies for its clustering functionality:
numpy>=1.21
scipy>1.6.0
pandas
lightgbm
scikit-learn
To install the current release via PIP use:
pip install slearn
Note
The documentation is still on going.
Symbolic sequence prediction with machine learning
Machine learning with symbols
Given a sequence of symbols, ask you to predict the following symbols, what will you do with machine learning? An intuitive way is to transform the symbols to numerical labels, decide the appropriate windows size for features input (lag), and then define a classification problem. slearn build a pipeline for this process, and provide user-friendly API.
First import the package:
from slearn import symbolicML
We can predict any symbolic sequence by choosing the classifiers available in scikit-learn. Currently slearn supports:
Classifiers |
Parameter call |
---|---|
Multi-layer Perceptron |
‘MLPClassifier’ |
K-Nearest Neighbors |
‘KNeighborsClassifier’ |
Gaussian Naive Bayes |
‘GaussianNB’ |
Decision Tree |
‘DecisionTreeClassifier’ |
Support Vector Classification |
‘SVC’ |
Radial-basis Function Kernel |
‘RBF’ |
Logistic Regression |
‘LogisticRegression’ |
Quadratic Discriminant Analysis |
‘QuadraticDiscriminantAnalysis’ |
AdaBoost classifier |
‘AdaBoostClassifier’ |
Random Forest |
‘RandomForestClassifier’ |
LightGBM |
‘LGBM’ |
Now we predict a simple synthetic symbolic sequence
string = 'aaaabbbccd'
First, we define the classifier, and specify the ws
(windows size or lag) and classifier_name
following the above table, initialize with
sbml = symbolicML(classifier_name="MLPClassifier", ws=3, random_seed=0, verbose=0)
Then we can use the method encode
to split the features and target for training models. The we use method forecast
to apply forecasting:
pred = sbml.forecast(x, y, step=5, hidden_layer_sizes=(10,10), learning_rate_init=0.1)
The parameters of x
, y
, and step
are fixed, the rest of parameters are depend on what classifier you specify, the parameter settings can be referred to scikit-learn library.
For nerual network, you can define the parameters of hidden_layer_sizes
and learning_rate_init
, while for support vector machine you might define C
.
Generating symbols
slearn library also contains functions for the generation of strings of tunable complexity using the LZW compressing method as base to approximate Kolmogorov complexity.
from slearn import *
df_strings = LZWStringLibrary(symbols=3, complexity=[3, 9])
df_strings

Also, you can deploy RNN test on the symbols you generate:
df_iters = pd.DataFrame()
for i, string in enumerate(df_strings['string']):
kwargs = df_strings.iloc[i,:-1].to_dict()
seed_string = df_strings.iloc[i,-1]
df_iter = RNN_Iteration(seed_string, iterations=2, architecture='LSTM', **kwargs)
df_iter.loc[:, kwargs.keys()] = kwargs.values()
df_iters = df_iters.append(df_iter)
df_iter.reset_index(drop=True, inplace=True)
df_iters.reset_index(drop=True, inplace=True)
print(df_iters)

Time series forecasting with symbolic representation
slearn package contains the fast symbolic representation method, namely SAX and fABBA (more methods will be included).
Summmary
You can select the available classifiers and symbolic representation method (currently we support SAX, ABBA and fABBA) for prediction. Similarly, the parameters of the chosen classifier follow the same as the scikit-learn library. We usually deploy ABBA symbolic representation, since it achieves better forecasting against SAX.
slean leverages user-friendly API, time series forecasting follows:
Step 1: Define the windows size (features size), the forecasting steps, symbolic representation method (SAX or fABBA) and classifier.
Step 2: Transform time series into symbols with user specified parameters defined for symbolic representation.
Step 3: Define the classifier parameters and forecast the future values.
Now we illustrate how to use slearn with symbolic representation to forecast time series step by step.
First of all, we set the number of symbols you would like to predict and load libraries and data..
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from slearn import *
time_series = pd.read_csv("Amazon.csv") # load the required dataset, here we use Amazon stock daily close price.
ts = time_series.Close.values
step = 50
we start off with initializing the slearn with fABBA (alternative options: SAX
and ABBA
) and GaussianNB classifier, setting windows size to 3 and step to 50:
sl = slearn(method='fABBA', ws=3, step=step, classifier_name="GaussianNB") # step 1
Next we transform the time series into symbols with method set_symbols
:
sl.set_symbols(series=ts, tol=0.01, alpha=0.2) # step 2
Then we predict the time series with method predict
:
abba_nb_pred = sl.predict(var_smoothing=0.001) # step 3
Together, we combine the code with three classifiers:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from slearn import *
np.random.seed(0)
time_series = pd.read_csv("Amazon.csv")
ts = time_series.Close.values
length = len(ts)
train, test = ts[:round(0.9*length)], ts[round(0.9*length):]
sl = slearn(method='fABBA', ws=8, step=1000, classifier_name="GaussianNB")
sl.set_symbols(series=train, tol=0.01, alpha=0.1)
abba_nb_pred = sl.predict(var_smoothing=0.001)
sl = slearn(method='fABBA', ws=8, step=1000, classifier_name="DecisionTreeClassifier")
sl.set_symbols(series=train, tol=0.01, alpha=0.1)
abba_nn_pred = sl.predict(max_depth=10, random_state=0)
sl = slearn(method='fABBA', ws=8, step=1000, classifier_name="KNeighborsClassifier")
sl.set_symbols(series=train, tol=0.01, alpha=0.1)
abba_kn_pred = sl.predict(n_neighbors=10)
sl = slearn(method='fABBA', ws=8, step=100, classifier_name="SVC")
sl.set_symbols(series=train, tol=0.01, alpha=0.1)
abba_svc_pred = sl.predict(C=20)
min_len = np.min([len(test), len(abba_nb_pred), len(abba_nn_pred)])
plt.figure(figsize=(20, 5))
sns.set(font_scale=1.5, style="whitegrid")
sns.lineplot(data=test[:min_len], linewidth=6, color='k', label='ground truth')
sns.lineplot(data=abba_nb_pred[:min_len], linewidth=6, color='tomato', label='prediction (ABBA - GaussianNB)')
sns.lineplot(data=abba_nn_pred[:min_len], linewidth=6, color='m', label='prediction (ABBA - DecisionTreeClassifier)')
sns.lineplot(data=abba_nn_pred[:min_len], linewidth=6, color='c', label='prediction (ABBA - KNeighborsClassifier)')
sns.lineplot(data=abba_svc_pred[:min_len], linewidth=6, color='yellowgreen', label='prediction (ABBA - Support Vector Classification)')
plt.legend()
plt.tick_params(axis='both', labelsize=15)
plt.savefig('demo1.png', bbox_inches = 'tight')
plt.show()
The result is as plotted below:

API Reference
|
A package linking symbolic representation with scikit-learn for time series prediction. |
|
Classifier for symbolic sequences. |
|
Modified from https://github.com/nla-group/TARZAN |
|
|
|
License
Copyright (c) 2022 Numerical Linear Algebra Group, Department of Mathematics, The University of Manchester
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.