Library reference¶
Base classes¶
This module defines the base classes: Symbol, Alphabet and Sequence, their attributes and methods.
A symbolic sequence is a list of symbols taken from a finite alphabet of length \(k\)
Internally they are encoded according to integers from \(0\) to \(k-1\) called ivals (for Integer VALueS) and all the computations use this representation. For human readability, there is also a string value (named svals) which is associated with the integer representation when needed.
This module also provides two specific alphabets:
>>> boolean_alphabet
Alphabet(Symbol(0 | False), Symbol(1 | True))
>>> binary_alphabet
Alphabet(Symbol(0 | 0), Symbol(1 | 1))
- class scyseq.sequence.Symbol(value)¶
A symbol (or state) is used to define the state of the system at time \(t\).
- __init__(value)¶
Initialize an instance of Symbol.
- Parameters:
value (int or str) – Value associated with the sval property. Must be an integer or a string. It is automatically converted to a string.
Examples:
>>> Symbol(1) # 1 is converted to a string Symbol(- | 1) >>> Symbol('One') Symbol(- | One)
It has two properties: a string value named sval and an integer value named ival which is only attributed once the symbol is inserted in an alphabet.
>>> my_symbol = Symbol('me') >>> my_symbol.sval 'me' >>> my_symbol.ival # returns None
- property sval¶
The “string value” of the Symbol (i.e. its “name”) which can be accessed or changed (set) but not deleted (deleter raises exception for explicit behavior).
If the symbol is inserted in an alphabet, the sval should not already exist in the alphabet.
- property ival¶
The “integer value” of the Symbol associate an integer value which can be accessed but neither changed nor deleted
setter and deleter raise exception for explicit behavior.
- __eq__(other)¶
Returns True if self.ival == other.ival and self.sval == other.sval
- class scyseq.sequence.Alphabet(symbols)¶
The set of symbols that can be visited in a Sequence.
An Alphabet behaves like bidirectional dictionaries with restrictions to avoid problems.
- __init__(symbols)¶
Initialize an instance of the Alphabet class
- Parameters:
symbols (int or list or tuple) – The objects used to build the alphabet
Alphabets can be created using:
the length (an integer):
>>> Alphabet(3) Alphabet(Symbol(0 | 0), Symbol(1 | 1), Symbol(2 | 2))
a list or tuple of strings:
>>> Alphabet(['a', 'b', 'c']) Alphabet(Symbol(0 | a), Symbol(1 | b), Symbol(2 | c))
a list or tuple of symbols:
>>> Alphabet([Symbol('s0'), Symbol('s1'), Symbol('s2')]) Alphabet(Symbol(0 | s0), Symbol(1 | s1), Symbol(2 | s2))
Alphabets can be created with a list of states:
>>> state1 = Symbol('One') >>> state2 = Symbol('Two') >>> state3 = Symbol('Three') >>> alpha = Alphabet([state1, state2, state3]) >>> alpha Alphabet(Symbol(0 | One), Symbol(1 | Two), Symbol(2 | Three)) >>> print(alpha) ((0 | One), (1 | Two), (2 | Three)) >>> len(alpha) 3
>>> alpha[0] Symbol(0 | One)
Symbols cannot be changed directly:
>>> alpha[1] = Symbol('Deux') Traceback (most recent call last): ... scyseq.exceptions.AlphabetAccessError: 'Alphabet' object does not support item assignment
But their sval can:
>>> alpha[1].sval = 'Deux' >>> alpha Alphabet(Symbol(0 | One), Symbol(1 | Deux), Symbol(2 | Three))
Alphabet’s symbols can be changed using a dictionary representation using the rename method.
>>> alpha.rename({0 : 'Uno', 2 : 'Tre'}) >>> alpha Alphabet(Symbol(0 | Uno), Symbol(1 | Deux), Symbol(2 | Tre))
- __eq__(other)¶
Two alphabets are equal if they have the same length and if their ivals and svals coincide.
>>> alpha_a = Alphabet(['a','b','c']) >>> alpha_b = Alphabet(['a','b','c']) >>> alpha_c = Alphabet(3) >>> alpha_a == alpha_b True >>> alpha_a == alpha_c False >>> alpha_c.rename({0 : 'a', 1 : 'b', 2 : 'c'}) >>> alpha_a == alpha_c True
- property svals¶
The tuple of string values in the alphabet
>>> alpha_a = Alphabet(['a','b','c']) >>> alpha_a.svals ('a', 'b', 'c')
- property ivals¶
The tuple of integer values in the alphabet
>>> alpha_a = Alphabet(['a','b','c']) >>> alpha_a.ivals (0, 1, 2)
- items()¶
Returns the pair ival : sval for each symbol.
>>> alpha_a = Alphabet(['a','b','c']) >>> list(alpha_a.items()) [(0, 'a'), (1, 'b'), (2, 'c')] >>> dict(alpha_a.items()) {0: 'a', 1: 'b', 2: 'c'}
- scyseq.sequence.boolean_alphabet instance of Alphabet¶
A predefined
Alphabetwith the two symbols (0 | False) and (1 | True) representing boolean values.
- scyseq.sequence.binary_alphabet instance of Alphabet¶
A predefined
Alphabetwith the two symbols (0 | 0) and (1 | 1) representing binary values.
- class scyseq.sequence.Sequence(symbols, alphabet, check=True)¶
Defines a symbolic sequence coded using integers in \({0, k-1}\) and their methods.
- __init__(symbols, alphabet, check=True)¶
Initializes a Sequence object.
- Parameters:
symbols (an object that can be coerced into an np.array of integers.) – the sequence of symbols.
alphabet – the alphabet which is either a alphabet or the alphabet length
check (boolean) – should the validity of the construction be checked
or
- Parameters:
s – a sequence object
- Exc:
TypeError: when parameter a is not given and s is not a sequence
- Raises:
ValueError: when a is neither a dict nor an intValueError: when s contains negative valuesAlphabetError: when s contains values greater or equal to kDictionaryError: if keys of d are not in \({0, k-1}\)- Returns:
a Sequence object with attribute s, k and d
>>> seqA = Sequence([1, 0, 0, 2, 0, 0, 0, 2, 2, 0], 3) >>> seqA Sequence: [1 0 0 2 0 0 0 2 2 0] Alphabet(Symbol(0 | 0), Symbol(1 | 1), Symbol(2 | 2)) N = 10 ; k = 3
- property alphabet¶
The alphabet property
- property k¶
The length of the alphabet
- property ivals¶
The tuple of integer values
- property svals¶
The tuple of string values
- iteritems()¶
Returns the pairs ival : sval
- iterivals()¶
Returns the iterator over the integer values
- itersvals()¶
Returns the iterator over the string values
- __eq__(other)¶
Return self==value.
- rename(replacement)¶
Rename the svals of a sequence (in fact the svals of the alphabet)
See the implementation in the operations module:
rename()
- roll(step)¶
Roll the sequence of step (with periodic boundary conditions).
See the implementation in the operations module:
roll()
- reduce()¶
Reduce the sequence i.e. delete repeated symbols.
See the implementation in the operations module:
reduce()
- count(value=None)¶
Count the number of value
See the implementation in the operations module:
count()
- frequency(value=None)¶
Returns the frequency of value
See the implementation in the operations module:
frequency()
Operations¶
Definitions of operations on objects Sequence and Alphabet.
The object’s methods are wrappers to these operations
- scyseq.operations.rename(obj, replacement)¶
To rename in place symbols in an alphabet or a sequence, pass a dictionary with integers as keys and strings as values so that the replacement of ivals and svals are explicit.
- Parameters:
replacement (dict) – The dictionary which describes the replacement.
>>> alpha_d = Alphabet(['a','b','c', 'd']) >>> alpha_d Alphabet(Symbol(0 | a), Symbol(1 | b), Symbol(2 | c), Symbol(3 | d)) >>> alpha_d.rename({1: 'One', 3: 'Three'}) >>> alpha_d Alphabet(Symbol(0 | a), Symbol(1 | One), Symbol(2 | c), Symbol(3 | Three))
The replacement variable should be a valid replacement candidate. So exceptions are raised if:
>>> alpha_d.rename(['bad1', 'bad2', 'bad3']) # not a dictionary Traceback (most recent call last): ... TypeError: The input must be a dictionary
>>> alpha_d.rename({'bad1': 'bad2', 'bad3': 'bad4'}) # not {int : str} Traceback (most recent call last): ... scyseq.exceptions.AlphabetAccessError: Replacements should be {integer : string, ...}
>>> alpha_d.rename({0: 'bad0', 1: 'bad0'}) # values are not different Traceback (most recent call last): ... scyseq.exceptions.AlphabetAccessError: Replacement values should all be different.
>>> alpha_d.rename({0: 'Three'}) # values already exists Traceback (most recent call last): ... scyseq.exceptions.AlphabetAccessError: Symbol 'Three' already exists in alphabet
- scyseq.operations.roll(obj, step)¶
Roll the sequence
>>> seq = Sequence([0, 1, 2, 0, 1], 3) >>> roll(seq, 2).ivals.tolist() [0, 1, 0, 1, 2]
- scyseq.operations.reverse(obj)¶
Reverse the sequence
>>> seq = Sequence([0, 1, 2, 0, 1], 3) >>> reverse(seq).ivals.tolist() [1, 0, 2, 1, 0]
- scyseq.operations.shuffle(obj)¶
Shuffle the sequence
>>> seq = Sequence([0, 1, 2, 0, 1], 3) >>> shuffled = shuffle(seq) >>> sorted(shuffled.ivals.tolist()) [0, 0, 1, 1, 2] >>> seq.ivals.tolist() [0, 1, 2, 0, 1]
- scyseq.operations.reduce(obj)¶
Returns a reduced sequence (ie Delete the repetitions of symbols in a sequence)
>>> seq = Sequence([0, 0, 2, 2, 2, 0, 1, 1], 3) >>> reduce(seq).ivals.tolist() [0, 2, 0, 1]
- scyseq.operations.count(obj, value=None)¶
Counts the number of each symbol in \({0, k-1}\) if code is None or the number of the code symbol.
- Parameters:
obj (Sequence) – The sequence object to count symbols in.
value (int or str, optional) – The specific symbol value to count. If None, counts all symbols.
- Returns:
An array of counts for each symbol, or a single integer count if value is provided.
- Return type:
numpy.ndarray or int
- scyseq.operations.frequency(obj, value=None)¶
Returns the probability of each symbol in \({0, k-1}\).
- Parameters:
obj (Sequence) – The sequence object.
value (int or str, optional) – The specific symbol value to find the probability of. If None, computes probabilities for all symbols.
- Returns:
An array of floats representing probabilities, or a single float if value is provided.
- Return type:
numpy.ndarray or float
- scyseq.operations.transform(seq, correspondance, new_alphabet=None)¶
Transforms the initial sequence according to the correspondence iterable.
- Parameters:
- Returns:
The new mathematically transformed Sequence.
- Return type:
Example
>>> seq = Sequence([0, 2, 0, 1], 3) >>> transform(seq, [1, 0, 0]).ivals.tolist() [1, 0, 1, 0] >>> alphabet = Alphabet(['low', 'high']) >>> transformed = transform(seq, [1, 0, 0], alphabet) >>> transformed.alphabet.svals ('low', 'high')
- scyseq.operations.recode(lseq, new_alphabet=False, sep='+', names=None)¶
Recodes a list of sequences with (possibly) different alphabets but with the same length (This is an error to pass Sequences with different length.) A new dictionnary is built for the new sequence.
- Parameters:
lseq (list) – A list of Sequence objects.
new_alphabet (bool, optional) – Whether to generate a new alphabet instead of integers, defaults to False.
sep (str, optional) – Separator to use if new_alphabet is True, defaults to ‘+’.
names (list, optional) – Optional names for the new alphabets.
- Raises:
LengthError – When the length of the Sequences are different.
- Returns:
A newly recoded Sequence object.
- Return type:
Example
>>> seq_a = Sequence([0, 0, 1, 1], 2) >>> seq_b = Sequence([0, 1, 0, 1], 2) >>> recoded = recode([seq_a, seq_b]) >>> recoded.ivals.tolist() [0, 1, 2, 3] >>> recoded.k 4 >>> named = recode([seq_a, seq_b], new_alphabet=True, names=['x', 'y']) >>> named.alphabet.svals ('x_0+y_0', 'x_0+y_1', 'x_1+y_0', 'x_1+y_1')
- scyseq.operations.words(seq, wlen, new_alphabet=False)¶
Returns a sequence encoded according to the m-words in seq
>>> seq = Sequence([0, 0, 1, 1, 0], 2) >>> word_seq = words(seq, 2) >>> word_seq.ivals.tolist() [0, 1, 3, 2] >>> word_seq.k 4
Discretisation and partition¶
- scyseq.discretize.symbolize(arr, bins, d=None)¶
Convert an array of continuous values into a symbolic sequence using bins.
- Parameters:
arr (numpy.ndarray) – Array of continuous values.
bins (array_like) – Array of bin edges.
d (dict, optional) – Optional dictionary (param kept for compatibility).
- Returns:
A generated symbolic sequence based on the bins.
- Return type:
- scyseq.discretize.partition(arr, method='histogram', nbin=10, d=None)¶
Discretize a continuous series according to method.
Methods are described in Hlavackova-Schindler et al. Physics Reports 441 (2007) 1–46 pages 14–19
- method = ‘histogram’
simple histogram method with equidistant binning
- method = ‘marginal_equiquantization’
marginal equiquantization ie does its best to let equal number of observation in each bin.
- Parameters:
arr (numpy.ndarray) – A continuous series of values.
method (str) – A string in [“histogram”, “marginal_equiquantization”].
nbin (int) – The number of bins ie the length of the alphabet.
d (dict, optional) – A dictionary.
- Raises:
NotImplementedError – If method is not in the list above.
- Returns:
A symbolic Sequence.
- Return type:
Tests and examples of the functionnement of the module
>>> x = np.linspace(0,10,11) >>> x array([ 0., 1., 2., 3., 4., 5., 6., 7., 8., 9., 10.]) >>> seq = partition(x, method='histogram', nbin=6) >>> seq Sequence: [0 0 1 1 2 3 3 4 4 5 5] Alphabet(Symbol(0 | 0), Symbol(1 | 1), Symbol(2 | 2), Symbol(3 | 3), Symbol(4 | 4), Symbol(5 | 5)) N = 11 ; k = 6 >>> seq = partition(x, method='marginal_equiquantization',nbin=6) >>> seq Sequence: [0 0 1 1 2 2 3 3 4 4 5] Alphabet(Symbol(0 | 0), Symbol(1 | 1), Symbol(2 | 2), Symbol(3 | 3), Symbol(4 | 4), Symbol(5 | 5)) N = 11 ; k = 6
- scyseq.discretize.subdivision(data, iter_max)¶
Ulam method. Adaptive subdivision technique.
Based on: Set oriented numerical methods for dynamical systems Dellnitz M. and Junge O. Handbook of dynamical systems vol. 2 p. 221-264 Elsevier 2002.
and
Numerical approximation of random attractors Keller H. and Ochs G. in “Stochastic dynamics” Crauel H. and Gundlach M. Eds Springer 1999. p. 93-115
- Parameters:
data (numpy.ndarray) – The input matrix/array to subdivide.
iter_max (int) – Maximum number of box iterations.
- Returns:
A tuple containing (boxes, refs).
- Return type:
tuple
- scyseq.discretize.phase_cluster(data, nb_symb, target_dim=2)¶
This function provides the symbolic dynamic of a multivariate data It is based on the clusterisation of the “phase space” of the channels of MEG temporal signal
- Parameters:
data (numpy.ndarray) – The input matrix, the lines are the channels and the columns are the time, must be an array.
nb_symb (int) – The number of bins used for the clusterisation i.e. the number of symbols of the symbolic sequences that will be created.
target_dim (int, optional) – The number of eigen vectors that we want to conserve to project our data on it.
- Returns:
Return computed value (the clusterization result arrays).
- Return type:
numpy.ndarray
Algorithmic complexity¶
Utilities for algorithmic complexity on symbolic sequences.
The public helpers in this module compute Lempel-Ziv complexities on
integer-encoded symbolic sequences. lz76 and lz77 accept non-empty
one-dimensional arrays of integers and can optionally return the parsing
history used to count phrases. lempel_ziv works on
scyseq.sequence.Sequence objects and returns either the raw or the
normalized complexity score.
- scyseq.algorithmic.contains_sublist(lst, sublst)¶
Return whether
sublstappears contiguously insidelst.- Parameters:
lst – Sequence to inspect.
sublst – Candidate contiguous subsequence.
- Returns:
Trueifsublstappears inlst,Falseotherwise.
The empty sublist is considered to be contained in every list.
>>> contains_sublist([1, 2, 3, 4], [2, 3]) True >>> contains_sublist([1, 2, 3], [2, 4]) False >>> contains_sublist([1, 2], []) True
- scyseq.algorithmic.lz76(arr, summary=False)¶
Return the Lempel-Ziv complexity obtained with the LZ76 parsing.
- Parameters:
arr – Non-empty array-like object of integers.
summary – If
True, also return the parsing history.
- Returns:
Either an integer or a tuple
(complexity, history)whensummary=True.- Raises:
IndexErrorifarris empty.
The returned history is the ordered list of phrases discovered during the parsing.
>>> arr = np.array([0, 0, 1, 0, 1, 1], dtype=np.uint8) >>> lz76(arr) 3 >>> lz76(arr, summary=True) (3, [[0], [0, 1], [0, 1, 1]])
- scyseq.algorithmic.lz77(arr, summary=False)¶
Return the Lempel-Ziv complexity obtained with the LZ77 parsing.
- Parameters:
arr – Non-empty array-like object of integers.
summary – If
True, also return the parsing history.
- Returns:
Either an integer or a tuple
(complexity, history)whensummary=True.- Raises:
IndexErrorifarris empty.
The returned history is the ordered list of phrases discovered during the parsing.
>>> arr = np.array([0, 0, 1, 0, 1, 1], dtype=np.uint8) >>> lz77(arr) 3 >>> lz77(arr, summary=True) (3, [[0], [0, 1], [0, 1, 1]])
- scyseq.algorithmic.lempel_ziv(seq, parsing='lz76', norm=False, nbsur=None)¶
Return the Lempel-Ziv complexity of a symbolic sequence.
- Parameters:
seq – A symbolic
scyseq.sequence.Sequence.parsing – Parsing name in
["lz76", "lz77"].norm – If
True, normalize the raw score with surrogate sequences.nbsur – Number of surrogate sequences used for normalization.
- Returns:
A float complexity score.
- Raises:
NotImplementedErrorifparsingis not implemented.ValueErrorifnormisTrueandnbsuris not provided.
>>> seq = S.Sequence(np.array([0, 0, 1, 0, 1, 1], dtype=np.uint8), 2) >>> lempel_ziv(seq) 1.292481250360578 >>> np.random.seed(123) >>> lempel_ziv(seq, norm=True, nbsur=5) 0.0
Stochastic and Markov¶
Defines stochastic matrices
- scyseq.stochastic.conditional_matrix(dependent, conditioning, smooth=None)¶
Returns the conditional matrix ie P(y=j | x=i).
This is estimated using the maximum likelihood estimator.
P(x) = 0 is dealt with add-k smoothing:
\(P(y \mid x) = (P(x,y) + k) / (P(x) + k |A_y|)\)
with \(|A_y|\) the alphabet length of the dependent sequence
- Parameters:
dependent – a symbolic Sequence object
conditioning – a symbolic Sequence object
smooth – smoothing with add-k (see below)
- Returns:
A numpy.array of floats
If smooth is None: no smoothing is applied. Can lead to non-stochastic matrices with NaN due to P(x) = 0
If smooth == 0 and P(x) == 0 raises an exception (cannot compute the stochastic matrix)
NB: lines should sum to one (one should go somewhere) see markov_sequence in generate.py ie np.sum(matrix, axis=1) == [[1]…[1]]
Example :
>>> np.random.seed(9) >>> a = np.random.choice([0,1],1000,replace=True, p=[0.7,0.3]) >>> np.random.seed(6) >>> b = np.random.choice([0,1],1000,replace=True, p=[0.7,0.3]) >>> A = S.Alphabet(['a','b']) >>> seq1 = S.Sequence(a,A) >>> seq2 = S.Sequence(b,A) >>> conditional_matrix(seq1, seq2) array([[0.71947674, 0.28052326], [0.69551282, 0.30448718]])
- scyseq.stochastic.transition_matrix(seq, time=1, smooth=0)¶
Returns the transition matrix.
This is estimated using the maximum likelihood estimator.
- Parameters:
seq – a symbolic Sequence object
- Returns:
A numpy.matrix of floats
Example :
>>> np.random.seed(9) >>> a = np.random.choice([0,1],1000,replace=True, p=[0.7,0.3]) >>> A = S.Alphabet(['a','b']) >>> seq = S.Sequence(a, A) >>> transition_matrix(seq) array([[0.70224719, 0.29775281], [0.73519164, 0.26480836]])
- scyseq.stochastic.influence_matrix(seq1, seq2, time=1, smooth=0)¶
Returns the influence matrix ie P(x1(T+t)=j | x2(T)=i).
This is estimated using the maximum likelihood estimator.
- Parameters:
seq1 – a symbolic Sequence object
seq2 – a symbolic Sequence object
- Returns:
A numpy.matrix of floats
Example :
>>> np.random.seed(9) >>> a = np.random.choice([0,1],1000,replace=True, p=[0.7,0.3]) >>> np.random.seed(6) >>> b = np.random.choice([0,1],1000,replace=True, p=[0.7,0.3]) >>> A = S.Alphabet(['a','b']) >>> seq1 = S.Sequence(a,A) >>> seq2 = S.Sequence(b,A) >>> influence_matrix(seq1, seq2) array([[0.70887918, 0.29112082], [0.71794872, 0.28205128]])
Information theory¶
Information-theoretic measures for symbolic sequences.
This module gathers entropy, mutual-information, and transfer-entropy helpers
for scyseq.sequence.Sequence objects. All logarithms are natural
logarithms, so the returned values are expressed in nats.
- scyseq.information.metric_entropy(seq)¶
Return Shannon’s metric entropy of a symbolic sequence.
- Parameters:
seq – A symbolic
scyseq.sequence.Sequence.- Returns:
A float entropy value in nats.
>>> np.random.seed(9) >>> a = np.random.choice([0, 1], 1000, replace=True, p=[0.7, 0.3]) >>> alpha = sq.Alphabet(["a", "b"]) >>> seq = sq.Sequence(a, alpha) >>> metric_entropy(seq) 0.6003511877776578
- scyseq.information.H(seq)¶
Return Shannon’s metric entropy of a symbolic sequence.
- Parameters:
seq – A symbolic
scyseq.sequence.Sequence.- Returns:
A float entropy value in nats.
>>> np.random.seed(9) >>> a = np.random.choice([0, 1], 1000, replace=True, p=[0.7, 0.3]) >>> alpha = sq.Alphabet(["a", "b"]) >>> seq = sq.Sequence(a, alpha) >>> metric_entropy(seq) 0.6003511877776578
- scyseq.information.shannon_entropy(seq)¶
Return Shannon’s metric entropy of a symbolic sequence.
- Parameters:
seq – A symbolic
scyseq.sequence.Sequence.- Returns:
A float entropy value in nats.
>>> np.random.seed(9) >>> a = np.random.choice([0, 1], 1000, replace=True, p=[0.7, 0.3]) >>> alpha = sq.Alphabet(["a", "b"]) >>> seq = sq.Sequence(a, alpha) >>> metric_entropy(seq) 0.6003511877776578
- scyseq.information.topological_entropy(seq)¶
Return the topological entropy of a symbolic sequence.
- Parameters:
seq – A symbolic
scyseq.sequence.Sequence.- Returns:
A float entropy value in nats.
>>> np.random.seed(9) >>> a = np.random.choice([0, 1], 1000, replace=True, p=[0.7, 0.3]) >>> alpha = sq.Alphabet(["a", "b"]) >>> seq = sq.Sequence(a, alpha) >>> topological_entropy(seq) 0.6931471805599453
- scyseq.information.T(seq)¶
Return the topological entropy of a symbolic sequence.
- Parameters:
seq – A symbolic
scyseq.sequence.Sequence.- Returns:
A float entropy value in nats.
>>> np.random.seed(9) >>> a = np.random.choice([0, 1], 1000, replace=True, p=[0.7, 0.3]) >>> alpha = sq.Alphabet(["a", "b"]) >>> seq = sq.Sequence(a, alpha) >>> topological_entropy(seq) 0.6931471805599453
- scyseq.information.renyi_entropy(seq, coef)¶
Return the Renyi entropy of order
coef.- Parameters:
seq – A symbolic
scyseq.sequence.Sequence.coef – Renyi order. The formula used here assumes
coef != 1.
- Returns:
A float entropy value in nats.
>>> np.random.seed(9) >>> a = np.random.choice([0, 1], 1000, replace=True, p=[0.7, 0.3]) >>> alpha = sq.Alphabet(["a", "b"]) >>> seq = sq.Sequence(a, alpha) >>> renyi_entropy(seq, 0.9) 0.6088567303148161
- scyseq.information.R(seq, coef)¶
Return the Renyi entropy of order
coef.- Parameters:
seq – A symbolic
scyseq.sequence.Sequence.coef – Renyi order. The formula used here assumes
coef != 1.
- Returns:
A float entropy value in nats.
>>> np.random.seed(9) >>> a = np.random.choice([0, 1], 1000, replace=True, p=[0.7, 0.3]) >>> alpha = sq.Alphabet(["a", "b"]) >>> seq = sq.Sequence(a, alpha) >>> renyi_entropy(seq, 0.9) 0.6088567303148161
- scyseq.information.block_entropy(seq, wlen)¶
Return the entropy of the overlapping words of length
wlen.- Parameters:
seq – A symbolic
scyseq.sequence.Sequence.wlen – Positive word length not exceeding
len(seq).
- Returns:
A float entropy value in nats.
- Raises:
ValueErrorifwlenis invalid.
>>> np.random.seed(9) >>> a = np.random.choice([0, 1], 1000, replace=True, p=[0.7, 0.3]) >>> alpha = sq.Alphabet(["a", "b"]) >>> seq = sq.Sequence(a, alpha) >>> block_entropy(seq, 6) 3.577559335188841
- scyseq.information.entropy_rate(seq, wlen, method='average')¶
Return an entropy-rate estimate based on block entropies.
- Parameters:
seq – A symbolic
scyseq.sequence.Sequence.wlen – Positive word length not exceeding
len(seq).method – One of
["average", "difference"].
- Returns:
The entropy-rate estimate as a float.
- Raises:
ValueErrorifwlenis invalid.NotImplementedErrorifmethodis unsupported.
>>> np.random.seed(9) >>> a = np.random.choice([0, 1], 1000, replace=True, p=[0.7, 0.3]) >>> alpha = sq.Alphabet(["a", "b"]) >>> seq = sq.Sequence(a, alpha) >>> entropy_rate(seq, 6) 0.5962598891981402 >>> entropy_rate(seq, 6, method="difference") 0.5689107029836205
- scyseq.information.effective_complexity(seq, n_max)¶
Return Grassberger’s effective complexity estimate.
- Parameters:
seq – A symbolic
scyseq.sequence.Sequence.n_max – Maximum block length used in the estimate.
- Returns:
A float effective-complexity value.
>>> np.random.seed(9) >>> a = np.random.choice([0, 1], 1000, replace=True, p=[0.7, 0.3]) >>> alpha = sq.Alphabet(["a", "b"]) >>> seq = sq.Sequence(a, alpha) >>> effective_complexity(seq, 6) 0.05784767682768299
- scyseq.information.mutual_information(seq1, seq2)¶
Return the mutual information between two symbolic sequences.
- Parameters:
seq1 – First symbolic
scyseq.sequence.Sequence.seq2 – Second symbolic
scyseq.sequence.Sequence.
- Returns:
The mutual information as a float.
- Raises:
LengthErrorif the sequences do not have the same length.
>>> np.random.seed(9) >>> a = np.random.choice([0, 1], 1000, replace=True, p=[0.7, 0.3]) >>> np.random.seed(6) >>> b = np.random.choice([0, 1], 1000, replace=True, p=[0.7, 0.3]) >>> alpha = sq.Alphabet(["a", "b"]) >>> seq1 = sq.Sequence(a, alpha) >>> seq2 = sq.Sequence(b, alpha) >>> mutual_information(seq1, seq2) 0.0002988020334349084
- scyseq.information.multi_information(seq1, seq2, seq3)¶
Return the three-variable mutual information for symbolic sequences.
- Parameters:
seq1 – First symbolic
scyseq.sequence.Sequence.seq2 – Second symbolic
scyseq.sequence.Sequence.seq3 – Third symbolic
scyseq.sequence.Sequence.
- Returns:
The three-variable mutual information as a float.
- Raises:
LengthErrorif the sequences do not have the same length.
>>> np.random.seed(9) >>> a = np.random.choice([0, 1], 1000, replace=True, p=[0.7, 0.3]) >>> np.random.seed(6) >>> b = np.random.choice([0, 1], 1000, replace=True, p=[0.7, 0.3]) >>> np.random.seed(3) >>> c = np.random.choice([0, 1], 1000, replace=True, p=[0.7, 0.3]) >>> alpha = sq.Alphabet(["a", "b"]) >>> seq1 = sq.Sequence(a, alpha) >>> seq2 = sq.Sequence(b, alpha) >>> seq3 = sq.Sequence(c, alpha) >>> multi_information(seq1, seq2, seq3) -4.8757282800737656e-05
- scyseq.information.transfer_entropy(seq1, seq1p, seq2)¶
Return the symbolic transfer entropy from
seq2toseq1.seq1corresponds to \(x_t\),seq1pto \(x_{t+1}\), andseq2to \(y_t\).- Parameters:
seq1 – Target sequence at time \(t\).
seq1p – Shifted version of the target sequence at time \(t+1\).
seq2 – Driving sequence at time \(t\).
- Returns:
The transfer entropy as a float.
- Raises:
LengthErrorif the sequences do not have the same length.
>>> np.random.seed(9) >>> a = np.random.choice([0, 1], 1000, replace=True, p=[0.7, 0.3]) >>> np.random.seed(6) >>> b = np.random.choice([0, 1], 1000, replace=True, p=[0.7, 0.3]) >>> alpha = sq.Alphabet(["a", "b"]) >>> seq1 = sq.Sequence(a, alpha) >>> seq2 = sq.Sequence(b, alpha) >>> transfer_entropy(seq1[:-1], seq1[1:], seq2[:-1]) 0.00019242807727182232
Sequence generators¶
Generation of specified symbolic sequences
- scyseq.generator.generate(method, N, k, *args)¶
Generates a Sequence according to a method
- Parameters:
method (str) – a string in [“uniform”, “markov”, “binary_logistic”]
N (int) – the length of the sequence
k (int) – the length of the alphabet
args – supplementary parameters (transition matrix and order for “markov” and parameter for “binary_logisitic”)
- Raises:
NotImplementedErrorif method is not in the list above.- Returns:
A Sequence object.
- scyseq.generator.uniform_sequence(length, alen)¶
Returns an uniform random sequence.
- Parameters:
N – the length of the sequence
k – the length of the alphabet
- Returns:
a Sequence object
- scyseq.generator.binary_map1d_sequence(length, map1d, xinit, threshold=0.5, skip=100)¶
Returns a binary sequence with a specified one-dimensional map dynamics
map1d can be specified such as: map1d = lambda x: 3.4 * x * (1 - x) or any function the defines x(t+1) as a function of x(t)
- Parameters:
N – the length of the sequence
thresh – the threshold value to make a binary sequence
- Returns:
A binary Sequence
- scyseq.generator.binary_logistic_sequence(length, param, xinit, threshold=0.5, skip=100)¶
Returns a binary sequence with logistic dynamics according to the parameter \(\mu\).
The equation used here is: \(x(t+1) = \mu x (1-x)\)
- Parameters:
N – the length of the sequence
mu – the paramter for the logistic equation
thresh – the threshold value to make a binary sequence
- Returns:
A binary Sequence
- scyseq.generator.markov_sequence(length, alen, markov_matrix, order)¶
Returns as sequence of a Markov process of order o with transition matrix M.
- Parameters:
N – the length of the sequence
k – the length of the alphabet
M – the transition matrix
order – the order of the Markov process
- Raises:
ValueError: if the shape of M does not correspond to the order of the process ie \(k^o imes k\)- Returns:
A sequence object
NB:
lines of Markov matrix give the probability to transition to one of the k symbols of the alphabet (so sum(markov_matrix[line] == 1) (ie np.sum(matrix, axis=1) == [[1]…[1]]
Input-output¶
- scyseq.io.read_codix(fname, data_only=True)¶
Reads data file from the codix encoder of the codix software suite for behavioral studies.
returns in all cases a dictionary with data[‘site’][‘code’] = Sequence
Visualisation¶
- scyseq.viz.get_state_colors(alphabet, cmap_name='viridis')¶
Get consistent color scheme for alphabet states. Returns a colormap and normalization that ensures each state gets the same color across plots.
- Parameters:
alphabet – A symbolic Alphabet object
cmap_name – Name of matplotlib colormap (default: ‘viridis’)
- Returns:
Tuple of (cmap, norm) for consistent coloring
- scyseq.viz.plot(seq, xlabel='Time', ylabel='States', title='Simple plot', labelsize=15, titlesize=25, color='blue', **kwargs)¶
Simple (discrete / symbolic) time series plot
- scyseq.viz.plot_bar(seq, xlabel='Time', ylabel='States', title='Bar plot', labelsize=15, titlesize=25, cmap_name='viridis', legend=False, legend_title='States', **kwargs)¶
Plots bar code like graph with consistent state colors.
- scyseq.viz.plot_color(seq, aspect='auto', title='Sequence', xlabel='Time', labelsize=15, titlesize=25, cmap_name='viridis', figsize=(10, 2.4), legend=True, legend_title='States', **kwargs)¶
Plots a sequence as a color strip with consistent state colors.
- scyseq.viz.plot_grid(seq1, seq2, xlabel='1st sequence', ylabel='2nd Sequence', title='Grid plot', labelsize=15, titlesize=25, color='blue', alpha=0.3, scale=100, jitter=0.4, **kwargs)¶
Plots state-space grids plots inspired from
Hollenstein T. (2013) State space grids. Springer.
- scyseq.viz.plot_independence(seq1, seq2, xlabel='1st sequence', ylabel='2nd Sequence', title='Independence plot', labelsize=15, titlesize=25, color=('blue', 'red'), alpha=0.3, scale=100, **kwargs)¶
Plots state-space grids representing the elements of the mutual information between sequences.
Exceptions¶
Exception classes for the scyseq library.
- exception scyseq.exceptions.ScyseqError¶
Base exception for all errors raised by the scyseq library.
- __weakref__¶
list of weak references to the object
- exception scyseq.exceptions.SymbolError¶
Base exception for symbol-related issues.
- exception scyseq.exceptions.SymbolDefinitionError(value, msg)¶
Exception raised when a symbol cannot be defined.
- __init__(value, msg)¶
- exception scyseq.exceptions.SymbolAccessError(msg)¶
Exception raised when a symbol cannot be accessed.
- __init__(msg)¶
- exception scyseq.exceptions.AlphabetError¶
Base exception for alphabet-related issues.
- exception scyseq.exceptions.AlphabetAccessError(msg)¶
Exception raised when an alphabet cannot be accessed.
- __init__(msg)¶
- exception scyseq.exceptions.InvalidSymbolError(symbol, alphabet)¶
Raised when an invalid symbol is used in an alphabet or sequence.
- __init__(symbol, alphabet)¶
- exception scyseq.exceptions.EmptyAlphabetError¶
Raised when attempting to use an empty alphabet.
- __init__()¶
- exception scyseq.exceptions.SequenceError¶
Base exception for sequence-related issues.
- exception scyseq.exceptions.SequenceParseError(sequence, message='Unable to parse sequence.')¶
Raised when parsing a sequence fails due to invalid format.
- __init__(sequence, message='Unable to parse sequence.')¶
Recurrence quantification¶
This module contains the functions for symbolic recurrence plots quantification.
Some references are:
Faure and Lesne (2010) Recurrence plots for symbolic sequence. International Journal of Bifurcation and Chaos
Zou et al. (2015) Identifying coupling directions by recurrences. In Recurrence Quantification Analysis.