Package Documentation

pydaddy.Characterize

class pydaddy.characterize.Characterize(data, t=1.0, Dt=1, dt=1, bins=None, inc=None, inc_x=None, inc_y=None, n_trials=1, show_summary=True, **kwargs)

Bases: object

Intialize a PyDaddy object for further analysis.

Parameters:

data (list) – Time series data to be analysed. data = [x] for scalar data and data = [x1, x2] for vector where x, x1 and x2 are of numpy.array object type
t (float, array, optional(default=1.0)) – t can be either a float representing the time-interval between observations, or a numpy array containing the time-stamps of the individual observations (Note: PyDaddy only supports uniformly spaced time-series, even when time-stamps are provided).
bins (int, optional(default=20)) – Number of bins for computing bin-wise averages of drift and diffusion (Binwise averages are used only for visualization.)
show_summary (bool, optional(default=True)) – If true, a summary text and summary figure will be shown.
Dt (int, optional(default=1)) – Subsampling factor for drift computation. When provided, the time-series will be sub-sampled by this factor while computing drift.
dt (int, optional(default=1)) – Subsampling factor for diffusion computation. When provided, the time-series will be sub-sampled by this factor while computing diffusion.
inc (float, optional(default=0.01)) – For scalar data, instead of specifying bins, the widths (increments) of the bins can also be provided.
inc_x (float, optional(default=0.1)) – For vector data, instead of specifying bins, the widths (increments) of the bins can also be provided. inc_x is the increment in the x-dimension.
inc_y (float, optional(default=0.1)) – For vector data, instead of specifying bins, the widths (increments) of the bins can also be provided. inc_y is the increment in the y-dimension.
n_trials (int, optional(default=1)) – Number of trials, concatenated timeseries of multiple trials is used.

Returns:

output – Daddy object which can be used for further analysis and visualization. See pyaddy.daddy.Daddy for details.

Return type:

pydaddy.daddy.Daddy

pydaddy.characterize.load_sample_dataset(name)

Load one of the sample datasets. For more details on the datasets, see Sample Datasets.

Available data sets:

'fish-data-etroplus'
'cell-data-cellhopping'
'model-data-scalar-pairwise'
'model-data-scalar-ternary'
'model-data-vector-pairwise'
'model-data-vector-ternary'

Parameters:

name (str) – name of the data set

Returns:

data (list) – timeseries data
t (float, array) – timescale

pydaddy.Daddy

class pydaddy.daddy.Daddy(ddsde, **kwargs)

Bases: Preprocessing, Visualize

An object of this type is returned by pydaddy.daddy.Characterize. This is the main workhorse class of PyDaddy, and contains functionality to compute drift and diffusion, visualize results, and perform diagnostic tests. See the individual method documentation for more details.

autocorrelation(lags=1000): Show the autocorrelation plot of the data.

cross_diffusion(slider_timescales=None, limits=None, polar=False, **plot_text)

Show an interactive figure of the cross-diffusion function (only for vector data). The bin-wise averaged estimates of the drift will be shown. If a polynomial has already been fitted with fit() function, a line (scalar)/surface (vector) plot of the fitted function will also be shown.

Parameters:

limits (tuple, (default=None)) – If specified, sets the y-axis limits for the cross diffusion function. Useful to get a clearer view when there are outliers.
polar (bool, (default=False):) – If True, plot the cross diffusion function only within a unit circle. Useful to get a better visualization in situations where |M| has an upper bound. (Used only in vector case).
**plot_text (dict:) –
To customize the captions, labels and layout of the plot, plot parameters can be passed as a dict. Available options are given below:

For scalar analysis
x_lable : x axis label

y_label : y axis label

For vector analysis
title1 : first plot title

x_label1 : first plot x label

y_label1 : first plot y label

z_label1 : first plot z label

title2 : second plot title

x_label2 : second plot x label

y_label2 : seocnd plot y label

z_label2 : second plot z label

diffusion(slider_timescales=None, limits=None, polar=False, **plot_text)

Show an interactive figure of the diffusion function. The bin-wise averaged estimates of the drift will be shown. If a polynomial has already been fitted with fit() function, a line (scalar)/surface (vector) plot of the fitted function will also be shown.

Parameters:

limits (tuple, (default=None)) – If specified, sets the y-axis limits for the diffusion function. Useful to get a clearer view when there are outliers.
polar (bool, (default=False):) – If True, plot the diffusion function only within a unit circle. Useful to get a better visualization in situations where |M| has an upper bound. (Used only in vector case).
**plot_text (dict:) –
To customize the captions, labels and layout of the plot, plot parameters can be passed as a dict. Available options are given below:

For scalar analysis
x_lable : x axis label

y_label : y axis label

For vector analysis
title1 : first plot title

x_label1 : first plot x label

y_label1 : first plot y label

z_label1 : first plot z label

title2 : second plot title

x_label2 : second plot x label

y_label2 : seocnd plot y label

z_label2 : second plot z label

drift(limits=None, polar=False, slider_timescales=None, **plot_text)

Show an interactive figure of the drift function. The bin-wise averaged estimates of the drift will be shown. If a polynomial has already been fitted with fit() function, a line (scalar)/surface (vector) plot of the fitted function will also be shown.

Parameters:

limits (tuple, (default=None)) – If specified, sets the y-axis limits for the drift function. Useful to get a clearer view when there are outliers.
polar (bool, (default=False):) – If True, plot the drift function only within a unit circle. Useful to get a better visualization in situations where |M| has an upper bound. (Used only in vector case).
**plot_text (dict:) –
To customize the captions, labels and layout of the plot, plot parameters can be passed as a dict. Available options are given below:

For scalar analysis
x_lable : x axis label

y_label : y axis label

For vector analysis
title1 : first plot title

x_label1 : first plot x label

y_label1 : first plot y label

z_label1 : first plot z label

title2 : second plot title

x_label2 : second plot x label

y_label2 : seocnd plot y label

z_label2 : second plot z label

export_data(filename=None, raw=False)

Returns a pandas dataframe containing the drift and diffusion values. Optionally, the data is also saved as a CSV file.

Parameters:

filename (str, optional(default=None)) – If provided, the data will be saved as a CSV at the given path. Else, a dataframe will be returned.
raw (bool, optional(default=False)) – If True, the raw, the drift and diffusion will be returned as raw unbinned data. Otherwise (default), drift and diffusion as binwise-average Kramers-Moyal coefficients are returned.

Returns:

DataFrame

Return type:

Pandas dataframe containing the estimated drift and diffusion coefficients.

fit(function_name, order=None, threshold=0.05, alpha=0, tune=False, thresholds=None, library=None, plot=False)

Fit analytical expressions to drift/diffusion functions using sparse regression. By default, a polynomial with a specified maximum degree will be fitted. Alternatively, you can also provide a library of custom functions for fitting.

Parameters:

function_name (str,) – Name of the function to fit. Can be ‘F’ or ‘G’ for scalar; ‘F1’, ‘F2’, ‘G11’, ‘G22’, ‘G12’, ‘G21’ for vector
order (int,) – Order (maximum degree) of the polynomial to fit.
threshold (float, (default=0.05)) – Sparsification threshold
tune (bool, (default=False)) – If True, the sparsification threshold will be automatically set using cross-validation.
alpha (float, (default=0.0)) – Optional regularization term for ridge regression. Useful when data is too noisy, but has a side effect of shrinking the estimated coefficients when set to high values.
thresholds (list, (default=None)) – With tune=True, a list of thresholds over which to search for can optionally be provided. If not present, this will be chosen automatically as the range between the minimum and maximum coefficients in the initial fit.
library (list, (default=None)) – A custom library of non-polynomial functions can optionally be provided. If provided, the functions will be fitted as a sparse linear combination of the terms in the library.

Returns:

res

Return type:

fitters.Poly1D or fitters.Poly2D object, representing the fitted polynomial.

model_diagnostics(oversample=1)

Perform model self-consistency diagnostics. Generates a simulated time series with the same length and sampling interval as the original time series, and re-estimates the drift and diffusion from the simulated time series. The re-estimated drift and diffusion should match the original estimates.

Parameters:

oversample (int, (default=1)) – Factor by which to oversample while simulating the SDE. If provided, the SDE will be simulated at t_int / oversample and then subsampled to t_int. This is useful when t_int is large enough to cause large errors in the SDE simulation.
plotted (The following are) –
- Histogram of the original time series overlaid with that of the simulated time series.
- Drift and diffusion of the original time series overlaid with that of the simulated time series.

noise_diagnostics(loc=None)

Perform diagostics on the noise-residual, to ensure that all assumptions for SDE estimation are met. Generates a plot with:

Distribution (1D or 2D histogram) of the residuals, and their QQ-plots against a theoretically expected Gaussian. The residual distribution is expected to be a Gaussian.

Autocorrelation of the residuals. The autocorrelation time should be close to 0.

Plot of the 2nd versus 4th jump moments. This plot should be a straight line. (Only for scalar data.)

parameters()

Get all given and assumed parameters used for the analysis

Returns:: params – all parameters given and assumed used for analysis
Return type:: dict, json

simulate(t_int, timepoints, x0=None)

Generate simulated time-series with the fitted SDE model.

Generates a simulated timeseries, with specified sampling time and duration, based on the SDE model discovered by PyDaddy. The drift and diffusion functions should be fit using fit() function before using simulate().

Parameters:

t_int (float) – Sampling time for the simulated time-series
timepoints (int) – Number of time-points to simulate
x0 (float (scalar) or list of two floats (vector), (default=None)) – Initial condition. If no value is passed, 0 ([0, 0] for vector) is taken as the initial condition.

Returns:

x

Return type:

Simulated timeseries with timepoints timepoints.

summary(start=0, end=1000, kde=True, tick_size=12, title_size=15, label_size=15, label_pad=8, n_ticks=3, ret_fig=False, **plot_text)

Print summary of data and show summary plots chart. (This is the same summary plot produced by Characterize().)

Parameters:

start (int, (default=0)) – starting index, begin plotting timeseries from this point
end (int, default=1000) – end point, plots timeseries till this index
kde (bool, (default=False)) – if True, plot kde for histograms
title_size (int, (default=15)) – title font size
tick_size (int, (default=12)) – axis tick size
label_size (int, (default=15)) – label font size
label_pad (int, (default=8)) – axis label padding
n_ticks (int, (default=3)) – number of axis ticks
ret_fig (bool, (default=True)) – if True return figure object
**plot_text –
plots’ title and axis texts

For scalar analysis summary plot:

timeseries_title : title of timeseries plot

timeseries_xlabel : x label of timeseries

timeseries_ylabel : y label of timeseries

drift_title : drift plot title

drift_xlabel : drift plot x label

drift_ylabel : drift plot ylabel

diffusion_title : diffusion plot title

diffusion_xlabel : diffusion plot x label

diffusion_ylabel : diffusion plot y label

For vector analysis summary plot:

timeseries1_title : first timeseries plot title

timeseries1_ylabel : first timeseries plot ylabel

timeseries1_xlabel : first timeseries plot xlabel

timeseries1_legend1 : first timeseries (Mx) legend label

timeseries1_legend2 : first timeseries (My) legend label

timeseries2_title : second timeseries plot title

timeseries2_xlabel : second timeseries plot x label

timeseries2_ylabel : second timeseries plot y label

2dhist1_title : Mx 2d histogram title

2dhist1_xlabel : Mx 2d histogram x label

2dhist1_ylabel : Mx 2d histogram y label

2dhist2_title : My 2d histogram title

2dhist2_xlabel : My 2d histogram x label

2dhist2_ylabel : My 2d histogram y label

2dhist3_title : M 3d histogram title

2dhist3_xlabel : M 2d histogram x label

2dhist3_ylabel : M 2d histogram y label

3dhist_title : 3d histogram title

3dhist_xlabel : 3d histogram x label

3dhist_ylabel : 3d histogram y label

3dhist_zlabel : 3d histogram z label

driftx_title : drift x plot title

driftx_xlabel : drift x plot x label

driftx_ylabel : drift x plot y label

driftx_zlabel : drift x plot z label

drifty_title : drift y plot title

drifty_xlabel : drift y plot x label

drifty_ylabel : drift y plot y label

drifty_zlabel : drift y plot z label

diffusionx_title : diffusion x plot title

diffusionx_xlabel : diffusion x plot x label

diffusionx_ylabel : diffusion x plot y label

diffusionx_zlabel : diffusion x plot z label

diffusiony_title : diffusion y plot title

diffusiony_xlabel : diffusion y plot x label

diffusiony_ylabel : diffusion y plot y label

diffusiony_zlabel : diffusion y plot z label

Return type:

None, or figure

Raises:

ValueError – If start is greater than end