Package Documentation

pydaddy.Characterize

class pydaddy.characterize.Characterize(data, t=1.0, Dt=1, dt=1, bins=None, inc=None, inc_x=None, inc_y=None, n_trials=1, show_summary=True, **kwargs)

Bases: object

Intialize a PyDaddy object for further analysis.

Parameters:
  • data (list) – Time series data to be analysed. data = [x] for scalar data and data = [x1, x2] for vector where x, x1 and x2 are of numpy.array object type

  • t (float, array, optional(default=1.0)) – t can be either a float representing the time-interval between observations, or a numpy array containing the time-stamps of the individual observations (Note: PyDaddy only supports uniformly spaced time-series, even when time-stamps are provided).

  • bins (int, optional(default=20)) – Number of bins for computing bin-wise averages of drift and diffusion (Binwise averages are used only for visualization.)

  • show_summary (bool, optional(default=True)) – If true, a summary text and summary figure will be shown.

  • Dt (int, optional(default=1)) – Subsampling factor for drift computation. When provided, the time-series will be sub-sampled by this factor while computing drift.

  • dt (int, optional(default=1)) – Subsampling factor for diffusion computation. When provided, the time-series will be sub-sampled by this factor while computing diffusion.

  • inc (float, optional(default=0.01)) – For scalar data, instead of specifying bins, the widths (increments) of the bins can also be provided.

  • inc_x (float, optional(default=0.1)) – For vector data, instead of specifying bins, the widths (increments) of the bins can also be provided. inc_x is the increment in the x-dimension.

  • inc_y (float, optional(default=0.1)) – For vector data, instead of specifying bins, the widths (increments) of the bins can also be provided. inc_y is the increment in the y-dimension.

  • n_trials (int, optional(default=1)) – Number of trials, concatenated timeseries of multiple trials is used.

Returns:

output – Daddy object which can be used for further analysis and visualization. See pyaddy.daddy.Daddy for details.

Return type:

pydaddy.daddy.Daddy

pydaddy.characterize.load_sample_dataset(name)

Load one of the sample datasets. For more details on the datasets, see Sample Datasets.

Available data sets:

'fish-data-etroplus'
'cell-data-cellhopping'
'model-data-scalar-pairwise'
'model-data-scalar-ternary'
'model-data-vector-pairwise'
'model-data-vector-ternary'
Parameters:

name (str) – name of the data set

Returns:

  • data (list) – timeseries data

  • t (float, array) – timescale

pydaddy.Daddy

class pydaddy.daddy.Daddy(ddsde, **kwargs)

Bases: Preprocessing, Visualize

An object of this type is returned by pydaddy.daddy.Characterize. This is the main workhorse class of PyDaddy, and contains functionality to compute drift and diffusion, visualize results, and perform diagnostic tests. See the individual method documentation for more details.

autocorrelation(lags=1000)

Show the autocorrelation plot of the data.

cross_diffusion(slider_timescales=None, limits=None, polar=False, **plot_text)

Show an interactive figure of the cross-diffusion function (only for vector data). The bin-wise averaged estimates of the drift will be shown. If a polynomial has already been fitted with fit() function, a line (scalar)/surface (vector) plot of the fitted function will also be shown.

Parameters:
  • limits (tuple, (default=None)) – If specified, sets the y-axis limits for the cross diffusion function. Useful to get a clearer view when there are outliers.

  • polar (bool, (default=False):) – If True, plot the cross diffusion function only within a unit circle. Useful to get a better visualization in situations where |M| has an upper bound. (Used only in vector case).

  • **plot_text (dict:) –

    To customize the captions, labels and layout of the plot, plot parameters can be passed as a dict. Available options are given below:

    For scalar analysis

    x_lable : x axis label

    y_label : y axis label

    For vector analysis

    title1 : first plot title

    x_label1 : first plot x label

    y_label1 : first plot y label

    z_label1 : first plot z label

    title2 : second plot title

    x_label2 : second plot x label

    y_label2 : seocnd plot y label

    z_label2 : second plot z label

diffusion(slider_timescales=None, limits=None, polar=False, **plot_text)

Show an interactive figure of the diffusion function. The bin-wise averaged estimates of the drift will be shown. If a polynomial has already been fitted with fit() function, a line (scalar)/surface (vector) plot of the fitted function will also be shown.

Parameters:
  • limits (tuple, (default=None)) – If specified, sets the y-axis limits for the diffusion function. Useful to get a clearer view when there are outliers.

  • polar (bool, (default=False):) – If True, plot the diffusion function only within a unit circle. Useful to get a better visualization in situations where |M| has an upper bound. (Used only in vector case).

  • **plot_text (dict:) –

    To customize the captions, labels and layout of the plot, plot parameters can be passed as a dict. Available options are given below:

    For scalar analysis

    x_lable : x axis label

    y_label : y axis label

    For vector analysis

    title1 : first plot title

    x_label1 : first plot x label

    y_label1 : first plot y label

    z_label1 : first plot z label

    title2 : second plot title

    x_label2 : second plot x label

    y_label2 : seocnd plot y label

    z_label2 : second plot z label

drift(limits=None, polar=False, slider_timescales=None, **plot_text)

Show an interactive figure of the drift function. The bin-wise averaged estimates of the drift will be shown. If a polynomial has already been fitted with fit() function, a line (scalar)/surface (vector) plot of the fitted function will also be shown.

Parameters:
  • limits (tuple, (default=None)) – If specified, sets the y-axis limits for the drift function. Useful to get a clearer view when there are outliers.

  • polar (bool, (default=False):) – If True, plot the drift function only within a unit circle. Useful to get a better visualization in situations where |M| has an upper bound. (Used only in vector case).

  • **plot_text (dict:) –

    To customize the captions, labels and layout of the plot, plot parameters can be passed as a dict. Available options are given below:

    For scalar analysis

    x_lable : x axis label

    y_label : y axis label

    For vector analysis

    title1 : first plot title

    x_label1 : first plot x label

    y_label1 : first plot y label

    z_label1 : first plot z label

    title2 : second plot title

    x_label2 : second plot x label

    y_label2 : seocnd plot y label

    z_label2 : second plot z label

export_data(filename=None, raw=False)

Returns a pandas dataframe containing the drift and diffusion values. Optionally, the data is also saved as a CSV file.

Parameters:
  • filename (str, optional(default=None)) – If provided, the data will be saved as a CSV at the given path. Else, a dataframe will be returned.

  • raw (bool, optional(default=False)) – If True, the raw, the drift and diffusion will be returned as raw unbinned data. Otherwise (default), drift and diffusion as binwise-average Kramers-Moyal coefficients are returned.

Returns:

DataFrame

Return type:

Pandas dataframe containing the estimated drift and diffusion coefficients.

fit(function_name, order=None, threshold=0.05, alpha=0, tune=False, thresholds=None, library=None, plot=False)

Fit analytical expressions to drift/diffusion functions using sparse regression. By default, a polynomial with a specified maximum degree will be fitted. Alternatively, you can also provide a library of custom functions for fitting.

Parameters:
  • function_name (str,) – Name of the function to fit. Can be ‘F’ or ‘G’ for scalar; ‘F1’, ‘F2’, ‘G11’, ‘G22’, ‘G12’, ‘G21’ for vector

  • order (int,) – Order (maximum degree) of the polynomial to fit.

  • threshold (float, (default=0.05)) – Sparsification threshold

  • tune (bool, (default=False)) – If True, the sparsification threshold will be automatically set using cross-validation.

  • alpha (float, (default=0.0)) – Optional regularization term for ridge regression. Useful when data is too noisy, but has a side effect of shrinking the estimated coefficients when set to high values.

  • thresholds (list, (default=None)) – With tune=True, a list of thresholds over which to search for can optionally be provided. If not present, this will be chosen automatically as the range between the minimum and maximum coefficients in the initial fit.

  • library (list, (default=None)) – A custom library of non-polynomial functions can optionally be provided. If provided, the functions will be fitted as a sparse linear combination of the terms in the library.

Returns:

res

Return type:

fitters.Poly1D or fitters.Poly2D object, representing the fitted polynomial.

model_diagnostics(oversample=1)

Perform model self-consistency diagnostics. Generates a simulated time series with the same length and sampling interval as the original time series, and re-estimates the drift and diffusion from the simulated time series. The re-estimated drift and diffusion should match the original estimates.

Parameters:
  • oversample (int, (default=1)) – Factor by which to oversample while simulating the SDE. If provided, the SDE will be simulated at t_int / oversample and then subsampled to t_int. This is useful when t_int is large enough to cause large errors in the SDE simulation.

  • plotted (The following are) –

    • Histogram of the original time series overlaid with that of the simulated time series.

    • Drift and diffusion of the original time series overlaid with that of the simulated time series.

noise_diagnostics(loc=None)

Perform diagostics on the noise-residual, to ensure that all assumptions for SDE estimation are met. Generates a plot with:

  • Distribution (1D or 2D histogram) of the residuals, and their QQ-plots against a theoretically expected Gaussian. The residual distribution is expected to be a Gaussian.

  • Autocorrelation of the residuals. The autocorrelation time should be close to 0.

  • Plot of the 2nd versus 4th jump moments. This plot should be a straight line. (Only for scalar data.)

parameters()

Get all given and assumed parameters used for the analysis

Returns:

params – all parameters given and assumed used for analysis

Return type:

dict, json

simulate(t_int, timepoints, x0=None)

Generate simulated time-series with the fitted SDE model.

Generates a simulated timeseries, with specified sampling time and duration, based on the SDE model discovered by PyDaddy. The drift and diffusion functions should be fit using fit() function before using simulate().

Parameters:
  • t_int (float) – Sampling time for the simulated time-series

  • timepoints (int) – Number of time-points to simulate

  • x0 (float (scalar) or list of two floats (vector), (default=None)) – Initial condition. If no value is passed, 0 ([0, 0] for vector) is taken as the initial condition.

Returns:

x

Return type:

Simulated timeseries with timepoints timepoints.

summary(start=0, end=1000, kde=True, tick_size=12, title_size=15, label_size=15, label_pad=8, n_ticks=3, ret_fig=False, **plot_text)

Print summary of data and show summary plots chart. (This is the same summary plot produced by Characterize().)

Parameters:
  • start (int, (default=0)) – starting index, begin plotting timeseries from this point

  • end (int, default=1000) – end point, plots timeseries till this index

  • kde (bool, (default=False)) – if True, plot kde for histograms

  • title_size (int, (default=15)) – title font size

  • tick_size (int, (default=12)) – axis tick size

  • label_size (int, (default=15)) – label font size

  • label_pad (int, (default=8)) – axis label padding

  • n_ticks (int, (default=3)) – number of axis ticks

  • ret_fig (bool, (default=True)) – if True return figure object

  • **plot_text

    plots’ title and axis texts

    For scalar analysis summary plot:

    timeseries_title : title of timeseries plot

    timeseries_xlabel : x label of timeseries

    timeseries_ylabel : y label of timeseries

    drift_title : drift plot title

    drift_xlabel : drift plot x label

    drift_ylabel : drift plot ylabel

    diffusion_title : diffusion plot title

    diffusion_xlabel : diffusion plot x label

    diffusion_ylabel : diffusion plot y label

    For vector analysis summary plot:

    timeseries1_title : first timeseries plot title

    timeseries1_ylabel : first timeseries plot ylabel

    timeseries1_xlabel : first timeseries plot xlabel

    timeseries1_legend1 : first timeseries (Mx) legend label

    timeseries1_legend2 : first timeseries (My) legend label

    timeseries2_title : second timeseries plot title

    timeseries2_xlabel : second timeseries plot x label

    timeseries2_ylabel : second timeseries plot y label

    2dhist1_title : Mx 2d histogram title

    2dhist1_xlabel : Mx 2d histogram x label

    2dhist1_ylabel : Mx 2d histogram y label

    2dhist2_title : My 2d histogram title

    2dhist2_xlabel : My 2d histogram x label

    2dhist2_ylabel : My 2d histogram y label

    2dhist3_title : M 3d histogram title

    2dhist3_xlabel : M 2d histogram x label

    2dhist3_ylabel : M 2d histogram y label

    3dhist_title : 3d histogram title

    3dhist_xlabel : 3d histogram x label

    3dhist_ylabel : 3d histogram y label

    3dhist_zlabel : 3d histogram z label

    driftx_title : drift x plot title

    driftx_xlabel : drift x plot x label

    driftx_ylabel : drift x plot y label

    driftx_zlabel : drift x plot z label

    drifty_title : drift y plot title

    drifty_xlabel : drift y plot x label

    drifty_ylabel : drift y plot y label

    drifty_zlabel : drift y plot z label

    diffusionx_title : diffusion x plot title

    diffusionx_xlabel : diffusion x plot x label

    diffusionx_ylabel : diffusion x plot y label

    diffusionx_zlabel : diffusion x plot z label

    diffusiony_title : diffusion y plot title

    diffusiony_xlabel : diffusion y plot x label

    diffusiony_ylabel : diffusion y plot y label

    diffusiony_zlabel : diffusion y plot z label

Return type:

None, or figure

Raises:

ValueError – If start is greater than end