Package Documentation
pydaddy.Characterize
- class pydaddy.characterize.Characterize(data, t=1.0, Dt=1, dt=1, bins=None, inc=None, inc_x=None, inc_y=None, n_trials=1, show_summary=True, **kwargs)
Bases:
object
Intialize a PyDaddy object for further analysis.
- Parameters:
data (list) – Time series data to be analysed. data = [x] for scalar data and data = [x1, x2] for vector where x, x1 and x2 are of numpy.array object type
t (float, array, optional(default=1.0)) – t can be either a float representing the time-interval between observations, or a numpy array containing the time-stamps of the individual observations (Note: PyDaddy only supports uniformly spaced time-series, even when time-stamps are provided).
bins (int, optional(default=20)) – Number of bins for computing bin-wise averages of drift and diffusion (Binwise averages are used only for visualization.)
show_summary (bool, optional(default=True)) – If true, a summary text and summary figure will be shown.
Dt (int, optional(default=1)) – Subsampling factor for drift computation. When provided, the time-series will be sub-sampled by this factor while computing drift.
dt (int, optional(default=1)) – Subsampling factor for diffusion computation. When provided, the time-series will be sub-sampled by this factor while computing diffusion.
inc (float, optional(default=0.01)) – For scalar data, instead of specifying bins, the widths (increments) of the bins can also be provided.
inc_x (float, optional(default=0.1)) – For vector data, instead of specifying bins, the widths (increments) of the bins can also be provided. inc_x is the increment in the x-dimension.
inc_y (float, optional(default=0.1)) – For vector data, instead of specifying bins, the widths (increments) of the bins can also be provided. inc_y is the increment in the y-dimension.
n_trials (int, optional(default=1)) – Number of trials, concatenated timeseries of multiple trials is used.
- Returns:
output – Daddy object which can be used for further analysis and visualization. See
pyaddy.daddy.Daddy
for details.- Return type:
- pydaddy.characterize.load_sample_dataset(name)
Load one of the sample datasets. For more details on the datasets, see Sample Datasets.
Available data sets:
'fish-data-etroplus' 'cell-data-cellhopping' 'model-data-scalar-pairwise' 'model-data-scalar-ternary' 'model-data-vector-pairwise' 'model-data-vector-ternary'
- Parameters:
name (str) – name of the data set
- Returns:
data (list) – timeseries data
t (float, array) – timescale
pydaddy.Daddy
- class pydaddy.daddy.Daddy(ddsde, **kwargs)
Bases:
Preprocessing
,Visualize
An object of this type is returned by
pydaddy.daddy.Characterize
. This is the main workhorse class of PyDaddy, and contains functionality to compute drift and diffusion, visualize results, and perform diagnostic tests. See the individual method documentation for more details.- autocorrelation(lags=1000)
Show the autocorrelation plot of the data.
- cross_diffusion(slider_timescales=None, limits=None, polar=False, **plot_text)
Show an interactive figure of the cross-diffusion function (only for vector data). The bin-wise averaged estimates of the drift will be shown. If a polynomial has already been fitted with fit() function, a line (scalar)/surface (vector) plot of the fitted function will also be shown.
- Parameters:
limits (tuple, (default=None)) – If specified, sets the y-axis limits for the cross diffusion function. Useful to get a clearer view when there are outliers.
polar (bool, (default=False):) – If True, plot the cross diffusion function only within a unit circle. Useful to get a better visualization in situations where |M| has an upper bound. (Used only in vector case).
**plot_text (dict:) –
To customize the captions, labels and layout of the plot, plot parameters can be passed as a dict. Available options are given below:
- For scalar analysis
x_lable : x axis label
y_label : y axis label
- For vector analysis
title1 : first plot title
x_label1 : first plot x label
y_label1 : first plot y label
z_label1 : first plot z label
title2 : second plot title
x_label2 : second plot x label
y_label2 : seocnd plot y label
z_label2 : second plot z label
- diffusion(slider_timescales=None, limits=None, polar=False, **plot_text)
Show an interactive figure of the diffusion function. The bin-wise averaged estimates of the drift will be shown. If a polynomial has already been fitted with fit() function, a line (scalar)/surface (vector) plot of the fitted function will also be shown.
- Parameters:
limits (tuple, (default=None)) – If specified, sets the y-axis limits for the diffusion function. Useful to get a clearer view when there are outliers.
polar (bool, (default=False):) – If True, plot the diffusion function only within a unit circle. Useful to get a better visualization in situations where |M| has an upper bound. (Used only in vector case).
**plot_text (dict:) –
To customize the captions, labels and layout of the plot, plot parameters can be passed as a dict. Available options are given below:
- For scalar analysis
x_lable : x axis label
y_label : y axis label
- For vector analysis
title1 : first plot title
x_label1 : first plot x label
y_label1 : first plot y label
z_label1 : first plot z label
title2 : second plot title
x_label2 : second plot x label
y_label2 : seocnd plot y label
z_label2 : second plot z label
- drift(limits=None, polar=False, slider_timescales=None, **plot_text)
Show an interactive figure of the drift function. The bin-wise averaged estimates of the drift will be shown. If a polynomial has already been fitted with fit() function, a line (scalar)/surface (vector) plot of the fitted function will also be shown.
- Parameters:
limits (tuple, (default=None)) – If specified, sets the y-axis limits for the drift function. Useful to get a clearer view when there are outliers.
polar (bool, (default=False):) – If True, plot the drift function only within a unit circle. Useful to get a better visualization in situations where |M| has an upper bound. (Used only in vector case).
**plot_text (dict:) –
To customize the captions, labels and layout of the plot, plot parameters can be passed as a dict. Available options are given below:
- For scalar analysis
x_lable : x axis label
y_label : y axis label
- For vector analysis
title1 : first plot title
x_label1 : first plot x label
y_label1 : first plot y label
z_label1 : first plot z label
title2 : second plot title
x_label2 : second plot x label
y_label2 : seocnd plot y label
z_label2 : second plot z label
- export_data(filename=None, raw=False)
Returns a pandas dataframe containing the drift and diffusion values. Optionally, the data is also saved as a CSV file.
- Parameters:
filename (str, optional(default=None)) – If provided, the data will be saved as a CSV at the given path. Else, a dataframe will be returned.
raw (bool, optional(default=False)) – If True, the raw, the drift and diffusion will be returned as raw unbinned data. Otherwise (default), drift and diffusion as binwise-average Kramers-Moyal coefficients are returned.
- Returns:
DataFrame
- Return type:
Pandas dataframe containing the estimated drift and diffusion coefficients.
- fit(function_name, order=None, threshold=0.05, alpha=0, tune=False, thresholds=None, library=None, plot=False)
Fit analytical expressions to drift/diffusion functions using sparse regression. By default, a polynomial with a specified maximum degree will be fitted. Alternatively, you can also provide a library of custom functions for fitting.
- Parameters:
function_name (str,) – Name of the function to fit. Can be ‘F’ or ‘G’ for scalar; ‘F1’, ‘F2’, ‘G11’, ‘G22’, ‘G12’, ‘G21’ for vector
order (int,) – Order (maximum degree) of the polynomial to fit.
threshold (float, (default=0.05)) – Sparsification threshold
tune (bool, (default=False)) – If True, the sparsification threshold will be automatically set using cross-validation.
alpha (float, (default=0.0)) – Optional regularization term for ridge regression. Useful when data is too noisy, but has a side effect of shrinking the estimated coefficients when set to high values.
thresholds (list, (default=None)) – With
tune=True
, a list of thresholds over which to search for can optionally be provided. If not present, this will be chosen automatically as the range between the minimum and maximum coefficients in the initial fit.library (list, (default=None)) – A custom library of non-polynomial functions can optionally be provided. If provided, the functions will be fitted as a sparse linear combination of the terms in the library.
- Returns:
res
- Return type:
fitters.Poly1D or fitters.Poly2D object, representing the fitted polynomial.
- model_diagnostics(oversample=1)
Perform model self-consistency diagnostics. Generates a simulated time series with the same length and sampling interval as the original time series, and re-estimates the drift and diffusion from the simulated time series. The re-estimated drift and diffusion should match the original estimates.
- Parameters:
oversample (int, (default=1)) – Factor by which to oversample while simulating the SDE. If provided, the SDE will be simulated at t_int / oversample and then subsampled to t_int. This is useful when t_int is large enough to cause large errors in the SDE simulation.
plotted (The following are) –
Histogram of the original time series overlaid with that of the simulated time series.
Drift and diffusion of the original time series overlaid with that of the simulated time series.
- noise_diagnostics(loc=None)
Perform diagostics on the noise-residual, to ensure that all assumptions for SDE estimation are met. Generates a plot with:
Distribution (1D or 2D histogram) of the residuals, and their QQ-plots against a theoretically expected Gaussian. The residual distribution is expected to be a Gaussian.
Autocorrelation of the residuals. The autocorrelation time should be close to 0.
Plot of the 2nd versus 4th jump moments. This plot should be a straight line. (Only for scalar data.)
- parameters()
Get all given and assumed parameters used for the analysis
- Returns:
params – all parameters given and assumed used for analysis
- Return type:
dict, json
- simulate(t_int, timepoints, x0=None)
Generate simulated time-series with the fitted SDE model.
Generates a simulated timeseries, with specified sampling time and duration, based on the SDE model discovered by PyDaddy. The drift and diffusion functions should be fit using fit() function before using simulate().
- Parameters:
t_int (float) – Sampling time for the simulated time-series
timepoints (int) – Number of time-points to simulate
x0 (float (scalar) or list of two floats (vector), (default=None)) – Initial condition. If no value is passed, 0 ([0, 0] for vector) is taken as the initial condition.
- Returns:
x
- Return type:
Simulated timeseries with timepoints timepoints.
- summary(start=0, end=1000, kde=True, tick_size=12, title_size=15, label_size=15, label_pad=8, n_ticks=3, ret_fig=False, **plot_text)
Print summary of data and show summary plots chart. (This is the same summary plot produced by Characterize().)
- Parameters:
start (int, (default=0)) – starting index, begin plotting timeseries from this point
end (int, default=1000) – end point, plots timeseries till this index
kde (bool, (default=False)) – if True, plot kde for histograms
title_size (int, (default=15)) – title font size
tick_size (int, (default=12)) – axis tick size
label_size (int, (default=15)) – label font size
label_pad (int, (default=8)) – axis label padding
n_ticks (int, (default=3)) – number of axis ticks
ret_fig (bool, (default=True)) – if True return figure object
**plot_text –
plots’ title and axis texts
For scalar analysis summary plot:
timeseries_title : title of timeseries plot
timeseries_xlabel : x label of timeseries
timeseries_ylabel : y label of timeseries
drift_title : drift plot title
drift_xlabel : drift plot x label
drift_ylabel : drift plot ylabel
diffusion_title : diffusion plot title
diffusion_xlabel : diffusion plot x label
diffusion_ylabel : diffusion plot y label
For vector analysis summary plot:
timeseries1_title : first timeseries plot title
timeseries1_ylabel : first timeseries plot ylabel
timeseries1_xlabel : first timeseries plot xlabel
timeseries1_legend1 : first timeseries (Mx) legend label
timeseries1_legend2 : first timeseries (My) legend label
timeseries2_title : second timeseries plot title
timeseries2_xlabel : second timeseries plot x label
timeseries2_ylabel : second timeseries plot y label
2dhist1_title : Mx 2d histogram title
2dhist1_xlabel : Mx 2d histogram x label
2dhist1_ylabel : Mx 2d histogram y label
2dhist2_title : My 2d histogram title
2dhist2_xlabel : My 2d histogram x label
2dhist2_ylabel : My 2d histogram y label
2dhist3_title : M 3d histogram title
2dhist3_xlabel : M 2d histogram x label
2dhist3_ylabel : M 2d histogram y label
3dhist_title : 3d histogram title
3dhist_xlabel : 3d histogram x label
3dhist_ylabel : 3d histogram y label
3dhist_zlabel : 3d histogram z label
driftx_title : drift x plot title
driftx_xlabel : drift x plot x label
driftx_ylabel : drift x plot y label
driftx_zlabel : drift x plot z label
drifty_title : drift y plot title
drifty_xlabel : drift y plot x label
drifty_ylabel : drift y plot y label
drifty_zlabel : drift y plot z label
diffusionx_title : diffusion x plot title
diffusionx_xlabel : diffusion x plot x label
diffusionx_ylabel : diffusion x plot y label
diffusionx_zlabel : diffusion x plot z label
diffusiony_title : diffusion y plot title
diffusiony_xlabel : diffusion y plot x label
diffusiony_ylabel : diffusion y plot y label
diffusiony_zlabel : diffusion y plot z label
- Return type:
None, or figure
- Raises:
ValueError – If start is greater than end