Usage Tips

  • While working with 1-D data, if your data is a 1-D array, remember to wrap the array in a list while passing to PyDaddy, like so: pydaddy.Characterize([x], .... Otherwise, PyDaddy will throw an error.

  • PyDaddy expects uniformly sampled time-series. If your dataset is sampled with irregular time intervals, resample the time-series to a uniform sampling interval before using PyDaddy (a library like traces will be useful for this).

  • There is a simulate() function (pydaddy.daddy.Daddy.simulate()) provided, that can generate simulated time series using the SDE estimated by PyDaddy. If you need to do advanced testing or diagnostics using simulated data, use this function.

  • When necessary, you can ‘hack’ the simulate() (pydaddy.daddy.Daddy.simulate()) function to use custom drift and diffusion functions, not necessarily the results of the fits. To do this, assign appropriate functions to ddsde.F and ddsde.G (ddsde.F1, ddsde.F2, ddsde.G11, ddsde.G12, ddsde.G22 for vector).

  • The fit() (pydaddy.daddy.Daddy.fit()) function has an alpha parameter, which is a ridge regularization parameter. This is useful when the data is noisy or has outliers. If you think fit() is tending to overfit the data, non-zero for alpha and see if the fits improve (very high values, as high as 10e6 or 10e7 may be often required to see noticable effects). Be aware of the fact that large values of alpha has a side-effect of shrinking the estimated parameters.

  • If noise_diagnostics() (pydaddy.daddy.Daddy.noise_diagnostics()) suggests that the noise autocorrelation is too high, a straightforward way around this problem is to subsample the data until the noise-correlation goes away. PyDaddy provides an easy way to do this: initialize Characterize() (pydaddy.characterize.Characterize) with parameters Dt=T, dt=T where T is the autocorrelation time (in number of time-steps) rounded to the nearest integer. However, note that using larger values of T can distort the estimated drift and diffusion functions. Specifically, when the sampling time is too high, the estimated drift will be linear and the estimated diffusion will be quadratic (regardless the shape of the actual drift and diffusion functions).

  • If the estimated drift is linear and the estimated diffusion is quadratic, the analysis results may not be reliable, and additional checks may be required: as mentioned above, these results can appear spuriously when the sampling interval is high 1.

  • Sometimes, the model_diagnostics() (pydaddy.daddy.Daddy.model_diagnostics()) can incorrectly suggest that the estimated model is inconsistent. This often happens when the sampling time of the data is too large. This can due to errors in the SDE simulation rather than a model inconsistency. To tackle this, model_diagnostics() has an oversample parameter that can be used to specify an oversampling factor and simulate with an integration timestep smaller than the sampling interval. See the function documentation (pydaddy.daddy.Daddy.model_diagnostics()) for more details.

1

Riera, R., & Anteneodo, C. (2010). Validation of drift and diffusion coefficients from experimental data. Journal of Statistical Mechanics: Theory and Experiment, 2010(04), P04020.