Version changes
2.0.0
This blog recapitulates the main changes of the andi_datasets
library. These have two goals: simplify and standarize how to access the different available diffusion models, and most importantly, include the diffusion models that will be considered during the second ANDI challenge.
Name change: andi-datasets
is now andi_datasets
Using a hyphen in the library’s name carried some problems. Mainly, due how pip
and python
deal with them. In our case, in order to install the library, you had to use pip install andi-datasets
(sadly, the package andi
already exists…), but the to import the package, you had to call import andi
. This was inconsistent, and carried some other problems, mostly related to the nbdev
package (see below). That is way we have changed the name of the package, such that now, it is found in pip as
pip install andi_datasets
and is imported in python
via import andi_datasets
.
Library structure
One of the main changes is that we have switched to a nbdev-like library, where all the code is developed in notebooks and the compiled into .py
files via the ndbev compiler. In terms of the library usage, nothing has really changed (aside of all the listed below), but we find that this way of developing code will help us maintain a better package and ease adding new features.
Then, we have restructured the whole package to accommodate the new diffusion models as well as standarize the use of the library. Here is a little scheme of how the library is organized:
In summary, the classes datasets_XXX
are used to generate, save and load the trajectories generated by the classes models_XXX
. These store the diffusion models, either theoretical or phenomenological. The former were used for the ANDI 1 challenge. You can find more details about them in the challenge’s paper or in this updated notebook tutorial. The latter are the basis of the ANDI 2022 challenge and are simulated by means of fractional brownian motion plus some extra tweaks. You can start playing around with this models and learn from them via this notebook tutorial.
On the other hand, we have created a completely new class that gathers the generators for the various Challenge datasets. This allows you to generate datasets similar to the ones that will be used in challenge. Note that the one for Challenge 2 is subject to changes!
We have also created a new class, called analysis
that allows you to access common analysis methods for diffusion trajectories, as for instance MSD based fittings for the diffusion coefficient and anomalous exponents, calculations of the velocity autocorrelation function,…
Last, we have organized the different auxiliary functions in three files:
utils_videos
allows to merge our library withdeeptrack
. This allows to generate experimentally realistic videos from the trajectories you create withandi-datasets
. A tutorial about it can be found in the Tutorials tab.utils_trajectories
gathers all functions related to the creation of trajectories.utils_challenge
gathers all functions needed to correctly prepare the trajectories and its labels for their use in the AnDi Challenge. It also contains the metrics and the evaluators for the second challenge.
Accessing diffusion models
To be fair, accessing directly the diffusion models in the last version was a bit messy. That’s what new versions are here for! Let’s focus first on the theoretical models. Now they can simply be from the models_theory
class. Their inputs stay the same as before (alpha
for anomalous exponent and T
for the length of the trajectory). We have added an extra variable to set the dimension of the diffusion easily. To give an example, a 3D trajectory of ATTM can now be easily created using:
from andi_dataset.models_theory import models_theory
trajectory = models_theory().attm(alpha = 0.8, T = 10, dimension = 3)
Again, more examples on this theoretical models are given in their corresponding tutorial notebook
For the new phenomenological models, it works the same! However, in this case we have a few more parameters than just the anomalous exponent. You can explore them here.
Other changes
- Variable name change in
models_theory().create_dataset
:N
is nowN_model
. In this way, it is more clear thatN_model
refers to number of trajectories per model and we useN
for total number of trajectories in other functions. - Corrected how noise is applied in Task3. Now the noise is added after the segmentation. This ensures that two subsequent segments don’t have different noises.
- Added an extra variable in
datasets_challenge().challenge_theory_dataset
,return_noise
which, ifTrue
, makes the function output the noise amplitudes added to each trajectory. This will help users to get all the information they need when creating their datasets. - Change how noise is applied in
models_theory.challenge_theory_dataset
, such that all components of trajectories with dimension 2 and 3 have the same noise amplitude.