datasets_challenge

ANDI 1 challenge (theory)


source

challenge_theory_dataset

 challenge_theory_dataset (N:numpy.ndarray|int=1000, max_T:int=1000,
                           min_T:int=10, tasks:list|int=[1, 2, 3],
                           dimensions:list|int=[1, 2, 3],
                           load_dataset:{'False','True'}=False,
                           save_dataset:{'False','True'}=False,
                           path_datasets:str='',
                           load_labels:{'False','True'}=True,
                           load_trajectories:{'False','True'}=False,
                           save_trajectories:{'False','True'}=False,
                           path_trajectories:str='datasets/',
                           N_save:int=1000, t_save:int=1000,
                           return_noise:{'False','True'}=False)

Creates a dataset similar to the one given by in the ANDI 1 challenge. Check the webpage of the challenge for more details. The default values are similar to the ones used to generate the available dataset.

The function returns 6 variables, three variables for the trajectories and three for the corresponding labels. Each variable is a list of three lists. Each of the three lists corresponds to a given dimension, in ascending order. If one of the tasks/dimensions was not calculated, the given list will be empty.

See the tutorials in our Github repository to learn about this function.

Type Default Details
N numpy.ndarray | int 1000 Number of trajectories per class (i.e.size # models x # classes). If int, all classes have same number.
max_T int 1000 Maximum length of the trajectories in the dataset.
min_T int 10 Minimum length of the trajectories in the dataset.
tasks list | int [1, 2, 3] Task(s) of the ANDI challenge I for which datasets will be generated.
dimensions list | int [1, 2, 3] Dimension(s) for which trajectories will be generated. Three possible values: 1, 2 and 3.
load_dataset {‘False’, ‘True’} False If True, the module loads existing datasets from the files task{}.txt and ref{}.txt.
save_dataset {‘False’, ‘True’} False If True, the module saves the datasets in a .txt following the competition format.
path_datasets str Path from where to load the dataset.
load_labels {‘False’, ‘True’} True If False, only loads trajectories and avoids the files refX.txt.
load_trajectories {‘False’, ‘True’} False If True, the module loads the trajectories of an .h5 file.
save_trajectories {‘False’, ‘True’} False If True, the module saves a .h5 file for each model considered, with N_save trajectories and T = T_save.
path_trajectories str datasets/ Path from where to load trajectories.
N_save int 1000 Number of trajectories to save for each exponents/model. Advise: save at the beggining
a big dataset (i.e. with default t_save N_save) which allows you to load any
other combiantionof T and N.
t_save int 1000 Length of the trajectories to be saved. See comments on N_save.
return_noise {‘False’, ‘True’} False If True, returns the amplitudes of the noises added to the trajectories.
Returns multiple Xn (lists): trajectories
Yn (lists): labels
loc_noise_tn (lists): localization noise amplitudes
diff_noise_tn (lists): variance of the diffusion noise

ANDI 2 challenge (phenom)


source

_defaults_andi2

 _defaults_andi2 ()

This class defines the default values set for the ANDI 2 challenge.


source

_get_dic_andi2

 _get_dic_andi2 (model)

Given the number label of diffusion model, returns a default dictionary of the model’s parameters to be fed to create_dataset The numeration is as follow: 1: single state 2: N-state 3: immobilization 4: dimerization 5: confinement

Type Details
model int in [1,6] Number of the diffusion model
Returns dictionary Dictionary containing the default parameters for ANDI 2 of the indicated model.

source

challenge_phenom_dataset

 challenge_phenom_dataset (experiments=5, dics=None, repeat_exp=True,
                           num_fovs=1, return_timestep_labs=False,
                           save_data=False, path='data/', prefix='',
                           get_video=False, num_vip=None,
                           get_video_masks=False, files_reorg=False,
                           path_reorg='ref/', save_labels_reorg=False,
                           delete_raw=False)

Creates a datasets with same structure as ones given in the ANDI 2 challenge. Default values for the various diffusion models have been set such as to be in the same ranges as the ones expected for the challenge. For details, check the ANDI 2 challenge webpage (soon).

This function will generate as many experiments (associated to one the diffusion models) as demanded. There are two ways of defining that: - Give number of experiments (and optional parameters such as repeat_exp) to create. The diffusion parameters are then taken from the default values are taken from datasets_phenom._defaults_andi2. - Feed a list of dictionaries (dics) from which data will be generated For each experiment, as many field of view as wanted can be generated

Type Default Details
experiments int 5 - if int: Number of experiments to generate. Each experiment is
generated from one of the available diffusion models.
- if list: diffusion models to generate (starting with 1!!!!!)
dics NoneType None If given, uses this to set the parameters of the experiments.
Must be of length equal to experiments.
This overrides any info about chosen models, as the model is set by the dictionary.
repeat_exp bool True -> Does not enter into play if experiments is list
If True: picks at random the diffusion model from the pool.
If False: picks the diffusion in an ordered way from the pool.
num_fovs int 1 Number of field of views to get trajectories from in each experiment.
return_timestep_labs bool False If True, the output trajectories dataframes containing also the labels alpha, D and state at each time step.
save_data bool False If True, saves all pertinent data.
path str data/ Path where to store the data.
prefix str Extra prefix that can be added in front of the files’ names.
get_video bool False If true, get as output the videos generated with Deeptrack for the generated datasets (see utils_videos for details).
num_vip NoneType None Number of VIP highlighted in the videos.
get_video_masks bool False If True, get masks of videos
files_reorg bool False If True, this function also creates a folder with name path_reorg inside path with the same data but organized à la ANDI2 challenge
path_reorg str ref/ Folder where the reorganized dataset will be created
save_labels_reorg bool False If to save also the labels in the reorganized dataset. This is needed if you want to create a reference dataset for the Scoring program.
No need if you are just creating data to predict.
delete_raw bool False If True, deletes the raw dataset so that only the reorganized one is maintained.
Returns tuple - trajs_out:
List of lenght (experiments x num_fovs). Each elements are is dataframe
containing the trajectories of a particular experiment/fov, in order of
generation (i.e. [exp1_fov1, exp1_fov2, …, exp2_fov1 ….]).
If return_timestep_labs = True, the dataframes also contain the labels
at each time step.
- labels_traj_out:
list of same length of trajs_out containing the labels of the
corresponding trajectories. Each element contains a list with the
labels of each trajectory, following the scheme:
[idx_traj, D_1, alpha_1, state_1, CP_1, D_2, alpha_2, …. state_N]
- labels_ens_out:
list of same length of trajs_out containing the ensemble labels of
given experiment. See description of output matrix in
utils_challenge._extract_ensemble()

This function generates trajectory datasets like the ones considered in the ANDI 2 challenge. It is based in models_phenom.create_dataset but also applies:

  • Apply Field of View (FOV).
  • Add localization noise.
  • Smooth the labeling of trajectories to a minimum segment length of 5.
  • Extracts ensemble properties.
  • Generate videos, if asked.

Outputs:

  • trajs_out: List of lenght (experiments x num_fovs). Each elements are is dataframe containing the trajectories of a particular experiment/fov, in order of generation (i.e. [exp1_fov1, exp1_fov2, …, exp2_fov1 ….]). If return_timestep_labs = True, the dataframes also contain the labels at each time step.

  • trajs_out: if get_video = True, returns the video for each experiments / FOV.

  • labels_traj_out : List of same length of trajs_out containing the labels of the corresponding trajectories. Each element contains a list with the labels of each trajectory, following the scheme: [idx_traj, D_1, alpha_1, state_1, CP_1, D_2, alpha_2, …. state_N]

  • labels_ens_out : List of same length of trajs_out containing the ensemble labels of given experiment. See description of output matrix in utils_challenge._extract_ensemble().

Examples

We generate a dataset of trajectories from 5 different experiments. As we are not stating the opposite, each experiment will correspond to one of the 5 diffusion models considered in ANDI 2 (phenom).

num_experiments, num_fovs = 5, 1

dics = []
for i in range(num_experiments):    
    dic = _get_dic_andi2(i+1)    
    if i == 0: 
        dic.update({'dim':2})
    dics.append(dic)    
    
df_list, _, _ = challenge_phenom_dataset(experiments = num_experiments, 
                                            num_fovs = num_fovs, 
                                            dics = dics,
                                            return_timestep_labs = True, 
                                            # get_video = True, num_vip = 10,
                                            # save_data = False, path = Path('dataset/'),                                 
                                            # files_reorg = False, path_reorg = Path('reorg/'), save_labels_reorg = False, delete_raw = True
                                             )
Creating dataset for Exp_0 (single_state).
Creating dataset for Exp_1 (multi_state).
Creating dataset for Exp_2 (immobile_traps).
Creating dataset for Exp_3 (dimerization).
Creating dataset for Exp_4 (confinement).

Distributions parameters

We first check how distributed are the diffusion parameters of the generated trajectories.

fig, axs = plt.subplots(2, len(df_list), figsize = (len(df_list)*2, 2*2), tight_layout = True)

for df, ax, dic in zip(df_list, axs.transpose(), dics):
    alphas = df['alpha']
    Ds = df['D']
    states = df['state']
    for u in np.unique(states):
        ax[0].hist(alphas[states == u], density = 1)
        ax[1].hist(Ds[states == u], density = 1)
    
    ax[0].set_title(dic['model'])
plt.setp(axs[:,0], ylabel = 'Frequency')
plt.setp(axs[0,:], xlabel = r'$\alpha$')
plt.setp(axs[1,:], xlabel = r'$D$')
;
''

FOVs

We can also check that generating multiple FOVS from every experiments actually choses random FOVs in the desired space.

num_fovs = 3; num_experiments = 4
df_fov, _ , lab_e = challenge_phenom_dataset(experiments = [1,2,3,4,5],
                                               num_fovs =num_fovs, 
                                               return_timestep_labs = True
                                               )
Creating dataset for Exp_0 (single_state).
Creating dataset for Exp_1 (multi_state).
Creating dataset for Exp_2 (immobile_traps).
Creating dataset for Exp_3 (dimerization).
Creating dataset for Exp_4 (confinement).