= 2, 1
num_experiments, num_fovs
= []
dics for i in range(num_experiments):
= _get_dic_andi2(i+1)
dic if i == 0: dic.update({'dim':2})
dics.append(dic)
= challenge_phenom_dataset(experiments = num_experiments,
df_list, _, _, _ = num_fovs,
num_fovs = dics,
dics = False,
return_timestep_labs = True, num_vip = 3,
get_video = True, path = 'dataset/',
save_data = True, path_reorg = 'reorg/', save_labels_reorg = False, delete_raw = True) files_reorg
datasets_challenge
ANDI 1 challenge (theory)
challenge_theory_dataset
challenge_theory_dataset (N:numpy.ndarray|int=1000, max_T:int=1000, min_T:int=10, tasks:list|int=[1, 2, 3], dimensions:list|int=[1, 2, 3], load_dataset:{'True','False'}=False, save_dataset:{'True','False'}=False, path_datasets:str='', load_labels:{'True','False'}=True, load_trajectories:{'True','False'}=False, save_trajectories:{'True','False'}=False, path_trajectories:str='datasets/', N_save:int=1000, t_save:int=1000, return_noise:{'True','False'}=False)
Creates a dataset similar to the one given by in the ANDI 1 challenge. Check the webpage of the challenge for more details. The default values are similar to the ones used to generate the available dataset.
The function returns 6 variables, three variables for the trajectories and three for the corresponding labels. Each variable is a list of three lists. Each of the three lists corresponds to a given dimension, in ascending order. If one of the tasks/dimensions was not calculated, the given list will be empty.
See the tutorials in our Github repository to learn about this function.
Type | Default | Details | |
---|---|---|---|
N | numpy.ndarray | int | 1000 | Number of trajectories per class (i.e.size # models x # classes). If int, all classes have same number. |
max_T | int | 1000 | Maximum length of the trajectories in the dataset. |
min_T | int | 10 | Minimum length of the trajectories in the dataset. |
tasks | list | int | [1, 2, 3] | Task(s) of the ANDI challenge I for which datasets will be generated. |
dimensions | list | int | [1, 2, 3] | Dimension(s) for which trajectories will be generated. Three possible values: 1, 2 and 3. |
load_dataset | {‘True’, ‘False’} | False | If True, the module loads existing datasets from the files task{}.txt and ref{}.txt. |
save_dataset | {‘True’, ‘False’} | False | If True, the module saves the datasets in a .txt following the competition format. |
path_datasets | str | Path from where to load the dataset. | |
load_labels | {‘True’, ‘False’} | True | If False, only loads trajectories and avoids the files refX.txt. |
load_trajectories | {‘True’, ‘False’} | False | If True, the module loads the trajectories of an .h5 file. |
save_trajectories | {‘True’, ‘False’} | False | If True, the module saves a .h5 file for each model considered, with N_save trajectories and T = T_save. |
path_trajectories | str | datasets/ | Path from where to load trajectories. |
N_save | int | 1000 | Number of trajectories to save for each exponents/model. Advise: save at the beggining a big dataset (i.e. with default t_save N_save) which allows you to load any other combiantionof T and N. |
t_save | int | 1000 | Length of the trajectories to be saved. See comments on N_save. |
return_noise | {‘True’, ‘False’} | False | If True, returns the amplitudes of the noises added to the trajectories. |
Returns | multiple | Xn (lists): trajectories Yn (lists): labels loc_noise_tn (lists): localization noise amplitudes diff_noise_tn (lists): variance of the diffusion noise |
ANDI 2 challenge (phenom)
_defaults_andi2
_defaults_andi2 ()
This class defines the default values set for the ANDI 2 challenge.
_get_dic_andi2
_get_dic_andi2 (model)
Given the number label of diffusion model, returns a default dictionary of the model’s parameters to be fed to create_dataset The numeration is as follow: 1: single state 2: N-state 3: immobilization 4: dimerization 5: confinement
Type | Details | |
---|---|---|
model | int in [1,6] | Number of the diffusion model |
Returns | dictionary | Dictionary containing the default parameters for ANDI 2 of the indicated model. |
challenge_phenom_dataset
challenge_phenom_dataset (experiments=5, dics=None, repeat_exp=True, num_fovs=1, return_timestep_labs=False, save_data=False, path='data/', prefix='', get_video=False, num_vip=None, get_video_masks=False, files_reorg=False, path_reorg='ref/', save_labels_reorg=False, delete_raw=False)
Creates a datasets with same structure as ones given in the ANDI 2 challenge. Default values for the various diffusion models have been set such as to be in the same ranges as the ones expected for the challenge. For details, check the ANDI 2 challenge webpage (soon).
This function will generate as many experiments (associated to one the diffusion models) as demanded. There are two ways of defining that: - Give number of experiments (and optional parameters such as repeat_exp) to create. The diffusion parameters are then taken from the default values are taken from datasets_phenom._defaults_andi2. - Feed a list of dictionaries (dics) from which data will be generated For each experiment, as many field of view as wanted can be generated
Type | Default | Details | |
---|---|---|---|
experiments | int | 5 | - if int: Number of experiments to generate. Each experiment is generated from one of the available diffusion models. - if list: diffusion models to generate (starting with 1!!!!!) |
dics | NoneType | None | If given, uses this to set the parameters of the experiments. Must be of length equal to experiments. This overrides any info about chosen models, as the model is set by the dictionary. |
repeat_exp | bool | True | -> Does not enter into play if experiments is list If True: picks at random the diffusion model from the pool. If False: picks the diffusion in an ordered way from the pool. |
num_fovs | int | 1 | Number of field of views to get trajectories from in each experiment. |
return_timestep_labs | bool | False | If True, the output trajectories dataframes containing also the labels alpha, D and state at each time step. |
save_data | bool | False | If True, saves all pertinent data. |
path | str | data/ | Path where to store the data. |
prefix | str | Extra prefix that can be added in front of the files’ names. | |
get_video | bool | False | If true, get as output the videos generated with Deeptrack for the generated datasets (see utils_videos for details). |
num_vip | NoneType | None | Number of VIP highlighted in the videos. |
get_video_masks | bool | False | If True, get masks of videos |
files_reorg | bool | False | If True, this function also creates a folder with name path_reorg inside path with the same data but organized à la ANDI2 challenge |
path_reorg | str | ref/ | Folder where the reorganized dataset will be created |
save_labels_reorg | bool | False | If to save also the labels in the reorganized dataset. This is needed if you want to create a reference dataset for the Scoring program. No need if you are just creating data to predict. |
delete_raw | bool | False | If True, deletes the raw dataset so that only the reorganized one is maintained. |
Returns | tuple | - trajs_out: List of lenght (experiments x num_fovs). Each elements are is dataframe containing the trajectories of a particular experiment/fov, in order of generation (i.e. [exp1_fov1, exp1_fov2, …, exp2_fov1 ….]). If return_timestep_labs = True, the dataframes also contain the labels at each time step. - labels_traj_out: list of same length of trajs_out containing the labels of the corresponding trajectories. Each element contains a list with the labels of each trajectory, following the scheme: [idx_traj, D_1, alpha_1, state_1, CP_1, D_2, alpha_2, …. state_N] - labels_ens_out: list of same length of trajs_out containing the ensemble labels of given experiment. See description of output matrix in utils_challenge._extract_ensemble() |
This function generates trajectory datasets like the ones considered in the ANDI 2 challenge. It is based in models_phenom.create_dataset
but also applies:
- Apply Field of View (FOV).
- Add localization noise.
- Smooth the labeling of trajectories to a minimum segment length of 5.
- Extracts ensemble properties.
- Generate videos, if asked.
Outputs:
trajs_out
: List of lenght (experiments x num_fovs). Each elements are is dataframe containing the trajectories of a particular experiment/fov, in order of generation (i.e. [exp1_fov1, exp1_fov2, …, exp2_fov1 ….]). Ifreturn_timestep_labs = True
, the dataframes also contain the labels at each time step.trajs_out
: ifget_video = True
, returns the video for each experiments / FOV.labels_traj_out
: List of same length of trajs_out containing the labels of the corresponding trajectories. Each element contains a list with the labels of each trajectory, following the scheme: [idx_traj, D_1, alpha_1, state_1, CP_1, D_2, alpha_2, …. state_N]labels_ens_out
: List of same length of trajs_out containing the ensemble labels of given experiment. See description of output matrix inutils_challenge._extract_ensemble()
.
Examples
We generate a dataset of trajectories from 5 different experiments. As we are not stating the opposite, each experiment will correspond to one of the 5 diffusion models considered in ANDI 2 (phenom).
Distributions parameters
We first check how distributed are the diffusion parameters of the generated trajectories.
= plt.subplots(2, len(df_list), figsize = (len(df_list)*2, 2*2), tight_layout = True)
fig, axs
for df, ax, dic in zip(df_list, axs.transpose(), dics):
= df['alpha']
alphas = df['D']
Ds = df['state']
states for u in np.unique(states):
0].hist(alphas[states == u], density = 1)
ax[1].hist(Ds[states == u], density = 1)
ax[
0].set_title(dic['model'])
ax[0], ylabel = 'Frequency')
plt.setp(axs[:,0,:], xlabel = r'$\alpha$')
plt.setp(axs[1,:], xlabel = r'$D$')
plt.setp(axs[;
''
FOVs
We can also check that generating multiple FOVS from every experiments actually choses random FOVs in the desired space.
= 3; num_experiments = 4
num_fovs = challenge_phenom_dataset(experiments = [1,2,3,4,5],
df_fov, _ , lab_e =num_fovs,
num_fovs = True
return_timestep_labs )
Creating dataset for Exp_0 (single_state).
Creating dataset for Exp_1 (multi_state).
Creating dataset for Exp_2 (immobile_traps).
Creating dataset for Exp_3 (dimerization).
Creating dataset for Exp_4 (confinement).