= plt.subplots(3, 3, figsize = (9, 5), tight_layout = True)
fig, axs = 5
window_size = 3
min_seg
for ax in axs.flatten():
= models_phenom()._multiple_state_traj(T = 50, alphas = [0.7, 0.8], Ds = [0.01, 0.1],
traj, labs = [[0.50, 0.50], [0.5, 0.5]])
M = label_filter(labs[:,1],
filtered_d = min_seg,
min_seg = window_size)
window_size
1], '.', label = 'True label')
ax.plot(labs[:, = r'Filtered label')
ax.plot(filtered_d, label
0,0].set_title(f'Majority filter with window size = {window_size}')
axs[0,0].legend()
axs[= [], yticklabels = []); plt.setp(axs, xticklabels
utils_challenge
Managing data
Smoothing labels
These functions are used to smooth a given vector of labels of heterogeneous processes by means of majority filter. It allows to define a minimum segment length.
label_filter
label_filter (label:numpy.ndarray, window_size:int=5, min_seg:int=3)
Given a vector of changing labels, applies a majority filter to smoothen it. Then, enforces that the minimum segment of a particular label is bigger or equal to the given minimum segment length min_seg.
Type | Default | Details | |
---|---|---|---|
label | ndarray | Vector to filter by majority vote | |
window_size | int | 5 | Size of the window in which the majority filter is applied. |
min_seg | int | 3 | Minimum segment allowed in the output array |
Returns | ndarray | Filtered label vector |
Example
We create a set of trajectories from models_phenom.multi_state
with a high probability of changing states. This makes segments very short. We filter them to ensure that there is not segment smaller than the desired one.
New population percentages after filtering
Note that smoothing the signal will have an effect on the actual proportion of time a particle spends in each state. This will be taken into account in the challenge. Here we showcase this effect:
= 100
T = models_phenom().multi_state(N = 500, alphas = [[0.7, 1],[0.4,2]], Ds = [[0, 1], [1, 0]], T = T) traj, labs
= np.array([])
res_t = np.array([])
res_ft for label in tqdm(labs.transpose(1,0,2)[:,:,0]):
# raw labels
= np.argwhere(label[1:] != label[:-1]).flatten()
CP if CP[-1] != 199: CP = np.append(CP, T-1)
= np.append(0, CP)
CP
= np.append(res_t, CP[1:] - CP[:-1])
res_t
# filtered labels
= label_filter(label)
filt
= np.argwhere(filt[1:] != filt[:-1]).flatten()
CP_f if CP_f[-1] != 199: CP_f = np.append(CP_f, T-1)
= np.append(0, CP_f)
CP_f
= np.append(res_ft, CP_f[1:] - CP_f[:-1]) res_ft
We show now the new transition rates (e.g. 1 over the residence time of a given state). Because we are minimum segment length of 3, we can actually approximate the filtered transition rate as the original times 2/3:
print(f' True transition rate: {1/np.mean(res_t)}\n',
f'Filtered transition rate: {1/np.mean(res_ft)}\n',
f'True rate x 2/3: {1/np.mean(res_t)*(2/3)}')
True transition rate: 0.10947474747474747
Filtered transition rate: 0.07402020202020201
True rate x 2/3: 0.07298316498316498
Continuous labels to list of features
The labels in the challenge will be the list of \(n\) changepoints as well as the \(n+1\) diffusion properties (\(D\) and \(\alpha\)) for each segment. This function transforms the stepwise labels into three lists: CPs, \(\alpha\)s and \(D\)s.
label_continuous_to_list
label_continuous_to_list (labs)
Given an array of T x 2 labels containing the anomalous exponent and diffusion coefficient at each timestep, returns 3 arrays, each containing the changepoints, exponents and coefficient, respectively. If labs is size T x 3, then we consider that diffusive states are given and also return those.
Type | Details | |
---|---|---|
labs | array | T x 2 or T x 3 labels containing the anomalous exponent, diffusion and diffusive state. |
Returns | tuple | - First element is the list of change points - The rest are corresponding segment properties (order: alpha, Ds and states) |
# Generate the trajectory
= models_phenom().multi_state(N = 1, T = 50)
trajs, labels
# Transform the labels:
= label_continuous_to_list(labels[:,-1,:])
CP, alphas, Ds, _
=(5, 3))
plt.figure(figsize-1, 1], 'o', alpha = 0.4, label = 'Continuous label')
plt.plot(labels[:, -1, Ds, c = 'C1', label = 'CP-1 and value of previous segment')
plt.scatter(CP; plt.xlabel('T'); plt.ylabel(r'$\alpha$') plt.legend()
Text(0, 0.5, '$\\alpha$')
List of features to continuous labels
This function does the opposite from than label_continuous_to_list
. From a list of properties as the one used in ANDI 2 challenge, creates continuous labels.
label_list_to_continuous
label_list_to_continuous (CP, label)
Given a list of change points and the labels of the diffusion properties of the resulting segments, generates and array of continuous labels. The last change point indicates the array length.
Type | Details | |
---|---|---|
CP | array, list | list of change points. Last change point indicates label length. |
label | array, list | list of segment properties |
Returns | array | Continuous label created from the given change points and segment properties |
= [3,24,34]
CP = [0.5, 0.4, 1]
label = label_list_to_continuous(CP, label)
cont = (3,1))
plt.figure(figsize = 'C1')
plt.plot(cont, c = 'k', ls = '--') for c in CP[:-1]]; [plt.axvline(c, c
Storing array data in dataframe
array_to_df
array_to_df (trajs, labels, min_length=10, fov_origin=[0, 0], fov_length=100.0, cutoff_length=10)
Given arrays for the position and labels of trajectories, creates a dataframe with that data. The function also applies the demanded FOV. If you don’t want a field of view, chose a FOV length bigger (smaller) that your maximum (minimum) trajectory position.
Type | Default | Details | |
---|---|---|---|
trajs | array | Trajectories to store in the df (dimension: T x N x 3) | |
labels | array | Labels to store in the df (dimension: T x N x 3) | |
min_length | int | 10 | |
fov_origin | list | [0, 0] | Bottom left point of the square defining the FOV. |
fov_length | float | 100.0 | Size of the box defining the FOV. |
cutoff_length | int | 10 | Minimum length of a trajectory inside the FOV to be considered in the output dataset. |
Returns | tuple | - df_in (dataframe): dataframe with trajectories - df_out (datafram): dataframe with labels |
#trajs, labels = models_phenom().multi_state(T = 200, N = 10, alphas=[0.5, 1], Ds = [1,1], L = 100)
= models_phenom().single_state(T = 200, N = 10)
trajs, labels
# Changing dimensions
= trajs.transpose((1, 0, 2)).copy()
trajs = labels.transpose(1, 0, 2)
labels
= array_to_df(trajs, labels) df_in, df_out
df_out.head()
traj_idx | Ds | alphas | states | changepoints | |
---|---|---|---|---|---|
0 | 0 | [1.0] | [1.0] | [2.0] | [115] |
1 | 1 | [1.0] | [1.0] | [2.0] | [22] |
2 | 2 | [1.0] | [1.0] | [2.0] | [10] |
3 | 3 | [1.0] | [1.0] | [2.0] | [39] |
4 | 4 | [1.0] | [1.0] | [2.0] | [28] |
Transform ANDI dataframe to array with padding
df_to_array
df_to_array (df, pad=-1)
Transform a dataframe as the ones given in the ANDI 2 challenge (i.e. 4 columns: traj_idx, frame, x, y) into a numpy array. To deal with irregular temporal supports, we pad the array whenever the trajectory is not present. The output array has the typical shape of ANDI datasets: TxNx2
Type | Default | Details | |
---|---|---|---|
df | dataframe | Dataframe with four columns ‘traj_idx’: the trajectory index, ‘frame’ the time frame and ‘x’ and ‘y’ the positions of the particle. |
|
pad | int | -1 | Number to use as padding. |
Returns | array | Array containing the trajectories from the dataframe, with usual ANDI shape (TxNx2). |
Reorganize folder for challenge if non-overlapping FOVS
The outputs of datasets_challenge.challenge_phenom_dataset
are not in the appropriate form if one considers the case of non-overlapping FOVS. The latter means that instead of taking n_fovs
from the same experiment, we repeat the same experiment n_fovs
times. This functions rearranges the folders to get the proper structure proposed in the paper.
file_nonOverlap_reOrg
file_nonOverlap_reOrg (raw_folder:pathlib.Path, target_folder:pathlib.Path, experiments:int, num_fovs:int, tracks=[1, 2], save_labels=False, task=['single', 'ensemble'], print_percentage=True)
This considers that you have n_fovs*n_experiments ‘fake’ experiments and organize them based on the challenge instructions
Type | Default | Details | |
---|---|---|---|
raw_folder | Path | Original folder with data produced by datasets_challenge.challenge_phenom_dataset | |
target_folder | Path | Folder where to put reorganized files | |
experiments | int | Number of experiments | |
num_fovs | int | Number of FOVS | |
tracks | list | [1, 2] | Track to consider |
save_labels | bool | False | If True, moves all data (also labels,.. etc). Do True only if saving reference / groundtruth data. Moreover, if True also save the trajectories for the video track |
task | list | [‘single’, ‘ensemble’] | Which task to consider |
print_percentage | bool | True | If True prints, the percentage of states for each experiment |
Isolate VIP particles
get_VIP
get_VIP (array_trajs, num_vip=5, min_distance_part=2, pad=-1, boundary=False, boundary_origin=(0, 0), min_distance_bound=0, sort_length=True)
Given an array of trajectories, finds the particles VIP particles that participants will need to characterize in the video trakcl.
The function first finds the particles that exist at frame 0 (i.e. that their first value is different from pad). Then, iterates over this particles to find num_vip that are at distance > than min_distance_part in the first frame.
Type | Default | Details | |
---|---|---|---|
array_trajs | array | Position of the trajectories that will be considered for the VIP search. | |
num_vip | int | 5 | Number of VIP particles to flag. |
min_distance_part | int | 2 | Minimum distance between two VIP particles. |
pad | int | -1 | Number used to indicate in the temporal support that the particle is outside of the FOV. |
boundary | bool | False | If float, defines the length of the box acting as boundary |
boundary_origin | tuple | (0, 0) | X and Y coords of the boundary |
min_distance_bound | int | 0 | Minimum distance a particles has to be from the boundary in ordered to be considered a VIP particle |
sort_length | bool | True | If True, candidates for VIP particles are choosen in descending trajectory length. This ensures that the longest ones are chosen. |
Returns | list | List of indices of the chosen VIP particles |
# define random trajectories
= np.random.rand(200,10, 2)*10
array_trajs # insert paddings to make first trajectories finish earlier
= -1
pad 100, :,:] = pad
array_trajs[0,3,0] = pad
array_trajs[ array_trajs.shape
(200, 10, 2)
= 5, min_distance_part = 0, pad = pad,
get_VIP(array_trajs, num_vip = 10, boundary_origin = (0,0), min_distance_bound = 0,
boundary = True) sort_length
[0, 1, 2, 4, 5]
ANDI 2 challenge metrics
Setting maximum erros for different metrics
Changepoint pairing
We use an assignment algorithm to pair predicted and groundtruth changepoints. From there, we will calculate the various metrics of the challenge.
changepoint_assignment
changepoint_assignment (GT, preds)
Given a list of groundtruth and predicted changepoints, solves the assignment problem via the Munkres algorithm (aka Hungarian algorithm) and returns two arrays containing the index of the paired groundtruth and predicted changepoints, respectively.
The distance between change point is the Euclidean distance.
Type | Details | |
---|---|---|
GT | list | List of groundtruth change points. |
preds | list | List of predicted change points. |
Returns | tuple | - tuple of two arrays, each corresponding to the assigned GT and pred changepoints - Cost matrix |
= 10; npreds = 6; T = 100
ngts = np.sort(np.random.choice(np.arange(1,T), ngts, replace = False))
GT = np.sort(np.random.choice(np.arange(1,T)*0.5, npreds, replace = False)).astype(int)
preds print('GT:', GT)
print('Pred:', preds)
0] changepoint_assignment(GT, preds)[
GT: [ 2 8 24 33 34 54 55 64 73 85]
Pred: [ 8 11 16 30 36 47]
(array([0, 1, 2, 3, 4, 5], dtype=int64),
array([1, 0, 2, 3, 4, 5], dtype=int64))
changepoint_alpha_beta
changepoint_alpha_beta (GT, preds, threshold=10)
Calculate the alpha and beta measure of paired changepoints. Inspired from Supplemantary Note 3 in https://www.nature.com/articles/nmeth.2808
Type | Default | Details | |
---|---|---|---|
GT | list | List of groundtruth change points. | |
preds | list | List of predicted change points. | |
threshold | int | 10 | Distance from which predictions are considered to have failed. They are then assigned this number. |
Returns | tuple | alpha, beta |
= [r'Random Guess + $N_p>N_{gt}$',
labels r'Random Guess + $N_p<N_{gt}$',
r'GT + rand $\in [-3, 3]$',
r'GT + rand $\in [-1, 1]$']
= plt.subplots(figsize = (4,3))
fig, ax = 0.2
alpha
= 200; ngts = 15;
T
for case, (label, color) in enumerate(zip(labels, ['C0', 'C1', 'C2', 'C3'])):
= [], []
alphas, betas for _ in range(100):
= np.sort(np.random.choice(np.arange(1,T), ngts, replace = False))
GT if case == 0:
= np.random.randint(low = ngts, high = ngts*2)
npreds = np.sort(np.random.choice(np.arange(1,T), npreds, replace = False))
preds elif case == 1:
= np.random.randint(low = 1, high = ngts)
npreds = np.sort(np.random.choice(np.arange(1,T), npreds, replace = False))
preds elif case == 2:
= GT + np.random.randint(-3, 3, ngts)
preds elif case == 3:
= GT + np.random.randint(-1, 1, ngts)
preds
= changepoint_alpha_beta(GT, preds)
alpha, beta
alphas.append(alpha)
betas.append(beta)
= color, alpha = alpha)
ax.scatter(alphas, betas, c = color, label = label, s = 50, marker = 's', edgecolors = 'k')
ax.scatter(np.mean(alphas), np.mean(betas), c = r'$\alpha$', ylabel = r'$\beta$')
plt.setp(ax, xlabel = (1.01,0.4)) ax.legend(loc
<matplotlib.legend.Legend>
jaccard_index
jaccard_index (TP:int, FP:int, FN:int)
Given the true positive, false positive and false negative rates, calculates the Jaccard Index
Type | Details | |
---|---|---|
TP | int | true positive |
FP | int | false positive |
FN | int | false negative |
Returns | float | Jaccard Index |
single_changepoint_error
single_changepoint_error (GT, preds, threshold=5)
Given the groundtruth and predicted changepoints for a single trajectory, first solves the assignment problem between changepoints, then calculates the RMSE of the true positive pairs and the Jaccard index.
Type | Default | Details | |
---|---|---|---|
GT | list | List of groundtruth change points. | |
preds | list | List of predicted change points. | |
threshold | int | 5 | Distance from which predictions are considered to have failed. They are then assigned this number. |
Returns | tuple | - TP_rmse: root mean square error of the true positive change points. - Jaccard Index of the ensemble predictions |
ensemble_changepoint_error
ensemble_changepoint_error (GT_ensemble, pred_ensemble, threshold=5)
Given an ensemble of groundtruth and predicted change points, iterates over each trajectory’s changepoints. For each, it solves the assignment problem between changepoints. Then, calculates the RMSE of the true positive pairs and the Jaccard index over the ensemble of changepoints (i.e. not the mean of them w.r.t. to the trajectories)
Type | Default | Details | |
---|---|---|---|
GT_ensemble | list, array | Ensemble of groutruth change points. | |
pred_ensemble | list | Ensemble of predicted change points. | |
threshold | int | 5 | Distance from which predictions are considered to have failed. They are then assigned this number. |
Returns | tuple | - TP_rmse: root mean square error of the true positive change points. - Jaccard Index of the ensemble predictions |
= ['Random Guess + Incorrect number',
labels r'GT + rand $\in [-3, 3]$',
r'GT + rand $\in [-1, 1]$']
= plt.subplots(figsize = (3,3))
fig, ax = 0.2
alpha
= 200; ngts = 10; npreds = 8
T
for case, (label, color) in enumerate(zip(labels, ['C0', 'C1', 'C2'])):
= [], []
rmse, ji = [], []
GT, preds for _ in range(100):
1,T), ngts, replace = False)))
GT.append(np.sort(np.random.choice(np.arange(if case == 0:
1,T), npreds, replace = False)))
preds.append(np.sort(np.random.choice(np.arange(elif case == 1:
-1] + np.random.randint(-3, 3, ngts))
preds.append(GT[elif case == 2:
-1] + np.random.randint(-1, 1, ngts))
preds.append(GT[
= changepoint_assignment(GT[-1], preds[-1])
assignment, _ = np.array(assignment)
assignment
= single_changepoint_error(GT[-1], preds[-1], threshold = 5)
RMSE, JI
rmse.append(RMSE)
ji.append(JI)
= ensemble_changepoint_error(GT, preds, threshold = 5)
rmse_e, ji_e
= color, alpha = alpha)
ax.scatter(rmse, ji, c = color, label = label, s = 50, marker = 's', edgecolors = 'k')
ax.scatter(rmse_e, ji_e, c = 'TP RMSE', ylabel = 'Jaccard')
plt.setp(ax, xlabel = (0.91,0.4)) ax.legend(loc
<matplotlib.legend.Legend>
Segments pairing
Here we focus on pairing the segments arising from a list of changepoints. We will use this to latter compare the predicted physical properties for each segment
create_binary_segment
create_binary_segment (CP:list, T:int)
Given a set of changepoints and the lenght of the trajectory, create segments which are equal to one if the segment takes place at that position and zero otherwise.
Type | Details | |
---|---|---|
CP | list | list of changepoints |
T | int | length of the trajectory |
Returns | list | list of arrays with value 1 in the temporal support of the current segment. |
= 50
T= np.sort(np.random.choice(np.arange(1,T), 10, replace = False))
GT = (4,3))
plt.figure(figsize for idx, x in enumerate(create_binary_segment(GT, T)):
*idx, 'o') plt.plot(x
jaccard_between_segments
jaccard_between_segments (gt, pred)
Given two segments, calculates the Jaccard index between them by considering TP as correct labeling, FN as missed events and FP leftover predictions.
Type | Details | |
---|---|---|
gt | array | groundtruth segment, equal to one in the temporal support of the given segment, zero otherwise. |
pred | array | predicted segment, equal to one in the temporal support of the given segment, zero otherwise. |
Returns | float | Jaccard index between the given segments. |
segment_assignment
segment_assignment (GT, preds, T:int=None)
Given a list of groundtruth and predicted changepoints, generates a set of segments. Then constructs a cost matrix by calculting the Jaccard Index between segments. From this cost matrix, we solve the assignment problem via the Munkres algorithm (aka Hungarian algorithm) and returns two arrays containing the index of the groundtruth and predicted segments, respectively.
If T = None, then we consider that GT and preds may have different lenghts. In that case, the end of the segments is the the last CP of each set of CPs.
Type | Default | Details | |
---|---|---|---|
GT | list | List of groundtruth change points. | |
preds | list | List of predicted change points. | |
T | int | None | Length of the trajectory. If None, considers different GT and preds length. |
Returns | tuple | - tuple of two arrays, each corresponding to the assigned GT and pred changepoints - Cost matrix calculated via JI of segments |
Examples
Predictions close to groundtruth
= 200;
T = 10;
ngts = np.sort(np.random.choice(np.arange(1,T), ngts, replace = False))
GT = np.sort(GT + np.random.randint(-5, 5, 1) )
preds
= create_binary_segment(GT, T)
seg_GT = create_binary_segment(preds, T)
seg_preds
= segment_assignment(GT, preds, T)
[row_ind, col_ind], cost_matrix
= plt.subplots(2, 5, figsize = (15, 6))
fig, axs for r, c, ax in zip(row_ind, col_ind, axs.flatten()):
f'1 - JI = {np.round(cost_matrix[r, c], 2)}')
ax.set_title(= 'Groundtruth')
ax.plot(seg_GT[r], label = 'Prediction')
ax.plot(seg_preds[c], label 0,0].legend() axs[
<matplotlib.legend.Legend>
Different size between predictions and trues
= 200; T2 = 100
T1 = 10;
ngts = np.sort(np.random.choice(np.arange(1,T1), ngts, replace = False))
GT = np.sort(np.random.choice(np.arange(1,T2), 5, replace = False))
preds
= create_binary_segment(GT, T1)
seg_GT = create_binary_segment(preds, T2)
seg_preds
= segment_assignment(GT, preds)
[row_ind, col_ind], cost_matrix
= plt.subplots(1, 5, figsize = (15, 3))
fig, axs for r, c, ax in zip(row_ind, col_ind, axs.flatten()):
f'1 - JI = {np.round(cost_matrix[r, c], 2)}')
ax.set_title(= 'Groundtruth')
ax.plot(seg_GT[r], label = 'Prediction')
ax.plot(seg_preds[c], label 0].legend() axs[
<matplotlib.legend.Legend>
Predictions very different to groundtruth
= 200;
T = 5; npreds = 5;
ngts = np.sort(np.random.choice(np.arange(1,T), ngts, replace = False))
GT = np.sort(np.random.choice(np.arange(1,T), npreds, replace = False))
preds
= create_binary_segment(GT, T)
seg_GT = create_binary_segment(preds, T)
seg_preds
= segment_assignment(GT, preds, T)
[row_ind, col_ind], cost_matrix
= plt.subplots(1, 5, figsize = (15, 3))
fig, axs for r, c, ax in zip(row_ind, col_ind, axs.flatten()):
f'1 - JI = {np.round(cost_matrix[r, c], 2)}')
ax.set_title(= 'Groundtruth')
ax.plot(seg_GT[r], label = 'Prediction')
ax.plot(seg_preds[c], label 0].legend() axs[
<matplotlib.legend.Legend>
Segment properties comparison
We use the segment pairing functions that we have defined above to compute various metrics between the properties of predicted and groundtruth segments.
Metrics of segment properties
metric_diffusive_state
metric_diffusive_state (gt=None, pred=None)
Compute the F1 score between diffusive states.
metric_diffusion_coefficient
metric_diffusion_coefficient (gt=None, pred=None, threshold_min=1e-12, max_error=190.86835960820298)
Compute the mean squared log error (msle) between diffusion coefficients. Checks the current bounds of diffusion from models_phenom to calculate the maximum error.
metric_anomalous_exponent
metric_anomalous_exponent (gt=None, pred=None, max_error=1.999)
Compute the mean absolute error (mae) between anomalous exponents. Checks the current bounds of anomalous exponents from models_phenom to calculate the maximum error.
= np.random.rand(100)
x = np.random.rand(100) y
+2,y+2, threshold_min=-2) metric_diffusion_coefficient(x
0.014261449910975834
Pairing and metrics calculation
check_no_changepoints
check_no_changepoints (GT_cp, GT_alpha, GT_D, GT_s, preds_cp, preds_alpha, preds_D, preds_s, T:bool|int=None)
Given predicionts over changepoints and variables, checks if in both GT and preds there is an absence of change point. If so, takes that into account to pair variables.
Type | Default | Details | |
---|---|---|---|
GT_cp | list, int, float | Groundtruth change points | |
GT_alpha | list, float | Groundtruth anomalous exponent | |
GT_D | list, float | Groundtruth diffusion coefficient | |
GT_s | list, float | Groundtruth diffusive state | |
preds_cp | list, int, float | Predicted change points | |
preds_alpha | list, float | Predicted anomalous exponent | |
preds_D | list, float | Predicted diffusion coefficient | |
preds_s | list, float | Predicted diffusive state | |
T | bool | int | None | (optional) Length of the trajectories. If none, last change point is length. |
Returns | tuple | - False if there are change points. True if there were missing change points. - Next three are either all Nones if change points were detected, or paired exponents, coefficient and states if some change points were missing. |
segment_property_errors
segment_property_errors (GT_cp, GT_alpha, GT_D, GT_s, preds_cp, preds_alpha, preds_D, preds_s, return_pairs=False, T=None)
Given predicionts over change points and the value of diffusion parameters in the generated segments, computes the defined metrics.
Type | Default | Details | |
---|---|---|---|
GT_cp | list, int, float | Groundtruth change points | |
GT_alpha | list, float | Groundtruth anomalous exponent | |
GT_D | list, float | Groundtruth diffusion coefficient | |
GT_s | list, float | Groundtruth diffusive state | |
preds_cp | list, int, float | Predicted change points | |
preds_alpha | list, float | Predicted anomalous exponent | |
preds_D | list, float | Predicted diffusion coefficient | |
preds_s | list, float | Predicted diffusive state | |
return_pairs | bool | False | If True, returns the assigment pairs for each diffusive property. |
T | NoneType | None | (optional) Length of the trajectories. If none, last change point is length. |
Returns | tuple | - if return_pairs = True, returns the assigned pairs of diffusive properties - if return_pairs = False, returns the errors for each diffusive property |
We generate some random predictions to check how the metrics behave. We consider errors also in the change point predictions, hence there will be some segment mismatchings, which will affect the diffusive properties predictions:
= 200;
T = 10;
ngts = np.linspace(0, 1, ngts)
errors_alpha = np.linspace(0, 10, ngts)
errors_d = np.linspace(0, 1, ngts)
errors_s
= [], [], []
metric_a, metric_d, metric_s for error_a, error_d, error_s in zip(errors_alpha, errors_d, errors_s):
= [], [], []
la, ld, ls for _ in range(100):
= np.sort(np.random.choice(np.arange(1,T-1), ngts, replace = False))
GT_cp = np.sort(np.random.choice(np.arange(1,T-1), ngts, replace = False))
preds_cp
= np.random.rand(GT_cp.shape[0]+1)
GT_alpha = GT_alpha + np.random.randn(preds_cp.shape[0]+1)*error_a
preds_alpha
= np.abs(np.random.randn(GT_cp.shape[0]+1)*10)
GT_D = GT_D + np.abs(np.random.randn(preds_cp.shape[0]+1))*error_d
preds_D
= np.random.randint(0, 5, GT_cp.shape[0]+1)
GT_s = np.random.rand(len(GT_s))
coin = GT_s.copy()
preds_s < error_s] = np.random.randint(0, 5, len(coin[coin < error_s]))
preds_s[coin
= segment_property_errors(GT_cp, GT_alpha, GT_D, GT_s, preds_cp, preds_alpha, preds_D, preds_s, T = T)
m_a, m_d, m_s
; ld.append(m_d); ls.append(m_s)
la.append(m_a)
metric_a.append(np.mean(la))
metric_d.append(np.mean(ld)) metric_s.append(np.mean(ls))
With no error in the changepoint predicitions:
= plt.subplots(1, 3, figsize = (9, 3), tight_layout = True)
fig, ax
0].plot(np.arange(ngts), errors_alpha, c = 'C0', ls = '--', label = 'Expected with no assigment error')
ax[0].plot(np.arange(ngts), metric_a, c = 'C0')
ax[0].set_title(r'Error in $\alpha$ (MAE)')
ax[
#ax[1].plot(np.arange(ngts), errors_d, c = 'C1', ls = '--')
1].plot(np.arange(ngts), metric_d, c = 'C1')
ax[1].set_title(r'Error in $D$ (MSLE)')
ax[
2].plot(np.arange(ngts), metric_s, c = 'C1')
ax[2].set_title(r'Error in states (JI)')
ax[
= 'Error magnitude') plt.setp(ax, xlabel
[Text(0.5, 0, 'Error magnitude'),
Text(0.5, 0, 'Error magnitude'),
Text(0.5, 0, 'Error magnitude')]
With error in the changepoint predicitions:
= plt.subplots(1, 3, figsize = (9, 3), tight_layout = True)
fig, ax
0].plot(np.arange(ngts), errors_alpha, c = 'C0', ls = '--', label = 'Expected with no assigment error')
ax[0].plot(np.arange(ngts), metric_a, c = 'C0')
ax[0].set_title(r'Error in $\alpha$ (MAE)')
ax[
#ax[1].plot(np.arange(ngts), errors_d, c = 'C1', ls = '--')
1].plot(np.arange(ngts), metric_d, c = 'C1')
ax[1].set_title(r'Error in $D$ (MSLE)')
ax[
2].plot(np.arange(ngts), metric_s, c = 'C1')
ax[2].set_title(r'Error in states (JI)')
ax[
= 'Error magnitude') plt.setp(ax, xlabel
[Text(0.5, 0, 'Error magnitude'),
Text(0.5, 0, 'Error magnitude'),
Text(0.5, 0, 'Error magnitude')]
Ensemble metrics
Get ensemble information
extract_ensemble
extract_ensemble (state_label, dic)
Given an array of the diffusive state and a dictionary with the diffusion information, returns a summary of the ensemble properties for the current dataset.
Type | Details | |
---|---|---|
state_label | array | Array containing the diffusive state of the particles in the dataset. For multi-state and dimerization, this must be the number associated to the state (for dimerization, 0 is free, 1 is dimerized). For the rest, we follow the numeration of models_phenom().lab_state. |
dic | dict | Dictionary containing the information of the input dataset. |
Returns | array | Matrix containing the ensemble information of the input dataset. It has the following shape: |mu_alpha1 mu_alpha2 … | |sigma_alpha1 sigma_alpha2 … | |mu_D1 mu_D1 … | |sigma_D1 sigma_D2 … | |counts_state1 counts_state2 … | |
Generate distribution and distances
multimode_dist
multimode_dist (params, weights, bound, x, normalized=False, min_var=1e-09)
Generates a multimodal distribution with given parameters. Also accounts for single mode if weight is float or int.
Type | Default | Details | |
---|---|---|---|
params | list | Mean and variances of every mode. | |
weights | list, float | Weight of every mode. If float, we consider a single mode. | |
bound | tuple | Bounds (min, max) of the functions support. | |
x | array | Support upon which the distribution is created. | |
normalized | bool | False | |
min_var | float | 1e-09 |
# True distribution
= np.logspace(np.log10(models_phenom().bound_D[0]),
x 1]), 100)
np.log10(models_phenom().bound_D[= [0.0005,0.9]
weights = [[0.0,0],[1.5,0.5]]
params_true = multimode_dist(params_true, weights, bound = models_phenom().bound_D, x = x, normalized = False, min_var=1e-9)
true plt.semilogx(x, true)
distribution_distance
distribution_distance (p:<built-infunctionarray>, q:<built- infunctionarray>, x:<built-infunctionarray>=None, metric='wasserstein')
Calculates distance between two distributions.
Type | Default | Details | |
---|---|---|---|
p | array | distribution 1 | |
q | array | distribution 2 | |
x | array | None | support of the distributions (not needed for MAE) |
metric | str | wasserstein | distance metric (either ‘wasserstein’ or ‘mae’) |
Returns | float | distance between distributions |
Tests distance
Normal scenario
= np.linspace(0, 2, 30)
means = False
normalize = plt.figure(figsize=(15, 4))
fig = fig.add_gridspec(2, 10)
gs
# True distribution
= np.arange(0, 3, 0.01)
x = [[1.7,0.01]]
params = [1]
weights = multimode_dist(params, weights, bound = [0, 3], x = x, normalized = normalize)
true
= (1,2)
range_x = np.argwhere((x>range_x[0]) & (x<range_x[1])).flatten()
idx_range
= []
MSE = []
wass for idx, mean in enumerate(means):
= [[mean, 0.01]]
params = [1]
weights = multimode_dist(params, weights, bound = [0, 3], x = x, normalized = normalize)
pred = 'mae'))
MSE.append(distribution_distance(true, pred, metric
wass.append(distribution_distance(true, pred, x))
if idx % 3 == 0:
= fig.add_subplot(gs[0, int(idx/3)])
ax = 'True')
ax.plot(x, true, label = 'Predicted')
ax.plot(x, pred, label = []);
plt.setp(ax, yticks
if idx == 0:
ax.legend()
0])
ax.axvline(range_x[1])
ax.axvline(range_x[
= fig.add_subplot(gs[1, :])
ax_dist '-o', label = 'MAE')
ax_dist.plot(MSE, '-o', label = 'wasserstein')
ax_dist.plot(wass,
ax_dist.legend()= 'MAE')
plt.setp(ax_dist, ylabel ax_dist.grid()
Having a variance = 0 (as in immobile)
Wasserstein:
= np.linspace(0, 2, 30)
means = False
normalize
= plt.figure(figsize=(15, 7))
fig = fig.add_gridspec(4, 10)
gs
# True distribution
= np.arange(0, 3, 0.01)
x = [0.3, 0.9]
weights
= [[0,0.0],[1,0.1]]
params_var0 = multimode_dist(params_var0, weights, bound = [0, 3], x = x, normalized = normalize)
true_var0
= [[0,0.1],[1,0.1]]
params = multimode_dist(params, weights, bound = [0, 3], x = x, normalized = normalize)
true
= []
wass_var0 = []
wass = []
mae_var0 = []
mae for idx, mean in enumerate(means):
= [[mean, 0.01]]
params = [1]
weights = multimode_dist(params, weights, bound = [0, 3], x = x, normalized = normalize)
pred
wass_var0.append(distribution_distance(true_var0, pred, x))
wass.append(distribution_distance(true, pred, x))
= 'mae'))
mae_var0.append(distribution_distance(true_var0, pred, metric = 'mae'))
mae.append(distribution_distance(true, pred, metric
if idx % 3 == 0:
= fig.add_subplot(gs[0, int(idx/3)])
ax0 = 'log(True)', c = 'C0')
ax0.plot(x, np.log(true_var0), label = 'Predicted', c = 'k')
ax0.plot(x, pred, label = [], ylim = (-5, 5));
plt.setp(ax0, yticks
= fig.add_subplot(gs[1, int(idx/3)])
ax = 'True', c = 'C1')
ax.plot(x, true, label = 'Predicted', c = 'k')
ax.plot(x, pred, label = []);
plt.setp(ax, yticks
if idx == 0:
ax0.legend()
ax.legend()
= fig.add_subplot(gs[2, :])
ax_wass '-o', label = 'Var_0 = 0')
ax_wass.plot(wass_var0, '-o', label = r'Var_0 $\neq$ 0')
ax_wass.plot(wass,
ax_wass.legend()'wass dist')
ax_wass.set_ylabel(
= fig.add_subplot(gs[3, :])
ax_mse '-o', label = 'Var_0 = 0')
ax_mse.plot(mae_var0, '-o', label = r'Var_0 $\neq$ 0')
ax_mse.plot(mae, 'log')
ax_mse.set_yscale('mse dist')
ax_mse.set_ylabel(
# ax_dist.grid()
Text(0, 0.5, 'mse dist')
Checking how variance of predicted affects Wasserstein distance:
= np.logspace(-12, -1,300)
variances # True distribution
= np.logspace(-12, 1, 10000)
x = [1]
weights = [[0.0,0]]
params_true = multimode_dist(params_true, weights, bound = [1e-9, 3], x = x, normalized = normalize, min_var=1e-7)
true
= []
dist for idx, var in enumerate(variances):
= [[0.5, var]]
params = [1]
weights = multimode_dist(params, weights, bound = [1e-9, 3], x = x, normalized = normalize, min_var=1e-7)
pred =x))
dist.append(distribution_distance(true, pred, x
+1, 'o')
plt.plot(variances, np.array(dist)0][1], c = 'k', label = 'True variance')
plt.axvline(params_true[
plt.legend()'log')
plt.xscale('log')
plt.yscale(
'Variance prediction')
plt.xlabel('Wasserstein distance') plt.ylabel(
Text(0, 0.5, 'Wasserstein distance')
Checking if we are considering a peak at 0
# True distribution
= np.logspace(np.log10(models_phenom().bound_D[0]),
x 1]), 100)
np.log10(models_phenom().bound_D[= [0.0005,0.9]
weights = [[0.0,0],[1.5,0.5]]
params_true = multimode_dist(params_true, weights, bound = models_phenom().bound_D, x = x, normalized = normalize) true
plt.plot(x, true)'log') plt.xscale(
Testing maximum value of Wasserstein distance for considered \(\alpha\) and \(D\) ranges
from andi_datasets.utils_challenge import multimode_dist, distribution_distance
= models_phenom().bound_alpha[0], models_phenom().bound_alpha[1]
min_a, max_a = np.arange(min_a, max_a, 0.01)
x = False normalize
= multimode_dist([[max_a,0.0001]], [1], bound = [min_a, max_a], x = x, normalized = normalize)
distmax = multimode_dist([[min_a,0.0001]], [1], bound = [min_a, max_a], x = x, normalized = normalize) distmin
distribution_distance(distmax, distmin, x)
1.982486622823773
= models_phenom().bound_D[0], models_phenom().bound_D[1]
min_d, max_d = np.logspace(np.log10(models_phenom().bound_D[0]),
x 1]), 100) np.log10(models_phenom().bound_D[
= multimode_dist([[max_d,0.1]], [1], bound = [min_d, max_d], x = x, normalized = normalize)
distmax = multimode_dist([[min_d,0.01]], [1], bound = [min_d, max_d], x = x, normalized = normalize) distmin
distribution_distance(distmax, distmin, x)
-0.004896474885754287
Calculate ensemble metric
error_Ensemble_dataset
error_Ensemble_dataset (true_data, pred_data, size_support=1000000, metric='wasserstein', return_distributions=False)
Calculates the ensemble metrics for the ANDI 2 challenge. The input are matrices of shape:
col1 (state 1) | col2 (state 2) | col3 (state 3) | … |
---|---|---|---|
\(\mu_a^1\) | \(\mu_a^2\) | \(\mu_a^3\) | … |
\(\sigma_a^1\) | \(\sigma_a^2\) | \(\sigma_a^3\) | … |
\(\mu_D^1\) | \(\mu_D^2\) | \(\mu_D^3\) | … |
\(\sigma_D^1\) | \(\sigma_D^2\) | \(\sigma_D^3\) | … |
\(N_1\) | \(N_2\) | \(N_3\) | … |
Type | Default | Details | |
---|---|---|---|
true_data | array | Matrix containing the groundtruth data. | |
pred_data | array | Matrix containing the predicted data. | |
size_support | int | 1000000 | size of the support of the distributions |
metric | str | wasserstein | metric used to calculate distance between distributions |
return_distributions | bool | False | If True, the function also outputs the generated distributions. |
Returns | tuple | - distance_alpha: distance between anomalous exponents - distance_D: distance between diffusion coefficients - dists (if asked): distributions of both groundtruth and predicted data. Order: true_a, true_D, pred_a, pred_D |
= 1
track # Choose the paths
= f'../../testing/data/fourth_round/pred_carlo/Track{track}/'
PATH_PRED = '../../testing/data/fourth_round/true/fourth_round/'
PATH_TRUE
= [], []
distance_D, distance_alpha for exp in [2]:
= np.loadtxt(PATH_TRUE+f'exp_{exp}/ensemble_labels.txt', skiprows = 1, delimiter = ';')
true = np.loadtxt(PATH_PRED+f'exp_{exp}/ensemble_pred.txt', skiprows = 1, delimiter = ';')
pred
= pandas.DataFrame(data = true.reshape(1,5) if exp == 1 else true.transpose(),
df_true = [r'mean $\alpha$', r'var $\alpha$', r'mean $D$', r'var $D$', '% residence time'])
columns = pandas.DataFrame(data = pred.reshape(1,5) if exp == 1 else pred.transpose(),
df_pred = [r'mean $\alpha$', r'var $\alpha$', r'mean $D$', r'var $D$', '% residence time'])
columns
# print(f'Experiment {exp}: \nGroundtruth:')
# display(df_true)
# print('Prediction:')
# display(df_pred)
# print('------ \n')
distance_D.append(distance_D_exp)
distance_alpha.append(distance_a_exp)
= error_Ensemble_dataset(true, pred, return_distributions = True)
distance_a_exp, distance_D_exp, dists
print(f'Distance distribution D = {np.mean(distance_D)}')
print(fr'Distance distribution $\alpha$ = {np.mean(distance_alpha)}')
Distance distribution D = 0.06045239400247898
Distance distribution $\alpha$ = 0.10273932574583697
Single trajectory metrics
The participants will have to output predictions in a .txt file were each line corresponds to the predictions of a trajectory. The latter have to be ordered as:
0, d\(_0\), a\(_0\), s\(_0\), t\(_1\), d\(_1\), a\(_1\), s\(_1\), t\(_2\), d\(_2\), a\(_2\), s\(_2\), …. t\(_n\), d\(_n\), a\(_n\), s\(_n\),\(T\)
where the first number corresponds to the trajectory index, then d\(_i\), a\(_i\), s\(_i\) correspond to the diffusion coefficient, anomalous exponent and diffusive state of the \(i\)-th segment. For the latter, we have the following code: - 0: immobile - 1: confined - 2: free (unconstrained) - 3: directed
Last, t\(_j\) corresponds to the \(j\)-th changepoints. The last changepoint \(T\) corresponds to the length of the trajectory. Each prediction must contain \(C\) changepoints and \(C\) segments property values. If this is not fulfilled, the whole trajectory is considered as mispredicted.
The .txt file will be first inspected. The data will then be collected into a dataframe
check_prediction_length
check_prediction_length (pred)
Given a trajectory segments prediction, checks whether it has C changepoints and C+1 segments properties values. As it must also contain the index of the trajectory, this is summarized by being multiple of 4. In some cases, the user needs to also predict the final point of the trajectory. In this case, we will have a residu of 1.
separate_prediction_values
separate_prediction_values (pred)
Given a prediction over trjaectory segments, extracts the predictions for each segment property as well as the changepoint values.
load_file_to_df
load_file_to_df (path_file, columns=['traj_idx', 'Ds', 'alphas', 'states', 'changepoints'])
Given the path of a .txt file, extract the segmentation predictions based on the rules of the ANDI 2 challenge022
Saving fake data for test
= [], []
file_gt, file_p = 200; ngts = 10;
T for traj in range(100):
= np.sort(np.random.choice(np.arange(1,T), ngts, replace = False))
GT_cp = np.sort(np.random.choice(np.arange(1,T+50), ngts, replace = False))
preds_cp
= np.random.rand(GT_cp.shape[0]+1)
GT_alpha = GT_alpha# + 0.1 #np.random.randn(preds_cp.shape[0]+1)*0.1
preds_alpha
= np.abs(np.random.randn(GT_cp.shape[0]+1)*10)
GT_D = GT_D + 1.5 #np.abs(np.random.randn(preds_cp.shape[0]+1))*1.6
preds_D
= np.random.randint(0, high = 5, size = GT_cp.shape[0]+1)
GT_state = np.random.randint(0, high = 5, size = preds_cp.shape[0]+1)
preds_state
= [traj, GT_D[0], GT_alpha[0], GT_state[0]], [traj, preds_D[0], preds_alpha[0], preds_state[0]]
list_gt, list_p for gtc, gta, gtd, gts, pc, pa, pd, ps in zip(GT_cp, GT_alpha[1:], GT_D[1:], GT_state[1:], preds_cp, preds_alpha[1:], preds_D[1:], preds_state[1:]):
+= [gtc, gtd, gta, gts]
list_gt += [pc, pd, pa, ps]
list_p
file_gt.append(list_gt)if traj != 6:
file_p.append(list_p)
= 'pred_test.txt', 'true_test.txt'
pred_path, true_path =',')
np.savetxt(true_path, file_gt, delimiter=',') np.savetxt(pred_path, file_p, delimiter
Recovering the data
= 'pred_test.txt', 'true_test.txt'
pred_path, true_path
= load_file_to_df(pred_path)
df_pred = load_file_to_df(true_path) df_true
error_SingleTraj_dataset
error_SingleTraj_dataset (df_pred, df_true, threshold_error_alpha=None, max_val_alpha=2, min_val_alpha=0, threshold_error_D=None, max_val_D=1000000.0, min_val_D=1e-06, threshold_error_s=None, threshold_cp=None, prints=True, disable_tqdm=False)
Given two dataframes, corresponding to the predictions and true labels of a set of trajectories from the ANDI 2 challenge022, calculates the corresponding metrics Columns must be for both (no order needed): traj_idx | alphas | Ds | changepoints | states df_true must also contain a column ‘T’.
Type | Default | Details | |
---|---|---|---|
df_pred | dataframe | Predictions | |
df_true | dataframe | Groundtruth | |
threshold_error_alpha | NoneType | None | (same for D, s, cp) Maximum possible error allowed. If bigger, it is substituted by this error. |
max_val_alpha | int | 2 | (same for D, s, cp) Maximum value of the parameter. |
min_val_alpha | int | 0 | (same for D, s, cp) Minimum value of the parameter. |
threshold_error_D | NoneType | None | |
max_val_D | float | 1000000.0 | |
min_val_D | float | 1e-06 | |
threshold_error_s | NoneType | None | |
threshold_cp | NoneType | None | |
prints | bool | True | |
disable_tqdm | bool | False | If True, disables the progress bar. |
Returns | tuple | - rmse_CP: root mean squared error change points - JI: Jaccard index change points - error_alpha: mean absolute error anomalous exponents - error_D: mean square log error diffusion coefficients - error_s: Jaccar index diffusive states |
Test
Two datasets with same number of trajs
= models_phenom().immobile_traps(T = 200, N = 250, alphas=0.5, Ds = 1, L = 20, Nt = 100, Pb = 1, Pu = 0.5)
trajs, labels
= trajs.transpose((1, 0, 2)).copy()
trajs = labels.transpose(1, 0, 2)
labels
= array_to_df(trajs, labels)
df_in, df_trues
= models_phenom().immobile_traps(T = 200, N = 250, alphas=[0.5, 0.1], Ds = 1, L = 20, Nt = 100, Pb = 1, Pu = 0.5)
trajs, labels
= trajs.transpose((1, 0, 2)).copy()
trajs = labels.transpose(1, 0, 2)
labels
= array_to_df(trajs, labels) df_in, df_preds
= True, disable_tqdm=True); error_SingleTraj_dataset(df_preds, df_trues, prints
Summary of metrics assesments:
Changepoint Metrics
RMSE: 4.187
Jaccard Index: 0.421
Diffusion property metrics
Metric anomalous exponent: 0.3062749061106691
Metric diffusion coefficient: 0.22970698514281537
Metric diffusive state: 0.5218950064020487
Two datasets with different number of trajectories
= models_phenom().immobile_traps(T = 200, N = 350, alphas=[0.5,0.01], Ds = [1., 0.1], L = 20, Nt = 100, Pb = 1, Pu = 0.5)
trajs, labels
= trajs.transpose((1, 0, 2)).copy()
trajs = labels.transpose(1, 0, 2)
labels
= array_to_df(trajs, labels, label_values=[0.5, 1], diff_states=[3, 2])
df_in, df_trues
= models_phenom().immobile_traps(T = 200, N = 250, alphas=[0.5, 0.1], Ds = 1, L = 20, Nt = 100, Pb = 1, Pu = 0.5)
trajs, labels
= trajs.transpose((1, 0, 2)).copy()
trajs = labels.transpose(1, 0, 2)
labels
= array_to_df(trajs, labels, label_values=[0.5, 1], diff_states=[3, 2]) df_in, df_preds
= True) error_SingleTraj_dataset(df_preds, df_trues, prints
Summary of metrics assesments:
100 missing trajectory/ies.
Changepoint Metrics
RMSE: 4.051
Jaccard Index: 0.441
Diffusion property metrics
Metric anomalous exponent: 0.35483874584715985
Metric diffusion coefficient: 3.1690909054732668
Metric diffusive state: 0.4913685263947961
(4.050708208970335,
0.4407643312101911,
0.35483874584715985,
3.1690909054732668,
0.4913685263947961)
= models_phenom().immobile_traps(T = 200, N = 5, alphas=[0.5,0.01], Ds = [1., 0.1], L = 20, Nt = 100, Pb = 1, Pu = 0.5)
trajs, labels
= trajs.transpose((1, 0, 2)).copy()
trajs = labels.transpose(1, 0, 2)
labels
= array_to_df(trajs, labels, label_values=[0.5, 1], diff_states=[3, 2])
df_in, df_preds
= models_phenom().multi_state(T = 200, N = 7, L = 20, M = np.array([[0.9,0.1],[0.9,0.1]]))
trajs, labels
= trajs.transpose((1, 0, 2)).copy()
trajs = labels.transpose(1, 0, 2)
labels
= array_to_df(trajs, labels, label_values=[0.5, 1], diff_states=[3, 2]) df_in, df_trues
= True); error_SingleTraj_dataset(df_preds, df_trues, prints
Summary of metrics assesments:
2 missing trajectory/ies.
Changepoint Metrics
RMSE: 2.903
Jaccard Index: 0.188
Diffusion property metrics
Metric anomalous exponent: 0.8269399281523714
Metric diffusion coefficient: 8.262443034681892
Metric diffusive state: 0.41379310344827586
Dataset with no changepoints
= 250
L # TRUES
= models_phenom().single_state(T = 200, N = 250, alphas=[0.5, 0.01], Ds = [1,0], L = L)
trajs, labels = trajs.transpose((1, 0, 2)).copy()
trajs = labels.transpose(1, 0, 2)
labels
= array_to_df(trajs, labels, fov_length = L+1)
df_in, df_trues
# PREDS
= models_phenom().single_state(T = 200, N = 250, L = L)
trajs, labels
= trajs.transpose((1, 0, 2)).copy()
trajs = labels.transpose(1, 0, 2)
labels
= array_to_df(trajs, labels, fov_length = L+1) df_in, df_preds
= True, disable_tqdm=True); error_SingleTraj_dataset(df_preds, df_trues, prints
Summary of metrics assesments:
Changepoint Metrics
RMSE: 0
Jaccard Index: 1
Diffusion property metrics
Metric anomalous exponent: 0.5024630319750929
Metric diffusion coefficient: 0.0
Metric diffusive state: 1.0
Dataset with no changepoints but different lengths T + one pred with CP
Because T is not considered for prediction, this should give JSC = 1. Because we add one CP, this counts as one FP, hence decreases the JSC to 1/N.
= 250
L = 200; T_pred = 100; N = 5
T_true
# TRUES
= models_phenom().single_state(T = T_true, N = N, alphas=[0.5, 0.01], Ds = [1,0], L = L)
trajs, labels = trajs.transpose((1, 0, 2)).copy()
trajs = labels.transpose(1, 0, 2)
labels
= array_to_df(trajs, labels, fov_length = L+1)
df_in, df_trues
# PREDS
= models_phenom().single_state(T = T_true, N = N, L = L)
trajs, labels = trajs.transpose((1, 0, 2)).copy()
trajs = labels.transpose(1, 0, 2)
labels
= array_to_df(trajs, labels, fov_length = L+1)
df_in, df_preds
# Adding one CP to one prediction
'changepoints'].loc[0] = [50, T_true]
df_preds['Ds'].loc[0] = [1, 1]
df_preds['alphas'].loc[0] = [1, 1]
df_preds['states'].loc[0] = [2, 2] df_preds[
= True, disable_tqdm=True); error_SingleTraj_dataset(df_preds, df_trues, prints
Summary of metrics assesments:
Changepoint Metrics
RMSE: 0
Jaccard Index: 0.8
Diffusion property metrics
Metric anomalous exponent: 0.5232892181068773
Metric diffusion coefficient: 0.0
Metric diffusive state: 1.0
Codalab scoring program
Single trajectory
run_single_task
run_single_task (exp_nums, track, submit_dir, truth_dir)
when_error_single
when_error_single (wrn_str)
Ensemble
run_ensemble_task
run_ensemble_task (exp_nums, track, submit_dir, truth_dir)
Parent program
codalab_scoring
codalab_scoring (INPUT_DIR=None, OUTPUT_DIR=None)
Type | Default | Details | |
---|---|---|---|
INPUT_DIR | NoneType | None | directory to where to find the reference and predicted labes |
OUTPUT_DIR | NoneType | None | directory where the scores will be saved (scores.txt) |
codalab_scoring_local
codalab_scoring_local (submit_dir, truth_dir, output_dir, scores_filename='scores.txt', html_filename='scores.html', dfs_suffix=None)
Local version of codalab_scoring, allowing for custom savings and without df swapping. Labelling is as: Track 1: videos, 2: trajectories; Task 1: Single, 2: Ensemble
Type | Default | Details | |
---|---|---|---|
submit_dir | directory to where to find the predicted labels (i.e. folder containing folders track_1 and/or track_2) | ||
truth_dir | directory to where to find the reference labels (i.e. folder containing folders track_1 and track_2) | ||
output_dir | directory where the scores will be saved | ||
scores_filename | str | scores.txt | name of the txt scores file |
html_filename | str | scores.html | name of the html scores file |
dfs_suffix | NoneType | None | if str, suffix of the df filename: df_task_{1|2}track{1|2}_{dfs_suffix}.csv |
Helper to transform results dataset in reference dataset
transform_ref_to_res
transform_ref_to_res (base_path:str, track:str, num_fovs:int)
Transforms an organized reference dataset into a valid submission dataset. Note that we do not account for VIP indices in track_1, so will later yield an error when scoring this track.
Type | Details | |
---|---|---|
base_path | str | path where to find the folder to reorganize |
track | str | either ‘track_1’ or ‘track_2’ |
num_fovs | int |