Group feature extraction

[1]:
import movekit as mkit
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
[2]:
path = "./datasets/fish-5-cleaned.csv"
data = mkit.read_data(path)
data = mkit.extract_features(data)
data.head()
Extracting all absolute features: 100%|██████████| 100.0/100 [00:01<00:00, 67.56it/s]
[2]:
time animal_id x y distance average_speed average_acceleration direction stopped turning
0 1 312 405.29 417.76 0.0 0.210217 -0.006079 (0.0, 0.0) 1 0.0
1 1 511 369.99 428.78 0.0 0.020944 0.000041 (0.0, 0.0) 1 0.0
2 1 607 390.33 405.89 0.0 0.070235 0.000344 (0.0, 0.0) 1 0.0
3 1 811 445.15 411.94 0.0 0.370500 0.007092 (0.0, 0.0) 1 0.0
4 1 905 366.06 451.76 0.0 0.118000 -0.003975 (0.0, 0.0) 1 0.0

Detecting outliers

Function performs detection of outliers, based on the KNN algorithm: user can define the regarding features for the detection, the number of the nearest neighbors taken into account for the outlier classification, the metric to calculate the distance, the method to aggregate the different distances, and the share of outliers.

[3]:
# Detect outliers based on KNN.
# mkit.outlier_detection(dataset, features=["distance", "average_speed", "average_acceleration",
# "stopped", "turning"], contamination=0.01, n_neighbors=5, method="mean", metric="minkowski")
outs = mkit.outlier_detection(data)
# printing all rows where outliers are present
outs[outs.loc[:,"outlier"] == 1].head()
[3]:
time animal_id outlier x y distance average_speed average_acceleration direction stopped turning
5 2 312 1 405.31 417.37 0.390512 0.192177 -0.006451 (0.02, -0.39) 1 0.000000
8 2 811 1 445.48 412.26 0.459674 0.387983 0.007893 (0.33, 0.32) 1 0.000000
2603 521 811 1 71.65 333.29 0.472652 0.341161 0.030547 (0.05, 0.47) 1 0.437319
2608 522 811 1 71.56 334.19 0.904489 0.347270 0.029050 (-0.09, 0.9) 1 0.978928
3486 698 511 1 113.96 283.46 4.342637 3.888347 0.326774 (2.12, -3.79) 0 0.999926
[4]:
# same function, different parameters
other_outs = mkit.outlier_detection(dataset = data, features = ["average_speed", "average_acceleration"], contamination = 0.05, n_neighbors = 8, method = "median", metric = "euclidean")

# printing all rows where outliers are present
other_outs[other_outs.loc[:,"outlier"] == 1].head()
[4]:
time animal_id outlier x y distance average_speed average_acceleration direction stopped turning
1527 306 607 1 176.53 416.77 3.010399 3.195286 -0.075984 (-3.0, -0.25) 0 0.999919
1607 322 607 1 130.81 410.81 1.780169 2.088872 -0.221172 (-1.77, -0.19) 0 0.999996
1612 323 607 1 129.26 410.63 1.560417 1.824672 -0.227409 (-1.55, -0.18) 0 0.999962
1617 324 607 1 127.95 410.58 1.310954 1.561242 -0.224810 (-1.31, -0.05) 0 0.997001
1622 325 607 1 126.90 410.58 1.050000 1.296308 -0.216147 (-1.05, 0.0) 0 0.999272

Group-level Analysis

Below we perform Analysis on Group-Level. This consists of: - Group-Level averages, - Centroid Medoid computation - A dynamic time warping matrix, - A clustering over time based on absolute features, - The centroid direction, - The heading difference of each animal with respect to the current centroid - The group - polarization for each timestep.

Obtain group-level records for each point in time

Records consist of total group-distance covered, mean speed, mean acceleration and mean distance from centroid for each timestamp. If input doesn’t contain centroid or feature data, it is calculated, showing a warning.

[5]:
group_data = mkit.group_movement(data)
group_data.head()
Calculating centroid distances: 100%|██████████| 1000/1000 [00:07<00:00, 132.25it/s]
[5]:
total_dist mean_speed mean_acceleration mean_distance_centroid
time
1 0.000000 0.157979 -0.000515 29.4616
2 1.174908 0.157641 -0.000331 29.5850
3 1.025155 0.155610 0.001818 29.6914
4 0.918960 0.153579 0.001551 29.7782
5 0.830461 0.153341 0.001603 29.8518

Obtain centroid, medoid and distance to centroid for each movement record

[6]:
movement = mkit.centroid_medoid_computation(data, object_output = False)
movement.head()
Calculating centroid distances: 100%|██████████| 1000/1000 [00:05<00:00, 188.64it/s]
[6]:
time animal_id outlier x y distance average_speed average_acceleration direction stopped turning x_centroid y_centroid medoid distance_to_centroid
0 1 312 0 405.29 417.76 0.0 0.210217 -0.006079 (0.0, 0.0) 1 0.0 395.364 423.226 312 11.331
1 1 511 0 369.99 428.78 0.0 0.020944 0.000041 (0.0, 0.0) 1 0.0 395.364 423.226 312 25.975
2 1 607 0 390.33 405.89 0.0 0.070235 0.000344 (0.0, 0.0) 1 0.0 395.364 423.226 312 18.052
3 1 811 0 445.15 411.94 0.0 0.370500 0.007092 (0.0, 0.0) 1 0.0 395.364 423.226 312 51.049
4 1 905 0 366.06 451.76 0.0 0.118000 -0.003975 (0.0, 0.0) 1 0.0 395.364 423.226 312 40.901

Get the heading difference between centroids and animal’s direction

Heading difference is computed with the cosine similarity of the two direction vectors, thus ranges from -1 to 1. While 1 indicates the animal and the centroid have the same direction, -1 indicates they move in different directions.

[7]:
centroid_dir = mkit.compute_centroid_direction(data).sort_values(['time','animal_id'])
heading_diff = mkit.get_heading_difference(data)
heading_diff.head()

Calculating centroid distances: 100%|██████████| 1000/1000 [00:02<00:00, 361.77it/s]
Computing centroid direction: 100%|██████████| 100.0/100 [00:00<00:00, 758.23it/s]
Calculating centroid distances: 100%|██████████| 1000/1000 [00:04<00:00, 222.59it/s]
Calculating heading difference: 100%|██████████| 100.0/100 [00:01<00:00, 76.04it/s]
[7]:
time animal_id outlier x y distance average_speed average_acceleration direction stopped turning x_centroid y_centroid medoid distance_to_centroid centroid_direction heading_difference
0 1 312 0 405.29 417.76 0.0 0.210217 -0.006079 (0.0, 0.0) 1 0.0 395.364 423.226 312 11.331 (0.0, 0.0) 0.0
1 1 511 0 369.99 428.78 0.0 0.020944 0.000041 (0.0, 0.0) 1 0.0 395.364 423.226 312 25.975 (0.0, 0.0) 0.0
2 1 607 0 390.33 405.89 0.0 0.070235 0.000344 (0.0, 0.0) 1 0.0 395.364 423.226 312 18.052 (0.0, 0.0) 0.0
3 1 811 0 445.15 411.94 0.0 0.370500 0.007092 (0.0, 0.0) 1 0.0 395.364 423.226 312 51.049 (0.0, 0.0) 0.0
4 1 905 0 366.06 451.76 0.0 0.118000 -0.003975 (0.0, 0.0) 1 0.0 395.364 423.226 312 40.901 (0.0, 0.0) 0.0

Obtain a matrix, based on dynamic time warping

Each Animal-ID is displayed in the indices, the entries reflect the similarity of the animal’s trajectories based on the DTW algorithm.

[8]:
#Obtain dynamic time warping amongst all trajectories from the animals. The lower the value for two animals is, the more similar their trajectories are based on the DTW algorithm.
#mkit.dtw_matrix(preprocessed_data, path=False, distance=euclidean)
#preprocessed_data: DataFrame containing the movement data.
#path: Boolean to specify if matrix of dtw-path gets returned as well. (the warping path for all the sequence pairs which are examined)
#distance: Specify with distance measure to use. Default: "euclidean". Other example alternatives are pdist or minkowski. (all distances defined by fastdtw package are possible.

mkit.dtw_matrix(data)
Calculating dynamic time warping: 100%|██████████| 5/5 [00:07<00:00,  1.51s/it]
[8]:
312 511 607 811 905
312 0.000000 30843.085403 32859.600139 42461.524553 37916.447829
511 30843.085403 0.000000 26931.014323 47116.708116 20967.960073
607 32859.600139 26931.014323 0.000000 39859.787924 35711.718898
811 42461.524553 47116.708116 39859.787924 0.000000 38379.806433
905 37916.447829 20967.960073 35711.718898 38379.806433 0.000000