Group feature extraction¶

[1]:

import movekit as mkit
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

[2]:

path = "./datasets/fish-5-cleaned.csv"
data = mkit.read_data(path)
data = mkit.extract_features(data)
data.head()

Extracting all absolute features: 100%|██████████| 100.0/100 [00:01<00:00, 67.56it/s]

[2]:

	time	animal_id	x	y	distance	average_speed	average_acceleration	direction	stopped	turning
0	1	312	405.29	417.76	0.0	0.210217	-0.006079	(0.0, 0.0)	1	0.0
1	1	511	369.99	428.78	0.0	0.020944	0.000041	(0.0, 0.0)	1	0.0
2	1	607	390.33	405.89	0.0	0.070235	0.000344	(0.0, 0.0)	1	0.0
3	1	811	445.15	411.94	0.0	0.370500	0.007092	(0.0, 0.0)	1	0.0
4	1	905	366.06	451.76	0.0	0.118000	-0.003975	(0.0, 0.0)	1	0.0

Detecting outliers¶

Function performs detection of outliers, based on the KNN algorithm: user can define the regarding features for the detection, the number of the nearest neighbors taken into account for the outlier classification, the metric to calculate the distance, the method to aggregate the different distances, and the share of outliers.

[3]:

# Detect outliers based on KNN.
# mkit.outlier_detection(dataset, features=["distance", "average_speed", "average_acceleration",
# "stopped", "turning"], contamination=0.01, n_neighbors=5, method="mean", metric="minkowski")
outs = mkit.outlier_detection(data)
# printing all rows where outliers are present
outs[outs.loc[:,"outlier"] == 1].head()

[3]:

	time	animal_id	outlier	x	y	distance	average_speed	average_acceleration	direction	stopped	turning
5	2	312	1	405.31	417.37	0.390512	0.192177	-0.006451	(0.02, -0.39)	1	0.000000
8	2	811	1	445.48	412.26	0.459674	0.387983	0.007893	(0.33, 0.32)	1	0.000000
2603	521	811	1	71.65	333.29	0.472652	0.341161	0.030547	(0.05, 0.47)	1	0.437319
2608	522	811	1	71.56	334.19	0.904489	0.347270	0.029050	(-0.09, 0.9)	1	0.978928
3486	698	511	1	113.96	283.46	4.342637	3.888347	0.326774	(2.12, -3.79)	0	0.999926

[4]:

# same function, different parameters
other_outs = mkit.outlier_detection(dataset = data, features = ["average_speed", "average_acceleration"], contamination = 0.05, n_neighbors = 8, method = "median", metric = "euclidean")

# printing all rows where outliers are present
other_outs[other_outs.loc[:,"outlier"] == 1].head()

[4]:

	time	animal_id	outlier	x	y	distance	average_speed	average_acceleration	direction	stopped	turning
1527	306	607	1	176.53	416.77	3.010399	3.195286	-0.075984	(-3.0, -0.25)	0	0.999919
1607	322	607	1	130.81	410.81	1.780169	2.088872	-0.221172	(-1.77, -0.19)	0	0.999996
1612	323	607	1	129.26	410.63	1.560417	1.824672	-0.227409	(-1.55, -0.18)	0	0.999962
1617	324	607	1	127.95	410.58	1.310954	1.561242	-0.224810	(-1.31, -0.05)	0	0.997001
1622	325	607	1	126.90	410.58	1.050000	1.296308	-0.216147	(-1.05, 0.0)	0	0.999272

Group-level Analysis¶

Below we perform Analysis on Group-Level. This consists of: - Group-Level averages, - Centroid Medoid computation - A dynamic time warping matrix, - A clustering over time based on absolute features, - The centroid direction, - The heading difference of each animal with respect to the current centroid - The group - polarization for each timestep.

Obtain group-level records for each point in time¶

Records consist of total group-distance covered, mean speed, mean acceleration and mean distance from centroid for each timestamp. If input doesn’t contain centroid or feature data, it is calculated, showing a warning.

[5]:

group_data = mkit.group_movement(data)
group_data.head()

Calculating centroid distances: 100%|██████████| 1000/1000 [00:07<00:00, 132.25it/s]

[5]:

	total_dist	mean_speed	mean_acceleration	mean_distance_centroid
time
1	0.000000	0.157979	-0.000515	29.4616
2	1.174908	0.157641	-0.000331	29.5850
3	1.025155	0.155610	0.001818	29.6914
4	0.918960	0.153579	0.001551	29.7782
5	0.830461	0.153341	0.001603	29.8518

Obtain centroid, medoid and distance to centroid for each movement record¶

[6]:

movement = mkit.centroid_medoid_computation(data, object_output = False)
movement.head()

Calculating centroid distances: 100%|██████████| 1000/1000 [00:05<00:00, 188.64it/s]

[6]:

	time	animal_id	outlier	x	y	distance	average_speed	average_acceleration	direction	stopped	turning	x_centroid	y_centroid	medoid	distance_to_centroid
0	1	312	0	405.29	417.76	0.0	0.210217	-0.006079	(0.0, 0.0)	1	0.0	395.364	423.226	312	11.331
1	1	511	0	369.99	428.78	0.0	0.020944	0.000041	(0.0, 0.0)	1	0.0	395.364	423.226	312	25.975
2	1	607	0	390.33	405.89	0.0	0.070235	0.000344	(0.0, 0.0)	1	0.0	395.364	423.226	312	18.052
3	1	811	0	445.15	411.94	0.0	0.370500	0.007092	(0.0, 0.0)	1	0.0	395.364	423.226	312	51.049
4	1	905	0	366.06	451.76	0.0	0.118000	-0.003975	(0.0, 0.0)	1	0.0	395.364	423.226	312	40.901

Get the heading difference between centroids and animal’s direction¶

Heading difference is computed with the cosine similarity of the two direction vectors, thus ranges from -1 to 1. While 1 indicates the animal and the centroid have the same direction, -1 indicates they move in different directions.

[7]:

centroid_dir = mkit.compute_centroid_direction(data).sort_values(['time','animal_id'])
heading_diff = mkit.get_heading_difference(data)
heading_diff.head()

Calculating centroid distances: 100%|██████████| 1000/1000 [00:02<00:00, 361.77it/s]
Computing centroid direction: 100%|██████████| 100.0/100 [00:00<00:00, 758.23it/s]
Calculating centroid distances: 100%|██████████| 1000/1000 [00:04<00:00, 222.59it/s]
Calculating heading difference: 100%|██████████| 100.0/100 [00:01<00:00, 76.04it/s]

[7]:

	time	animal_id	outlier	x	y	distance	average_speed	average_acceleration	direction	stopped	turning	x_centroid	y_centroid	medoid	distance_to_centroid	centroid_direction	heading_difference
0	1	312	0	405.29	417.76	0.0	0.210217	-0.006079	(0.0, 0.0)	1	0.0	395.364	423.226	312	11.331	(0.0, 0.0)	0.0
1	1	511	0	369.99	428.78	0.0	0.020944	0.000041	(0.0, 0.0)	1	0.0	395.364	423.226	312	25.975	(0.0, 0.0)	0.0
2	1	607	0	390.33	405.89	0.0	0.070235	0.000344	(0.0, 0.0)	1	0.0	395.364	423.226	312	18.052	(0.0, 0.0)	0.0
3	1	811	0	445.15	411.94	0.0	0.370500	0.007092	(0.0, 0.0)	1	0.0	395.364	423.226	312	51.049	(0.0, 0.0)	0.0
4	1	905	0	366.06	451.76	0.0	0.118000	-0.003975	(0.0, 0.0)	1	0.0	395.364	423.226	312	40.901	(0.0, 0.0)	0.0

Obtain a matrix, based on dynamic time warping¶

Each Animal-ID is displayed in the indices, the entries reflect the similarity of the animal’s trajectories based on the DTW algorithm.

[8]:

#Obtain dynamic time warping amongst all trajectories from the animals. The lower the value for two animals is, the more similar their trajectories are based on the DTW algorithm.
#mkit.dtw_matrix(preprocessed_data, path=False, distance=euclidean)
#preprocessed_data: DataFrame containing the movement data.
#path: Boolean to specify if matrix of dtw-path gets returned as well. (the warping path for all the sequence pairs which are examined)
#distance: Specify with distance measure to use. Default: "euclidean". Other example alternatives are pdist or minkowski. (all distances defined by fastdtw package are possible.

mkit.dtw_matrix(data)

Calculating dynamic time warping: 100%|██████████| 5/5 [00:07<00:00,  1.51s/it]

[8]:

	312	511	607	811	905
312	0.000000	30843.085403	32859.600139	42461.524553	37916.447829
511	30843.085403	0.000000	26931.014323	47116.708116	20967.960073
607	32859.600139	26931.014323	0.000000	39859.787924	35711.718898
811	42461.524553	47116.708116	39859.787924	0.000000	38379.806433
905	37916.447829	20967.960073	35711.718898	38379.806433	0.000000