movekit package

Submodules

movekit.clustering module

movekit.clustering.clustering(algorithm, data, **kwargs)[source]

Clustering of spatio-temporal data. :param algorithm: Choose between dbscan, hdbscan, agglomerative, kmeans, optics, spectral, affinitypropagation, birch. :param data: DataFrame to perform clustering on. :return: labels as numpy array where the label in the first position corresponds to the first row of the input data.

movekit.clustering.clustering_with_splits(algorithm, data, frame_size, **kwargs)[source]

Clustering of spatio-temporal data. :param algorithm: Choose between dbscan, hdbscan, agglomerative, optics, spectral, affinitypropagation. :param data: DataFrame to perform clustering on. :param frame_size: the dataset is partitioned into frames and merged afterwards. :return: labels as numpy array where the label in the first position corresponds to the first row of the input data.

movekit.clustering.compute_centroid_direction(data, colname='centroid_direction', group_output=False, only_centroid=True)[source]

Calculate the direction of the centroid. Calculates centroid, if not in input data. :param pd DataFrame: DataFrame with x/y positional data and animal_ids, optionally include centroid :param colname: Name of the column. Default: centroid_direction. :param group_output: Boolean, defines form of output. Default: Animal-Level. :param only_centroid: Boolean in case we just want to compute the centroids. Default: True. :return: pandas DataFrame with centroid direction included

movekit.clustering.compute_polarization(preprocessed_data, group_output=False)[source]

Compute the polarization of a group at all record timepoints. More info about the formula: Here: https://bit.ly/2xZ8uSI and Here: https://bit.ly/3aWfbDv. As the formula only takes angles as input, the polarization is calculated for 2d - Data by first calculating the direction angles of the different movers and afterwards by calculating the polarization. For 3-dimensional data for all two’s-combinations of the three dimensions the polarization is calculated in the way described before for 2d-data, afterwards the mean of the three results is taken as result for the polarization. :param preprocessed_data: Pandas Dataframe with or without previously extracted features. :return: Pandas Dataframe, with extracted features along with a new “polarization” variable.

movekit.clustering.dtw_matrix(preprocessed_data, path=False, distance=<function euclidean>)[source]

Obtain dynamic time warping amongst all trajectories from the grouped animal-records. :param preprocessed_data: pandas Dataframe containing the movement records. :param path: Boolean to specify if matrix of dtw-path gets returned as well. :param distance: Specify which distance measure to use. Default: “euclidean”. (ex. Alternatives: pdist, minkowski) :return: pandas Dataframe with distances between trajectories.

movekit.clustering.get_group_data(preprocessed_data)[source]

Helper function to get all group data at one place. :param preprocessed_data: pandas DataFrame, containing preprocessed movement records. :return: pd DataFrame containing all relevant group variables

movekit.clustering.get_heading_difference(preprocessed_data)[source]

Calculate the difference in between the animal’s direction and the centroid’s direction for each timestep. The difference is measured by the cosine similarity of the two direction vectors. The value range is from -1 to 1, with 1 meaning animal and centroid having the same direction while -1 meaning they have opposite directions. :param preprocessed_data: Pandas Dataframe containing preprocessed animal records. :return: Pandas Dataframe containing animal and centroid directions as well as the heading difference.

movekit.clustering.get_spatial_objects(preprocessed_data, group_output=False)[source]

Function to calculate convex hull, voronoi diagram and delaunay triangulation objects and also volumes of the first two objects. Please visit https://docs.scipy.org/doc/scipy-0.14.0/reference/tutorial/spatial.html for detailed documentation of spatial attributes. :param preprocessed_data: Pandas Df, containing x and y coordinates. :param group_output: Boolean, default: False, If true, one line per time capture for entire animal group. :return: DataFrame either for each animal or for group at each time, containing convex hull and voronoi diagram area as well as convex hull, voronoi diagram and delaunay triangulation object.

movekit.clustering.get_trajectories(data_groups)[source]

Obtain trajectories out of a grouped dictionary with multiple ids. :param data_groups: Grouped dictionary by animal_id. :return: Grouped dictionary by animal id, containing tuples of positions in 2d coordinate system.

movekit.clustering.voronoi_volumes(points)[source]

Function to calculate area in a voronoi-diagram. Used in function below. :param points: Nested list, indicating points with coordinates. :return: Volume for each point, infinite if area is not closed to each direction (usually outmost points).

movekit.feature_extraction module

movekit.feature_extraction.centroid_medoid_computation(data, only_centroid=False, object_output=False)[source]

Calculates the data point (animal_id) closest to center/centroid/medoid for a time step Uses group by on ‘time’ attribute :param data: Pandas DataFrame containing movement records :param only_centroid: Boolean in case we just want to compute the centroids. Default: False. :param object_output: Boolean whether to create a point object for the calculated centroids. Default: False. :return: Pandas DataFrame containing computed medoids & centroids

movekit.feature_extraction.compute_average_acceleration(data_animal_id_groups, fps)[source]

Compute average acceleration of mover based on fps parameter. The formula used for calculating average acceleration is: (Final Speed - Initial Speed) / (Total Time Taken). Size of traveling window is determined by fps parameter: By choosing f.e. fps=4 at timestamp 5: (speed at timestamp 7 - speed at timestamp 3) / 4. By choosing f.e. fps=3 at timestamp 5: (speed at timestamp 6.5 - speed at timestamp 3.5) / 3. (in this case use of interpolation if timestamps 3.5 and 6.5 do not exist.) :param data_animal_id_groups: dictionary with ‘animal_id’ as keys. :param fps: integer to define size of window for integer-formatted time or string to define size of window for datetime-formatted time (For possible units refer to:https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#offset-aliases.) :return: dictionary, including measure for ‘average_acceleration’.

movekit.feature_extraction.compute_average_speed(data_animal_id_groups, fps)[source]

Compute average speed of mover based on fps parameter. The formula used for calculating average speed is: (Total Distance traveled) / (Total time taken). Size of traveling window is determined by fps parameter: By choosing f.e. fps=4 at timestamp 5: (distance covered from timestamp 3 to timestamp 7) / 4. By choosing f.e. fps=3 at timestamp 5: (distance covered from timestamp 3.5 to timestamp 6.5) / 3. (in this case use of interpolation if timestamps 3.5 and 6.5 do not exist.) :param data_animal_id_groups: dictionary with ‘animal_id’ as keys. :param fps: integer to define size of window for integer-formatted time or string to define size of window for datetime-formatted time (For possible units refer to:https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#offset-aliases.) :return: dictionary, including measure for ‘average_speed’.

movekit.feature_extraction.compute_direction(data_animal_id_groups, pbar, param_x='x', param_y='y', param_z='z', colname='direction')[source]

Computes the movement vector for each timestamp by checking the difference of the coordinates to the previous timestamp. :param data_animal_id_groups: dictionary containing the data frames for each animal :param pbar: percentage bar filled with 10% already :param param_x: column name of the x coordinate :param param_y: column name of the y coordinate :param param_z: column name of the z coordinate :param colname: the name to appear in the new DataFrame for the calculated direction :return: dictionary containing computed ‘direction’ attribute

movekit.feature_extraction.compute_direction_angle(data, param_x='x', param_y='y', colname='direction_angle')[source]

Computes the angle of rotation of an animal between two timesteps. Only possible if coordinates are 2D only. :param data: dataframe containing the movement records :param param_x: column name of the x coordinate :param param_y: column name of the y coordinate :param colname: the name to appear in the new DataFrame for the direction angle computed. :return: dataframe containing computed ‘direction_angle’ as angle from 0-360 degrees (x-axis to the right is 0 degrees)

movekit.feature_extraction.compute_distance(data_animal_id_groups, param_x='x', param_y='y', param_z='z')[source]

Calculate metric distance of animals in between two timesteps. :param data_animal_id_groups: dictionary ordered by ‘animal_id’. :param param_x: Column name to be recognized as x. Default “x”. :param param_y: Column name to be recognized as y. Default “y”. :param param_z: Column name to be recognized as z. Default “z”. :return: dictionary containing computed ‘distance’ attribute.

movekit.feature_extraction.compute_similarity(data, weights, p=2)[source]

Compute positional similarity between animals. Computing the positional similarity in a distance matrix according to animal_id for each time step. :param data: pandas DataFrame, containing preprocessed movement records. :param weights: dictionary, giving variable’s weights in weighted distance calculation. :param p: integer, giving p-norm for Minkowski, weighted and unweighted. Default: 2. :return: pandas DataFrame, including computed similarities.

movekit.feature_extraction.compute_similarity_multiproccessing(data, weights, p=2)[source]

Compute positional similarity between animals. Computing the positional similarity in a distance matrix according to animal_id for each time step. :param data: pandas DataFrame, containing preprocessed movement records. :param weights: dictionary, giving variable’s weights in weighted distance calculation. :param p: integer, giving p-norm for Minkowski, weighted and unweighted. Default: 2. :return: pandas DataFrame, including computed similarities.

movekit.feature_extraction.compute_turning(data_animal_id_groups, param_direction='direction', colname='turning')[source]

Computes the turning for a mover between two timesteps as the cosine similarity between its direction vectors. :param data_animal_id_groups: dictionary ordered by ‘animal_id’. :param param_direction: Column name to be recognized as direction. Default “direction”. :param colname: the name of the new column added which contains the computed cosine similarity. :return: data_animal_id_groups

movekit.feature_extraction.compute_turning_angle(data, colname='turning_angle', direction_angle_name='direction_angle')[source]

Computes the turning angle for a mover between two timesteps as the difference of its direction angle. Only possible for 2D data. :param data: dataframe containing the movement records. :param colname: the name of the new column to be added. :param direction_angle_name: the name of the column containg the direction angle for each movement record. :return: dataframe containing an additional column with the difference in degrees between current and previous timestamp for each record. Note that difference can not be higher than +-180 degrees.

movekit.feature_extraction.computing_stops(data_animal_id_groups, threshold_speed)[source]

Calculate absolute feature, describing a record as stop, based on threshold. Calculate absolute feature called ‘Stopped’ where the value is 1 if ‘Average_Speed’ <= threshold_speed and 0 otherwise. :param data_animal_id_groups: dictionary with ‘animal_id’ as keys. :param threshold_speed: integer, defining maximum value for ‘average_speed’ to be considered as a stop. :return: dictionary, including variable ‘stopped’.

movekit.feature_extraction.distance_by_time(data, frm, to)[source]

Computes the distance between positions for a particular time window for all movers. :param data: pandas DataFrame with all records of movements. :param frm: int defining the start of the time window. Note that if time is stored as a date (if input data has time not stored as numeric type it is automatically converted to datetime) parameter has to be set using an datetime format: mkit.distance_by_time(data, “2008-01-01”, “2010-10-01”) :param to: Int, defining end point up to where to extract records. :param to: int defining the end of the time window (inclusive) :return: pandas DataFrame with animal_id and distance

movekit.feature_extraction.euclidean_dist(data)[source]

Compute the euclidean distance between movers for one individual grouped time step using the Scipy ‘pdist’ and ‘squareform’ methods. :param data: Preprocessed pandas DataFrame with positional record data containing no duplicates. :return: pandas DataFrame, including computed euclidean distances.

movekit.feature_extraction.euclidean_dist_multiproccessing(data)[source]

Compute the euclidean distance between movers for one individual grouped time step using the Scipy ‘pdist’ and ‘squareform’ methods. :param data: Preprocessed pandas DataFrame with positional record data containing no duplicates. :return: pandas DataFrame, including computed euclidean distances.

movekit.feature_extraction.explore_features(data)[source]

Show percentage of environment space explored by singular animal. Using minumum and maximum of coordinates, given by ‘x’ and ‘y’ (and ‘z’) features in input DataFrame. :param data: pandas DataFrame, containing preprocessed movement records. :return: None.

movekit.feature_extraction.explore_features_geospatial(preprocessed_data)[source]

Show exploration of environment space by each animal using ‘shapely’ package. Gives singular descriptions of polygon area covered by each animal and combined. Additionally a plot of the respective areas is provided. :param preprocessed_data: pandas DataFrame, containing preprocessed movement records. :return: None.

movekit.feature_extraction.extract_features(data, fps=10, stop_threshold=0.5)[source]

Calculate and return all absolute features for input animal group. Combined usage of the functions on DataFrame grouping_data(), compute_distance(), compute_direction(), compute_average_speed(), compute_average_acceleration(), computing_stops() :param data: pandas DataFrame with all records of movements. :param fps: size of window used to calculate average speed and average acceleration: integer to define size of window for integer-formatted time or string to define size of window for datetime-formatted time (For possible units refer to:https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#offset-aliases.) :param stop_threshold: integer to specify threshold for average speed, such that we consider timestamp a “stop”. :return: pandas DataFrame with additional variables consisting of all relevant features.

movekit.feature_extraction.extract_features_multiproccessing(data, fps=10, stop_threshold=0.5)[source]

Calculate and return all absolute features for input animal group. Combined usage of the functions on DataFrame grouping_data(), compute_distance(), compute_direction(), compute_average_speed(), compute_average_acceleration(), computing_stops() :param data: pandas DataFrame with all records of movements. :param fps: integer to specify the size of the window examined for calculating average speed and average acceleration. :param stop_threshold: integer to specify threshold for average speed, such that we consider timestamp a “stop”. :return: pandas DataFrame with additional variables consisting of all relevant features.

movekit.feature_extraction.getis_ord(data, x_grids_per_t=3, y_grids_per_t=3, time_grids=3)[source]

Calculate the Getis-Ord G* statistic for each x-y-time interval of the data. Interval size is specified by input. For more information about how the statistic is calculated please refer to: https://sigspatial2016.sigspatial.org/giscup2016/problem :param data: pandas Data frame containing the movement data in the columns x, y and time. :param x_grids_per_t: int defining how many x intervals there are for each time step. The x axis is subdivided uniformly, i.e. if the maximum value of x in the data is 100 and the minimum value is 10, by setting x_grids_per_t = 3 for each time step there are 3 intervals ([10,40),[40,70),[70,100]) :param y_grids_per_t: int defining how many y intervals there are for each time step. The y axis is subdivided uniformly, i.e. if the maximum value of y in the data is 50 and the minimum value is 10, by setting y_grids_per_t = 4 for each time step there are 4 intervals ([10,20),[20,30),[30,40),[50,50]) :param time_grids: int defining how many time intervals there are. The time axis is subdivided uniformly, i.e. if the maximum value of time in the data is 500 and the minimum value is 0, by setting time_grids = 5 there are 5 time intervals ([0,100),[100,200),[200,300),[300,400),[400,500]) Note that if one defines f.e. x_grids_per_t = 3, y_grids_per_t = 3 and time_grids = 5 the space time cube used for calculating G* contains 3*3*5=45 intervals. return: Pandas data frame containing the Getis-Ord statistic for each examined interval (intervals are defined by six columns defining the respective start and end values of the intervals’ x-coordinate, y-coordinate and time.

movekit.feature_extraction.group_movement(feats)[source]

Returns aggregated movement data, such as distance, mean speed, mean acceleration and mean distance to centroid for the entire group at each time capture. :param feats: pd DataFrame with animal-specific data - if no features contained, they will be extracted. :return: pd DataFrame with group-specific values for each time-capture

movekit.feature_extraction.grouping_data(processed_data, pick_vars=None, preprocessedMethod=False)[source]

Function to group data records by ‘animal_id’. Adds additional attributes/columns, if features aren’t extracted yet. :param processed_data: pd.DataFrame with all preprocessed records. :param preprocessedMethod: Boolean whether calling method is from preprocessing to check whether columns for features are added. :return: dictionary with ‘animal_id’ as key and all records as value.

movekit.feature_extraction.hausdorff_distance(data, mover1=None, mover2=None)[source]

Calculate the Hausdorff-Distance between trajectories of different movers. :param data: pandas DataFrame containing movement records. :param mover1: animal_id of the first mover if Hausdorff distance is just to be calculated between two movers. :param mover2: animal_id of the second mover if Hausdorff distance is just to be calculated between two movers :return: Hausdorff distance between two specified movers. If no movers are specified, Hausdorff distance between all movers in the data to each other as a Pandas DataFrame.

movekit.feature_extraction.movement_stopping_durations(data, stop_threshold=0.5)[source]

Split trajectories of movers in stopping and moving phases and return the duration of each phase. :param data: pandas DataFrame containing preprocessed movement records. :param stop_threshold: integer to specify threshold for average speed, such that we consider timestamp a “stop”. :return: dictionary with animal_id as key and DataFrame with the different phases and their durations as value.

movekit.feature_extraction.outlier_by_threshold(data, feature_thresholds, remove=False)[source]

Identify outliers by user given features with specific minimum and maximum thresholds. :param data: data on which outliers are detected. :param feature_thresholds: dictionary containing the features as keys and the minimum/maximum threshold as two element list. For example if one would only want to declare all data points having an average speed < 5 and > 10 as outliers: feature_threshold = {‘average_speed’: [5,10]} :param remove: Boolean deciding whether outliers should be removed in returned dataframe. Default: False (outliers are not removed). :return: Dataframe containing information for each record whether it is an outlier according to the defined threshold values.

movekit.feature_extraction.outlier_detection(dataset, features=['distance', 'average_speed', 'average_acceleration', 'stopped', 'turning'], remove=False, algorithm='KNN', **kwargs)[source]

Detect outliers based on different pyod algorithms. Note: User may decide on different parameters specific to algorithm chosen. :param dataset: Dataframe containing the movement records. :param features: list of features to detect outliers upon. Default: [“distance”, “average_speed”, “average_acceleration”,

“stopped”, “turning”]

Parameters:
  • remove – Boolean deciding whether outliers should be removed in returned dataframe. Default: False (outliers are not removed).

  • algorithm

    String defining which algorithm to use for finding the outliers. The following algorithms are available: “KNN”, “ECOD”, “COPOD”, “ABOD”, “MAD”, “SOS”, “KDE”, “Sampling”, “GMM”, “PCA”, “KPCA”,

    ”MCD”, “CD”, “OCSVM”, “LMDD”, “LOF”, “COF”, “CBLOF”, “LOCI”, “HBOS”, “SOD”, “ROD”, “IForest”, “INNE”, “LSCP”,”LODA”, “VAE”, “SO_GAAL”, “MO_GAAL”, “DeepSVDD”, “AnoGAN”, “ALAD”, “R-Graph”. Additional available algorithms: FastABOD: call algorithm=”ABOD” with method=”fast”, AvgKNN: call algorithm=”KNN” with method=”mean”, MedKNN: call algorithm=”KNN” with method=”median”. Default algorithm is “KNN”. For more information regarding all the algorithms refer to: https://pyod.readthedocs.io/en/latest/pyod.models.html#

  • kwargs – Specific to the algorithm additional parameters can be specified. For further information about the algorithm-specific parameters once again refer to: https://pyod.readthedocs.io/en/latest/pyod.models.html# For the default algorithm “KNN” some additional parameters are for example: contamination - the contamination threshold, n_neighbors - the number of neighbors, method - which KNN to use, and metric - for distance computation.

Returns:

Dataframe containing information for each movement record whether outlier or not.

movekit.feature_extraction.regrouping_data(data_animal_id_groups)[source]

Concatenate all Pandas DataFrames in grouped dictionary into one. :param data_animal_id_groups: dictionary ordered by ‘animal_id’. :return: Pandas DataFrame containing all records of all andimal_ids.

movekit.feature_extraction.segment_data(data, feature, threshold, csv=False, fps=10, stop_threshold=0.5)[source]

Segment data in subsets by feature values using a given threshold value. For instance, by using the average speed as feature split the dataset in segments above and below a given threshold. :param data: dataframe containing the feature which is used to split the dataset. Note that if feature is ‘distance’, ‘average_speed’, ‘average_acceleration’, ‘direction’, ‘stopped’ or ‘turning’, feature can also be extracted within the function. In that case one should define the input parameters to use when extract_features() is called. :param feature: column name of the feature used to split data in subsets. :param threshold: threshold used to split data according to feature value. :param csv: Boolean, defining if each subset shall be exported locally as singular csv. :param fps: used if features are not extracted before but within the function by calling extract_features(): size of window used to calculate average speed and average acceleration: integer to define size of window for integer-formatted time or string to define size of window for datetime-formatted time (For possible units refer to:https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#offset-aliases.) :param stop_threshold: used if features are not extracted before but within the function by calling extract_features(): integer to specify threshold for average speed, such that we consider timestamp a “stop”. :return: dictionary with id of different movers as key and a list of all the subsets for this mover as values. Subsets are thereby stored as dataframe.

movekit.feature_extraction.similarity_computation(group, w, p)[source]

Compute similarity between records. Compute the Minkowski similarity for one individual grouped time step using the Scipy pdist and squareform methods :param group: pandas DataFrame, containing preprocessed movement records. :param w: array, consisting of the weight vector. :param p: double, applies the respective p-norm for weighted Minkowski. :return: pandas DataFrame, including the distances of the records.

movekit.feature_extraction.split_movement_trajectory(data, stop_threshold=0.5, csv=False)[source]

Split trajectories of movers in stopping and moving phases. :param data: pandas DataFrame containing preprocessed movement records. :param stop_threshold: integer to specify threshold for average speed, such that we consider timestamp a “stop”. :param csv: Boolean, defining if each phase shall be exported locally as singular csv. :return: dictionary with animal_id as key and list of individual dataFrames for each movement phase as values.

movekit.feature_extraction.timewise_dict(data)[source]

Group records by timestep. :param data: pd.DataFrame with all preprocessed records. :return: dictionary with ‘time’ as key and all animal records as value.

movekit.feature_extraction.ts_all_features(data)[source]

Perform time series analysis on record data. :param data: pandas DataFrame, containing preprocessed movement records and features. :return: pandas DataFrame, containing extracted time series features for each id for each feature.

movekit.feature_extraction.ts_feature(data, feature)[source]

Perform time series analysis by extracting specified time series features from record data. :param data: pandas DataFrame, containing preprocessed movement records and features. :param feature: time series feature which is extracted from the movement records. :return: pandas DataFrame, containing defined extracted time series features for each id for each feature.

movekit.io module

movekit.io.parse_csv(path_to_file, time_format, three_dim)[source]

Read CSV file into Pandas DataFrame. :param path_to_file: Complete path/relative path to CSV file along with file name. :param time_format: If time is given in an unusual format, the format has to be indicated for the conversion. :param three_dim: Boolean defining whether data contains three dimensional coordinates :return: Pandas DataFrame containing imported data.

movekit.io.parse_excel(path_to_file, sheet, time_format, three_dim)[source]

Read Excel file into Pandas DataFrame :param path_to_file: Complete path/relative path to Excel file along with file name :param sheet: name of specific sheet given, by default first sheet of the excel workbook :param time_format: If time is given in an unusual format, the format has to be indicated for the conversion. :param three_dim: Boolean defining whether data contains three dimensional coordinates :return: Pandas DataFrame containing imported data.

movekit.io.read_data(path, sheet=0, time_format='undefined', three_dim=False)[source]

Function to import data from ‘csv’, ‘xlsx’ and ‘xls’ files. :param path: Complete path/relative path to Excel file along with file name :param sheet: name of specific sheet given, by default first sheet of the excel workbook :param time_format: If time is given in an unusual format, the format has to be indicated for the conversion. :param three_dim: Boolean defining whether data contains three dimensional coordinates :return: Pandas DataFrame containing imported data.

movekit.io.read_movebank(path_to_file, animal_id='individual-local-identifier')[source]

Function to import csv and excel files from the Movebank database. :param path_to_file: Complete path/relative path to file along with file name :param animal_id: Column name of the unique animal identifier (converted to be animal_id) return: Data frame in a format required for using the movekit package.

movekit.io.read_with_geometry(path, animal_id='name', time='time', x='x', y='y', z='z', coordinate_conversion=False, time_format='undefined', geopandas=True)[source]

Function to import files containing both data and geometry (f.e. GeoPackage, (Geo)json, Shapefile and many more). :param path: Complete path/relative path to file along with file name :param animal_id: Key name of the unique animal identifier (as f.e. defined as property value in the geojson feature) :param time: Key name of time (as defined f.e. as property value in the geojson feature) :param x: Key name of x variable (as defined f.e. as property value in the geojson feature) :param y: Key name of y variable (as defined f.e. as property value in the geojson feature) :param z: Key name of z variable (as defined f.e. as property value in the geojson feature) :param coordinate_conversion: Boolean defining whether the x,y (and z) coordinates are stored in geometry object of imported data (f.e. as geometry value in geojson feature). Note that if coordinate_conversion=True function searches for point object in geometry column and converts coordinates to individual columns of returned data frame. (In case of multiple geometries for an observation, so called ‘geometry collections’, the first point object is converted.) :param time_format: If time is given in an unusual format, the format has to be indicated for the conversion. :param geopandas: Boolean defining whether the returned data frame is a geopandas data frame (containing geometry objects in column ‘geometry’) or a pandas data frame (not containing a ‘geometry’ column). return: Geopandas or pandas data frame in a format required for using the movekit package.

movekit.network module

movekit.network.network_time_graphlist(preprocessed_data, object_type='delaunay_object', fps=10, stop_threshold=0.5)[source]

Calculates a network list for each timestep based on delaunay triangulation (currently only one available). :param preprocessed_data: Pandas DataFrame containing movement records. :param object_type: delaunay_object - currently the only one available. :param fps: as the returned network graph contains features such as average speed and average acceleration, the fps parameter defines the size of the travel window used to calculate these features. (refer to extract_features for a more detailed description of the parameter) :param stop_threshold: as the returned network graph contains a feature defining whether a timestamp is a stop, this parameter defines the average speed threshold for a timestamp to be a stop. (refer to extract_features for a more detailed description of the parameter) :return: List of nx graphs based on delaunay triangulation, containing singular, group and relational attributes on nodes, graph and edges.

movekit.plot module

movekit.plot.animate_movement(data, viewsize)[source]

Animated version of plot_movement function. Animates ‘x’ and ‘y’ attributes for given Pandas DataFrame in specified time frame. :param data: Pandas DataFrame (should be sorted by ‘time’ attribute). :param viewsize: Int. Define how many time steps/frames should be visible in the animation.

movekit.plot.plot_animal(inp_data, animal_id)[source]

Plot individual animal’s ‘x’ and ‘y’ coordinates. :param inp_data: DataFrame containing ‘x’ & ‘y’ attributes. :param animal_id: ID of animal to be plotted. :return: None.

movekit.plot.plot_animal_timesteps(data)[source]

Plot the number of time steps for each ‘animal_id’ :param data_animal_id_groups: DataFrame containing movement records. :return: None

movekit.plot.plot_geodata(data, latitude_colname='location-lat', longitude_colname='location-long', animal_list=[], movement_lines=False)[source]

Function to plot geo data on an interactive map using Open Street Maps. :param data: DataFrame containing the movement records :param latitude_colname: name of the column containing the latitude of each movement record :param longitude_colname: name of the column containing the longitude of each movement record :param animal_list: list containing animal_id’s of all animals to be plotted (Default: every animal in data is plotted) :param movement_lines: Boolean whether movement lines between different location markers of animals are plotted return: map Object containing markers for each tracked animal position

movekit.plot.plot_heatmap(data, time0_start, time0_end, round_digits=1, font_size=10, linewidth=0.5)[source]

Plot a heatmap for the mover for user defined time interval. :param data: data frame returned by function getis_ord(): Data frame containing xy- interval coordinates and respective Getis-Ord statistic. :param time0_start: beginning time of the earliest interval included in the heatmap. :param time0_end: beginning time of the latest interval included in the heatmap. :param round_digits: for clear axis description the xy-values of the displayed intervals are rounded to have user defined number of digits. :param font_size: for clear axis description font size of the axis ticks can be defined. :param linewidth: width of the line dividing each cell in heatmap.

movekit.plot.plot_movement(data, frm, to)[source]

Plot ‘x’ and ‘y’ attributes for given Pandas DataFrame in specified time frame. :param data: Pandas DataFrame (should be sorted by ‘time’ attribute). :param frm: Starting from time step. Note that if time is stored as a date (if input data has time not stored as numeric type it is automatically converted to datetime) parameter has to be set using an datetime format: mkit.plot_movement(data, “2008-01-01”, “2010-10-01”) :param to: Ending to time step. :return: None.

movekit.plot.plot_pace(avg_speed_data, feature='speed')[source]

Plot average speed extracted feature for each animal. :param avg_speed_data: pandas Dataframe including average speed feature. :return: None.

movekit.plot.save_animation_plot(animation_object, filename)[source]

save animation as gif in working directory. (mp4 file is not working at the moment as moviepy import error) :param animation_object: created animation object :param filename: name of the two files which are created

movekit.plot.save_geodata_map(map, filename)[source]

save the created geodata map as a file :param map: map object to be saved. :param filename: name of the new created file containing the map.

movekit.preprocess module

movekit.preprocess.convert_latlon(data, latitude='latitude', longitude='longitude', replace=True)[source]

Project data from GPS coordinates (latitude and longitude) to the cartesian coordinate system :param data: DataFrame with GPS coordinates :param latitude: str. Name of the column where latitude is stored :param longitude: str. Name of the column where longitude is stored :param replace: bool. Flag whether the xy columns should replace the latlon columns :return: DataFrame after the transformation where latitude is projected into y and longitude is projected into x

movekit.preprocess.convert_measueres(preprocessed_data, x_min=0, x_max=1, y_min=0, y_max=1, z_min=0, z_max=1)[source]

Create a linear scale with input parameters for x,y for transformation of position data. :param preprocessed_data: Pandas DataFrame only with x and y position data :param x_min: int minimum for x - default: 0. :param x_max: int maximum for x - default: 1. :param y_min: int minimum for y - default: 0. :param y_max: int maximum for y - default: 1. :param z_min: int minimum for z - default: 0. :param z_max: int maximum for z - default: 1. :return: Pandas DataFrame with linearly transformed position data.

movekit.preprocess.delete_mover(data, animal_id)[source]

Delete a particular mover from the DataFrame :param data: DataFrame :param animal_id: int. The animal_id as found in the column animal_id :return: DataFrame

movekit.preprocess.filter_dataframe(data, frm, to)[source]

Extract records of assigned time frame from preprocessed movement record data. :param data: Pandas DataFrame, containing preprocessed movement record data. :param frm: Int, defining starting point from where to extract records.Note that if time is stored as a date (if input data has time not stored as numeric type it is automatically converted to datetime) parameter has to be set using an datetime format: mkit.filter_dataframe(data, “2008-01-01”, “2010-10-01”) :param to: Int, defining end point up to where to extract records. :return: Pandas DataFrame, filtered by records matching the defined frame in ‘from’-‘to’.

movekit.preprocess.from_dataframe(data, dictionary)[source]

Reformat an existing DataFrame to make it compatible with movekit :param data: pandas DataFrame. The data to be reformatted :param dictionary: Key-value pairs of column names. Keys store the old column names. The respective new column names are stored as their values. Values that need to be defined include ‘time’, ‘animal_id’, ‘x’ and ‘y’ :return: pandas DataFrame

movekit.preprocess.interpolate(data, limit=1, limit_direction='forward', inplace=False, method='linear', order=1, date_format=False)[source]

Interpolate over missing values in pandas Dataframe of movement records. Interpolation methods consist of “linear”, “polynomial, “time”, “index”, “pad”. (see https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.interpolate.html) :param data: Pandas DataFrame of movement records :param limit: Maximum number of consecutive NANs to fill :param limit_direction: If limit is specified, consecutive NaNs will be filled in this direction. :param inplace: Update the data in place if possible. :param method: Interpolation technique to use. Default is “linear”. :param order: To be used in case of polynomial or spline interpolation. :param date_format: Boolean to define whether time is some kind of date format. In this case column type has to be converted before calling interpolate. :return: Interpolated DataFrame.

movekit.preprocess.normalize(data)[source]

Normalizes values for the ‘x’ and ‘y’ column :param data: DataFrame to perform preprocessing on :return: normalized DataFrame

movekit.preprocess.plot_missing_values(data)[source]

Plot the missing values of an animal-ID against time. :param data: Pandas DataFrame containing records of movement. :return: None.

movekit.preprocess.preprocess(data, dropna=True, interpolation=False, limit=1, limit_direction='forward', inplace=False, method='linear', order=1, date_format=False)[source]

Function to perform data preprocessing. Print the number of missing values per column; Drop columns with missing values for ‘time’ and ‘animal_id’; Remove the duplicated rows found. :param data: DataFrame to perform preprocessing on :param dropna: Optional parameter to drop columns with missing values for ‘time’ and ‘animal_id’ :param interpolation: Optional parameter to perform interpolation :param limit: Maximum number of consecutive NANs to fill :param limit_direction: If limit is specified, consecutive NaNs will be filled in this direction. :param inplace: Update the data in place if possible. :param method: Interpolation technique to use. Default is “linear”. :param order: To be used in case of polynomial or spline interpolation. :param date_format: Boolean to define whether time is some kind of date format. Important for interpolation. :return: Preprocessed DataFrame.

movekit.preprocess.print_duplicate(df)[source]

Print rows, which are duplicates. :param df: Pandas DataFrame of movement records. :return: None.

movekit.preprocess.print_missing(df)[source]

Print the missing values for each column. :param df: Pandas DataFrame of movement records. :return: None.

movekit.preprocess.replace_parts_animal_movement(data_groups, animal_id, time_array, replacement_value_x, replacement_value_y, replacement_value_z=None)[source]

Replace subsets (segments) of animal movement based on some indices e.g. time. This function can be used to remove outliers.

Example usage:

data_groups = grouping_data(data) arr_index = np.array([10, 20, 200, 20000, 40000, 43200]) replaced_data_groups = replace_parts_animal_movement(data_groups, 811, arr_index, 100, 90)

Parameters:
  • data_groups – DataFrame containing the movement records.

  • animal_id – Int defining ‘animal_id’ whose movements have to be replaced.

  • time_array – Array defining time indices whose movements have to replaced (array of integers if time has integer format, array of strings with datetime if time is datetime format)

  • replacement_value_x – Int value that will replace all ‘x’ attribute values in ‘time_array’.

  • replacement_value_y – Int value that will replace all ‘y’ attribute values in ‘time_array’.

  • replacement_value_z – Int value that will replace all ‘z’ attribute values in ‘time_array’. (optional)

Returns:

Dictionary with replaced subsets.

movekit.preprocess.resample_random(data_groups, downsample_size)[source]

Resample the movement data of each animal - by downsampling at random time intervals. This is done to reduce resolution of the dataset. This function does this by randomly choosing samples from each animal. :param data_groups: DataFrame containing the movement records. :param downsample_size: Int sample size to which each animal has to be reduced by downsampling. :return: DataFrame, modified from original size ‘data_groups’ to ‘downsample_size’.

movekit.preprocess.resample_systematic(data_groups, downsample_size)[source]

Resample the movement data of each animal - by downsampling at fixed time intervals. This is done to reduce the resolution of the dataset. This function does this by systematically choosing samples from each animal. :param data_groups: DataFrame containing the movement records. :param downsample_size: Int sample size to which each animal has to be reduced by downsampling. :return: DataFrame, modified from original size ‘data_groups’ to ‘downsample_size’.

movekit.preprocess.split_trajectories(data_groups, segment, fuzzy_segment=0, csv=False)[source]

Split trajectory of a single animal into several segments based on specific criterion.

Example usage:

data_groups = group_animals(data) split_trajectories_fuzzy_segmentation(data_groups, segment = 5, fuzzy_segment = 5)

Parameters:
  • data_groups – DataFrame with movement records.

  • segment – Int, defining point where the animals are split into several Pandas Data Frames.

  • fuzzy_segment – Int, defining interval which will overlap on either side of the segments.

  • csv – Boolean, defining if each interval shall be exported locally as singular csv

Returns:

Dictionary with the created DataFrames for each animal.

movekit.utils module

movekit.utils.angle(vec1, vec2)[source]

Calculate angle between two vectors :param vec1: vector 1 :param vec2: vector 2 return: angle in degrees

movekit.utils.cosine_similarity(vec1, vec2)[source]
movekit.utils.presence_3d(data)[source]

Check whether data is 3-dimensional.

Parameters:

data – pandas Dataframe containing the movement records.

Returns:

boolean whether column z in Dataframe.

Module contents