Workflow compilation, use and scheduling
Within MoveApps, Apps can be combined into workflows (Fig. 1(2)), which define an ordered set of steps to access, process and analyse data. The process of building workflows is simple and intuitive in the platform’s graphical user interface, where users can browse Apps, view details of an App’s developers, purpose and documentation and select chosen Apps to add to a workflow. The list of Apps is alphabetically ordered, includes a short description of each App and is searchable by keywords. Each workflow is visually represented by connected containerised Apps, including access points to e.g. App details, options with descriptions for available settings and result overviews, as well as buttons to initiate or stop workflow runs. Workflows can be saved, edited and run for specific use cases.
Every workflow starts with a core App that loads data into the system (Fig. 1(4)). As MoveApps has been set up as a partner platform to the Movebank data base within the Movebank Ecosystem [41], it is most convenient to directly import animal movement data stored in Movebank using the "Movebank" App. This core App allows users to log into Movebank to browse and securely transfer data based on their user access permissions within the Movebank data base, which accommodates both public and controlled-access data, provides support to harmonize data to a shared format and vocabulary, and supports live data feeds [41]. Relying on Movebank for input of data to MoveApps thus provides a secure method to share data between collaborators, allows users without access to data storage or a fast internet connection to input large data volumes, reduces problems in analysis caused by inconsistent or unknown data formats, and supports automated reporting procedures during data collection (see example workflows below). Alternatively, uploading data files (.rds or.csv) from a personal cloud folder (Dropbox, Google Drive) is supported. This option offers flexibility to prepare multi-study datasets prior to importing to MoveApps, as well as to support Apps that incorporate other local data sources as part of tracking data analysis. The data are then passed on to the next App in the appropriate format and processed accordingly. Presently, analyses on data sets of up to 2 million locations are possible in a MoveApps workflow.
After data import, subsequent Apps can be added by selection from a list of all available Apps that accept the appropriate input and provide output in the required format. Input and output formats are filtered and matched automatically by the system. Once a workflow is compiled, it can be executed (Fig. 1(3)). The user can follow the progress of each App in a workflow by the colour-indication of its state (idle, starting, working, post-processing or in error). Workflows are managed to concurrently always activate two Apps, thus reserving system memory, which is the main bottleneck in App execution. In the present system, up to 20 workflows can run at once, additional requests are cued.
Because MoveApps is cloud based, workflows run independently of the local machine and results from complex and time-intensive workflows can be checked after login at a later time. While the container structure of the workflow leads to somewhat longer runtimes in MoveApps than if the code was executed locally (see example workflows below), we consider this downside to be more than offset by the increased flexibility by users and other advantages of containers (see above). The workflow run can be stopped or re-started at any time. R-shiny Apps that invoke user interfaces can be opened after the App has finished and its results can be examined and users can interact with it according to the App’s programming features (Fig. 1(5)).
App details can be viewed at any time by opening the App menu. From this menu, user can change settings or access logs (process run, warning or error messages). Users can also “pin” a workflow at a certain App to retain the results of an App and all preceding Apps in the workflow. As a result, only subsequent Apps to the “pinned” App are re-executed when a workflow is re-started. The purpose is to avoid re-running e.g. initial data access and preparation steps that can be time-consuming with large datasets, thus providing ease of use when iteratively composing workflows and testing App settings. Each App that returns data also generates a short summary of the output data (e.g. time interval, number of animals and positions), which can be viewed easily at any time after the App has finished running. This allows the user to swiftly review App results, identify possible errors or unexpected results of the App, and better understand how each App relates to the workflow output. Finally, each workflow can be cloned into several workflow instances that analyse different datasets or are run using different user-specified parameter settings in one or more of its Apps. Managed by Kubernetes, this allows parallel execution for easy exploration of the influence of the workflow’s parameter space on the results. All workflows and their instances are saved in the user account for future reference.
Workflow instances can be started manually or scheduled to run automatically and without further interaction at fixed time intervals. This is especially useful when up-to-date information about tagged animals are required on a regular basis. Results of the scheduled runs can be accessed in the MoveApps platform or via a secure API (Fig. 1(6)). Users have the option to request an E-mail notification after each scheduled run is completed, containing either a link to the MoveApps site for output access and download or including selected output files as attachments. The integration of alert notifications in the E-mail is e.g. possible with the “Email Alert” App. To avoid system overload by scheduled workflows that are not used any more, we have set a quota of 12 or 30 repeats (depending on run intervals) that needs to be reset by the user. A note on the current state of the quota is included in each notification E-mail.
Share, cite and publish
For replication, collaboration or other joint work, it is possible to share workflows with other MoveApps users (Fig. 1(7)). Workflows can be either shared publicly or with specific users. Recipients can load a shared workflow into their account's dashboard and edit it there independently of the original workflow. It is possible to add two kinds of messages with shared workflows: (1) an open text field that allows the user to provide a brief description of the workflow and (2) a data source message which is by default filled with details of the dataset used by the original workflow creator. Thus, sensitive data are not transferred. Recipients of workflows must access the input data from their own accounts, which maintains the integrity of data sharing rights as managed by users in Movebank.
The importance of transparency and reproducibility based on open data and open code/methods has been repeatedly highlighted [35, 36], especially if ecological applications are involved that can have important or controversial implications for science or management and are hard to impossible to replicate [22]. Further, there is a need to ensure that researchers receive professional benefit and recognition for sharing code [9]. Therefore, MoveApps provides a citation for all Apps (Fig. 1(8)) and offers the option to publish and acquire a digital object identifier (DOI) for workflows that are related to a published paper and dataset (Fig. 1(9)).
To support reproducibility and comprehensive documentation of published analyses, the published workflows, their related Apps (including settings and source code) and metadata describing the operating system, libraries, packages and run-time versions used are archived in the Movebank Data Repository (Fig. 1(9)). This is a free and well-established repository in the movement ecology community [31, 41] that provides persistent identifiers for future access and is accepted by scientific journals. The repository is developed in accordance with the FAIR [42] and TRUST [43] data principles. For publication and archiving of workflows, users are required to provide a description of the workflow and each contained instance, the names of all contributors, funding sources and license type. Similar information for each App used in the workflow is extracted from their custom specification files. Finally, we require each published workflow to be publicly shared on the MoveApps platform for easy discovery and reuse, allowing any MoveApps user to reproduce the analysis. Thus, in combination with MoveApps' serverless and modular structure, this archiving service helps to ensure the future reusability of code and replicability of published results, as well as the possibility to assess, modify and improve code and related analytical methods. For replication outside of MoveApps, archived workflows can be downloaded for local use, and old R-environments and R-package versions can be accessed from the CRAN website.
Example workflows
We illustrate the use of MoveApps with two example workflows that address common analysis needs: using the “Morning Report” and the “Migration Mapper”, we analyse a published set of migration tracks of greater white-fronted geese (Anser a. albifrons; Movebank study: "Migration timing in white-fronted geese (data from [45])", [44]). These workflows were developed to showcase the use of the platform and discuss possible extensions to the beta version. The workflows have been made public on MoveApps to be used by all registered users and have been published in the Movebank Data Repository [46, 47].
The “Morning Report” workflow (Fig. 2a, https://doi.org/10.5441/001/1.h4c0p8bv, [46]) is made up of two Apps, the “Movebank” App and the “Morning Report” App, where the latter extracts an overview of a dataset with times of tag activity, plots of tag properties and a small interactive map. This is meant to be used for projects with active tags to explore tag performance, identify changes in behaviour and possibly find the animals in the field. Four Apps (called “Morning Report pdf Overview”, “Morning Report pdf Attribute Plots”, “Morning Report pdf Property Plots” and “Morning Report pdf Maps”) were recently developed, which can be combined into a workflow that provides ".pdf" artefact files containing a time overview for all animals/tags, various data properties and track maps for download. These files can be taken into the field, sent by E-mail or accessed via API.
The user interface output of the workflow (Fig. 2b) reveals that there were (at least) six different animals with available data during the past 5 months in the dataset. The time range, number of locations and distances moved are indicated. For the selected animal, we can see that from mid-June to the end of August, no data were available. After this period, autumn migration commenced and the large displacements and route are visible in the plots and map. To assess performance, we ran the workflow on both MoveApps and on a local installation of R-Studio. The workflow took 3:15 min to run on MoveApps, of which the longest part was taken up by loading the data (2:55 min). In comparison, on a local system R-Studio (IntelCore i7, 16 GB RAM, Windows 10 64-bit), running the same code required 2:55 min in total, with 2:46 min for loading the data. Relative performance will vary based on the available processing power available to users outside of MoveApps.
The “Migration Mapper” workflow (Fig. 3a, https://doi.org/10.5441/001/1.7tq16jr8, [47]) is a more complex workflow made up of six Apps that load data from Movebank, remove outliers, thin the data, filter by season, segment the data by speed and then plot the remaining locations as a density raster. The raster plot is provided as a user interface in which the user can change raster size for more detail vs. better visibility. The division of the workflow’s functionality into the many small Apps has notable advantages: Modular runs of independent Docker instances are more stable and run on less resources than one large, complex App. Furthermore, each App can be used in new workflows or can be replaced in the present workflow by different or more advanced App versions or Apps that have similar functionality.
The user interface outputs of the two different workflow instances show the routes of greater white-fronted geese during spring migration (Fig. 3b) and autumn migration (Fig. 3c). Densely travelled areas become visible by the heat map colours and indicate movement rather than resting, because only flight locations were selected using the “Segment Data by Speed” App. The maps confirm the known differences between the two migrations: During spring the geese fly in a wide front, using many different routes, whereas during autumn most of them use the coastal route which they pass quickly [45]. The runtimes of the workflow for spring and autumn migration only differed minimally, each taking about 5:20 min on MoveApps, and 3:00 min on local R-Studio (see above).