Pre-downloading data#
By default, VirtualShip will automatically ‘stream’ data from the Copernicus Marine Service via the copernicusmarine toolbox. However, for users who wish to manage data locally, it is possible to pre-download the required datasets and feed them into VirtualShip simulations.
As outlined in the Quickstart Guide, the virtualship run command supports an optional --from-data argument, which allows users to specify a local directory containing the necessary data files.
Tip
See the for example… section for an example data download workflow.
Data requirements#
For pre-downloaded data, VirtualShip only supports daily and monthly resolution physical and biogeochemical data, along with a static bathymetry file.
In addition, all pre-downloaded data must be split into separate files per timestep (i.e. one .nc file per day or month).
Note
Monthly data: when using monthly data, ensure that your final .nc file download is for the month after your expedition schedule end date. This is to ensure that a Parcels FieldSet can be generated under-the-hood which fully covers the expedition period. For example, if your expedition runs from 1st May to 15th May, your final monthly data file should be in June. Daily data files only need to cover the expedition period exactly.
Note
Argo and Drifter data: if using Argo floats or Drifters in your expedition, ensure that: 1) the temporal extent of the downloaded data also accounts for the full lifetime of the instruments, not just the expedition period, and 2) the spatial bounds of the downloaded data also accounts for the likely drift distance of the instruments over their lifetimes. Otherwise, simulations will end prematurely (out-of-bounds errors) when the data runs out.
Further, VirtualShip expects pre-downloaded data to be organised in a specific directory & filename structure within the specified local data directory. The expected structure is as outlined in the subsequent sections.
Directory structure#
Assuming the local data directory (as supplied in the --from-data argument) is named data/, the expected subdirectory structure is:
.
└── data
├── bathymetry # containing the singular bathymetry .nc file
├── bgc # containing biogeochemical .nc files
└── phys # containing physical .nc files
Filename conventions#
Within these subdirectories, the expected filename conventions are:
Physical data files (in
data/phys/) should be named as follows:<COPERNICUS_DATESET_NOMENCLATURE>_<YYYY_MM_DD>.nce.g.
cmems_mod_glo_phy-all_my_0.25deg_P1D-m_1998_05_01.ncand so on for each timestep.
Biogeochemical data files (in
data/bgc/) should be named as follows:<COPERNICUS_DATESET_NOMENCLATURE>_<YYYY_MM_DD>.nce.g.
cmems_mod_glo_bgc_my_0.25deg_P1M-m_1998_05_01.ncand so on for each timestep.
Bathymetry data file (in
data/bathymetry/) should be named as follows:cmems_mod_glo_phy_anfc_0.083deg_static_bathymetry.nc
Tip
Take care to use an underscore (_) as the separator between date components in the filenames (i.e. YYYY_MM_DD).
Note
Using the <COPERNICUS_DATESET_NOMENCLATURE> in the filenames is vital in order to correctly identify the temporal resolution of the data (daily or monthly). The P1D in the example above indicates daily data, whereas P1M would indicate monthly data.
See here for more information on Copernicus dataset nomenclature.
See also our own documentation on the Copernicus datasets used natively by VirtualShip when ‘streaming’ data if you wish to use the same datasets for pre-download.
Note
Monthly data: the DD component of the date in the filename for monthly .nc files should always be 01, representing the first day of the month. This ensures that a Parcels FieldSet can be generated under-the-hood which fully covers the expedition period from the start.
Further assumptions#
The following assumptions are also made about the data:
All pre-downloaded data files must be in NetCDF format (
.nc).Physical data files must contain the following variables:
uo,vo,so,thetaoOr these strings must appear as substrings within the variable names (e.g.
uo_gloris acceptable foruo).
If using BGC instruments (e.g.
CTD_BGC), the relevant biogeochemical data files must contain the following variables:o2,chl,no3,po4,nppv,ph,phyc.Or these strings must appear as substrings within the variable names (e.g.
o2_gloris acceptable foro2).
Bathymetry data files must contain a variable named
deptho.
Also of note#
Whilst not mandatory to use data downloaded only from the Copernicus Marine Service (any existing data you may hold can be re-organised accordingly), the assumptions made by VirtualShip regarding directory structure and filename conventions are motivated by alignment with the Copernicus Marine Service’s practices.
If you want to use pre-existing data with VirtualShip, which you may have accessed from a different source, it is possible to do so by restructuring and/or renaming your data files as necessary.
The whole VirtualShip pre-downloaded data workflow should support global data or subsets thereof, provided the data files contain the necessary variables and are structured as outlined above.
For example…#
Example Python code for automating the data download from Copernicus Marine can be found in Example Copernicus Download.