Getting Started Guide

Getting Started

The ICEBERG project is focused on helping domain scientists make the most of high resolution satellite imagery. Most domain scientists don't have a lot of experience using high-performance computers or remote clusters, so here we offer a basic getting started guide. If you are already familiar with running software on a high-performance cluster or are a software developer interested in extending the ICEBERG tools for new applications, this guide is not for you and you should go here instead.

All of the projects developed by the ICEBERG team to date involve the classification of features or the detection of specific objects in high-resolution imagery. Our land cover classification project involves the classification of geological features based on the reflectance properties of individual pixels, whereas the other projects involve the use of computer vision models that take into account features at larger spatial scales as well. Either way, the goal is to take satellite imagery and extract useful information from it in an automated fashion that is easily scaled-up to cover very large areas of interest (far beyond what one could do without the aid of computers).

We've developed a set of models designed for specific science applications (finding penguins or seals or rivers, for example) as well as a more general set of tools that help deploy these models efficiently on a remote computing cluster, the latter of which we will refer to as "middleware".

To get you started, we'll walk through an explanation as to how to run the models we've developed on your own computer (assuming you have the system requirements, see below) and then we'll walk you through the documentation for running this on a remote cluster.

System requirements to run ICEBERG tools on your local desktop/laptop computer

Systems such as XSEDE Bridges are recommended, but other resources, such as a local cluster, desktop, or laptop, can be used if cuda, python3, and a NVIDIA P100 GPU are present. Start-up allocations are readily available on XSEDE, apply here

Minimum system requirements for running the LandCover tools:

Python 3

Minimum system requirements for running all the other ICEBERG models:

Python 3
CPU and GPU (tested with NVIDIA P100) + CUDA CuDNN

All other dependancies will be installed with the software links below.

Penguins
Seals
Rivers
LandCover - coming in fall 2020.

Setup and installation

These instructions are specific to XSEDE Bridges but other resources can be used if cuda, python3, gdal (Rivers only) and a NVIDIA P100 GPU are present (the LandCover project requires only python3), in which case 'module load' instructions can be skipped, which are specific to Bridges.

For Unix or Mac Users: Login to bridges via ssh using a Unix or Mac command line terminal. Login is available to bridges directly or through the XSEDE portal. Please see the Bridges User's Guide.

For Windows Users: Many tools are available for ssh access to bridges. Please see Ubuntu, MobaXterm or PuTTY

The lines below following a '$' are commands to enter (or cut and paste) into your terminal (note that all commands are case-sensitive, meaning capital and lowercase letters are differentiated.) Everything following '#' are comments to explain the reason for the command and should not be included in what you enter. Lines that do not start with '$' or '[my_env] $' are output you should expect to see.

                          
$ pwd
/home/username
$ cd $SCRATCH                      # switch to your working space.
$ mkdir ICEBERG                    # create a directory to work in.
$ cd ICEBERG                       # move into your working directory.
$ module load cuda                 # load parallel computing architecture.
$ module load python3              # load correct python version.
$ module load gdal/2.2.1           # load gdal (for the Rivers pipeline only, it is not needed for the other pipelines).
$ virtualenv my_env             # create a virtual environment to isolate your work from the default system.  This can be called anything you like that describes your work.
$ source my_env/bin/activate    # activate your environment. Notice the command line prompt changes to show your environment on the next line.
[my_env] $ pwd
/pylon5/group/username/ICEBERG
[my_env] $ export PYTHONPATH=/my_env/lib/python3.5/site-packages # set a system variable to point python to your specific code. (Replace  with the results of pwd command above).
[my_env] $ pip install iceberg_seals.search # pip is a python tool to extract the requested software (iceberg_seals.search in this case) from a repository. (this may take several minutes).

Getting ready for processing

These instructions apply to bridges and may not apply if you are using other resources.

You will start a new session using the ssh steps above, then execute the following commands:

                          
$ interact -p GPU-small --gres=gpu:p100:1 -t 00:30:00 # request a compute node for 30 minutes.  This package has been tested on P100 GPUs on bridges, 
                                             but that does not exclude any other resource that offers the same GPUs. 
					     (this may take a minute or two or more to receive an allocation).
					     Click here for more details on the interact command.  
					     For a single node, 30 minutes per image is recommended.
$ cd $SCRATCH                      # switch to your working space.
$ cd ICEBERG                       # move into your environment directory.
$ module load cuda                 # load parallel computing architecture (this is not needed for the LandCover pipeline).
$ module load python3              # load correct python version.
$ module load gdal/2.2.1           # load gdal (for the Rivers pipeline only, it is not needed for the other pipelines).
$ source my_env/bin/activate    # activate the environment you created earlier. Notice the command line prompt changes to show your environment on the next line.
[my_env] $ pwd
/pylon5/group/username/ICEBERG
[my_env] $ export PYTHONPATH=/my_env/lib/python3.5/site-packages # set a system variable to point python to your specific code. (Replace  with the results of pwd command above).
[my_env] $ mkdir ICEBERG_run    # this can be called anything you like that describes your work.  It is where your input imagery is located and where output will be written.
[my_env] $ cd ICEBERG_run

You are now ready to process imagery with the instructions below!

Single image processing

Penguins

Coming soon

Seals

- Download the pre-trained model. Experienced users can also train their own model. Code is provided in the Seals GitHub Repository, no packaged version is offered.

Seals predicting is executed in two steps:
1) Create tiles from an input GeoTiff image and write to the output_folder. The scale_bands parameter (in pixels) depends on the trained model being used. The default scale_bands is 224 for the pre-trained model downloaded above. If you use your own model the scale_bands may be different.
From the command line and virtual environment used for the installation steps above, the tiling command is:

                          
[iceberg_seals] $ iceberg_seals.tiling --scale_bands=224 --input_image= --output_folder=./test

Where image_abspath is the path to the imagery you want to process.

2) Detect seals on each tile and output counts and confidence for each tile.

                          
[iceberg_seals] $ iceberg_seals.predicting --input_image= --model_architecture=UnetCntWRN --hyperparameter_set=A --training_set=test_vanilla --test_folder=./test --model_path=./ --output_folder=./test_image

Rivers

- Download a pre-trained model at: Pending - a default model will be released in Oct 2020
You can download to your local machine and use scp, ftp, rsync, or Globus to [transfer to bridges](https://portal.xsede.org/psc-bridges).

Rivers predicting is executed in three steps:
1) Create tiles from an input GeoTiff image and write to the output_folder. The scale_bands parameter (in pixels) depends on the trained model being used. The default scale_bands is 224 for the pre-trained model downloaded above. If you use your own model the scale_bands may be different.
From the command line and virtual environment used for the installation steps above, the tiling command is:

                          
[iceberg_rivers] $ iceberg_rivers.tiling --scale_bands=224 --input_image=image_abspath --output_folder=./test

2) Then, detect rivers on each tile and output counts and confidence for each tile.

                          
[iceberg_rivers] $ iceberg_rivers.predicting --input_image=image_filename --model_architecture=UnetCntWRN --hyperparameter_set=A --training_set=test_vanilla --test_folder=./test --model_path=./ --output_folder=./test_image

3) Finally, mosaic all the tiles back into one image.

                          
[iceberg_rivers] $ iceberg_rivers.mosaic --input_WV image --input masks_folder --tile_size 224 --step 112 --output_folder ./mosaic

Where 'image' is the original input image and 'mask_folder' is the folder containing the predicting output.

LandCover

- There is a pre-processed library of landcover classification shapefiles: Pending - a default library will be released in late 2020
- Here is the map of regions available: Pending - the pre-processed map will be released in late 2020

If you have your own imagery to process and classify, LandCover is executed in five steps: (available in late 2020)
1) Perform top-of-atmosphere (TOA) radiance calibration on raw digital number (DN) imagery.
From the command line and virtual environment used for the installation steps above, the TOA command is:

                          
[iceberg_landcover] $ iceberg_landcover.rad -ip Data

Where Data is the path to the input imagery.

The script searches through the inputted path for two file types: raw M1BS(shorthand for multispectral) .tif(image) files and an M1BS .xml file. The raw image filename does not contain keywords such as “atmcorr” or “refl” from other Landcover scripts’ outputs .tif files. Once obtained, the script reads through the metadata present within the .xml file to determine if the raw image files are either WorldView-2 or WorldView-3 images. Once determined, the script performs the algorithm as described by Chander et al [1]. The output file is a TOA radiance image, whose filename is the raw image file’s filename with the string “_rad” appended to the end of it.

2) Perform atmospheric corrections on the TOA imagery.

                          
[iceberg_landcover] $ iceberg_landcover.atmcorr -ip Data

3) Perform surface reflectance calibration on the atmosphere correctioned imagery.

                          
[iceberg_landcover] $ iceberg_landcover.refl -ip Data

The script makes use of a dictionary residing within earth_sun_dist.py containing the Earth-Sun distances in AU. These distances are to be used in the conversion from an atmospherically-corrected image to a surface reflectance image. Similar to the rad.py script, this script searches for two file types: atmospherically-corrected .tif (image) files and an M1BS .xml file. However, if no atmospherically-corrected images exist, the script will instead use a TOA image, which will be assumed to be atmospherically corrected, to obtain a surface reflectance image. If the requisite files exist, the script uses the algorithm described in Updike et al [4]. The output file is a surface reflectance image, whose filename is the atmospherically-corrected image’s filename with the string “_refl” appended to the end of it.

4) Use the surface reflectance imagery to create a classification map

                          
[iceberg_landcover] $ iceberg_landcover.classify -ip Data

The script searches for the output .tif (image) files from refl.py and the corresponding M1BS .xml file. Once found, the script finds the sum of bands for each pixel. Then, each individual pixel is classified as either ice, shadow or water, or geology using this sum. The output file is an image file with each pixel classified under one of the three aforementioned classifications. Its filename is the surface reflectance image’s filename but with the string “_class” appended to the end.

5) Finally, create shapefiles from the classification map.

                          
[iceberg_landcover] $ iceberg_landcover.shapefiles -ip Data

The script searches for the output .tif (image) shapefiles from classify.py. Once found, it creates the raster shape file by reading each pixel’s classification. The output file has the same filename as the classified image file but is a .shp file rather than a .tif file.

Dependencies:
GDAL (>=1.11)
rasterio (1.1.6)
xml.etree.ElementTree
argparse
scipy.stats (1.5.2)
math
sys
numpy
os
Each version number indicates the latest package version the scripts are known to work with.

References:
[1] Chander, G., Markham, B.L. & Helder, D.L. 2009. Summary of current radiometric calibration coefficients for Landsat MSS, TM, ETM1, and EO-1 ALI sensors. Remote Sensing of Environment, 113, 893–903.
[2] Chavez Jr, P.S. 1996. Image-based atmospheric corrections - revisited and improved. Photogrammetric Engineering & Remote Sensing, 62, 1025–1036.
[3] Lu, D., Mausel, P., Brondizio, E. & Moran, E. 2002. Assessment of atmospheric correction methods for Landsat TM data applicable to Amazon basin LBA research. International Journal of Remote Sensing, 23, 2651–2671
[4] Updike, T. & Comp, C. 2010. Radiometric use of WorldView-2 imagery: technical note. Longmont, CO: DigitalGlobe, 16 pp.

Large-scale batch processing with the ICEBERG Middleware

The ICEBERG Middleware transitions desktop image processing to large-scale processing on high performance computing (HPC) platforms. Specifically, the linear codes installed above that process one image at a time can be run concurrently in parallelized tasks that process many images at once on multi-core systems.
Instructions for installation and execution can be found here.
Detailed documentation and background can be found here.