The ICEBERG project is focused on helping domain scientists make the most of high resolution satellite imagery.
Most domain scientists don't have a lot of experience using high-performance computers or remote clusters,
so here we offer a basic getting started guide. If you are already familiar with running software on a
high-performance cluster or are a software developer interested in extending the ICEBERG tools for new
applications, this guide is not for you and you should go
here instead.
All of the projects developed by the ICEBERG team to date involve the classification of features or
the detection of specific objects in high-resolution imagery. Our land cover classification project
involves the classification of geological features based on the reflectance properties of individual pixels,
whereas
the other projects involve the use of computer vision models that take into account features at larger
spatial scales as well. Either way, the goal is to take satellite imagery and extract useful information
from it in an automated fashion that is easily scaled-up to cover very large areas of interest
(far beyond what one could do without the aid of computers).
We've developed a set of models designed for specific science applications (finding penguins or
seals or rivers, for example) as well as a more general set of tools that help deploy these models
efficiently on a remote computing cluster, the latter of which we will refer to as "middleware".
To get you started, we'll walk through an explanation as to how to run the models we've developed
on your own computer (assuming you have the system requirements, see below) and then we'll walk you
through the documentation for running this on a remote cluster.
These instructions are specific to XSEDE Bridges but other resources can be used if cuda, python3, gdal (Rivers only)
and a NVIDIA P100 GPU are present (the LandCover project requires only python3), in which case 'module load' instructions
can be skipped, which are specific to Bridges.
For Unix or Mac Users:
Login to bridges via ssh using a Unix or Mac command line terminal. Login is available to bridges directly or through
the XSEDE portal. Please see the Bridges User's Guide.
For Windows Users:
Many tools are available for ssh access to bridges. Please see Ubuntu, MobaXterm or PuTTY
The lines below following a '$' are commands to enter (or cut and paste) into your terminal (note that all commands are
case-sensitive, meaning capital and lowercase letters are differentiated.) Everything following '#' are comments to
explain the reason for the command and should not be included in what you enter. Lines that do not start with '$'
or '[my_env] $' are output you should expect to see.
$ pwd
/home/username
$ cd $SCRATCH # switch to your working space.
$ mkdir ICEBERG # create a directory to work in.
$ cd ICEBERG # move into your working directory.
$ module load cuda # load parallel computing architecture.
$ module load python3 # load correct python version.
$ module load gdal/2.2.1 # load gdal (for the Rivers pipeline only, it is not needed for the other pipelines).
$ virtualenv my_env # create a virtual environment to isolate your work from the default system. This can be called anything you like that describes your work.
$ source my_env/bin/activate # activate your environment. Notice the command line prompt changes to show your environment on the next line.
[my_env] $ pwd
/pylon5/group/username/ICEBERG
[my_env] $ export PYTHONPATH=/my_env/lib/python3.5/site-packages # set a system variable to point python to your specific code. (Replace with the results of pwd command above).
[my_env] $ pip install iceberg_seals.search # pip is a python tool to extract the requested software (iceberg_seals.search in this case) from a repository. (this may take several minutes).
$ interact -p GPU-small --gres=gpu:p100:1 -t 00:30:00 # request a compute node for 30 minutes. This package has been tested on P100 GPUs on bridges,
but that does not exclude any other resource that offers the same GPUs.
(this may take a minute or two or more to receive an allocation).
Click here for more details on the interact command.
For a single node, 30 minutes per image is recommended.
$ cd $SCRATCH # switch to your working space.
$ cd ICEBERG # move into your environment directory.
$ module load cuda # load parallel computing architecture (this is not needed for the LandCover pipeline).
$ module load python3 # load correct python version.
$ module load gdal/2.2.1 # load gdal (for the Rivers pipeline only, it is not needed for the other pipelines).
$ source my_env/bin/activate # activate the environment you created earlier. Notice the command line prompt changes to show your environment on the next line.
[my_env] $ pwd
/pylon5/group/username/ICEBERG
[my_env] $ export PYTHONPATH=/my_env/lib/python3.5/site-packages # set a system variable to point python to your specific code. (Replace with the results of pwd command above).
[my_env] $ mkdir ICEBERG_run # this can be called anything you like that describes your work. It is where your input imagery is located and where output will be written.
[my_env] $ cd ICEBERG_run
You are now ready to process imagery with the instructions below!
[iceberg_seals] $ iceberg_seals.tiling --scale_bands=224 --input_image= --output_folder=./test
Where image_abspath is the path to the imagery you want to process.
[iceberg_seals] $ iceberg_seals.predicting --input_image= --model_architecture=UnetCntWRN --hyperparameter_set=A --training_set=test_vanilla --test_folder=./test --model_path=./ --output_folder=./test_image
[iceberg_rivers] $ iceberg_rivers.tiling --scale_bands=224 --input_image=image_abspath --output_folder=./test
2) Then, detect rivers on each tile and output counts and confidence for each tile.
[iceberg_rivers] $ iceberg_rivers.predicting --input_image=image_filename --model_architecture=UnetCntWRN --hyperparameter_set=A --training_set=test_vanilla --test_folder=./test --model_path=./ --output_folder=./test_image
3) Finally, mosaic all the tiles back into one image.
[iceberg_rivers] $ iceberg_rivers.mosaic --input_WV image --input masks_folder --tile_size 224 --step 112 --output_folder ./mosaic
Where 'image' is the original input image and 'mask_folder' is the folder containing the predicting output.
[iceberg_landcover] $ iceberg_landcover.rad -ip Data
Where Data is the path to the input imagery.
[iceberg_landcover] $ iceberg_landcover.atmcorr -ip Data
3) Perform surface reflectance calibration on the atmosphere correctioned imagery.
[iceberg_landcover] $ iceberg_landcover.refl -ip Data
The script makes use of a dictionary residing within earth_sun_dist.py containing the
Earth-Sun distances in AU. These distances are to be used in the conversion from an
atmospherically-corrected image to a surface reflectance image. Similar to the rad.py script,
this script searches for two file types: atmospherically-corrected .tif (image) files
and an M1BS .xml file. However, if no atmospherically-corrected images exist, the script
will instead use a TOA image, which will be assumed to be atmospherically corrected, to
obtain a surface reflectance image. If the requisite files exist, the script uses the
algorithm described in Updike et al [4]. The output file is a surface reflectance image,
whose filename is the atmospherically-corrected image’s filename with the string “_refl”
appended to the end of it.
[iceberg_landcover] $ iceberg_landcover.classify -ip Data
The script searches for the output .tif (image) files from refl.py and the corresponding
M1BS .xml file. Once found, the script finds the sum of bands for each pixel. Then, each
individual pixel is classified as either ice, shadow or water, or geology using this sum.
The output file is an image file with each pixel classified under one of the three
aforementioned classifications. Its filename is the surface reflectance image’s filename
but with the string “_class” appended to the end.
[iceberg_landcover] $ iceberg_landcover.shapefiles -ip Data
The script searches for the output .tif (image) shapefiles from classify.py. Once found,
it creates the raster shape file by reading each pixel’s classification. The output file
has the same filename as the classified image file but is a .shp file rather than a .tif file.