Land Cover Classification using Sentinel2 and Deep Learning
Intro
This post is about getting to know LULC classificaiton using Sentinel 2 based on deep learing.
Sentinel 2 Dataset pre-porcess in SNAP
Sentinel-2 is a mission of two twin satellites, promoted by the European Space Agency, travelling around the Globe, completing a full circle every ten days. The two satellites are flying in the same orbit but phased at 180°, which means that they co-operate to provide freshly captured imagery every five days. One is called S2A and its twin is called S2B1!
Data Download
1 |
|
Download with aria2c
-
download
product.meta4
-
cd
to the folder that you want to place -
aria2c --http-user=mengyuchi --http-passwd=5212109mengle --check-certificate=false --max-concurrent-downloads=2 -M products.meta4
Download with python
-
Setup python environment for python
-
Open
Anaconda powershell prompt
, installsentinelsat
package4
1 |
|
Sen2Cor
Sen2Cor in SNAP
-
Unzip Sentinel2 file
-
Install Sen2Cor in SNAP Tools
-
Optical\Thematic Land Processing\Sen2Cor
-
I\O Parameter: Select .xml file in
D:\Data\Sentinel2\xuzhou\Original\S2A_MSIL1C_20161206T030112_N0204_R032_T50SMC_20161206T030749.SAFE\MTD_MSIL1C.xml
-
Processing Parameter: Resolution → All
-
Wait for processing
Sen2Cor in cmd
-
Download Sen2Cor v2.10
-
Unzip
-
Open
dir
in command prompt -
Copy Sentinel2 L1C data file path
L2A_Process.bat I:\Data\Sentinel2\xuzhou\L1C\S2A_MSIL1C_20200320T025541_N0209_R032_T50SMC_20200320T060846.SAFE
-
Run command
L2A_Process.bat I:\Data\Sentinel2\xuzhou\L1C\S2A_MSIL1C_20200320T025541_N0209_R032_T50SMD_20200320T060846.SAFE
-
Wait around 10 mins.
-
Check results in SNAP or QGIS.
Sen2Cor using bat script
-
Create new
txt
file in folderE:\Code\Sentinel2_L1C_to_L2A
-
Copy these code in
txt
file
1 |
|
-
Save as
Sen2cor_weak.bat
-
Open
Windows powershell
,cd
to DirE:\Code\Sentinel2_L1C_to_L2A
and run./Sen2cor_weak.bat
-
Error
Sen2cor 2.10.01 - Product metadata file cannot be read with 14.2 baseline L1C
The solution to this error: using
Sen2Cor-2.05.05-win64
to process sentinel 2 data that produced before 20176.
- Output
dir
for data before 2017 isE:\DATA\LULC\Sentinel2\L1C\Before_2017
, and outputdir
for data after 2017 isE:\DATA\LULC\Sentinel2\L2A
Sen2Res in SNAP
- Get
Extent
ofAOI
inArcGIS
- Extent for
Xuzhou_center
in deg
1 |
|
- Extent for
feng_pei
in deg
1 |
|
Geo Coordinates
setting:
1 |
|
Code for
subset
1 |
|
-
Raster
→subset
-
Optical
→Sentinel2 super-resolution
-
Edit
Geo Coordinates
,I/O Parameters → Directory
in panel
-
Run, this would take a while
-
Total time spend for processing: 01:48:46.160
Mosaic
- Raster → Geometric → Mosaicking
Subset in SNAP
Due to long processing time of Sen2Res, we have to clip our original L2A product into sub-areas2. We use subset
tool in SNAP for this process.
- Open multi L2A products in SNAP.
Raster
→Subset
Using SNAP-GPT tool to process Sentinel2 data
Setup
GPT for Resampling and subset
LULC classification using lstm-sentinel2-landcover
Code for Deep Learning for Land Cover Change Detection5
Python environment setup
-
conda create -n n_env_py37 python=3.7
-
conda activate n_env_py37
-
cd
to folderC:\Users\mengy\Documents\VS workspace\LSTM_Sentinel2_land_cover\lstm-sentinel2-landcover
-
conda config --env --add channels conda-forge
pip install numpy
-
conda install -c conda-forge fire
-
conda create -n tf tensorflow
conda activate tf
-
conda install -c conda-forge segmentation-models-pytorch
-
conda install -c conda-forge scikit-learn
-
conda install -c conda-forge scikit-image
-
conda install -c conda-forge tqdm
-
conda install -c conda-forge gdal
-
conda install -c conda-forge rasterio
Data preparation and placement
-
Sentinel images:
data/raw/sentinel/
-
Ground Truth shapefile:
data/raw/ground_truth/gt_shapefile/
-
shapefile of area of interest (AOI) = shape of GT as shapefile:
data/raw/ground_truth/overallshape/
Pre-processing
Example models are in src/models/
, the data generators used for training are in src/helpers/image_functions.py
.
-
baseline_cnn.py
is a simple FCN that trains on single images at a time (i.e. no sequences). It can be used as a pretrained base model for the LSTM models, as shown in e.g.lstm_fixed_seq.py
. -
lstm_fixed_seq.py
is a FCN+LSTM that trains with sequences of images, however the images that build the sequence are fixed (``. -
lstm_random_seq.py
is also a FCN+LSTM that randomly builds a new sequence of images for each training batch.
Trained models are saved in models/.
Tensorboard logs are saved in separate subdirectories of reports/logs/
; this way they can be called via tensorboard --logdir=reports/logs
(when in main directory).
Dataset Prepare
NDVI
- NDVI Processer
Optical—>Thematic Land Processing—>Vegetation Radiometric Indices—>NDVI Processor
-
Red source band: B4
-
NIR source band: B8
- Use
Band Math
-Right Click
or Raster
to select Band Math
- Band Math Expression:
($2.B8 - $2.B4) / ($2.B8 + $2.B4)
植被红边位置指数 Red-Edge Position Index (REP)
S2REP Processor
-
Set output folder in
I/O Parameters
-
Set parameters in
Processing parameters
-
Red source band1: B4
-
Red source band2: SRB5
-
Red source band3: SRB6
-
NIR source band: SRB7
水体指数MNDWI
考虑到重采样后具有中红外波段,这里使用它们计算改进的归一化水体指数MNDWI7.
-
Green source band: B3
-
NIR source band: SRB11
土壤亮度指数BI2 (The second Brightness Index)
-
Red source band: B4
-
Green source band: B3
-
NIR source band: B8
Pricipal components analysis (PCA)
- In SNAP
Orfeo ToolBox is an open-source project for state-of-the-art remote sensing, including a fast image viewer, apps callable from Bash, Python or QGIS, and a powerful C++ API.
-
Open image
S2A_MSIL2A_20210921T025551_N9999_R032_T50_20221119T225446_super_resolved_mosaic
-
Delete bands
_count
-
Export to
Geotiff
-
Open in
Orfeo Toolboxs
C:\Users\mengy\Downloads\Programs\OTB-8.1.1-Win64\monteverdi.bat
-
File
Open
Geotiff -
在
View
下选项中勾选OTB-Applications browser
-
OTB-Applications browser
—>Image Filtering
—>DemensionalityReduction
-
Execute
- In Gee
主成分分析是将众多具有相关性的数据指标,重新组合成一组新的指标,新形成的指标互不相关,并且前几个主成分能代表原始数据的大部分信息16。
Performing PCA per image over imageCollection in Google Earth Engine17.
主成分分析是将众多具有相关性的数据指标,重新组合成一组新的指标,新形成的指标互不相关,并且前几个主成分能代表原始数据的大部分信息18。
Sources:
Error: Function getnewbandnames
not defined.
Solution:17
1 |
|
GLCM (Grey Level Co-occurrence Matrix)
- In
SNAP
:
Raster
->Image Analysis
->Texture Analysis
->Grey Level Co-occurrence Matrix
Two GLCM features were computed for each Sentinel-2 band – Correlation and Entropy – resulting in a total of GLCM 80 layers.
- In ENVI:
-
Filters- > Texture -> Occurrence Measures
-
Open file
S2A_MSIL2A_20210921T025551_N9999_R032_T50_20221119T225446_super_resolved_mosaic.tif
-
Choose
Textures to Compute
-
Choose window size, e.p
3*3
-
Choose ouput location and file name
- In Python:
-
package
scikit-image
graycomatrix
andgreycoprops
11 -
scikit-image feature是一个强大的python可以调用的计算特征库。对于常见的图像特征可以直接调用scikit-image feature中封装好的函数来计算,速度也比自己编写的函数快12.
-
灰度共生矩阵法(GLCM, Gray-level co-occurrence matrix),就是通过计算灰度图像得到它的共生矩阵,然后透过计算该共生矩阵得到矩阵的部分特征值,来分别代表图像的某些纹理特征(纹理的定义仍是难点)13。
-
Fast Gray-Level Co-Occurrence Matrix (GLCM) by numpy. This script calculates GLCM without per pixel For loop, and works faster than GLCM on scikit-image14.
-
conda install -c anaconda scikit-image
-
- In GEE:
Another method to obtain GLCM and it’s textures are using GEE (Google Earth Engine).
Codes are saving in https://code.earthengine.google.com/GLCM_test
-
Run
GLCM_test
script and inTasks
tab, click run. -
After task submitted to Google Drive, download it to local drive.
D:\DATA\LULC\01_Prepare\GEE\GLCM
-
If the
GeoTiff
is too large, GEE converted it into several tiles15. -
In
QGIS
use raster toolRaster - Misscellaneous - Merge
to mosaic all tiles.
Band Merge
In Graph Bilder
Read
->Band Merge
->Write
Band Select
-
Raster
->Data Concersion
->Band Select
-
Select bands without
_count
and_flag
-
Edit
.dim
file -
Delete
<Flag_Coding name="flags">
in.dim
-
Delete
<Masks>
in.dim
-
Delete
<VALID_MASK_TERM>B2_count > 0</VALID_MASK_TERM>
for all bands in.dim
DEM and slope
- SRTM V3 DEM
- Extent of
select and stack
dataset- southBound 34.01983866595984 ascii
- northBound 34.58781619824366 ascii
- eastBound 117.7167699067148 ascii
- westBound 116.81001605948157 ascii
- In ArcGIS Pro
mosaic to New Raster
- Extent same as layer
xuzhou_center_dissolve
- Extent of
Process in SNAP
- Prepare stacked dataset with all Sentinel bands, indexs and DEM.
When stack sentinel bands with DEM, we use collocation
to porcesss.
Band Merge is not the correct operator to bring two products together. I recommend the Collocation tool1.
Training dataset
Stacked dataset
-
Sentinel-2A
- 10m: B2 B3 B4 B8
- 20m: SRB5 SRB6 SRB7 SRB8A SRB9 SRB11 SRB12
-
SRTM V3 DEM
- Extent of
select and stack
dataset- southBound 34.01983866595984 ascii
- northBound 34.58781619824366 ascii
- eastBound 117.7167699067148 ascii
- westBound 116.81001605948157 ascii
- In ArcGIS Pro
mosaic to New Raster
- Extent same as layer
xuzhou_center_dissolve
- Extent of
-
slope
-
Indexs
-
NDVI
-
MNDWI
-
REPI
-
BI2
-
NDBI = (SWIR - NIR) / (SWIR + NIR)
- Sentinel2A SWIR: B11, NIR: B8A (in SNAP: (SRB11 - SRB8A) / (SRB11 + SRB8A))
- Landsat 8 SWIR:B6, NIR:B5
-
EVI2
Raster Calculator Expression
inQGIS
:2.5 * ( "20210921_fullstack_rename@4" - "20210921_fullstack_rename@3" ) / ( "20210921_fullstack_rename@4" + 2.4 * "20210921_fullstack_rename@3" + 1
-
NDRE
Raster Calculator Expression
:(NIR - RED) / (NIR + RED)
-
-
Texture Analysis
Gray-Level co-occurrence matrix (GLCM)
-
Bands_GLCM
-
PCA_GLCM
To combine all PCA_GLCM bands into one, we process all PCA_GLCM output in QGIS.
- use
split image tool
fromOrfeo toolbox
21.
OTB
Applications are fully integrated inQGIS
since QGIS 3.8. You can configure OTB for QGIS. Since QGIS 3.22: the plugin is not activated by default. It should be activated in the plugins settings (Plugins/Manage and Install Plugins... toolbar
). The plugin should then be configured as detailed in the QGIS documentation (see the links provided above).
You can see OTB under 'Providers’22:
- Expand OTB tab
- Tick Activate option
- Set OTB folder. This is location of your OTB installation.
- Set OTB application folder. This is location of your OTB applications. <OTB_FOLDER>/lib/otb/applications
- Click ;ok’ to save settings and close dialog. If settings are correct, you will have OTB algorithms loaded in Processing toolbox
-
Load all
PCA_GLCM
image inQGIS
-
In
Processing
-Toolbox
-OTB
-Image Manipulation
-SplitImage
- Builds a
VRT
(Virtual Dataset) that is a mosaic of the list of inputGDAL
-supported rasters. With a mosaic you can merge several raster files23.
Raster
-Miscellaneous
-Build Virtual Raster
- Get fullstack raster imfomations
Raster
-Miscellaneous
-Raster Information
- Rename band names in
fullstack.vrt
There is no way to name bands in QGIS while merging24.
But it can be done after the file is created, by editing their
.aux.xml
file. It works for both.tif
and.img
files, as far as I’ve researched.The solution is to include after each
<PAMRasterBand band="1">
,<PAMRasterBand band="2">
, etc, a subelement<Description>YourBandName</Description>
. I haven’t tested it forVRT
datasets, but the link at the end suggests that it should be the same procedure.
For vrt
files:
You can still put then again by editing the merged
.vrt
and adding the<Metadata> <MDI key="Band">LandsatBand X</MDI> </Metadata>
at appropriate place25.
Use Plugins: Install plugin Rename bands
for QGIS
26!
- Export renamed
vrt
togeotiff
To output a compressed tif, use the Processing
Toolbox
->GDAL
->Translate (Convert Format)
and specify High compression in theAdvanced parameters
->Additional Creation options
->Profile
27
- Repeoject to
EPSG:32650
Raster
-Projections
-Warp
Set option for Nodata Value
to 0
and Output file resolution
to 10
- Roads map
We use roads map extracted from OSM D:\DATA\LSS\Roads\OSM\Roads_xuzhou_OSM.shp
and process in QGIS
.
-
Vector
-Data Managment Tools
-Reproject Layer
-
Raster
-Conversion
-Rasterise (Vector to Raster)
-
Layer Properties
-Transparency
-Add Values Manually
-From 0 to 0 set 100
List of stacked bands:
Nr. | Band |
---|---|
1 | B2 |
2 | B3 |
3 | B4 |
4 | B5 |
5 | B6 |
6 | B7 |
7 | B8 |
8 | B8A |
9 | B9 |
10 | B11 |
11 | B12 |
12 | NDVI |
13 | NDRE |
14 | EVI2 |
15 | BI2 |
16 | MNDWI |
17 | NDBI |
18 | DEM |
19 | slope |
20 | aspect |
-
Remove
s2rep
due to low performance.- In QGIS,
GDAL
->Raster conversion
->Rearrange bands
- In QGIS,
-
Add
Aspect
bands using DEM
Final Processing Layers
-
Fullstack @
D:\DATA\LULC\01_Prepare\Sentinel2\07_Final_outputs\20210921_fullstack.tif
-
PCA_GLCM @
D:/DATA/LULC/01_Prepare/GEE/GLCM/clip/20210920_PCA_GLCM_fullstack_resize.tif
-
Ground Truth @
D:/DATA/LULC/04_Validation/ESRI_10m/50S_20210101-20220101.tif
-
Clip Raster by Extent - Rename - Resize to
(8187,6239)
-
Fullstack @
D:/DATA/LULC/02_Training/03_Final_Input/20210921_fullstack_xz.tif
-
PCA_GLCM @
D:/DATA/LULC/02_Training/03_Final_Input/20210921_PCA_GLCM_xz.tif
-
Ground Truth @
D:/DATA/LULC/04_Validation/ESRI_10m/20210921_GroundTruth_xz.tif
Process in ArcGIS
Using SRTM DEM to generate slope in Xuzhou.
Process in SNAP
- Prepare stacked dataset with all Sentinel bands, indexs and DEM.
When stack sentinel bands with DEM, we use collocation
to porcesss.
Band Merge is not the correct operator to bring two products together. I recommend the Collocation tool1.
Training Dataset
Google Earth Pro + Sentinel2 + Land Use Map 2014 + Bing Map
Create training dataset in QGIS.
Land-cover types:
- Build-up (red)
- Farmland (green)
- Forest (cyan)
- Meadow (yellow)
- Water (blue)
- Other land (black)
Get Imagery metadata for Bing Map
Bing Virtual Earth
data can be added in QGIS using HCMGIS
plugin, but this dataset is without date information. For better visual interpretation of satellite and aerial imagery, we need to obtain metadata for Bing Map
. Here is the solution.
Use the following URL templates to get metadata for imagery that is hosted by Bing Maps. The imagery metadata returned includes URLs and dimensions for imagery tiles, ranges of zoom levels, and imagery vintage information8.
1 |
|
Error: Access was denied.
I pasted this example URL from your Imagery Metadata webpage: https://dev.virtualearth.net/REST/V1/Imagery/Metadata/Aerial/40.714550167322159,-74.007124900817871?zl=15&o=xml&key={BingMapsKey}.
It returned a 401 error code ‘Access was denied. You may have entered your credentials incorrectly, or you might not have access to the requested resource or operation.’
Solution:
1 |
|
Url to get imagery metadata: https://dev.virtualearth.net/REST/V1/Imagery/Metadata/Aerial/34.3788,117.280379?zl=16&o=xml&key=AtlVktKCspTDTH1P31NUy7hkyx8d202D7UiTaarrJv96IU2v8bzk_vJeH3D_C93b
1 |
|
Get Imagery metadata for Google Satellite
Obtain imagery date from visual interpretation in Google Earth Pro: 2022-09-05
Google API
not available yet, maybe use this way later.
Create Land Type shapefile
-
Create
New Shapefile Layer
-
Build-up
-
Farmland
-
Forest
-
Meadow
-
Water
-
-
- Reproject
Vector
-> Data Management Tools
-> Reproject Layer
- Merge
Vector
-> Data Management Tools
-> Merge Vector Layers
Ground Truth Data
Sentinel-2 10m land cover time series
This layer displays a global map of land use/land cover (LULC) derived from ESA Sentinel-2 imagery at 10m resolution. Each year is generated from Impact Observatory’s deep learning AI land classification model used a massive training dataset of billions of human-labeled image pixels developed by the National Geographic Society. The global maps were produced by applying this model to the Sentinel-2 scene collection on Microsoft’s Planetary Computer, processing over 400,000 Earth observations per year9.
Class definitions
-
Water
Areas where water was predominantly present throughout the year; may not cover areas with sporadic or ephemeral water; contains little to no sparse vegetation, no rock outcrop nor built up features like docks; examples: rivers, ponds, lakes, oceans, flooded salt plains. -
Trees
Any significant clustering of tall (~15 feet or higher) dense vegetation, typically with a closed or dense canopy; examples: wooded vegetation, clusters of dense tall vegetation within savannas, plantations, swamp or mangroves (dense/tall vegetation with ephemeral water or canopy too thick to detect water underneath). -
Flooded vegetation
Areas of any type of vegetation with obvious intermixing of water throughout a majority of the year; seasonally flooded area that is a mix of grass/shrub/trees/bare ground; examples: flooded mangroves, emergent vegetation, rice paddies and other heavily irrigated and inundated agriculture. -
Crops
Human planted/plotted cereals, grasses, and crops not at tree height; examples: corn, wheat, soy, fallow plots of structured land. -
Built Area
Human made structures; major road and rail networks; large homogenous impervious surfaces including parking structures, office buildings and residential housing; examples: houses, dense villages / towns / cities, paved roads, asphalt. -
Bare ground
Areas of rock or soil with very sparse to no vegetation for the entire year; large areas of sand and deserts with no to little vegetation; examples: exposed rock or soil, desert and sand dunes, dry salt flats/pans, dried lake beds, mines. -
Snow/Ice
Large homogenous areas of permanent snow or ice, typically only in mountain areas or highest latitudes; examples: glaciers, permanent snowpack, snow fields. -
Clouds
No land cover information due to persistent cloud cover. -
Rangeland
Open areas covered in homogenous grasses with little to no taller vegetation; wild cereals and grasses with no obvious human plotting (i.e., not a plotted field); examples: natural meadows and fields with sparse to no tree cover, open savanna with few to no trees, parks/golf courses/lawns, pastures. Mix of small clusters of plants or single plants dispersed on a landscape that shows exposed soil or rock; scrub-filled clearings within dense forests that are clearly not taller than trees; examples: moderate to sparse cover of bushes, shrubs and tufts of grass, savannas with very sparse grasses, trees or other plants.
Data Visualization
To dispaly all sentinel bands in python, we use:
As we discussed, the data contains 12 bands. let’s visualize each band using the
EarhPy
package. theplot_bands()
the method takes the stack of the bands and plots along with custom titles which can be done by passing unique titles for each image as a list of titles using the title= parameter10.
Preprocessing
Standardization is another scaling technique where the values are centered around the mean with a unit standard deviation. This means that the mean of the attribute becomes zero and the resultant distribution has a unit standard deviation. The scaled data is divided into train and test data in the ratio of 30:70. The below code is used to scale and split the data10.
1 |
|
Visualizetion map of KNN
Let’s visualize the classification map of K-NNC, the below code is used to predict the labels of the Sundarbans data and plot the data using plot_bands() method from the earthpy package10.
1 |
|
Sentinel-2 10 m Land Use/Land Cover Time series Downloader
This application provides access to individual 10-meter resolution GeoTIFF scenes for all land masses on the planet, for each year from 2017-20213.
All scenes for each year are also available to download as a zip file: 2017, 2018, 2019, 2020, 2021.
Each annual zip download is approximately 60 GB of global extent.