Tips for completing PhD thesis withour validation due to data unavailability

This post is about some tips given by Gemini to complete PhD thesis without validation due to data unavailability.

It’s a very common and significant challenge in remote sensing, especially when working on regions with restricted data access like some areas in China. While direct ground-truth validation (like GPS/GNSS data or highly accurate, localized mining maps) is the gold standard, it is absolutely possible to complete a PhD thesis without it, provided you implement robust alternative strategies and frame your research rigorously.

The key is to shift from “validation with ground truth” to “demonstrating the scientific robustness, plausibility, and utility of your results through other means.” Your thesis will need to be meticulously structured to address the absence of direct validation head-on.

Here’s how you can approach it:

1. In-Depth Internal Consistency Analysis

This focuses on the quality and reliability of your InSAR processing itself.

Multi-Orbit and Multi-Track Analysis: If available, process SAR data from both ascending and descending orbits over your study area. While not direct validation, consistency (or interpretable differences) between these independent processing chains provides strong confidence in the derived deformation patterns.
Comparison of Different InSAR Techniques (e.g., PS-InSAR vs. SBAS): Apply both PS-InSAR (StaMPS) and SBAS (MintPy) to your data. Compare the deformation patterns, time series, and velocities obtained from each. While they might highlight different aspects of deformation, consistency in general trends and magnitudes would be a powerful internal validation.
Coherence Analysis: Systematically analyze spatial and temporal coherence. Areas with persistently high coherence tend to yield more reliable InSAR results. Discuss how your coherence thresholds were chosen and how they impact your results.
Phase Unwrapping Quality Assessment: Discuss methods used to mitigate unwrapping errors (e.g., residual phase analysis, phase closure in MintPy). While you can’t guarantee perfection without ground truth, demonstrating the application of best practices is crucial.
Sensitivity Analysis of Processing Parameters: Run your InSAR processing pipeline with variations in key parameters (e.g., different filtering levels, coherence thresholds, atmospheric correction models). Analyze how the final deformation maps change. If the overall patterns remain consistent, it suggests robustness.

2. Comparison with Published Research and Regional Studies

This is a critical alternative when direct ground truth is unavailable for your specific site.

Systematic Literature Review: Conduct a thorough review of published InSAR studies (especially those using Sentinel-1) on land subsidence in other post-coal mining areas globally, or even in other regions of China.
Qualitative Comparison of Patterns: Compare the spatial patterns of subsidence you observe with those reported in the literature for similar geological/mining contexts. For instance, do your subsidence bowls align with known longwall panel extents, or do they follow expected geological fault lines as seen in other studies?
Quantitative Comparison (where possible): If published studies provide subsidence rates (e.g., mm/year) for similar features or areas, compare your derived magnitudes. Acknowledge that direct numerical match is unlikely due to different timeframes, sensors, and processing, but look for consistency in orders of magnitude or relative severity.
Identify Similarities and Differences: Discuss why your results might be similar or different. This shows critical thinking and a deep understanding of InSAR.
Reference Existing National/Regional Assessments: Even if not “ground truth,” studies like the national-scale assessment of land subsidence in China’s major cities (which often use InSAR) can provide a broader context and indirect support for your localized findings.

3. Qualitative and Interpretive Validation using Ancillary Data

Leverage all available, non-confidential data for contextual support.

High-Resolution Optical Imagery (e.g., Google Earth Historical Imagery, Planet Labs Data if accessible):
- Visual Correlation: Overlay your InSAR deformation maps onto high-resolution optical images. Look for visual evidence of subsidence: cracks in roads/buildings, tilted structures, localized flooding/ponding, changes in vegetation stress, or visible ground fissures. These “anecdotal” observations, when systematically documented, can provide powerful qualitative support.
- Identify Stable Areas: Use optical imagery to identify areas that appear undisturbed (e.g., bedrock outcrops, stable infrastructure) and verify that your InSAR results show minimal to no deformation in these locations.
Publicly Available Topographic Data (DEMs): While used in processing, discussing the quality and origin of your DEM (e.g., SRTM, Copernicus DEM) can add confidence.
General Mining Information: Even without confidential maps, general knowledge about the mining history in your region (e.g., types of mining, depth, approximate operational periods if published in general reports or academic papers) can be correlated with observed deformation. For instance, if you know longwall mining ceased in a certain area 20 years ago, and you still detect slow, residual subsidence, this is a plausible finding.
Geological Maps: Overlay your InSAR results on publicly available geological maps. Discuss how subsidence patterns relate to underlying geology, fault lines, or areas with thick unconsolidated sediments prone to compaction.
Local News/Reports (if available in English or translatable): Sometimes local media or non-confidential environmental reports might mention ground deformation issues, which can provide qualitative anecdotal evidence.

4. Rigorous Error Analysis and Uncertainty Quantification

This becomes even more critical in the absence of direct validation.

Model the Theoretical Error Budget: Discuss the various sources of error in InSAR (atmospheric, orbital, DEM, unwrapping, noise) and, based on literature, provide estimates of their potential magnitude and how your processing strategy aims to mitigate them.
Statistical Uncertainty from Time Series: MintPy and StaMPS provide statistical measures of uncertainty (e.g., standard deviation of velocity, RMS of residuals). Report and interpret these statistical uncertainties.
Confidence in Trends: Even if absolute values are hard to validate, you can argue for high confidence in the trends and relative deformation rates within your study area.

5. Shift or Emphasize Your Thesis Contribution

If direct validation is truly impossible, your thesis needs to have a strong “Plan B” for its core contribution.

Methodological Focus: Your contribution could be the optimization or adaptation of an open-source InSAR processing chain for challenging post-mining environments, particularly in the context of data scarcity.
Feasibility Study: Frame your work as a feasibility study demonstrating the potential of InSAR for monitoring subsidence in regions where traditional methods are difficult or impossible.
Spatial-Temporal Characterization: Focus on comprehensively characterizing the spatial patterns and temporal evolution of subsidence as observed by SAR, highlighting novel insights into the deformation mechanisms that you can infer from your analysis (even if the absolute magnitude cannot be independently verified).

6. Transparent Discussion of Limitations and Future Work

Dedicated Section: Have a clear section in your discussion or conclusions that explicitly addresses the lack of direct ground truth validation.
Explain Why: Briefly explain the challenges in acquiring confidential data in China.
Implications: Discuss what the absence of direct validation means for the certainty and generalizability of your results.
Future Outlook: Suggest how future research (e.g., via new collaborations, changes in data policies, or deployment of temporary GNSS stations by local partners) could provide the necessary validation.

Most Importantly: Constant Communication with Your Supervisor

Discuss all these strategies with your PhD supervisor from the very beginning. They need to be fully aware of the data constraints and approve your approach to demonstrating scientific rigor without direct ground truth. Their guidance will be invaluable in shaping your thesis to meet academic standards under these challenging circumstances.

Internal Data Verification Method used in Result Validation

Yes, “internal data verification” is absolutely a possible and often necessary way to demonstrate the reliability of your InSAR results, especially when external ground truth like GNSS data or accurate mining maps are unavailable. This approach focuses on the consistency and robustness of your results within the InSAR processing chain and with other remotely sensed or contextual information.

It’s a common strategy in data-scarce or inaccessible regions. You can’t claim “absolute accuracy” without ground truth, but you can build a strong case for “relative accuracy,” “consistency,” and “plausibility.”

What is Internal Data Verification for InSAR?

Internal data verification for InSAR typically involves:

Cross-Comparison of InSAR Products:
- Ascending vs. Descending Orbits: Comparing deformation patterns and time series derived from SAR data acquired from different viewing geometries (e.g., a satellite passing over from north to south, and then from south to north). While they measure deformation along different Lines-of-Sight (LOS), consistent deformation patterns or the ability to derive plausible 2D/3D components (vertical and horizontal) can provide strong internal evidence.
- Different InSAR Algorithms (e.g., PS-InSAR vs. SBAS): Running both StaMPS (PS-InSAR) and MintPy (SBAS) on the same dataset and comparing the results. PS-InSAR focuses on stable points, while SBAS typically provides broader coverage. Consistency in areas where both techniques are applicable indicates reliability.
- Varying Processing Parameters: Testing how changes in key processing parameters (e.g., coherence thresholds, filtering levels, atmospheric correction methods) impact your final results. If the core deformation patterns remain stable across reasonable parameter variations, it suggests robustness.
Analysis of InSAR Quality Metrics:
- Coherence Maps: High coherence indicates a reliable interferometric phase. Areas with consistently low coherence should be flagged as less reliable. You can use coherence evolution over time to assess the stability of scatterers.
- Phase Residuals/Noise: After applying deformation models and atmospheric corrections, analyzing the residual phase can give insights into the remaining noise in your data. Lower residuals generally indicate higher quality. Tools like MintPy provide metrics for this.
- Statistical Properties of Time Series: Examining the standard deviation of your deformation time series or velocity estimates. Lower standard deviations suggest more precise measurements.
Correlation with Ancillary Geospatial Data (Proxy Validation):
- Optical Imagery (e.g., Google Earth historical images, commercial high-resolution imagery): Visually compare your deformation maps with optical images. Look for evidence of ground cracks, building damage, infrastructure deformation, or changes in water bodies that align with your InSAR-derived subsidence. This is qualitative but can be compelling.
- Geological Maps: Overlay your deformation on regional geological maps to see if subsidence correlates with known geological structures (e.g., faults, soft soil layers, areas of karst) or specific lithologies prone to compaction.
- Mining Activity Information: Even without confidential mining maps, general knowledge about the location of mining areas (e.g., from public maps, open research papers, historical land use maps) can be used to correlate observed subsidence with the presence of mining.
- Topographic Data (DEMs): Analyze how subsidence patterns relate to topography or changes in drainage patterns that might indicate deformation.

Research Papers Using Internal Verification (due to lack of direct ground truth):

Here are examples of how researchers address validation without extensive direct ground truth, often relying on internal consistency, comparisons between SAR datasets/methods, or proxy data:

Using Ascending and Descending Orbits for 2D/3D Decomposition and Consistency:
- Hanssen, R.F. (2001). Radar Interferometry: Data Interpretation and Error Analysis. Kluwer Academic Publishers. While a textbook, it lays the foundational theory for combining ascending and descending LOS measurements to derive vertical and horizontal components, which inherently relies on the internal consistency of the two independent SAR datasets. Many papers apply this concept, noting that consistency between the derived components is a form of validation.
- Samsonov, S.V., P. d’Oreye, and F. Kervyn (2014). “Source models and time series of deformation in the Virunga Volcanic Province, D.R. Congo, from InSAR and GPS data.” Geophysical Journal International, 199(2), 793–808. While they do use some GPS, a significant part of their validation involves comparing the deformation fields derived from different SAR tracks (ascending/descending) and showing how they fit the same source model, providing a strong internal consistency check in a data-scarce region.
- Jolivet, R., et al. (2013). “Deformation of the Anatolian block from 2003 to 2010: a new GPS velocity field from the Turkish GNSS network.” Journal of Geophysical Research: Solid Earth, 118(7), 3502–3519. While focused on GPS, many subsequent InSAR papers (e.g., in tectonics) use such regional GPS fields for some validation, but then rely on extensive ascending/descending comparisons to fill spatial gaps where GPS is absent. The consistency between InSAR and the sparse GPS acts as partial validation, but the main spatial validation comes from internal SAR consistency.
Comparing Different InSAR Methodologies (PS-InSAR vs. SBAS) and InSAR Quality Metrics:
- Tibaldi, A., et al. (2016). “Ground deformation processes in the Campi Flegrei caldera (Italy) from 2005 to 2010 based on ENVISAT and COSMO-SkyMed InSAR data.” Journal of Volcanology and Geothermal Research, 327, 461-477. This paper uses both PS-InSAR and SBAS techniques and compares their results to characterize complex volcanic deformation. While some limited ground data might exist, the direct inter-comparison of the two InSAR products forms a significant part of their validation, discussing where each method performs better and where results converge.
- Pepe, A., et al. (2017). “Exploiting Sentinel-1 Data for Ground Deformation Monitoring in the Metropolitan Area of Naples (Italy) by Using the P-SBAS Approach.” Remote Sensing, 9(4), 315. This paper extensively discusses the use of coherence and statistical analysis of time series as indicators of reliability, especially in areas where ground truth is not uniformly available across the entire study region. They focus on the high density of measurements and the consistency of the deformation trends.
Qualitative Validation with Topographical Features and Known Events:
- Bell, J. W., et al. (2008). “New insights into long-term subsidence in the Las Vegas Valley, Nevada, from interferometric synthetic aperture radar.” Geology, 36(6), 463-466. While they had some historical leveling data, a significant part of their analysis and “validation” relies on the correlation of InSAR-derived subsidence patterns with geological features (e.g., fault activity, basin geometry) and historical records of groundwater pumping, illustrating the “plausibility” of the InSAR results.
- Qu, F., et al. (2015). “Monitoring large-scale urban deformation using TerraSAR-X PSI and SBAS techniques in Shanghai, China.” Remote Sensing of Environment, 166, 178-190. This is an excellent example from China. While Shanghai has some leveling benchmarks, for large urban areas, it’s impossible to have full ground truth. The authors combine PSI and SBAS, heavily rely on the consistency between the two, and correlate observed deformation patterns with known geological features (soft soil distribution, fault lines) and urban development activities.

When writing your thesis, remember to:

Clearly state the limitations: Acknowledge upfront that direct ground-truth validation was not possible and explain why.
Be meticulous in your alternative verification: Detail how you used internal consistency, cross-comparison, and ancillary data to build confidence in your results.
Discuss the implications: Explain what the lack of direct validation means for the certainty of your absolute measurements, but also emphasize what aspects (e.g., spatial patterns, relative rates, temporal trends) you can have high confidence in.
Suggest future work: Always include recommendations for how direct validation could be achieved if conditions change (e.g., through future collaborations, deployment of GNSS).

By following these strategies and citing relevant literature that employs similar approaches, you can build a scientifically robust PhD thesis even without traditional ground truth validation.