Gemini优化asf网站python下载脚本

此篇博客记录使用个gemini优化ASF网站的sentinel burst数据python下载脚本。


你提供的这个 download-all-2025-07-15_15-33-05.py 脚本是 ASF 官方为批量下载数据而生成的 Python 脚本。它使用了 Python 标准库中的 urllib.request 进行 HTTP 请求,这是一个同步阻塞库,这意味着它一次只能下载一个文件,并且在等待网络响应时会停止其他操作,这正是你观察到速度较慢的原因。

要像 aria2c 那样实现多线程或并发下载以提高速度,我们需要对这个脚本进行重大修改


修改思路和方法

最直接和推荐的方法是使用 concurrent.futures 模块 来实现多线程或多进程下载。考虑到下载是 I/O 密集型任务(等待网络响应),使用多线程 (ThreadPoolExecutor) 通常是更合适的选择,因为它可以有效地处理多个并发的网络连接,而不会受到 Python 全局解释器锁 (GIL) 的限制。

下面是如何修改你的脚本以实现多线程下载的步骤和代码:

  1. 导入必要的模块: 引入 concurrent.futures
  2. 定义一个下载单个文件的函数: 将原脚本中的 download_file_with_cookiejar 方法稍微修改一下,使其能够作为独立的函数被多线程执行器调用。
  3. download_files 方法中实现多线程逻辑: 使用 ThreadPoolExecutor 来管理并发下载任务。
  4. 处理认证和 cookie 共享: 需要确保每个线程都能正确地使用 Earthdata Login 认证信息。最简单的方法是让每个线程(或工作进程)都从相同的 cookie_jar_path 加载 cookie,或者在启动线程时传递必要的认证信息。考虑到 urllib.request 的 opener 是全局安装的,可能需要为每个线程创建一个新的 opener,并确保 cookie jar 是线程安全的。

修改后的代码 (使用 concurrent.futures.ThreadPoolExecutor)

我将修改你提供的脚本的 bulk_downloader 类中的 download_file_with_cookiejardownload_files 方法。

主要修改点:

  • download_one_file_wrapper 函数: 创建一个封装函数,它接收单个文件 URL 和下载器实例,以便在线程池中使用。这样做是为了避免在类方法中直接处理线程池(因为 self 会带来复杂性)。
  • download_file_with_cookiejar 独立性: 确保这个方法可以被多个线程安全地调用,这可能意味着它不应该依赖于全局的 install_opener,而是在每次调用时创建或传递自己的 opener。
  • download_files 方法: 使用 ThreadPoolExecutor 来提交下载任务并收集结果。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
#!/usr/bin/python
"""
Usage:

In a terminal/command line, cd to the directory where this file lives. Then...

With embedded urls: ( download the hardcoded list of files in the 'files =' block below)

python ./download-all-2025-07-15_15-33-05.py

Download all files in a Metalink/CSV: (downloaded from ASF Vertex)

python ./download-all-2025-07-15_15-33-05.py /path/to/downloads.metalink localmetalink.metalink localcsv.csv

Compatibility: python >= 3.6

If downloading from a trusted source with invalid SSL Certs, use --insecure to ignore

For more information on bulk downloads, navigate to:
https://asf.alaska.edu/how-to/data-tools/data-tools/#bulk_download

This script was generated by the Alaska Satellite Facility's bulk download service.
For more information on the service, navigate to:
http://bulk-download.asf.alaska.edu/help
"""


import sys
import csv
import os
import os.path
import tempfile
import shutil
import re

import base64
import time
import getpass
import ssl
import signal
import socket

import xml.etree.ElementTree as ET

from urllib.request import build_opener, install_opener, Request, urlopen
from urllib.request import HTTPHandler, HTTPSHandler, HTTPCookieProcessor
from urllib.error import HTTPError, URLError

from http.cookiejar import MozillaCookieJar
from io import StringIO

# --- 新增导入 ---
import concurrent.futures # For multi-threading/multi-processing
from threading import Lock # To make cookie jar operations thread-safe if needed

###
# Global variables intended for cross-thread modification
abort = False
# --- 增加一个锁,用于保护共享资源,比如 cookie jar 的写入操作 ---
cookie_jar_lock = Lock()


###
# A routine that handles trapped signals
def signal_handler(sig, frame):
global abort
sys.stderr.write("\n > Caught Signal. Exiting!\n") # Changed print to sys.stderr.write
abort = True # necessary to cause the program to stop
raise SystemExit # this will only abort the thread that the ctrl+c was caught in

# --- 新增:为每个线程创建一个独立的 opener,并管理 cookie ---
def get_thread_opener(cookie_jar_path, context):
cookie_jar = MozillaCookieJar()
if os.path.isfile(cookie_jar_path):
try:
cookie_jar.load(cookie_jar_path, ignore_discard=True, ignore_expires=True)
except Exception as e:
sys.stderr.write(f"Warning: Could not load cookie jar in thread: {e}\n")
# If loading fails, proceed without cookies for this thread, or re-authenticate

opener = build_opener(
HTTPCookieProcessor(cookie_jar),
HTTPHandler(),
HTTPSHandler(**context)
)
return opener, cookie_jar

# --- 新增:一个包装函数,用于在线程池中执行下载任务 ---
# 这个函数需要接收所有必要的参数,而不是依赖全局状态
def download_one_file_wrapper(url, file_count, total, cookie_jar_path, context, asf_urs4):
global abort # 允许函数访问全局的 abort 变量

# 每个线程需要自己的 opener 和 cookie_jar 实例,以避免冲突
# 注意:这里的 cookie_jar 只是当前线程的,不会自动与其他线程同步
# 理论上,当需要重新认证并保存cookie时,仍需要一个锁来保护文件写入。
# 对于大多数情况,如果主线程已经认证并保存了cookie,子线程可以直接加载使用。
opener, thread_cookie_jar = get_thread_opener(cookie_jar_path, context)
install_opener(opener) # 为当前线程安装 opener

# 创建一个临时的 downloader 实例来调用 download_file_with_cookiejar 方法
# 这样做是为了复用原脚本的下载逻辑,而不需要把它完全重写成一个独立函数
temp_downloader = bulk_downloader(init_empty=True) # 使用一个标志来避免重复初始化文件列表和认证
temp_downloader.cookie_jar_path = cookie_jar_path
temp_downloader.cookie_jar = thread_cookie_jar
temp_downloader.context = context
temp_downloader.asf_urs4 = asf_urs4

# 手动处理认证重定向,如果 download_file_with_cookiejar 内部触发了
# 注意:这个逻辑有点复杂,因为原始脚本的 get_new_cookie() 是交互式的。
# 在多线程环境中,你不能让每个线程都弹窗要求用户名密码。
# 假设在多线程下载开始前,主线程已经通过 get_cookie() 成功认证并保存了 cookie。
# 那么子线程只需要加载和使用这些 cookie。

# 运行下载
size, total_size = temp_downloader.download_file_with_cookiejar(url, file_count, total)

# 返回结果
return {
'url': url,
'size': size,
'total_size': total_size,
'status': 'success' if temp_downloader.is_good_download(total_size, size) else 'failed'
}


class bulk_downloader:
# --- 新增 init_empty 参数,用于多线程实例 ---
def __init__(self, init_empty=False):
if init_empty:
self.files = [] # 避免重复加载文件列表
self.cookie_jar_path = None
self.cookie_jar = None
self.asf_urs4 = {}
self.context = {}
return

# List of files to download
self.files = [ "https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20231116T100545_20231116T100612_051241_062E6C_0E03/IW1/VV/1.tiff",
"https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20231116T100545_20231116T100612_051241_062E6C_0E03/IW1/VV/0.tiff",
"https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20231104T100545_20231104T100608_051066_062861_1D22/IW1/VV/1.tiff",
"https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20231104T100545_20231104T100608_051066_062861_1D22/IW1/VV/0.tiff",
"https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20231011T100545_20231011T100612_050716_061C6C_EBE7/IW1/VV/1.tiff",
"https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20231011T100545_20231011T100612_050716_061C6C_EBE7/IW1/VV/0.tiff",
"https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20230824T100544_20230824T100611_050016_060479_B862/IW1/VV/1.tiff",
"https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20230824T100544_20230824T100611_050016_060479_B862/IW1/VV/0.tiff",
"https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20230731T100543_20230731T100610_049666_05F8E6_9783/IW1/VV/1.tiff",
"https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20230731T100543_20230731T100610_049666_05F8E6_9783/IW1/VV/0.tiff",
"https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20230613T100540_20230613T100602_048966_05E36F_5994/IW1/VV/1.tiff",
"https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20230613T100540_20230613T100602_048966_05E36F_5994/IW1/VV/0.tiff",
"https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20230601T100539_20230601T100606_048791_05DE1B_47E1/IW1/VV/1.tiff",
"https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20230601T100539_20230601T100606_048791_05DE1B_47E1/IW1/VV/0.tiff",
"https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20230520T100539_20230520T100606_048616_05D8E5_EF6E/IW1/VV/1.tiff",
"https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20230520T100539_20230520T100606_048616_05D8E5_EF6E/IW1/VV/0.tiff",
"https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20230508T100538_20230508T100605_048441_05D3B5_663A/IW1/VV/1.tiff",
"https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20230508T100538_20230508T100605_048441_05D3B5_663A/IW1/VV/0.tiff",
"https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20230426T100537_20230426T100604_048266_05CDD9_4B44/IW1/VV/1.tiff",
"https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20230426T100537_20230426T100604_048266_05CDD9_4B44/IW1/VV/0.tiff",
"https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20230414T100537_20230414T100604_048091_05C7F9_CC51/IW1/VV/1.tiff",
"https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20230414T100537_20230414T100604_048091_05C7F9_CC51/IW1/VV/0.tiff",
"https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20230402T100537_20230402T100604_047916_05C20A_C9D3/IW1/VV/1.tiff",
"https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20230402T100537_20230402T100604_047916_05C20A_C9D3/IW1/VV/0.tiff",
"https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20230321T100536_20230321T100559_047741_05BC32_9E02/IW1/VV/1.tiff",
"https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20230321T100536_20230321T100559_047741_05BC32_9E02/IW1/VV/0.tiff",
"https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20230309T100536_20230309T100558_047566_05B642_6A57/IW1/VV/1.tiff",
"https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20230309T100536_20230309T100558_047566_05B642_6A57/IW1/VV/0.tiff",
"https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20230225T100536_20230225T100559_047391_05B059_A7AD/IW1/VV/1.tiff",
"https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20230225T100536_20230225T100559_047391_05B059_A7AD/IW1/VV/0.tiff",
"https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20230213T100536_20230213T100559_047216_05AA62_E12E/IW1/VV/1.tiff",
"https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20230213T100536_20230213T100559_047216_05AA62_E12E/IW1/VV/0.tiff",
"https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20230201T100537_20230201T100559_047041_05A486_907E/IW1/VV/1.tiff",
"https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20230201T100537_20230201T100559_047041_05A486_907E/IW1/VV/0.tiff",
"https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20230120T100537_20230120T100559_046866_059EA8_353E/IW1/VV/1.tiff",
"https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20230120T100537_20230120T100559_046866_059EA8_353E/IW1/VV/0.tiff",
"https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20230108T100537_20230108T100600_046691_0598BF_5C9B/IW1/VV/1.tiff",
"https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20230108T100537_20230108T100600_046691_0598BF_5C9B/IW1/VV/0.tiff",
"https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20221227T100538_20221227T100600_046516_0592E0_DA00/IW1/VV/1.tiff",
"https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20221227T100538_20221227T100600_046516_0592E0_DA00/IW1/VV/0.tiff",
"https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20221215T100539_20221215T100606_046341_058CDF_5715/IW1/VV/1.tiff",
"https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20221215T100539_20221215T100606_046341_058CDF_5715/IW1/VV/0.tiff",
"https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20221203T100539_20221203T100602_046166_0586F1_5712/IW1/VV/1.tiff",
"https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20221203T100539_20221203T100602_046166_0586F1_5712/IW1/VV/0.tiff",
"https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20221121T100540_20221121T100603_045991_0580FB_DDFA/IW1/VV/1.tiff",
"https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20221121T100540_20221121T100603_045991_0580FB_DDFA/IW1/VV/0.tiff",
"https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20221109T100540_20221109T100602_045816_057B19_AB1C/IW1/VV/1.tiff",
"https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20221109T100540_20221109T100602_045816_057B19_AB1C/IW1/VV/0.tiff",
"https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20221028T100541_20221028T100603_045641_057529_45F6/IW1/VV/1.tiff",
"https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20221028T100541_20221028T100603_045641_057529_45F6/IW1/VV/0.tiff",
"https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20220922T100539_20220922T100602_045116_056439_C6B0/IW1/VV/1.tiff",
"https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20220922T100539_20220922T100602_045116_056439_C6B0/IW1/VV/0.tiff",
"https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20220910T100540_20220910T100607_044941_055E50_7AC8/IW1/VV/1.tiff",
"https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20220910T100540_20220910T100607_044941_055E50_7AC8/IW1/VV/0.tiff",
"https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20220805T100538_20220805T100605_044416_054CE7_FAB0/IW1/VV/1.tiff",
"https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20220805T100538_20220805T100605_044416_054CE7_FAB0/IW1/VV/0.tiff",
"https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20220724T100537_20220724T100604_044241_0547C5_09A8/IW1/VV/1.tiff",
"https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20220724T100537_20220724T100604_044241_0547C5_09A8/IW1/VV/0.tiff",
"https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20220525T100533_20220525T100600_043366_052DBA_EB07/IW1/VV/1.tiff",
"https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20220525T100533_20220525T100600_043366_052DBA_EB07/IW1/VV/0.tiff",
"https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20220501T100532_20220501T100559_043016_0522D8_F388/IW1/VV/1.tiff",
"https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20220501T100532_20220501T100559_043016_0522D8_F388/IW1/VV/0.tiff",
"https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20220419T100531_20220419T100558_042841_051D0F_CFB9/IW1/VV/1.tiff",
"https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20220419T100531_20220419T100558_042841_051D0F_CFB9/IW1/VV/0.tiff",
"https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20220407T100531_20220407T100558_042666_051731_9A6B/IW1/VV/1.tiff",
"https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20220407T100531_20220407T100558_042666_051731_9A6B/IW1/VV/0.tiff",
"https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20220326T100531_20220326T100558_042491_05114B_42D9/IW1/VV/1.tiff",
"https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20220326T100531_20220326T100558_042491_05114B_42D9/IW1/VV/0.tiff",
"https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20220314T100530_20220314T100557_042316_050B55_5602/IW1/VV/1.tiff",
"https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20220314T100530_20220314T100557_042316_050B55_5602/IW1/VV/0.tiff",
"https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20220206T100531_20220206T100558_041791_04F952_6852/IW1/VV/1.tiff",
"https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20220206T100531_20220206T100558_041791_04F952_6852/IW1/VV/0.tiff",
"https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20220125T100531_20220125T100558_041616_04F34E_B897/IW1/VV/1.tiff",
"https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20220125T100531_20220125T100558_041616_04F34E_B897/IW1/VV/0.tiff",
"https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20220101T100532_20220101T100559_041266_04E794_1BC0/IW1/VV/1.tiff",
"https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20220101T100532_20220101T100559_041266_04E794_1BC0/IW1/VV/0.tiff",
"https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20231116T100545_20231116T100612_051241_062E6C_0E03/IW1/VV/1.xml",
"https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20231116T100545_20231116T100612_051241_062E6C_0E03/IW1/VV/0.xml",
"https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20231104T100545_20231104T100608_051066_062861_1D22/IW1/VV/1.xml",
"https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20231104T100545_20231104T100608_051066_062861_1D22/IW1/VV/0.xml",
"https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20231011T100545_20231011T100612_050716_061C6C_EBE7/IW1/VV/1.xml",
"https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20231011T100545_20231011T100612_050716_061C6C_EBE7/IW1/VV/0.xml",
"https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20230824T100544_20230824T100611_050016_060479_B862/IW1/VV/1.xml",
"https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20230824T100544_20230824T100611_050016_060479_B862/IW1/VV/0.xml",
"https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20230731T100543_20230731T100610_049666_05F8E6_9783/IW1/VV/1.xml",
"https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20230731T100543_20230731T100610_049666_05F8E6_9783/IW1/VV/0.xml",
"https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20230613T100540_20230613T100602_048966_05E36F_5994/IW1/VV/1.xml",
"https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20230613T100540_20230613T100602_048966_05E36F_5994/IW1/VV/0.xml",
"https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20230601T100539_20230601T100606_048791_05DE1B_47E1/IW1/VV/1.xml",
"https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20230601T100539_20230601T100606_048791_05DE1B_47E1/IW1/VV/0.xml",
"https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20230520T100539_20230520T100606_048616_05D8E5_EF6E/IW1/VV/1.xml",
"https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20230520T100539_20230520T100606_048616_05D8E5_EF6E/IW1/VV/0.xml",
"https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20230508T100538_20230508T100605_048441_05D3B5_663A/IW1/VV/1.xml",
"https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20230508T100538_20230508T100605_048441_05D3B5_663A/IW1/VV/0.xml",
"https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20230426T100537_20230426T100604_048266_05CDD9_4B44/IW1/VV/1.xml",
"https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20230426T100537_20230426T100604_048266_05CDD9_4B44/IW1/VV/0.xml",
"https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20230414T100537_20230414T100604_048091_05C7F9_CC51/IW1/VV/1.xml",
"https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20230414T100537_20230414T100604_048091_05C7F9_CC51/IW1/VV/0.xml",
"https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20230402T100537_20230402T100604_047916_05C20A_C9D3/IW1/VV/1.xml",
"https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20230402T100537_20230402T100604_047916_05C20A_C9D3/IW1/VV/0.xml",
"https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20230321T100536_20230321T100559_047741_05BC32_9E02/IW1/VV/1.xml",
"https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20230321T100536_20230321T100559_047741_05BC32_9E02/IW1/VV/0.xml",
"https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20230309T100536_20230309T100558_047566_05B642_6A57/IW1/VV/1.xml",
"https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20230309T100536_20230309T100558_047566_05B642_6A57/IW1/VV/0.xml",
"https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20230225T100536_20230225T100559_047391_05B059_A7AD/IW1/VV/1.xml",
"https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20230225T100536_20230225T100559_047391_05B059_A7AD/IW1/VV/0.xml",
"https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20230213T100536_20230213T100559_047216_05AA62_E12E/IW1/VV/1.xml",
"https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20230213T100536_20230213T100559_047216_05AA62_E12E/IW1/VV/0.xml",
"https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20230201T100537_20230201T100559_047041_05A486_907E/IW1/VV/1.xml",
"https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20230201T100537_20230201T100559_047041_05A486_907E/IW1/VV/0.xml",
"https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20230120T100537_20230120T100559_046866_059EA8_353E/IW1/VV/1.xml",
"https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20230120T100537_20230120T100559_046866_059EA8_353E/IW1/VV/0.xml",
"https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20230108T100537_20230108T100600_046691_0598BF_5C9B/IW1/VV/1.xml",
"https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20230108T100537_20230108T100600_046691_0598BF_5C9B/IW1/VV/0.xml",
"https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20221227T100538_20221227T100600_046516_0592E0_DA00/IW1/VV/1.xml",
"https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20221227T100538_20221227T100600_046516_0592E0_DA00/IW1/VV/0.xml",
"https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20221215T100539_20221215T100606_046341_058CDF_5715/IW1/VV/1.xml",
"https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20221215T100539_20221215T100606_046341_058CDF_5715/IW1/VV/0.xml",
"https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20221203T100539_20221203T100602_046166_0586F1_5712/IW1/VV/1.xml",
"https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20221203T100539_20221203T100602_046166_0586F1_5712/IW1/VV/0.xml",
"https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20221121T100540_20221121T100603_045991_0580FB_DDFA/IW1/VV/1.xml",
"https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20221121T100540_20221121T100603_045991_0580FB_DDFA/IW1/VV/0.xml",
"https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20221109T100540_20221109T100602_045816_057B19_AB1C/IW1/VV/1.xml",
"https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20221109T100540_20221109T100602_045816_057B19_AB1C/IW1/VV/0.xml",
"https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20221028T100541_20221028T100603_045641_057529_45F6/IW1/VV/1.xml",
"https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20221028T100541_20221028T100603_045641_057529_45F6/IW1/VV/0.xml",
"https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20220922T100539_20220922T100602_045116_056439_C6B0/IW1/VV/1.xml",
"https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20220922T100539_20220922T100602_045116_056439_C6B0/IW1/VV/0.xml",
"https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20220910T100540_20220910T100607_044941_055E50_7AC8/IW1/VV/1.xml",
"https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20220910T100540_20220910T100607_044941_055E50_7AC8/IW1/VV/0.xml",
"https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20220805T100538_20220805T100605_044416_054CE7_FAB0/IW1/VV/1.xml",
"https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20220805T100538_20220805T100605_044416_054CE7_FAB0/IW1/VV/0.xml",
"https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20220724T100537_20220724T100604_044241_0547C5_09A8/IW1/VV/1.xml",
"https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20220724T100537_20220724T100604_044241_0547C5_09A8/IW1/VV/0.xml",
"https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20220525T100533_20220525T100600_043366_052DBA_EB07/IW1/VV/1.xml",
"https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20220525T100533_20220525T100600_043366_052DBA_EB07/IW1/VV/0.xml",
"https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20220501T100532_20220501T100559_043016_0522D8_F388/IW1/VV/1.xml",
"https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20220501T100532_20220501T100559_043016_0522D8_F388/IW1/VV/0.xml",
"https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20220419T100531_20220419T100558_042841_051D0F_CFB9/IW1/VV/1.xml",
"https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20220419T100531_20220419T100558_042841_051D0F_CFB9/IW1/VV/0.xml",
"https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20220407T100531_20220407T100558_042666_051731_9A6B/IW1/VV/1.xml",
"https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20220407T100531_20220407T100558_042666_051731_9A6B/IW1/VV/0.xml",
"https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20220326T100531_20220326T100558_042491_05114B_42D9/IW1/VV/1.xml",
"https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20220326T100531_20220326T100558_042491_05114B_42D9/IW1/VV/0.xml",
"https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20220314T100530_20220314T100557_042316_050B55_5602/IW1/VV/1.xml",
"https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20220314T100530_20220314T100557_042316_050B55_5602/IW1/VV/0.xml",
"https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20220206T100531_20220206T100558_041791_04F952_6852/IW1/VV/1.xml",
"https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20220206T100531_20220206T100558_041791_04F952_6852/IW1/VV/0.xml",
"https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20220125T100531_20220125T100558_041616_04F34E_B897/IW1/VV/1.xml",
"https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20220125T100531_20220125T100558_041616_04F34E_B897/IW1/VV/0.xml",
"https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20220101T100532_20220101T100559_041266_04E794_1BC0/IW1/VV/1.xml",
"https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20220101T100532_20220101T100559_041266_04E794_1BC0/IW1/VV/0.xml",
"https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20211220T100533_20211220T100600_041091_04E1B5_EDAE/IW1/VV/1.xml"
]


# Local stash of cookies so we don't always have to ask
self.cookie_jar_path = os.path.join(
os.path.expanduser('~'),
".bulk_download_cookiejar.txt"
)
self.cookie_jar = None

self.asf_urs4 = { 'url': 'https://urs.earthdata.nasa.gov/oauth/authorize',
'client': 'BO_n7nTIlMljdvU6kRRB3g',
'redir': 'https://auth.asf.alaska.edu/login'}

# Make sure we can write it our current directory
if os.access(os.getcwd(), os.W_OK) is False:
sys.stderr.write(f"WARNING: Cannot write to current path! Check permissions for {os.getcwd()}\n") # Changed print to sys.stderr.write
exit(-1)

# For SSL
self.context = {}

# Check if user handed in a Metalink or CSV:
if len(sys.argv) > 0:
download_files = []
input_files = []
for arg in sys.argv[1:]:
if arg == '--insecure':
try:
ctx = ssl.create_default_context()
ctx.check_hostname = False
ctx.verify_mode = ssl.CERT_NONE
self.context['context'] = ctx
except AttributeError:
# Python 2.6 won't complain about SSL Validation
pass

elif arg.endswith('.metalink') or arg.endswith('.csv'):
if os.path.isfile(arg):
input_files.append(arg)
if arg.endswith('.metalink'):
new_files = self.process_metalink(arg)
else:
new_files = self.process_csv(arg)
if new_files is not None:
for file_url in (new_files):
download_files.append(file_url)
else:
sys.stderr.write(f" > I cannot find the input file you specified: {arg}\n") # Changed print to sys.stderr.write
else:
sys.stderr.write(f" > Command line argument '{arg}' makes no sense, ignoring.\n") # Changed print to sys.stderr.write

if len(input_files) > 0:
if len(download_files) > 0:
sys.stdout.write(f" > Processing {len(download_files)} downloads from {len(input_files)} input files. \n") # Changed print to sys.stdout.write
self.files = download_files
else:
sys.stdout.write(f" > I see you asked me to download files from {len(input_files)} input files, but they had no downloads!\n") # Changed print to sys.stdout.write
sys.stdout.write(" > I'm super confused and exiting.\n") # Changed print to sys.stdout.write
exit(-1)

# Make sure cookie_jar is good to go!
self.get_cookie()

# summary
self.total_bytes = 0
self.total_time = 0
self.cnt = 0
self.success = []
self.failed = []
self.skipped = []

# Get and validate a cookie
def get_cookie(self):
if os.path.isfile(self.cookie_jar_path):
self.cookie_jar = MozillaCookieJar()
try:
self.cookie_jar.load(self.cookie_jar_path, ignore_discard=True, ignore_expires=True) # Added ignore_discard/expires for better re-use
except Exception as e:
sys.stderr.write(f"Error loading existing cookie jar: {e}. Will attempt to get a new one.\n")
self.cookie_jar = None # Reset if loading failed

if self.cookie_jar and self.check_cookie():
sys.stdout.write(" > Reusing previous cookie jar.\n") # Changed print to sys.stdout.write
return
else:
sys.stdout.write(" > Could not validate old cookie Jar\n") # Changed print to sys.stdout.write

# We don't have a valid cookie, prompt user or creds
sys.stdout.write("No existing URS cookie found, please enter Earthdata username & password:\n") # Changed print to sys.stdout.write
sys.stdout.write("(Credentials will not be stored, saved or logged anywhere)\n") # Changed print to sys.stdout.write

# Keep trying 'till user gets the right U:P
while self.check_cookie() is False:
self.get_new_cookie()

# Validate cookie before we begin
def check_cookie(self):
if self.cookie_jar is None:
sys.stderr.write(f" > Cookiejar is bunk: {self.cookie_jar}\n") # Changed print to sys.stderr.write
return False

# File we know is valid, used to validate cookie
file_check = 'https://urs.earthdata.nasa.gov/profile'

# Apply custom Redirect Hanlder
opener = build_opener(
HTTPCookieProcessor(self.cookie_jar),
HTTPHandler(),
HTTPSHandler(**self.context)
)
# install_opener(opener) # Removed: Avoids global state, pass opener explicitly or manage per-thread

# Attempt a HEAD request
request = Request(file_check)
request.get_method = lambda : 'HEAD'
try:
sys.stdout.write(f" > attempting to download {file_check}\n") # Changed print to sys.stdout.write
response = opener.open(request, timeout=30) # Use opener directly
resp_code = response.getcode()
# Make sure we're logged in
if not self.check_cookie_is_logged_in(self.cookie_jar):
return False

# Save cookiejar (only if successful and main thread)
with cookie_jar_lock: # Protect file write
self.cookie_jar.save(self.cookie_jar_path, ignore_discard=True, ignore_expires=True) # Added ignore_discard/expires
sys.stdout.write(" > Cookie jar saved.\n") # Confirmation

except HTTPError:
# If we ge this error, again, it likely means the user has not agreed to current EULA
sys.stderr.write("\nIMPORTANT: \n") # Changed print to sys.stderr.write
sys.stderr.write("Your user appears to lack permissions to download data from the ASF Datapool.\n") # Changed print to sys.stderr.write
sys.stderr.write("\n\nNew users: you must first log into Vertex and accept the EULA. In addition, your Study Area must be set at Earthdata https://urs.earthdata.nasa.gov\n") # Changed print to sys.stderr.write
exit(-1)

# This return codes indicate the USER has not been approved to download the data
if resp_code in (300, 301, 302, 303):
try:
redir_url = response.info().getheader('Location')
except AttributeError:
redir_url = response.getheader('Location')

# Funky Test env:
if ("vertex-retired.daac.asf.alaska.edu" in redir_url and "test" in self.asf_urs4['redir']):
sys.stdout.write("Cough, cough. It's dusty in this test env!\n") # Changed print to sys.stdout.write
return True

sys.stderr.write(f"Redirect ({resp_code}) occured, invalid cookie value!\n") # Changed print to sys.stderr.write
return False

# These are successes!
if resp_code in (200, 307):
return True
else:
return False

def get_new_cookie(self):
# Start by prompting user to input their credentials

new_username = input("Username: ")
new_password = getpass.getpass(prompt="Password (will not be displayed): ")

# Build URS4 Cookie request
auth_cookie_url = self.asf_urs4['url'] + '?client_id=' + self.asf_urs4['client'] + '&redirect_uri=' + self.asf_urs4['redir'] + '&response_type=code&state='

user_pass = base64.b64encode(bytes(new_username+":"+new_password, "utf-8"))
user_pass = user_pass.decode("utf-8")

# Authenticate against URS, grab all the cookies
self.cookie_jar = MozillaCookieJar()
opener = build_opener(HTTPCookieProcessor(self.cookie_jar), HTTPHandler(), HTTPSHandler(**self.context))
request = Request(auth_cookie_url, headers={"Authorization": "Basic {0}".format(user_pass)})

# Watch out cookie rejection!
try:
response = opener.open(request)
except HTTPError as e:
if "WWW-Authenticate" in e.headers and "Please enter your Earthdata Login credentials" in e.headers["WWW-Authenticate"]:
sys.stderr.write(" > Username and Password combo was not successful. Please try again.\n") # Changed print to sys.stderr.write
return False
else:
# If an error happens here, the user most likely has not confirmed EULA.
sys.stderr.write("\nIMPORTANT: There was an error obtaining a download cookie!\n") # Changed print to sys.stderr.write
sys.stderr.write("Your user appears to lack permission to download data from the ASF Datapool.\n") # Changed print to sys.stderr.write
sys.stderr.write("\n\nNew users: you must first log into Vertex and accept the EULA. In addition, your Study Area must be set at Earthdata https://urs.earthdata.nasa.gov\n") # Changed print to sys.stderr.write
exit(-1)
except URLError as e:
sys.stderr.write("\nIMPORTANT: There was a problem communicating with URS, unable to obtain cookie. \n") # Changed print to sys.stderr.write
sys.stderr.write("Try cookie generation later.\n") # Changed print to sys.stderr.write
exit(-1)

# Did we get a cookie?
if self.check_cookie_is_logged_in(self.cookie_jar):
with cookie_jar_lock: # Protect file write
self.cookie_jar.save(self.cookie_jar_path, ignore_discard=True, ignore_expires=True) # Added ignore_discard/expires
return True

# if we aren't successful generating the cookie, nothing will work. Stop here!
sys.stderr.write("WARNING: Could not generate new cookie! Cannot proceed. Please try Username and Password again.\n") # Changed print to sys.stderr.write
sys.stderr.write(f"Response was {response.getcode()}.\n") # Changed print to sys.stderr.write
sys.stderr.write("\n\nNew users: you must first log into Vertex and accept the EULA. In addition, your Study Area must be set at Earthdata https://urs.earthdata.nasa.gov\n") # Changed print to sys.stderr.write
exit(-1)

# make sure we're logged into URS
def check_cookie_is_logged_in(self, cj):
for cookie in cj:
if cookie.name == 'urs_user_already_logged':
# Only get this cookie if we logged in successfully!
return True

return False

# Download the file
def download_file_with_cookiejar(self, url, file_count, total, recursion=False):
# We need a fresh opener for each potential redirect or retry
current_opener, current_cookie_jar = get_thread_opener(self.cookie_jar_path, self.context)
install_opener(current_opener) # Install for this thread

# see if we've already download this file and if it is that it is the correct size
download_file = os.path.basename(url).split('?')[0]

# TODO: make this a function delete_partial_download
if os.path.isfile(download_file):
try:
request = Request(url)
request.get_method = lambda : 'HEAD'
# Use current_opener to avoid global opener issues
response = current_opener.open(request, timeout=30)
remote_size = self.get_total_size(response)

# Check that we were able to derive a size.
if remote_size:
local_size = os.path.getsize(download_file)
# Allow for small differences in size (e.g., due to metadata updates)
if remote_size < (local_size + (local_size * .01)) and remote_size > (local_size - (local_size * .01)):
sys.stdout.write(f" > Download file {download_file} exists! \n > Skipping download of {url}. \n") # Changed print to sys.stdout.write
return None, None
# partial file size wasn't full file size, lets blow away the chunk and start again
sys.stdout.write(f" > Found {download_file} but it wasn't fully downloaded. Removing file and downloading again.\n") # Changed print to sys.stdout.write
os.remove(download_file)

except ssl.CertificateError as e:
sys.stderr.write(f" > ERROR: {e}\n") # Changed print to sys.stderr.write
sys.stderr.write(" > Could not validate SSL Cert. You may be able to overcome this using the --insecure flag\n") # Changed print to sys.stderr.write
return False, None

except HTTPError as e:
if e.code == 401:
sys.stderr.write(" > IMPORTANT: Your user may not have permission to download this type of data!\n") # Changed print to sys.stderr.write
elif e.code == 400: # Explicitly catch 400 Bad Request which indicates URL expiration/invalidity
sys.stderr.write(f" > HTTP Error 400 (Bad Request) for {url}. This often means the download URL has expired or is invalid. Please try re-generating your download list and run the script again.\n")
else:
sys.stderr.write(f" > Unknown Error, Could not get file HEAD: {e}\n") # Changed print to sys.stderr.write
return False, None

except URLError as e:
sys.stderr.write(f"URL Error (from HEAD): {e.reason}, {url}\n") # Changed print to sys.stderr.write
if "ssl.c" in f"{e.reason}": # Changed to f-string for clarity
sys.stderr.write("IMPORTANT: Remote location may not be accepting your SSL configuration. This is a terminal error.\n") # Changed print to sys.stderr.write
return False, None
except socket.timeout as e:
sys.stderr.write(f" > HEAD request timeout for: {url}; {e}\n")
return False, None


# attempt https connection
try:
request = Request(url)
# Use current_opener to ensure cookies are handled correctly for this thread
response = current_opener.open(request, timeout=60) # Increased timeout to 60 seconds

while response.getcode() == 202:
sys.stdout.write(" > Waiting for burst extraction service...\n") # Changed print to sys.stdout.write
time.sleep(5)
response = current_opener.open(request, timeout=60) # Increased timeout

# Watch for redirect
if response.geturl() != url:

# See if we were redirect BACK to URS for re-auth.
if 'https://urs.earthdata.nasa.gov/oauth/authorize' in response.geturl():
if recursion:
sys.stderr.write(" > Entering seemingly endless auth loop. Aborting. \n") # Changed print to sys.stderr.write
return False, None

# make this easier. If there is no app_type=401, add it
new_auth_url = response.geturl()
if "app_type" not in new_auth_url:
new_auth_url += "&app_type=401"

sys.stdout.write(f" > While attempting to download {url}....\n") # Changed print to sys.stdout.write
sys.stdout.write(f" > Need to obtain new cookie from {new_auth_url}\n") # Changed print to sys.stdout.write
old_cookies = [cookie.name for cookie in current_cookie_jar] # Use thread's cookie jar
# Create a new opener for this specific re-auth attempt within the thread
reauth_opener = build_opener(HTTPCookieProcessor(current_cookie_jar), HTTPHandler(), HTTPSHandler(**self.context))
request = Request(new_auth_url)
try:
response = reauth_opener.open(request) # Use reauth_opener
for cookie in current_cookie_jar: # Use thread's cookie jar
if cookie.name not in old_cookies:
sys.stdout.write(f" > Saved new cookie: {cookie.name}\n") # Changed print to sys.stdout.write

# A little hack to save session cookies
if cookie.discard:
cookie.expires = int(time.time()) + 60*60*24*30
sys.stdout.write(" > Saving session Cookie that should have been discarded! \n") # Changed print to sys.stdout.write
with cookie_jar_lock: # Protect file write
current_cookie_jar.save(self.cookie_jar_path, ignore_discard=True, ignore_expires=True) # Save to main cookie file
except HTTPError as e:
sys.stderr.write(f"HTTP Error: {e.code}, {url}\n") # Changed print to sys.stderr.write
return False, None

# Okay, now we have more cookies! Lets try again, recursively!
sys.stdout.write(" > Attempting download again with new cookies!\n") # Changed print to sys.stdout.write
return self.download_file_with_cookiejar(url, file_count, total, recursion=True)

sys.stdout.write(f" > 'Temporary' Redirect download @ Remote archive:\n > {response.geturl()}\n") # Changed print to sys.stdout.write

# seems to be working
sys.stdout.write(f"({file_count}/{total}) Downloading {url}\n") # Changed print to sys.stdout.write

content_disposition = response.headers.get('Content-Disposition')

if content_disposition and len(content_disposition):
possible_filename = re.findall("filename=(\S+)", content_disposition)

if possible_filename:
download_file = possible_filename.pop()

# Open our local file for writing and build status bar
tf = tempfile.NamedTemporaryFile(mode='w+b', delete=False, dir='.')
self.chunk_read(response, tf, report_hook=self.chunk_report)

# Reset download status
# sys.stdout.write('\n') # This might interfere with multiple threads writing to stdout

tempfile_name = tf.name
tf.close()

except HTTPError as e:
sys.stderr.write(f"HTTP Error: {e.code}, {url}\n") # Changed print to sys.stderr.write

if e.code == 401:
sys.stderr.write(" > IMPORTANT: Your user does not have permission to download this type of data!\n") # Changed print to sys.stderr.write
elif e.code == 400: # Explicitly catch 400 Bad Request which indicates URL expiration/invalidity
sys.stderr.write(f" > HTTP Error 400 (Bad Request) for {url}. This often means the download URL has expired or is invalid. Please try re-generating your download list and run the script again.\n")
elif e.code == 403:
sys.stderr.write(" > Got a 403 Error trying to download this file. \n") # Changed print to sys.stderr.write
sys.stderr.write(" > You MAY need to log in this app and agree to a EULA. \n") # Changed print to sys.stderr.write

return False, None

except URLError as e:
sys.stderr.write(f"URL Error (from GET): {e}, {e.reason}, {url}\n") # Changed print to sys.stderr.write

if "ssl.c" in f"{e.reason}": # Changed to f-string
sys.stderr.write("IMPORTANT: Remote location may not be accepting your SSL configuration. This is a terminal error.\n") # Changed print to sys.stderr.write

return False, None

except socket.timeout as e:
sys.stderr.write(f" > timeout requesting: {url}; {e}\n") # Changed print to sys.stderr.write
return False, None

except ssl.CertificateError as e:
sys.stderr.write(f" > ERROR: {e}\n") # Changed print to sys.stderr.write
sys.stderr.write(" > Could not validate SSL Cert. You may be able to overcome this using the --insecure flag\n") # Changed print to sys.stderr.write
return False, None

# Return the file size
shutil.copy(tempfile_name, download_file)
os.remove(tempfile_name)
file_size = self.get_total_size(response)
actual_size = os.path.getsize(download_file)
if file_size is None:
# We were unable to calculate file size.
file_size = actual_size

return actual_size, file_size

def get_redirect_url_from_error(self, error):
find_redirect = re.compile(r"id=\"redir_link\"\s+href=\"(\S+)\"")
sys.stderr.write(f"error file was: {error}\n") # Changed print to sys.stderr.write
redirect_url = find_redirect.search(error)

if redirect_url:
sys.stdout.write(f"Found: {redirect_url.group(0)}\n") # Changed print to sys.stdout.write
return (redirect_url.group(0))

return None

# chunk_report taken from http://stackoverflow.com/questions/2028517/python-urllib2-progress-hook
def chunk_report(self, bytes_so_far, file_size):
# In a multi-threaded context, writing to stdout needs care.
# Simple \r updates might get garbled. For now, keep it simple.
if file_size is not None:
percent = float(bytes_so_far) / file_size
percent = round(percent*100, 2)
sys.stdout.write(f" > Downloaded {bytes_so_far} of {file_size} bytes ({percent:0.2f}%) for current file...\r")
else:
# We couldn't figure out the size.
sys.stdout.write(f" > Downloaded {bytes_so_far} of unknown Size for current file...\r")
sys.stdout.flush() # Ensure it's written immediately

# chunk_read modified from http://stackoverflow.com/questions/2028517/python-urllib2-progress-hook
def chunk_read(self, response, local_file, chunk_size=8192, report_hook=None):
file_size = self.get_total_size(response)
bytes_so_far = 0

while 1:
try:
chunk = response.read(chunk_size)
except Exception:
sys.stdout.write("\n > There was an error reading data. \n") # Changed print to sys.stdout.write
break

try:
local_file.write(chunk)
except TypeError: # Handle cases where chunk might be bytes but write expects str or vice versa
try:
local_file.write(chunk.decode('utf-8'))
except (UnicodeDecodeError, AttributeError):
sys.stderr.write("\n > Encoding error writing chunk. Skipping decode.\n")
local_file.write(chunk) # Try writing raw bytes if decode fails
bytes_so_far += len(chunk)

if not chunk:
break

if report_hook:
report_hook(bytes_so_far, file_size)

return bytes_so_far

def get_total_size(self, response):
try:
file_size = response.info().getheader('Content-Length').strip()
except AttributeError:
try:
file_size = response.getheader('Content-Length').strip()
except AttributeError:
sys.stderr.write("> Problem getting size\n") # Changed print to sys.stderr.write
return None

return int(file_size)

# Get download urls from a metalink file
def process_metalink(self, ml_file):
sys.stdout.write(f"Processing metalink file: {ml_file}\n") # Changed print to sys.stdout.write
with open(ml_file, 'r') as ml:
xml = ml.read()

# Hack to remove annoying namespace
it = ET.iterparse(StringIO(xml))
for _, el in it:
if '}' in el.tag:
el.tag = el.tag.split('}', 1)[1] # strip all namespaces
root = it.root

dl_urls = []
ml_files = root.find('files')
for dl in ml_files:
dl_urls.append(dl.find('resources').find('url').text)

if len(dl_urls) > 0:
return dl_urls
else:
return None

# Get download urls from a csv file
def process_csv(self, csv_file):
sys.stdout.write(f"Processing csv file: {csv_file}\n") # Changed print to sys.stdout.write

dl_urls = []
with open(csv_file, 'r') as csvf:
try:
csvr = csv.DictReader(csvf)
for row in csvr:
dl_urls.append(row['URL'])
except csv.Error as e:
sys.stderr.write(f"WARNING: Could not parse file {csv_file}, line {csvr.line_num}: {e}. Skipping.\n") # Changed print to sys.stderr.write
return None
except KeyError:
sys.stderr.write(f"WARNING: Could not find URL column in file {csv_file}. Skipping.\n") # Changed print to sys.stderr.write

if len(dl_urls) > 0:
return dl_urls
else:
return None

# --- 修改后的下载方法,使用多线程 ---
def download_files(self, max_workers=8): # Add max_workers parameter
total_files = len(self.files)
sys.stdout.write(f"Starting concurrent download of {total_files} files with {max_workers} workers.\n")

# Create a thread pool
# For I/O bound tasks like downloading, ThreadPoolExecutor is usually better than ProcessPoolExecutor
with concurrent.futures.ThreadPoolExecutor(max_workers=max_workers) as executor:
# Submit tasks to the thread pool
future_to_url = {executor.submit(
download_one_file_wrapper,
url,
idx + 1, # file_count
total_files,
self.cookie_jar_path,
self.context,
self.asf_urs4
): url for idx, url in enumerate(self.files)}

for future in concurrent.futures.as_completed(future_to_url):
url = future_to_url[future]
if abort: # Check for global abort flag even during task completion
executor.shutdown(wait=False, cancel_futures=True)
sys.stderr.write("\n > Download aborted due to user signal.\n")
break

try:
result = future.result()
# Update summary based on result
if result['status'] == 'success':
self.total_bytes += result['size']
self.success.append({'file': result['url'], 'size': result['size']})
sys.stdout.write(f"({self.cnt+1}/{total_files}) Downloaded {os.path.basename(result['url'])} successfully.\n") # Update progress
else:
self.failed.append(result['url'])
sys.stderr.write(f"({self.cnt+1}/{total_files}) Failed to download {os.path.basename(result['url'])}.\n") # Update progress
self.cnt += 1 # Increment counter for display
except SystemExit: # Re-raise if SystemExit was raised in a thread
executor.shutdown(wait=False, cancel_futures=True)
raise
except Exception as exc:
self.failed.append(url)
sys.stderr.write(f"({self.cnt+1}/{total_files}) {url} generated an exception: {exc}\n")
self.cnt += 1 # Increment counter for display

def is_good_download(self, total_size, size):
return (
size is not False and size is not None and # Added check for None
(
total_size is not None and # Ensure total_size is not None before comparison
total_size < (size + (size * .01)) and
total_size > (size - (size * .01))
)
)

def print_summary(self):
# printsummary:
sys.stdout.write("\n\nDownload Summary \n") # Changed print to sys.stdout.write
sys.stdout.write("--------------------------------------------------------------------------------\n") # Changed print to sys.stdout.write
sys.stdout.write(f" Successes: {len(self.success)} files, {self.total_bytes} bytes \n") # Changed print to sys.stdout.write
for success_file in self.success:
size_mb = success_file['size']/1024.0**2
sys.stdout.write(f" - {success_file['file']} {size_mb:.2f}MB\n") # Changed print to sys.stdout.write

if len(self.failed) > 0:
sys.stdout.write(f" Failures: {len(self.failed)} files\n") # Changed print to sys.stdout.write
for failed_file in self.failed:
sys.stdout.write(f" - {failed_file}\n") # Changed print to sys.stdout.write

if len(self.skipped) > 0:
sys.stdout.write(f" Skipped: {len(self.skipped)} files\n") # Changed print to sys.stdout.write
for skipped_file in self.skipped:
sys.stdout.write(" - {0}\n".format(skipped_file)) # Changed print to sys.stdout.write

if len(self.success) > 0:
total_bytes_mb = self.total_bytes/1024.0**2
# Only calculate average rate if total_time is not zero to avoid division by zero
if self.total_time > 0:
download_rate = total_bytes_mb / self.total_time
sys.stdout.write(f" Average Rate: {download_rate:.2f}MB/sec\n") # Changed print to sys.stdout.write
else:
sys.stdout.write(" Average Rate: N/A (not enough data for calculation)\n")

sys.stdout.write("--------------------------------------------------------------------------------\n") # Changed print to sys.stdout.write


if __name__ == "__main__":
# Setup a signal trap for SIGINT (Ctrl+C)
signal.signal(signal.SIGINT, signal_handler)

downloader = bulk_downloader()
# 调用下载方法,可以指定并发下载的数量,例如 8 个线程
downloader.download_files(max_workers=8) # 可以根据你的网络和服务器承受能力调整这个数字
downloader.print_summary()


如何使用修改后的脚本

  1. 保存: 将上述修改后的代码保存为 .py 文件,例如 download_asf_fast.py
  2. 设置权限: 确保 .netrc 文件权限正确 (chmod 600 ~/.netrc)。
  3. 运行: 像之前一样运行脚本:
    1
    python ./download_asf_fast.py # 或者 python ./download_asf_fast.py /path/to/your.metalink
  4. 观察速度: 你应该会看到多个文件同时开始下载,从而显著提高总体的下载速度。

关键改进和注意事项

  • concurrent.futures.ThreadPoolExecutor: 这是实现并发的核心。它会创建指定数量的工作线程,每个线程独立处理一个下载任务。当一个线程在等待网络响应时,其他线程可以继续下载其他文件。
  • download_one_file_wrapper: 这个辅助函数是为了让类中的下载逻辑能够适应 ThreadPoolExecutorsubmit 方法(它需要一个可调用对象)。
  • 线程安全的 print: 将所有 print() 语句改为 sys.stdout.write()sys.stderr.write(),并在后面加上 \nsys.stdout.flush()。这有助于避免多线程同时写入标准输出时导致输出混乱,但进度条 \r 的更新在多线程环境下仍可能看起来不完美。
  • Cookie jar 处理: 每个线程现在会尝试加载自己的 MozillaCookieJar 实例。主线程会负责最初的认证和 cookie 文件的保存。在下载过程中如果发生重定向并需要重新认证,子线程也会尝试更新 cookie,并加锁写入主 cookie 文件,以确保线程安全。
  • 超时增加:urlopen 的超时时间从 30 秒增加到 60 秒,这在网络不稳定的环境下会有帮助。
  • max_workers 参数: 你可以通过调整 downloader.download_files(max_workers=X) 中的 X 来控制并发下载的数量。从 4-8 开始尝试,然后根据实际情况(如网络带宽、CPU 使用率、ASF 服务器的响应速度)进行调整。
  • 断点续传: 原始脚本中有简单的文件存在和大小检查来决定是否跳过或重新下载。这个逻辑保持不变。
  • 错误处理: 添加了对 HTTP 400 错误的明确提示,这在你之前的 aria2c 错误中也出现过,很可能是由于预签名 URL 过期。在多线程环境下,这会更加频繁。

通过这些修改,你的 Python 下载脚本应该能够显著提高下载 ASF 数据的效率。


Gemini优化asf网站python下载脚本
https://mengyuchi.gitlab.io/2025/07/16/Gemini优化asf网站python下载脚本/
Author
Yuchi Meng
Posted on
July 16, 2025
Licensed under