From 6fc19a017ff845a5de1fb77bc3aa297ed9f4de06 Mon Sep 17 00:00:00 2001 From: "trop[bot]" <37223003+trop[bot]@users.noreply.github.com> Date: Sun, 5 Apr 2026 21:24:01 -0700 Subject: [PATCH] ci: zstd-compress the src cache and drop the doubled win_toolchain (#50705) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit * ci: shrink src cache and fix Windows tar cleanup - Exclude platform-specific toolchains (llvm-build, rust-toolchain) from the src cache; all platforms now fetch them via fix-sync post-restore - Exclude unused test data and benchmarks: blink/web_tests, jetstream, speedometer, catapult/tracing/test_data, swiftshader/tests/regres - Fix Windows restore leaving the tarball on disk after extraction ($src_cache was scoped to the previous PowerShell step) - Bump src-cache key v1 -> v2 Co-authored-by: Samuel Attard * ci: fetch llvm/rust toolchains in gn-check and clang-tidy These workflows restore the src cache but don't run fix-sync. Now that llvm-build and rust-toolchain are excluded from the cache, they need to download them directly — gn gen read_file()s both, and clang-tidy runs the binary from llvm-build. Co-authored-by: Samuel Attard * ci: fetch clang-tidy package explicitly update.py's default 'clang' package doesn't include the clang-tidy binary; it ships as a separate package. Co-authored-by: Samuel Attard * ci: preserve blink/web_tests/BUILD.gn when stripping test data //BUILD.gn references //third_party/blink/web_tests:wpt_tests as a target label, so the BUILD.gn must exist for gn gen. The data = [...] entries it declares are runtime-only and not existence-checked at gen time, so the actual test directories can still be removed. Co-authored-by: Samuel Attard * ci: compress src cache with zstd and drop gclient sync -vv The src cache was an uncompressed tar (~16GB after exclusions). Switch to zstd -T0 --long=30 for ~4x smaller transfer and multi-threaded compression. Decompress on restore: - Linux/macOS: zstd -d -c | tar -xf - - Windows: zstd -d to an intermediate .tar, then the existing 7z -snld20 extraction (preserves symlink handling) All filename references updated .tar -> .tar.zst. -f added to the two -o invocations so re-runs overwrite instead of failing. Also drop -vv from gclient sync; default verbosity is sufficient. Co-authored-by: Samuel Attard * ci: keep .tar extension for src cache (zstd content inside) The sas-sidecar that issues Azure SAS tokens validates filenames against /^v[0-9]+-[a-z\-]+-[a-f0-9]+\.(tar|tgz)$/ and is not easily redeployed, so keep the .tar extension and decode zstd on restore. Windows decompresses to a distinct intermediate (src_cache.tar) so input and output don't collide. Co-authored-by: Samuel Attard * ci: log NTFS 8.3/lastaccess/Defender state before Windows cache extract Temporary diagnostics to see whether 8.3 short-name generation is the cause of the ~20 min tar extraction. Co-authored-by: Samuel Attard * ci: revert src-cache exclusion additions The new exclusions (web_tests contents, jetstream, speedometer, catapult test_data, regres, llvm-build, rust-toolchain) caused siso/RBE cache misses — even data-only deps are part of action input hashes. Revert to the original exclusion list and drop the corresponding toolchain-fetch plumbing. zstd compression, the Windows tar cleanup, and the -vv removal remain. Co-authored-by: Samuel Attard * ci: drop win_toolchain from src cache; remove NTFS diagnostics The Windows src cache includes 14.6GB of depot_tools/win_toolchain — 7.3GB of MSVC/SDK doubled because tar captures both the vs_files.ciopfs backing store and the live ciopfs mount at vs_files/. Every Windows cache consumer already re-fetches this via vs_toolchain.py update --force (fix-sync for build/publish, inline for gn-check/clang-tidy), so the cached copy is never used. Diagnostics removed — CI confirmed 8dot3, last-access, and Defender are all already off on the AKS Windows nodes. Co-authored-by: Samuel Attard * ci: unmount ciopfs vs_files before removing win_toolchain vs_files is a live ciopfs mount during the win-targeted checkout; rm -rf fails with EBUSY until it's unmounted. Co-authored-by: Samuel Attard * ci: skip win_toolchain download during checkout instead of removing after fusermount isn't on the checkout container, so the ciopfs mount can't be torn down before rm. Setting DEPOT_TOOLS_WIN_TOOLCHAIN=0 makes the win_toolchain hook a no-op (vs_toolchain.py:525-527), so there's no download and no mount. All Windows consumers re-fetch it post-restore anyway. The rm -rf stays as a safety net. Co-authored-by: Samuel Attard * ci: also set ELECTRON_DEPOT_TOOLS_WIN_TOOLCHAIN=0 for checkout sync build.yml sets ELECTRON_DEPOT_TOOLS_WIN_TOOLCHAIN=1 at the job level for the Windows checkout, which makes e d inject DEPOT_TOOLS_WIN_TOOLCHAIN=1 and override the inline =0. Need both: the ELECTRON_ var stops e d from overriding, the plain one stops vs_toolchain.py from defaulting to 1. Co-authored-by: Samuel Attard * ci: extract Windows src cache with piped tar instead of 7z 7z takes ~20 min to extract the ~1.1M-entry tar regardless of size — ~1ms per entry of header parsing and path handling, single-threaded, well under the 75k IOPS / 1000 MBps the ephemeral disk can do. Switch to the same zstd -d | tar -xf - pipe used on Linux/macOS (via Git Bash tar). No intermediate src_cache.tar, download deleted after extract. The -snld20 flag was working around 7z's own "dangerous symlink" refusal; GNU tar extracts symlinks as-is so it shouldn't be needed. Co-authored-by: Samuel Attard * ci: keep depot_tools/win_toolchain scripts in src cache The rm -rf removed get_toolchain_if_necessary.py (a depot_tools source file), breaking vs_toolchain.py update --force on restore. DEPOT_TOOLS_WIN_TOOLCHAIN=0 on the sync already prevents the vs_files download, so the rm was only removing scripts. Co-authored-by: Samuel Attard * ci: split src cache into 4 parallel-extractable shards Windows tar extraction is ~1ms/entry for ~1.2M entries (~20 min) regardless of tool, well under the 75k IOPS / 1000 MBps the D16lds_v5 ephemeral disk can do. Tar is a sequential stream so the only way to parallelize is to split at creation time. Shards (balanced by entry count, ~220-360k each): a: src/third_party/blink b: src/third_party/{dawn,electron_node,tflite,devtools-frontend} c: src/third_party (rest) d: src (excluding third_party) DEPSHASH is now the raw hash; shard files are v2-src-cache-shard-{a..d}-${DEPSHASH}.tar (all pass the sas-sidecar filename regex). sas-token is now a JSON keyed by shard letter. All restore paths extract the four shards in parallel with per-PID wait so a failed shard aborts the step. Co-authored-by: Samuel Attard * Revert "ci: split src cache into 4 parallel-extractable shards" This reverts commit 970574998b3be77d8e98d7c0c7f3d346ac8f935c. Co-authored-by: Samuel Attard --------- Co-authored-by: trop[bot] <37223003+trop[bot]@users.noreply.github.com> Co-authored-by: Samuel Attard --- .github/actions/checkout/action.yml | 8 +++--- .github/actions/restore-cache-aks/action.yml | 2 +- .../actions/restore-cache-azcopy/action.yml | 25 ++++++++----------- .../workflows/pipeline-electron-docs-only.yml | 2 +- .../pipeline-segment-electron-build.yml | 2 +- .../pipeline-segment-electron-clang-tidy.yml | 2 +- .../pipeline-segment-electron-gn-check.yml | 2 +- .../pipeline-segment-electron-publish.yml | 2 +- 8 files changed, 21 insertions(+), 24 deletions(-) diff --git a/.github/actions/checkout/action.yml b/.github/actions/checkout/action.yml index f14f5d306d..85bedf2eae 100644 --- a/.github/actions/checkout/action.yml +++ b/.github/actions/checkout/action.yml @@ -28,7 +28,7 @@ runs: shell: bash run: | node src/electron/script/generate-deps-hash.js - DEPSHASH="v1-src-cache-$(cat src/electron/.depshash)" + DEPSHASH="v2-src-cache-$(cat src/electron/.depshash)" echo "DEPSHASH=$DEPSHASH" >> $GITHUB_ENV echo "CACHE_FILE=$DEPSHASH.tar" >> $GITHUB_ENV if [ "${{ inputs.target-platform }}" = "win" ]; then @@ -109,7 +109,7 @@ runs: echo "target_os=['$TARGET_OS']" >> ./.gclient fi - ELECTRON_USE_THREE_WAY_MERGE_FOR_PATCHES=1 e d gclient sync --with_branch_heads --with_tags -vv + ELECTRON_DEPOT_TOOLS_WIN_TOOLCHAIN=0 DEPOT_TOOLS_WIN_TOOLCHAIN=0 ELECTRON_USE_THREE_WAY_MERGE_FOR_PATCHES=1 e d gclient sync --with_branch_heads --with_tags if [[ "${{ inputs.is-release }}" != "true" ]]; then # Re-export all the patches to check if there were changes. python3 src/electron/script/export_all_patches.py src/electron/patches/config.json @@ -187,7 +187,9 @@ runs: shell: bash run: | echo "Uncompressed src size: $(du -sh src | cut -f1 -d' ')" - tar -cf $CACHE_FILE src + # Named .tar but zstd-compressed; the sas-sidecar's filename allowlist + # only permits .tar/.tgz so we keep the extension and decode on restore. + tar -cf - src | zstd -T0 --long=30 -f -o $CACHE_FILE echo "Compressed src to $(du -sh $CACHE_FILE | cut -f1 -d' ')" cp ./$CACHE_FILE $CACHE_DRIVE/ - name: Persist Src Cache diff --git a/.github/actions/restore-cache-aks/action.yml b/.github/actions/restore-cache-aks/action.yml index b614b3a076..2522193274 100644 --- a/.github/actions/restore-cache-aks/action.yml +++ b/.github/actions/restore-cache-aks/action.yml @@ -31,7 +31,7 @@ runs: fi mkdir temp-cache - tar -xf $cache_path -C temp-cache + zstd -d --long=30 -c $cache_path | tar -xf - -C temp-cache echo "Unzipped cache is $(du -sh temp-cache/src | cut -f1)" if [ -d "temp-cache/src" ]; then diff --git a/.github/actions/restore-cache-azcopy/action.yml b/.github/actions/restore-cache-azcopy/action.yml index ee8fe62905..7099ea8fda 100644 --- a/.github/actions/restore-cache-azcopy/action.yml +++ b/.github/actions/restore-cache-azcopy/action.yml @@ -61,9 +61,9 @@ runs: echo "Cache is empty - exiting" exit 1 fi - + mkdir temp-cache - tar -xf $DEPSHASH.tar -C temp-cache + zstd -d --long=30 -c $DEPSHASH.tar | tar -xf - -C temp-cache echo "Unzipped cache is $(du -sh temp-cache/src | cut -f1)" if [ -d "temp-cache/src" ]; then @@ -85,19 +85,17 @@ runs: - name: Unzip and Ensure Src Cache (Windows) if: ${{ inputs.target-platform == 'win' }} - shell: powershell + shell: bash run: | - $src_cache = "$env:DEPSHASH.tar" - $cache_size = $(Get-Item $src_cache).length - Write-Host "Downloaded cache is $cache_size" - if ($cache_size -eq 0) { - Write-Host "Cache is empty - exiting" + echo "Downloaded cache is $(du -sh $DEPSHASH.tar | cut -f1)" + if [ `du $DEPSHASH.tar | cut -f1` = "0" ]; then + echo "Cache is empty - exiting" exit 1 - } + fi - $TEMP_DIR=New-Item -ItemType Directory -Path temp-cache - $TEMP_DIR_PATH = $TEMP_DIR.FullName - C:\ProgramData\Chocolatey\bin\7z.exe -y -snld20 x $src_cache -o"$TEMP_DIR_PATH" + mkdir temp-cache + zstd -d --long=30 -c $DEPSHASH.tar | tar -xf - -C temp-cache + rm -f $DEPSHASH.tar - name: Move Src Cache (Windows) if: ${{ inputs.target-platform == 'win' }} @@ -112,9 +110,6 @@ runs: Write-Host "Relocating Cache" Remove-Item -Recurse -Force src Move-Item temp-cache\src src - - Write-Host "Deleting zip file" - Remove-Item -Force $src_cache } if (-Not (Test-Path "src\third_party\blink")) { Write-Host "Cache was not correctly restored - exiting" diff --git a/.github/workflows/pipeline-electron-docs-only.yml b/.github/workflows/pipeline-electron-docs-only.yml index b3e22f3168..038822c57d 100644 --- a/.github/workflows/pipeline-electron-docs-only.yml +++ b/.github/workflows/pipeline-electron-docs-only.yml @@ -35,7 +35,7 @@ jobs: - name: Generate DEPS Hash run: | node src/electron/script/generate-deps-hash.js - DEPSHASH=v1-src-cache-$(cat src/electron/.depshash) + DEPSHASH=v2-src-cache-$(cat src/electron/.depshash) echo "DEPSHASH=$DEPSHASH" >> $GITHUB_ENV echo "CACHE_PATH=$DEPSHASH.tar" >> $GITHUB_ENV - name: Restore src cache via AKS diff --git a/.github/workflows/pipeline-segment-electron-build.yml b/.github/workflows/pipeline-segment-electron-build.yml index 1a67587ab5..ce507de736 100644 --- a/.github/workflows/pipeline-segment-electron-build.yml +++ b/.github/workflows/pipeline-segment-electron-build.yml @@ -156,7 +156,7 @@ jobs: - name: Generate DEPS Hash run: | node src/electron/script/generate-deps-hash.js - DEPSHASH=v1-src-cache-$(cat src/electron/.depshash) + DEPSHASH=v2-src-cache-$(cat src/electron/.depshash) echo "DEPSHASH=$DEPSHASH" >> $GITHUB_ENV echo "CACHE_PATH=$DEPSHASH.tar" >> $GITHUB_ENV - name: Restore src cache via AZCopy diff --git a/.github/workflows/pipeline-segment-electron-clang-tidy.yml b/.github/workflows/pipeline-segment-electron-clang-tidy.yml index c7ddb39403..8924322e40 100644 --- a/.github/workflows/pipeline-segment-electron-clang-tidy.yml +++ b/.github/workflows/pipeline-segment-electron-clang-tidy.yml @@ -80,7 +80,7 @@ jobs: - name: Generate DEPS Hash run: | node src/electron/script/generate-deps-hash.js - DEPSHASH=v1-src-cache-$(cat src/electron/.depshash) + DEPSHASH=v2-src-cache-$(cat src/electron/.depshash) echo "DEPSHASH=$DEPSHASH" >> $GITHUB_ENV echo "CACHE_PATH=$DEPSHASH.tar" >> $GITHUB_ENV - name: Restore src cache via AZCopy diff --git a/.github/workflows/pipeline-segment-electron-gn-check.yml b/.github/workflows/pipeline-segment-electron-gn-check.yml index 0c28e2c8c1..2e23a1466c 100644 --- a/.github/workflows/pipeline-segment-electron-gn-check.yml +++ b/.github/workflows/pipeline-segment-electron-gn-check.yml @@ -81,7 +81,7 @@ jobs: - name: Generate DEPS Hash run: | node src/electron/script/generate-deps-hash.js - DEPSHASH=v1-src-cache-$(cat src/electron/.depshash) + DEPSHASH=v2-src-cache-$(cat src/electron/.depshash) echo "DEPSHASH=$DEPSHASH" >> $GITHUB_ENV echo "CACHE_PATH=$DEPSHASH.tar" >> $GITHUB_ENV - name: Restore src cache via AZCopy diff --git a/.github/workflows/pipeline-segment-electron-publish.yml b/.github/workflows/pipeline-segment-electron-publish.yml index 5de845ba94..cddeb6028a 100644 --- a/.github/workflows/pipeline-segment-electron-publish.yml +++ b/.github/workflows/pipeline-segment-electron-publish.yml @@ -165,7 +165,7 @@ jobs: - name: Generate DEPS Hash run: | node src/electron/script/generate-deps-hash.js - DEPSHASH=v1-src-cache-$(cat src/electron/.depshash) + DEPSHASH=v2-src-cache-$(cat src/electron/.depshash) echo "DEPSHASH=$DEPSHASH" >> $GITHUB_ENV echo "CACHE_PATH=$DEPSHASH.tar" >> $GITHUB_ENV - name: Restore src cache via AZCopy