```
2025-05-09 11:54:15,171 ERROR Error executing graph 954e6fc8-9c90-46fa-be5b-4063eb519ec7: Block ID AIMusicGeneratorBlock error: 44f6c8ad-d75c-4ae1-8209-aad1c0326928 is already in use
```
### Changes 🏗️
This PR avoids the use of global variables messing up with the way
`load_all_blocks` is cached.
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
<!-- Put your test plan here: -->
- [x] CI
Currently, there is no guarantee that an error will be reported right
away. And late execution is a serious issue that needs to be addressed
quickly
### Changes 🏗️
Provided a direct alert when late execution occurs.
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] Manual run on the late executions job on artificially created late
executions.
This is a follow-up to
https://github.com/Significant-Gravitas/AutoGPT/pull/9903
The continued graph execution restarted all the execution stats from
zero, making the execution stats misleading.
### Changes 🏗️
Continue the execution stats when continuing the graph execution.
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
<!-- Put your test plan here: -->
- [x] Existing tests, manual graph run with the graph execution aborted
midway.
Some of the code paths in the notification & scheduler service were
synchronous HTTP calls that execute a long-running job that blocks. This
makes the service threads busy waiting.
### Changes 🏗️
* Remove queue_notification API
* Remove DTO
* Move heavy tasks intothe executor
<!-- Concisely describe all of the changes made in this pull request:
-->
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
<!-- Put your test plan here: -->
- [x] Manually executing notification service jobs through the scheduler
API
Listing page throws exception on deployment because of supabase auth
issue.
### Changes 🏗️
Catch the exception when getting library agent. This reverts the
behavior of listing page and it'll always show "Add to Library" when
user is logged in.
### Checklist 📋
#### For code changes:
- [ ] I have clearly listed my changes in the PR description
- [ ] I have made a test plan
- [ ] I have tested my changes according to the test plan:
- [ ] ...
<!-- Clearly explain the need for these changes: -->
The goal of this change is a quick and temporary tweak to improve the
displaying of output text in the Agent Runs screen.
This change is made anticipating that these outputs will be properly
improved in the near future, and is thus just a temporary change in
order to display text in a human readable format.
### Changes 🏗️
There is one change in this PR:
- The class of the Agent Output textbox is changed to properly display
text without impacting the design.
Below is a before and after of this change:
**Before**

**After**

### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [ ] I have tested my changes according to the test plan:
<!-- Put your test plan here: -->
- [ ] ...
---------
Co-authored-by: Bentlybro <Github@bentlybro.com>
- Resolves#9918
- Follow-up fix for #9914
### Changes 🏗️
- In `get_graph_execution_schedules`, skip jobs when their kwargs can't
be parsed as `GraphExecutionJobArgs`
- Rename methods of `Scheduler` to clarify their scope (scheduled
*graph* executions)
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- Go to `/library/agents/[id]` (which calls `GET /api/schedules`)
- [x] -> `GET /api/schedules` request returns HTTP 200
If a node has a multi-credentials input (e.g. AI Text Generator block)
but the discriminator value (e.g. model choice) is missing, the input
can't be discriminated into a single-provider input. Discrimination into
a single-provider input is necessary to make a graph-level credentials
input for use in the Library.
### Changes 🏗️
- feat(backend): Require discriminator fields to always have a value
- dx(frontend): Improve typing of discriminator stuff
- dx(frontend): Fix typing in `NodeOneOfDiscriminatorField` component
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] Saving & running graphs with and without credentials works
normally
- Note: We don't have any blocks with a discriminator that doesn't have
a default value, so currently I don't think it's possible to produce a
case where this mechanism would be triggered.
The Library Agent credentials UX (#9789) currently doesn't work for
sub-graphs.
### Changes 🏗️
- Include sub-graphs in generating `Graph.credentials_input_schema`
- Propagate `node_credentials_input_map` into `AgentExecutionBlock`
executions
- Fix: also apply `node_credentials_input_map` in `_enqueue_next_nodes`
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- Import a graph with sub-graphs that need credentials
- Run this agent from the Library
- [x] -> Should work
Introduce a late execution check scheduled job. The late threshold
duration is configurable.
This initial version only reports the error to Sentry.
### Changes 🏗️
* Added late execution check scheduled job
* Move the registration weekly notification processing job out of API
call and calling it directly from the scheduler service.
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
<!-- Put your test plan here: -->
- [x] Manual firing of scheduled job through an exposed API
<!-- Clearly explain the need for these changes: -->
We want the scheduler shouldn't scale with the rest API lol
### Changes 🏗️
pulls out the scheduler into its own service
<!-- Concisely describe all of the changes made in this pull request:
-->
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
<!-- Put your test plan here: -->
- [x] test it
---------
Co-authored-by: Zamil Majdy <zamil.majdy@agpt.co>
Currently agent listing on Marketplace have bad UX.
### Changes 🏗️
- Add function and endpoint to check if user has `LibraryAgent` by given
`storeListingVersionId`
- Redesign listing buttons
- `Add to library` shown when user is logged in and doesn't have an
agent in library
- `See runs` shown when user logged in as has the agent in the library
- `Download agent` always shown
- Disabled buttons during processing (adding/downloading)
- Stop raising when owner is trying to add own agent. Now it'll simply
redirect to Library.
- Remove button appearing/flickering after a delay on listing page -
logged in status is now checked in server component.
- Show error toast on adding/redirecting to library and downloading
error
- Update breadcrumbs and page title to say `Marketplace` instead of
`Store`
- `font-geist` -> `font-sans` (`font-geist` var doesn't exist)
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] Button on a listing is `Add to library` (no library agent)
- [x] Agent can be added and user is redirected
- [x] Button on the listing is `See runs` and clicking it redirects to
the library agent
- [x] Remove agent from library
- [x] Buttons shows `Add to library` again
- [x] Agent can be re-added
- [x] Agent can be downloaded
- [x] `Add to library` Button is hidden when user is logged out
---------
Co-authored-by: Zamil Majdy <zamil.majdy@agpt.co>
Process initializer on the process pool should never fail, but we do
network-related stuff there.
This cause the pool to be in a broken state.
### Changes 🏗️
Remove the health check step on process initializer.
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
<!-- Put your test plan here: -->
- [x] Existing CI test
<!-- Clearly explain the need for these changes: -->
Our oauth review wants us to drop this in favor of a diff scope that
will require additional work
### Changes 🏗️
Disables the oauth sheets scopes in prod
<!-- Concisely describe all of the changes made in this pull request:
-->
### Checklist 📋
#### For code changes:
- [ ] I have clearly listed my changes in the PR description
- [ ] I have made a test plan
- [ ] I have tested my changes according to the test plan:
<!-- Put your test plan here: -->
- [ ] set env locally
A collection of updates regarding onboarding and wallet.
### Changes 🏗️
- `try-except` instead of `if` when rewarding (skip unnecessary db call)
- Make external services question onboarding step optional
- Add `SmartImage` component to lazy load images with pulse animation
and use it throughout onboarding
- Use store agent name instead of graph graph name (run page)
- Fix some images breaking layout on the agent card (run page)
- Center agent card vertically and horizontally (center on the left half
of page) (run page)
- Delay and tweak confetti when opening wallet and when task finished
(wallet)
- Flash wallet when credits change value
- Make tutorial video grayscale on completed steps (wallet)
- Fix confetti triggering on page refresh (wallet)
- Redirect to agent run page instead of Library after onboarding
- Expand task groups by default (wallet) - this means tutorial videos
are visible by default
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] Services step is optional and skipping it doesn't break onboarding
- [x] `SmartImage` works properly
- [x] Agent card is aligned properly, including on page scroll
- [x] Wallet flash when credits value change
- [x] User is redirected to the agent runs page after onboarding
Currently, the agent/graph execution engine is consuming the execution
queue and acknowledges the message after fully completing its execution
or failing it.
However, in the case of the agent executor failing due to a
hardware/resource issue, or the executor did not manage to acknowledge
the execution message. Another agent executor will pick it up and start
the execution again from the beginning.
The scope of this PR is to make the next executor pick up the next work
to continue the pre-existing execution instead of starting it all over
from the beginning.
### Changes 🏗️
* Removed `start_node_execs` from `GraphExecutionEntry`
* Populate the starting graph node from the DB query instead (fetching
Running & Queued node executions).
* Removed `get_incomplete_node_executions` from DB manager.
* Use get_node_executions with a status filter instead.
* Allow graph execution to end in non-FAILED/COMPLETED status, e.g, when
the executor is interrupted, it should be stuck in the running status,
and let other executors continue the task.
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
<!-- Put your test plan here: -->
- [x] Run an agent, stop the executor midway, re-reun the executor, the
execution should be continued instead of restarted.
### Changes 🏗️
* Avoid executing any agent with a zero balance.
* Make node execution count global across agents for a single user.
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
<!-- Put your test plan here: -->
- [x] Run agents by tweaking the `execution_cost_count_threshold` &
`execution_cost_per_threshold` values.
### Changes 🏗️
* Avoid executing any agent with a zero balance.
* Make node execution count global across agents for a single user.
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
<!-- Put your test plan here: -->
- [x] Run agents by tweaking the `execution_cost_count_threshold` &
`execution_cost_per_threshold` values.
Using sync code in the async route often introduces a blocking
event-loop code that impacts stability.
The current RPC system only provides a synchronous client to call the
service endpoints.
The scope of this PR is to provide an entirely decoupled signature
between client and server, allowing the client can mix & match async &
sync options on the client code while not changing the async/sync nature
of the server.
### Changes 🏗️
* Add support for flexible async/sync RPC client.
* Migrate scheduler client to all-async client.
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] Scheduler route test.
- [x] Modified service_test.py
- [x] Run normal agent executions
Fixes the admin add dollars, in the ``add-money-button.tsx`` file, in
the handleApproveSubmit action it was trying to use formatCredits for
the value which is wrong, this fix changes it
```diff
<form action={handleApproveSubmit}>
<input type="hidden" name="id" value={userId} />
<input
type="hidden"
name="amount"
- value={formatCredits(Number(dollarAmount))}
+ value={Math.round(parseFloat(dollarAmount) * 100)}
/>
```
i was able to add $1, $0.10 and $0.01

```
FAILED test/model_test.py::test_agent_preset_from_db - pydantic_core._pydantic_core.ValidationError: 1 validation error for AgentNodeExecutionInputOutput
E pydantic_core._pydantic_core.ValidationError: 1 validation error for AgentNodeExecutionInputOutput
E data
E JSON input should be string, bytes or bytearray [type=json_type, input_value=Json, input_type=Json]
E For further information visit https://errors.pydantic.dev/2.11/v/json_type
```
### Changes 🏗️
Manually creating a Prisma model often breaks, and we have such an
instance in the test.
This PR fixes the test to make the new Pydantic happy.
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] CI
<!-- Clearly explain the need for these changes: -->
We need a way to refund people who spend money on agents wihout making
manual db actions
### Changes 🏗️
- Adds a bunch for refunding users
- Adds reasons and admin id for actions
- Add admin to db manager
- Add UI for this for the admin panel
- Clean up pagination controls
<!-- Concisely describe all of the changes made in this pull request:
-->
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
<!-- Put your test plan here: -->
- [x] Test by importing dev db as baseline
- [x] Add transactions on top for "refund", and make sure all existing
transactions work
---------
Co-authored-by: Zamil Majdy <zamil.majdy@agpt.co>
When an executor dies, an ongoing execution will not be retried and will
just stuck in the running status.
This change avoids such a scenario by allowing an execution of an entry
that is not in QUEUED status with the low-probability risk of double
execution.
### Changes 🏗️
* Allow non-QUEUED status to be re-executed.
* Improve cleanup of node & graph executor.
* Make a cancellation request consumption a separate thread to avoid
being blocked by other messages.
* Remove unused retry loop on the execution manager.
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
<!-- Put your test plan here: -->
- [x] Run agent, kill the server, re-run it, agent restarted.
<!-- Clearly explain the need for these changes: -->
This PR fixes [Issue
#9883](https://github.com/Significant-Gravitas/AutoGPT/issues/9883),
where the SendWebRequestBlock crashes when receiving a 204 No Content
response, such as when posting to a Discord webhook. The fix ensures
that empty responses are handled gracefully, and the block does not
crash.
### Changes 🏗️
- Added a check to handle empty HTTP responses (like 204 status) in
SendWebRequestBlock
- Fallback to empty string or None if there is no response content
- Prevents server errors when parsing non-existent response bodies
<!-- Concisely describe all of the changes made in this pull request:
-->
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
<!-- Put your test plan here: -->
- [x] Send a POST request to an endpoint that returns 204 No Content
- [x] Confirm that SendWebRequestBlock handles it without crashing
- [x] Confirm that regular 200 OK JSON responses still work
---------
Co-authored-by: Zamil Majdy <zamil.majdy@agpt.co>
Co-authored-by: Lohith-11 <lohithr011@gamil.com>
Co-authored-by: Toran Bruce Richards <toran.richards@gmail.com>
Co-authored-by: Nicholas Tindle <nicholas.tindle@agpt.co>
Add Note to "Getting Started" page for Raspberry Pi 5 page size issue
with `supabase-vector` that prevents `docker compose up` from running
successfully.
<!-- Clearly explain the need for these changes: -->
### Changes 🏗️
- Added a Note to the "Getting Started" page that explains a change in
Raspberry Pi OS for Raspberry Pi 5s, and how to revert the change to
avoid an issue running the backend on Docker.
<!-- Concisely describe all of the changes made in this pull request:
-->
### Checklist 📋
#### For code changes:
- [x] No code changes
#### For configuration changes:
- [x] No configuration changes
---------
Co-authored-by: Zamil Majdy <zamil.majdy@agpt.co>
Co-authored-by: Nicholas Tindle <nicholas.tindle@agpt.co>
<!-- Clearly explain the need for these changes: -->
we oopsed and used the wrong attribute for short desc
### Changes 🏗️
Uses sub heading instead now
<!-- Concisely describe all of the changes made in this pull request:
-->
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
<!-- Put your test plan here: -->
- [x] check the expected text shows
<!-- Clearly explain the need for these changes: -->
for admins to approve agents for the marketplace, we need to be able to
run them. this is a quick workaround for downloading them so you can put
them in your marketplace to check
### Changes 🏗️
- clones various endpoints related to downloading into an admin side
with logging, and admin checks
- adds download button and removes open in builder action
<!-- Concisely describe all of the changes made in this pull request:
-->
### Checklist 📋
#### For code changes:
- [ ] I have clearly listed my changes in the PR description
- [ ] I have made a test plan
- [ ] I have tested my changes according to the test plan:
<!-- Put your test plan here: -->
- [ ] Test downloading agents from local marketplace
- fix#9882
we’re currently using optional multi select, and it’s working great.
We’re able to correctly determine the data type for it. However, there’s
a small issue. We’re not using the correct subSchema that is inside
anyOf on the multi select input. This is why we’re getting the problem
on the Twitter block. It’s the only one that’s using this type of input,
so it’s the only one that’s affected.

---------
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
### Changes 🏗️
Bring back PrintConsoleBlock
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] Print console block
---------
Co-authored-by: Zamil Majdy <zamil.majdy@agpt.co>
The transaction with zero payment amount will not generate a payment ID,
so the checkout failed for this scenario.
### Changes 🏗️
Don't use payment id as transaction key on top-up with zero payment
amount.
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] Top-up with stripe coupon
There are instances of node executions that were failed and end up stuck
in the RUNNING status due to the execution failed to release the lock:
```
2025-04-24 20:53:31,573 INFO [ExecutionManager|uid:25eba2d1-e9c1-44bc-88c7-43e0f4fbad5a|gid:01f8c315-c163-4dd1-a8a0-d396477c5a9f|nid:f8bf84ae-b1f0-4434-8f04-80f43852bc30]|geid:2e1b35c6-0d2f-4e97-adea-f6fe0d9965d0|neid:590b29ea-63ee-4e24-a429-de5a3e191e72|-] Failed node execution 590b29ea-63ee-4e24-a429-de5a3e191e72: Cannot release a lock that's no longer owned
```
### Changes 🏗️
Check the ownership of the lock before releasing.
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
<!-- Put your test plan here: -->
- [x] Existing CI tests.
(cherry picked from commit ef022720d5)
👋 Hi there! This PR was automatically generated by Autofix 🤖
This fix was triggered by Toran Bruce Richards.
Fixes
[AUTOGPT-SERVER-1ZY](https://sentry.io/organizations/significant-gravitas/issues/6386687527/).
The issue was that: `llm_call` calculates `max_tokens` without
considering `input_tokens`, causing OpenRouter API errors when the
context window is exceeded.
- Implements a function `estimate_token_count` to estimate the number of
tokens in a list of messages.
- Calculates available tokens based on the context window, estimated
input tokens, and user-defined max tokens.
- Adjusts `max_tokens` for LLM calls to prevent exceeding context window
limits.
- Reduces `max_tokens` by 15% and retries if a token limit error is
encountered during LLM calls.
If you have any questions or feedback for the Sentry team about this
fix, please email [autofix@sentry.io](mailto:autofix@sentry.io) with the
Run ID: 32838.
---------
Co-authored-by: sentry-autofix[bot] <157164994+sentry-autofix[bot]@users.noreply.github.com>
Co-authored-by: Krzysztof Czerwinski <kpczerwinski@gmail.com>
There are instances of node executions that were failed and end up stuck
in the RUNNING status due to the execution failed to release the lock:
```
2025-04-24 20:53:31,573 INFO [ExecutionManager|uid:25eba2d1-e9c1-44bc-88c7-43e0f4fbad5a|gid:01f8c315-c163-4dd1-a8a0-d396477c5a9f|nid:f8bf84ae-b1f0-4434-8f04-80f43852bc30]|geid:2e1b35c6-0d2f-4e97-adea-f6fe0d9965d0|neid:590b29ea-63ee-4e24-a429-de5a3e191e72|-] Failed node execution 590b29ea-63ee-4e24-a429-de5a3e191e72: Cannot release a lock that's no longer owned
```
### Changes 🏗️
Check the ownership of the lock before releasing.
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
<!-- Put your test plan here: -->
- [x] Existing CI tests.
### Changes 🏗️
Provide a system toggle for disabling the billing page:
NEXT_PUBLIC_SHOW_BILLING_PAGE
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] Toggle `NEXT_PUBLIC_SHOW_BILLING_PAGE` value.
Smart Decision Block was not able to work with sub agent with custom
name input & the bead were not properly propagated in the execution UI.
The scope of this PR is fixing it.
### Changes 🏗️
* Introduce an easy to parse format of tool edge:
`{tool}_^_{func}_~_{arg}`. Graph using SmartDecisionBlock needs to be
re-saved before execution to work.
* Reduce cluttering on a smart decision block logic.
* Fix beads not being shown for a smart decision block tool calling.
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] Execute an SDM with some special character input as a tool
<img width="672" alt="image"
src="https://github.com/user-attachments/assets/873556b3-c16a-4dd1-ad84-bc86c636c406"
/>
Update "Edit a copy" modal text when copying marketplace agent in
Library. Update agent action buttons to reflect the design accurately.
### Changes 🏗️
- Update modal text
- Disable copying owned agents (only marketplace allowed)
- `Open in Builder` -> `Customize agent`
- Disabled `Customize agent` instead of hiding
- Change `Delete agent` to non-destructive design
### Checklist 📋
#### For code changes:
- [ ] I have clearly listed my changes in the PR description
- [ ] I have made a test plan
- [ ] I have tested my changes according to the test plan:
- [ ] ...
Strip secrets, credentials when forking agent
### Changes 🏗️
<!-- Concisely describe all of the changes made in this pull request:
-->
### Checklist 📋
#### For code changes:
- [ ] I have clearly listed my changes in the PR description
- [ ] I have made a test plan
- [ ] I have tested my changes according to the test plan:
- [ ] ...
Currently, we have no visibility on the state of the execution manager,
the scope of this PR is to open up the observability of it by exposing
Prometheus metrics.
### Changes 🏗️
Re-use the execution manager port to expose the Prometheus metrics.
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] Hit /metrics on 8002 port
### Changes 🏗️
Set process starting mode to forkserver instead of spawn, if possible,
for performance benefits.
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] Existing tests
Executor process initialization can fail and cause this error:
```
concurrent.futures.process.BrokenProcessPool: A child process terminated abruptly, the process pool is not usable anymore
```
### Changes 🏗️
Add retry to reduce the chance of the initialization error to happen.
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] Existing tests