Abstract persistent files through Apache OpenDAL #174

Closed
opened 2025-07-08 08:41:50 -04:00 by AtHeartEngineer · 0 comments

Originally created by @txase on 2/21/2025

This PR represents the first set of attempting to incorporate the changes from the AWS Serverless POC in #5591 and contains two commits:

  1. Refactor the codebase to access persistent files through Apache OpenDAL
  2. Add support for an optional AWS S3 OpenDAL backend behind the new s3 feature flag

I decided to put these two commits into one PR to make it easier for me to get the new abstractions right, and to demonstrate what it takes to add an additional OpenDAL backend. I'm happy to split these into separate sequential PRs if that's preferred.

The first commit looks quite large at first glance, but many changes simply make all file accesses asynchronous and fallible by returning Result types. I think the key changes worth reviewing are:

  • src/auth.rs: Slightly refactored initialize_keys() as needed for OpenDAL access of private key file
  • src/config.rs:
    • CONFIG is still a synchronous Lazy type, but we have to calculate it from async methods. I shoehorned a tiny tokio async thread that runs to completion to calculate the value. The alternative of patching every use of CONFIG to be async, for no runtime benefit, seemed not worth doing.
    • See the new opendal_operator_for_path() and the abstracted CONFIG.opendal_operator_for_path_type() methods for the core of how operators are managed for various paths.
  • src/util.rs: Has a new function save_temp_file() that abstracts the saving of TempFiles that Rocket creates when files are uploaded
  • Wherever persistent data files are used, we now generate an OpenDAL Operator to access them. This includes config.json, sends, attachments, and icons.

The second commit is much smaller and more straightforward. The only thing worth pointing out is that OpenDAL uses reqsign under the covers to configure AWS credentials. However, AWS SDK configs have repeatedly been extended for better credential generation. For example, I use AWS Identity Center (aka AWS SSO) to generate temporary access tokens in my dev environment. reqsign doesn't support AWS SSO configs, but it has an escape hatch I utilized to load credentials. In the escape hatch I load the official AWS SDK config and credential generation crates to generate credentials. The one annoying part of the escape hatch is that reqsign's AwsCredentialLoad trait uses anyhow::Result, so we have to pull in anyhow just for this escape hatch :(.

Trying it out

These changes should be a behavioral no-op for existing use cases. The one minor change is the attachments, icon_cache, and sends folders aren't created at startup as OpenDAL FS service creates them when the first Operator is instantiated for each.

To try out the new S3 changes:

  1. Build with the s3 feature turned on
  2. Configure an AWS profile (this implementation honors all standard env vars like AWS_PROFILE and AWS_REGION along with standard AWS configs like ~/.aws/config)
  3. Set the following config values:
    • DATA_FOLDER -> s3://<bucket in the matching AWS region>[/<optional path prefix>]
    • ALLOWED_CONNECT_SRC -> https://<bucket>.s3.<region>.amazonaws.com (required if using web-vault)
    • TMP_FOLDER -> data/tmp (or your preference, but must be set to a local path)
    • TEMPLATES_FOLDER -> data/templates (or your preference, but must be set to a local path)
    • DATABASE_URL -> data/db.sqlite3 (or your preference, but must be set to a valid value)
*Originally created by @txase on 2/21/2025* This PR represents the first set of attempting to incorporate the changes from the AWS Serverless POC in #5591 and contains two commits: 1. Refactor the codebase to access persistent files through [Apache OpenDAL](https://opendal.apache.org/) 2. Add support for an optional AWS S3 OpenDAL backend behind the new `s3` feature flag > I decided to put these two commits into one PR to make it easier for me to get the new abstractions right, and to demonstrate what it takes to add an additional OpenDAL backend. I'm happy to split these into separate sequential PRs if that's preferred. The first commit looks quite large at first glance, but many changes simply make all file accesses asynchronous and fallible by returning `Result` types. I think the key changes worth reviewing are: * src/auth.rs: Slightly refactored `initialize_keys()` as needed for OpenDAL access of private key file * src/config.rs: * `CONFIG` is still a synchronous `Lazy` type, but we have to calculate it from async methods. I shoehorned a tiny tokio async thread that runs to completion to calculate the value. The alternative of patching every use of `CONFIG` to be async, for no runtime benefit, seemed not worth doing. * See the new `opendal_operator_for_path()` and the abstracted `CONFIG.opendal_operator_for_path_type()` methods for the core of how operators are managed for various paths. * src/util.rs: Has a new function `save_temp_file()` that abstracts the saving of [TempFiles](https://api.rocket.rs/v0.5/rocket/fs/enum.TempFile) that Rocket creates when files are uploaded * Wherever persistent data files are used, we now generate an [OpenDAL Operator](https://docs.rs/opendal/latest/opendal/struct.Operator.html) to access them. This includes config.json, sends, attachments, and icons. The second commit is much smaller and more straightforward. The only thing worth pointing out is that OpenDAL uses [reqsign](https://docs.rs/reqsign/latest/reqsign/) under the covers to configure AWS credentials. However, AWS SDK configs have repeatedly been extended for better credential generation. For example, I use AWS Identity Center (aka AWS SSO) to generate temporary access tokens in my dev environment. reqsign doesn't support AWS SSO configs, but it has an escape hatch I utilized to load credentials. In the escape hatch I load the official AWS SDK config and credential generation crates to generate credentials. The one annoying part of the escape hatch is that reqsign's [AwsCredentialLoad](https://docs.rs/reqsign/latest/reqsign/trait.AwsCredentialLoad.html) trait uses `anyhow::Result`, so we have to pull in anyhow just for this escape hatch :(. ## Trying it out These changes should be a behavioral no-op for existing use cases. The one minor change is the attachments, icon_cache, and sends folders aren't created at startup as OpenDAL FS service creates them when the first Operator is instantiated for each. To try out the new S3 changes: 1. Build with the `s3` feature turned on 2. Configure an AWS profile (this implementation honors all standard env vars like `AWS_PROFILE` and `AWS_REGION` along with standard AWS configs like `~/.aws/config`) 3. Set the following config values: * `DATA_FOLDER` -> `s3://<bucket in the matching AWS region>[/<optional path prefix>]` * `ALLOWED_CONNECT_SRC` -> `https://<bucket>.s3.<region>.amazonaws.com` (required if using web-vault) * `TMP_FOLDER` -> `data/tmp` (or your preference, but must be set to a local path) * `TEMPLATES_FOLDER` -> `data/templates` (or your preference, but must be set to a local path) * `DATABASE_URL` -> `data/db.sqlite3` (or your preference, but must be set to a valid value)
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github/vaultwarden#174