Unverified Commit 94540748 authored by ysaito1001's avatar ysaito1001 Committed by GitHub
Browse files

Fix panic when failing to create `Duration` for exponential backoff from a large float (#3621)

## Motivation and Context
Avoids panic when [Duration for exponential
backoff](https://github.com/smithy-lang/smithy-rs/blob/main/rust-runtime/aws-smithy-runtime/src/client/retries/strategy/standard.rs#L150)
could not be created from a large float.

## Description

[Duration::from_secs_f64](https://doc.rust-lang.org/core/time/struct.Duration.html#method.from_secs_f64)
may panic. This PR switches to use a fallible sibling
`Duration::try_from_secs_f64` to avoid panics. If
`Duration::try_from_secs_f64` returns an `Err` we fallback to
`max_backoff` for subsequent retries. Furthermore, we learned from
internal discussion that jitter also needs to be applied to
`max_backoff`. This PR updates `calculate_exponential_backoff` to handle
all the said business logic in one place.

## Testing
- Added a unit test
`should_not_panic_when_exponential_backoff_duration_could_not_be_created`
- Manually verified reproduction steps provided [in the original
PR](https://github.com/awslabs/aws-sdk-rust/issues/1133)
<details> 
<summary>More details</summary>

```
#[tokio::test]
async fn repro_1133() {
    let config = aws_config::from_env()
        .region(Region::new("non-existing-region")) // forces retries
        .retry_config(
            RetryConfig::standard()
                .with_initial_backoff(Duration::from_millis(1))
                .with_max_attempts(100),
        )
        .timeout_config(
            TimeoutConfig::builder()
                .operation_attempt_timeout(Duration::from_secs(180))
                .operation_timeout(Duration::from_secs(600))
                .build(),
        )
        .load()
        .await;

    let client: Client = Client::new(&config);
    let res = client
        .list_objects_v2()
        .bucket("bucket-name-does-not-matter")
        .send()
        .await;

    dbg!(res);
}
```

Without changes in this PR:
```
---- repro_1133 stdout ----
thread 'repro_1133' panicked at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/core/src/time.rs:741:23:
can not convert float seconds to Duration: value is either too big or NaN
stack backtrace:
...

failures:
    repro_1133

test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 9 filtered out; finished in 338.18s
```

With changes in this PR:
```
// no panic
---- repro_1133 stdout ----
res = Err(
    TimeoutError(
        TimeoutError {
            source: MaybeTimeoutError {
                kind: Operation,
                duration: 600s,
            },
        },
    ),
)
```
</details> 

## Checklist
<!--- If a checkbox below is not applicable, then please DELETE it
rather than leaving it unchecked -->
- [x] I have updated `CHANGELOG.next.toml` if I made changes to the
smithy-rs codegen or runtime crates
- [x] I have updated `CHANGELOG.next.toml` if I made changes to the AWS
SDK, generated SDK code, or SDK runtime crates

## Appendix
<details> 
<summary>runtime-versioner bug fix</summary>

This PR also fixes a limitation in `runtime-versioner audit`. I included
the fix in the PR because the issue occurred with special conditions,
and we don't get to reproduce it every day. The way the issue manifests
is this.

1. We have a branch from the main whose latest release tag at the time
was `release-2024-04-30`
2. The main has moved ahead and a new `smithy-rs` release has been made
`release-2024-05-08`
3. We perform `git merge main`, pre-commit hooks run, and we then get
`audit` failures from `runtime-versioner`:
```
2024-05-10T16:54:36.434968Z  WARN runtime_versioner::command::audit: there are newer releases since 'release-2024-04-30'
aws-config was changed and version bumped, but the new version number (1.4.0) has already been published to crates.io. Choose a new version number.
aws-runtime was changed and version bumped, but the new version number (1.2.2) has already been published to crates.io. Choose a new version number.
aws-smithy-runtime-api was changed and version bumped, but the new version number (1.6.0) has already been published to crates.io. Choose a new version number.
aws-smithy-types was changed and version bumped, but the new version number (1.1.9) has already been published to crates.io. Choose a new version number.
aws-types was changed and version bumped, but the new version number (1.2.1) has already been published to crates.io. Choose a new version number.
Error: there are audit failures in the runtime crates
```
This happens because when the latest `main` is being merged to our
branch, `runtime-versioner audit` should use `previous_release_tag`
`release-2024-05-08` to perform audit but [pre-commit hooks run the
tool](https://github.com/smithy-lang/smithy-rs/blob/main/.pre-commit-hooks/runtime-versioner.sh#L8)
using the latest previous release tag that can be traced back from
`HEAD` of our branch, which is `release-2024-04-30`. Hence the error.

The fix adds an environment variable
`SMITHY_RS_RUNTIME_VERSIONER_AUDIT_PREVIOUS_RELEASE_TAG` to specify a
previous release tag override, in addition to a `--previous-release-tag`
command-line argument.
Furthermore, the fix has relaxed certain checks in `audit`. Taking our
example for instance, when `HEAD` is now behind `release-2024-05-08`,
it's OK to fail even if `release-2024-05-08` is not the ancestor of
`HEAD` (as stated above, `git merge-base --is-ancestor` does not know
that while main is being merged) as long as `release-2024-04-28` (the
latest release seen from `HEAD`) is the ancestor of
`release-2024-05-08`.

```
   release-2024-04-28               release-2024-05-08           
            │                                │                   
────────────┼───────────────┬────────────────┼───────x─────► main
            │               │HEAD            │       x           
                            │ of                     x           
                            │branch                  x           
                            │                        x           
                            │                        x           
                            │              xxxxxxxxxxx           
                            │              x                     
                            │              x  git merge main     
                            │              x                     
                            │              ▼                     
                            │                                    
                            └──────────────────► feature branch  
```
To use the fix, set the environment variable with the new release tag
and perform `git merge main`:
```
➜  smithy-rs git:(ysaito/fix-panic-in-exp-backoff) ✗ export SMITHY_RS_RUNTIME_VERSIONER_AUDIT_PREVIOUS_RELEASE_TAG=release-2024-05-08
➜  smithy-rs git:(ysaito/fix-panic-in-exp-backoff) ✗ git merge main
     ...
     Running `/Users/awsaito/src/smithy-rs/tools/target/debug/runtime-versioner audit`
2024-05-10T19:32:26.665578Z  WARN runtime_versioner::tag: expected previous release to be 'release-2024-04-30', but 'release-2024-05-08' was specified. Proceeding with 'release-2024-05-08'.
SUCCESS
```

</summary>

----

_By submitting this pull request, I confirm that you can use, modify,
copy, and redistribute this contribution, under the terms of your
choice._
parent 14ccc50f
Loading
Loading
Loading
Loading
+13 −1
Original line number Original line Diff line number Diff line
@@ -10,3 +10,15 @@
# references = ["smithy-rs#920"]
# references = ["smithy-rs#920"]
# meta = { "breaking" = false, "tada" = false, "bug" = false, "target" = "client | server | all"}
# meta = { "breaking" = false, "tada" = false, "bug" = false, "target" = "client | server | all"}
# author = "rcoh"
# author = "rcoh"

[[aws-sdk-rust]]
message = "Fix panics that occurred when `Duration` for exponential backoff could not be created from too big a float."
references = ["aws-sdk-rust#1133"]
meta = { "breaking" = false, "tada" = false, "bug" = true }
author = "ysaito1001"

[[smithy-rs]]
message = "Fix panics that occurred when `Duration` for exponential backoff could not be created from too big a float."
references = ["aws-sdk-rust#1133"]
meta = { "breaking" = false, "tada" = false, "bug" = true, "target" = "client" }
author = "ysaito1001"
+1 −1
Original line number Original line Diff line number Diff line
[package]
[package]
name = "aws-smithy-runtime"
name = "aws-smithy-runtime"
version = "1.5.0"
version = "1.5.1"
authors = ["AWS Rust SDK Team <aws-sdk-rust@amazon.com>", "Zelda Hessler <zhessler@amazon.com>"]
authors = ["AWS Rust SDK Team <aws-sdk-rust@amazon.com>", "Zelda Hessler <zhessler@amazon.com>"]
description = "The new smithy runtime crate"
description = "The new smithy runtime crate"
edition = "2021"
edition = "2021"
+54 −16
Original line number Original line Diff line number Diff line
@@ -138,7 +138,7 @@ impl StandardRetryStrategy {
                    } else {
                    } else {
                        fastrand::f64()
                        fastrand::f64()
                    };
                    };
                    let backoff = calculate_exponential_backoff(
                    Ok(calculate_exponential_backoff(
                        // Generate a random base multiplier to create jitter
                        // Generate a random base multiplier to create jitter
                        base,
                        base,
                        // Get the backoff time multiplier in seconds (with fractional seconds)
                        // Get the backoff time multiplier in seconds (with fractional seconds)
@@ -146,8 +146,9 @@ impl StandardRetryStrategy {
                        // `self.local.attempts` tracks number of requests made including the initial request
                        // `self.local.attempts` tracks number of requests made including the initial request
                        // The initial attempt shouldn't count towards backoff calculations, so we subtract it
                        // The initial attempt shouldn't count towards backoff calculations, so we subtract it
                        request_attempts - 1,
                        request_attempts - 1,
                    );
                        // Maximum backoff duration as a fallback to prevent overflow when calculating a power
                    Ok(Duration::from_secs_f64(backoff).min(retry_cfg.max_backoff()))
                        retry_cfg.max_backoff(),
                    ))
                }
                }
            }
            }
            RetryAction::RetryForbidden | RetryAction::NoActionIndicated => {
            RetryAction::RetryForbidden | RetryAction::NoActionIndicated => {
@@ -288,11 +289,29 @@ fn check_rate_limiter_for_delay(
    None
    None
}
}


fn calculate_exponential_backoff(base: f64, initial_backoff: f64, retry_attempts: u32) -> f64 {
fn calculate_exponential_backoff(
    2_u32
    base: f64,
    initial_backoff: f64,
    retry_attempts: u32,
    max_backoff: Duration,
) -> Duration {
    let result = match 2_u32
        .checked_pow(retry_attempts)
        .checked_pow(retry_attempts)
        .map(|backoff| (backoff as f64) * base * initial_backoff)
        .map(|power| (power as f64) * initial_backoff)
        .unwrap_or(f64::MAX)
    {
        Some(backoff) => match Duration::try_from_secs_f64(backoff) {
            Ok(result) => result.min(max_backoff),
            Err(e) => {
                tracing::warn!("falling back to {max_backoff:?} as `Duration` could not be created for exponential backoff: {e}");
                max_backoff
            }
        },
        None => max_backoff,
    };

    // Apply jitter to `result`, and note that it can be applied to `max_backoff`.
    // Won't panic because `base` is either in range 0..1 or a constant 1 in testing (if configured).
    result.mul_f64(base)
}
}


fn get_seconds_since_unix_epoch(runtime_components: &RuntimeComponents) -> f64 {
fn get_seconds_since_unix_epoch(runtime_components: &RuntimeComponents) -> f64 {
@@ -425,6 +444,23 @@ mod tests {
        assert_eq!(ShouldAttempt::No, actual);
        assert_eq!(ShouldAttempt::No, actual);
    }
    }


    #[test]
    fn should_not_panic_when_exponential_backoff_duration_could_not_be_created() {
        let (ctx, rc, cfg) = set_up_cfg_and_context(
            ErrorKind::TransientError,
            // Greater than 32 when subtracted by 1 in `calculate_backoff`, causing overflow in `calculate_exponential_backoff`
            33,
            RetryConfig::standard()
                .with_use_static_exponential_base(true)
                .with_max_attempts(100), // Any value greater than 33 will do
        );
        let strategy = StandardRetryStrategy::new();
        let actual = strategy
            .should_attempt_retry(&ctx, &rc, &cfg)
            .expect("method is infallible for this use");
        assert_eq!(ShouldAttempt::YesAfterDelay(MAX_BACKOFF), actual);
    }

    #[derive(Debug)]
    #[derive(Debug)]
    struct ServerError;
    struct ServerError;
    impl fmt::Display for ServerError {
    impl fmt::Display for ServerError {
@@ -868,14 +904,16 @@ mod tests {
        assert_eq!(token_bucket.available_permits(), 480);
        assert_eq!(token_bucket.available_permits(), 480);
    }
    }


    const MAX_BACKOFF: Duration = Duration::from_secs(20);

    #[test]
    #[test]
    fn calculate_exponential_backoff_where_initial_backoff_is_one() {
    fn calculate_exponential_backoff_where_initial_backoff_is_one() {
        let initial_backoff = 1.0;
        let initial_backoff = 1.0;


        for (attempt, expected_backoff) in [initial_backoff, 2.0, 4.0].into_iter().enumerate() {
        for (attempt, expected_backoff) in [initial_backoff, 2.0, 4.0].into_iter().enumerate() {
            let actual_backoff =
            let actual_backoff =
                calculate_exponential_backoff(1.0, initial_backoff, attempt as u32);
                calculate_exponential_backoff(1.0, initial_backoff, attempt as u32, MAX_BACKOFF);
            assert_eq!(expected_backoff, actual_backoff);
            assert_eq!(Duration::from_secs_f64(expected_backoff), actual_backoff);
        }
        }
    }
    }


@@ -885,8 +923,8 @@ mod tests {


        for (attempt, expected_backoff) in [initial_backoff, 6.0, 12.0].into_iter().enumerate() {
        for (attempt, expected_backoff) in [initial_backoff, 6.0, 12.0].into_iter().enumerate() {
            let actual_backoff =
            let actual_backoff =
                calculate_exponential_backoff(1.0, initial_backoff, attempt as u32);
                calculate_exponential_backoff(1.0, initial_backoff, attempt as u32, MAX_BACKOFF);
            assert_eq!(expected_backoff, actual_backoff);
            assert_eq!(Duration::from_secs_f64(expected_backoff), actual_backoff);
        }
        }
    }
    }


@@ -896,17 +934,17 @@ mod tests {


        for (attempt, expected_backoff) in [initial_backoff, 0.06, 0.12].into_iter().enumerate() {
        for (attempt, expected_backoff) in [initial_backoff, 0.06, 0.12].into_iter().enumerate() {
            let actual_backoff =
            let actual_backoff =
                calculate_exponential_backoff(1.0, initial_backoff, attempt as u32);
                calculate_exponential_backoff(1.0, initial_backoff, attempt as u32, MAX_BACKOFF);
            assert_eq!(expected_backoff, actual_backoff);
            assert_eq!(Duration::from_secs_f64(expected_backoff), actual_backoff);
        }
        }
    }
    }


    #[test]
    #[test]
    fn calculate_backoff_overflow() {
    fn calculate_backoff_overflow_should_gracefully_fallback_to_max_backoff() {
        // avoid overflow for a silly large amount of retry attempts
        // avoid overflow for a silly large amount of retry attempts
        assert_eq!(
        assert_eq!(
            calculate_exponential_backoff(1_f64, 10_f64, 100000),
            MAX_BACKOFF,
            f64::MAX
            calculate_exponential_backoff(1_f64, 10_f64, 100000, MAX_BACKOFF),
        );
        );
    }
    }
}
}
+1 −1
Original line number Original line Diff line number Diff line
@@ -16,7 +16,7 @@ opt-level = 0
[dependencies]
[dependencies]
anyhow = "1.0.75"
anyhow = "1.0.75"
camino = "1.1.6"
camino = "1.1.6"
clap = { version = "4.4.11", features = ["derive"] }
clap = { version = "4.4.11", features = ["derive", "env"] }
crates-index = "2.3.0"
crates-index = "2.3.0"
indicatif = "0.17.7"
indicatif = "0.17.7"
reqwest = { version = "0.11.22", features = ["blocking"] }
reqwest = { version = "0.11.22", features = ["blocking"] }
+3 −1
Original line number Original line Diff line number Diff line
@@ -32,7 +32,9 @@ pub fn audit(args: Audit) -> Result<()> {
    let previous_release_tag =
    let previous_release_tag =
        previous_release_tag(&repo, &release_tags, args.previous_release_tag.as_deref())?;
        previous_release_tag(&repo, &release_tags, args.previous_release_tag.as_deref())?;
    if release_tags.first() != Some(&previous_release_tag) {
    if release_tags.first() != Some(&previous_release_tag) {
        tracing::warn!("there are newer releases since '{previous_release_tag}'");
        tracing::warn!("there are newer releases since '{previous_release_tag}'. \
            Consider specifying a more recent release tag using the `--previous-release-tag` command-line argument or \
            the `SMITHY_RS_RUNTIME_VERSIONER_AUDIT_PREVIOUS_RELEASE_TAG` environment variable if audit fails.");
    }
    }


    let next_crates = discover_runtime_crates(&repo.root).context("next")?;
    let next_crates = discover_runtime_crates(&repo.root).context("next")?;
Loading