Skip to content
Unverified Commit 94540748 authored by ysaito1001's avatar ysaito1001 Committed by GitHub
Browse files

Fix panic when failing to create `Duration` for exponential backoff from a large float (#3621)

## Motivation and Context
Avoids panic when [Duration for exponential
backoff](https://github.com/smithy-lang/smithy-rs/blob/main/rust-runtime/aws-smithy-runtime/src/client/retries/strategy/standard.rs#L150)
could not be created from a large float.

## Description

[Duration::from_secs_f64](https://doc.rust-lang.org/core/time/struct.Duration.html#method.from_secs_f64)
may panic. This PR switches to use a fallible sibling
`Duration::try_from_secs_f64` to avoid panics. If
`Duration::try_from_secs_f64` returns an `Err` we fallback to
`max_backoff` for subsequent retries. Furthermore, we learned from
internal discussion that jitter also needs to be applied to
`max_backoff`. This PR updates `calculate_exponential_backoff` to handle
all the said business logic in one place.

## Testing
- Added a unit test
`should_not_panic_when_exponential_backoff_duration_could_not_be_created`
- Manually verified reproduction steps provided [in the original
PR](https://github.com/awslabs/aws-sdk-rust/issues/1133)
<details> 
<summary>More details</summary>

```
#[tokio::test]
async fn repro_1133() {
    let config = aws_config::from_env()
        .region(Region::new("non-existing-region")) // forces retries
        .retry_config(
            RetryConfig::standard()
                .with_initial_backoff(Duration::from_millis(1))
                .with_max_attempts(100),
        )
        .timeout_config(
            TimeoutConfig::builder()
                .operation_attempt_timeout(Duration::from_secs(180))
                .operation_timeout(Duration::from_secs(600))
                .build(),
        )
        .load()
        .await;

    let client: Client = Client::new(&config);
    let res = client
        .list_objects_v2()
        .bucket("bucket-name-does-not-matter")
        .send()
        .await;

    dbg!(res);
}
```

Without changes in this PR:
```
---- repro_1133 stdout ----
thread 'repro_1133' panicked at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/core/src/time.rs:741:23:
can not convert float seconds to Duration: value is either too big or NaN
stack backtrace:
...

failures:
    repro_1133

test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 9 filtered out; finished in 338.18s
```

With changes in this PR:
```
// no panic
---- repro_1133 stdout ----
res = Err(
    TimeoutError(
        TimeoutError {
            source: MaybeTimeoutError {
                kind: Operation,
                duration: 600s,
            },
        },
    ),
)
```
</details> 

## Checklist
<!--- If a checkbox below is not applicable, then please DELETE it
rather than leaving it unchecked -->
- [x] I have updated `CHANGELOG.next.toml` if I made changes to the
smithy-rs codegen or runtime crates
- [x] I have updated `CHANGELOG.next.toml` if I made changes to the AWS
SDK, generated SDK code, or SDK runtime crates

## Appendix
<details> 
<summary>runtime-versioner bug fix</summary>

This PR also fixes a limitation in `runtime-versioner audit`. I included
the fix in the PR because the issue occurred with special conditions,
and we don't get to reproduce it every day. The way the issue manifests
is this.

1. We have a branch from the main whose latest release tag at the time
was `release-2024-04-30`
2. The main has moved ahead and a new `smithy-rs` release has been made
`release-2024-05-08`
3. We perform `git merge main`, pre-commit hooks run, and we then get
`audit` failures from `runtime-versioner`:
```
2024-05-10T16:54:36.434968Z  WARN runtime_versioner::command::audit: there are newer releases since 'release-2024-04-30'
aws-config was changed and version bumped, but the new version number (1.4.0) has already been published to crates.io. Choose a new version number.
aws-runtime was changed and version bumped, but the new version number (1.2.2) has already been published to crates.io. Choose a new version number.
aws-smithy-runtime-api was changed and version bumped, but the new version number (1.6.0) has already been published to crates.io. Choose a new version number.
aws-smithy-types was changed and version bumped, but the new version number (1.1.9) has already been published to crates.io. Choose a new version number.
aws-types was changed and version bumped, but the new version number (1.2.1) has already been published to crates.io. Choose a new version number.
Error: there are audit failures in the runtime crates
```
This happens because when the latest `main` is being merged to our
branch, `runtime-versioner audit` should use `previous_release_tag`
`release-2024-05-08` to perform audit but [pre-commit hooks run the
tool](https://github.com/smithy-lang/smithy-rs/blob/main/.pre-commit-hooks/runtime-versioner.sh#L8)
using the latest previous release tag that can be traced back from
`HEAD` of our branch, which is `release-2024-04-30`. Hence the error.

The fix adds an environment variable
`SMITHY_RS_RUNTIME_VERSIONER_AUDIT_PREVIOUS_RELEASE_TAG` to specify a
previous release tag override, in addition to a `--previous-release-tag`
command-line argument.
Furthermore, the fix has relaxed certain checks in `audit`. Taking our
example for instance, when `HEAD` is now behind `release-2024-05-08`,
it's OK to fail even if `release-2024-05-08` is not the ancestor of
`HEAD` (as stated above, `git merge-base --is-ancestor` does not know
that while main is being merged) as long as `release-2024-04-28` (the
latest release seen from `HEAD`) is the ancestor of
`release-2024-05-08`.

```
   release-2024-04-28               release-2024-05-08           
            │                                │                   
────────────┼───────────────┬────────────────┼───────x─────► main
            │               │HEAD            │       x           
                            │ of                     x           
                            │branch                  x           
                            │                        x           
                            │                        x           
                            │              xxxxxxxxxxx           
                            │              x                     
                            │              x  git merge main     
                            │              x                     
                            │              ▼                     
                            │                                    
                            └──────────────────► feature branch  
```
To use the fix, set the environment variable with the new release tag
and perform `git merge main`:
```
➜  smithy-rs git:(ysaito/fix-panic-in-exp-backoff) ✗ export SMITHY_RS_RUNTIME_VERSIONER_AUDIT_PREVIOUS_RELEASE_TAG=release-2024-05-08
➜  smithy-rs git:(ysaito/fix-panic-in-exp-backoff) ✗ git merge main
     ...
     Running `/Users/awsaito/src/smithy-rs/tools/target/debug/runtime-versioner audit`
2024-05-10T19:32:26.665578Z  WARN runtime_versioner::tag: expected previous release to be 'release-2024-04-30', but 'release-2024-05-08' was specified. Proceeding with 'release-2024-05-08'.
SUCCESS
```

</summary>

----

_By submitting this pull request, I confirm that you can use, modify,
copy, and redistribute this contribution, under the terms of your
choice._
parent 14ccc50f
Loading
Loading
Loading
Loading
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment