Skip to content
Unverified Commit 1e803494 authored by ysaito1001's avatar ysaito1001 Committed by GitHub
Browse files

Make `TransferredBytes` be the top of the list in `BinLabel` (#3871)

## Motivation and Context
https://github.com/awslabs/aws-sdk-rust/issues/1202

## Description
The issue above demonstrated the incorrect
[BinLabel](https://github.com/smithy-lang/smithy-rs/blob/07fe426697cc30ad613568902528c305f953deb1/rust-runtime/aws-smithy-runtime/src/client/http/body/minimum_throughput/throughput.rs#L100-L119)
ordering in
[LogBuffer](https://github.com/smithy-lang/smithy-rs/blob/07fe426697cc30ad613568902528c305f953deb1/rust-runtime/aws-smithy-runtime/src/client/http/body/minimum_throughput/throughput.rs#L173-L183),
the underlying data structure we use for stall stream protection.

The following trace logs are generated from executing the reproduction
steps in the issue above. In the file labeled "no_sleep," we have
commented out
`std::thread::sleep(std::time::Duration::from_millis(120));` from the
reproducer so the updated code can be tested as the happy path.


[s3_throughput_min_repro_no_sleep.log](https://github.com/user-attachments/files/17299373/s3_throughput_min_repro_no_sleep.log)


[s3_throughput_min_repro_with_sleep.log](https://github.com/user-attachments/files/17299447/s3_throughput_min_repro_with_sleep.log)

In both files, it’s important to note that `Bin`s assigned
`TransferredBytes` can be overwritten by `Pending` due to
[`ThroughputLogs::push`](https://github.com/smithy-lang/smithy-rs/blob/07fe426697cc30ad613568902528c305f953deb1/rust-runtime/aws-smithy-runtime/src/client/http/body/minimum_throughput/throughput.rs#L346).
Once a `Bin` is labeled as `Pending`, it cannot be re-labeled.

When this occurs, the only way to avoid the stall stream protection
check going into the grace period is for time to advance beyond the
current `Bin`'s resolution, the `LogBuffer` pushes a new `Bin` during
[`catch_up`](https://github.com/smithy-lang/smithy-rs/blob/07fe426697cc30ad613568902528c305f953deb1/rust-runtime/aws-smithy-runtime/src/client/http/body/minimum_throughput/throughput.rs#L355)
, and this new `Bin` hopefully [gets
assigned](https://github.com/smithy-lang/smithy-rs/blob/07fe426697cc30ad613568902528c305f953deb1/rust-runtime/aws-smithy-runtime/src/client/http/body/minimum_throughput/http_body_0_4_x.rs#L78-L79)
a `TransferredBytes`. However, this new `Bin` could also be overwritten
by Pending in a subsequent call to
[`MinimumThroughputDownloadBody::poll_data`](https://github.com/smithy-lang/smithy-rs/blob/07fe426697cc30ad613568902528c305f953deb1/rust-runtime/aws-smithy-runtime/src/client/http/body/minimum_throughput/http_body_0_4_x.rs#L78-L79),
which can trigger the the grace period if the overall `LogBuffer` looks
like it's violated the stall stream protection check.

The reproducer without sleep does not fail the stall stream protection
obviously because the execution completes way before the grace period
ends, but more importantly because the execution periodically assigns
new `TransferredBytes` `Bin`s in the throughput logs. This effectively
resets the grace period for the stall stream protection (search for
`throughput recovered; exiting grace period` in the
`s3_throughput_min_repro_no_sleep.log`). However, with sleep, `Bin`s
labeled as `TransferredBytes` are frequently (and almost immediately)
overwritten by `Pending`. This results in the execution being unable to
exit the grace period, ultimately leading to a stall stream protection
error.

To resolve this, we make `TransferredBytes` be the top priority in
`BinLabel`. This means once a new `Bin` has earned `TransferredBytes`,
it's green for that time resolution and that it should not be revoked by
`Pending` overwriting it to make it look like no bytes transferred
during that time.

## Testing
- Existing tests in CI
- Added unit tests for `BinLabel` ordering and for `ThroughputLogs`
- Passed the customer's reproduction step
- To confirm the stall stream protection for download still works, I
switched off WiFi while running the customer's reproducer (with sleep)
and it successfully failed with the stall stream protection error:
```
---- s3_throughput_min_repro stdout ----
2024-10-08T23:29:24.999477Z DEBUG aws_smithy_runtime::client::http::body::minimum_throughput::http_body_0_4_x: current throughput: 0 B/s is below minimum: 1 B/s
2024-10-08T23:29:24.999513Z TRACE aws_smithy_runtime::client::http::body::minimum_throughput::http_body_0_4_x: received poll pending
2024-10-08T23:29:24.999530Z DEBUG aws_smithy_runtime::client::http::body::minimum_throughput::http_body_0_4_x: current throughput: 0 B/s is below minimum: 1 B/s
2024-10-08T23:29:25.081811Z TRACE aws_smithy_runtime::client::http::body::minimum_throughput::http_body_0_4_x: received poll pending
2024-10-08T23:29:25.081938Z DEBUG aws_smithy_runtime::client::http::body::minimum_throughput::http_body_0_4_x: current throughput: 0 B/s is below minimum: 1 B/s
test s3_throughput_min_repro ... FAILED
...
called `Result::unwrap()` on an `Err` value: Custom { kind: Other, error: Error { kind: StreamingError(ThroughputBelowMinimum { expected: Throughput { bytes_read: 1, per_time_elapsed: 1s }, actual: Throughput { bytes_read: 0, per_time_elapsed: 1s } }) } }
```

## Checklist
<!--- If a checkbox below is not applicable, then please DELETE it
rather than leaving it unchecked -->
- [x] For changes to the smithy-rs codegen or runtime crates, I have
created a changelog entry Markdown file in the `.changelog` directory,
specifying "client," "server," or both in the `applies_to` key.
- [x] For changes to the AWS SDK, generated SDK code, or SDK runtime
crates, I have created a changelog entry Markdown file in the
`.changelog` directory, specifying "aws-sdk-rust" in the `applies_to`
key.

----

_By submitting this pull request, I confirm that you can use, modify,
copy, and redistribute this contribution, under the terms of your
choice._
parent 07fe4266
Loading
Loading
Loading
Loading
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment