Unverified Commit 7c86ecfc authored by david-perez's avatar david-perez Committed by GitHub
Browse files

Add server SDK constraint traits RFC (#1199)

parent 3652538c
Loading
Loading
Loading
Loading
+1 −0
Original line number Diff line number Diff line
@@ -34,3 +34,4 @@
- [RFC-0022: Error Context and Compatibility](./rfc0022_error_context_and_compatibility.md)
- [RFC-0023: Evolving the new service builder API](./rfc0023_refine_builder.md)
- [RFC-0024: RequestID](./rfc0024_request_id.md)
- [RFC-0025: Constraint traits](./rfc0025_constraint_traits.md)
+397 −0
Original line number Diff line number Diff line
RFC: Constraint traits
======================

> Status: Implemented.
>
> See the description of [the PR][builders-of-builders] that laid the
> foundation for the implementation of constraint traits for a complete
> reference. See the [Better Constraint Violations] RFC too for subsequent
> improvements to this design.
>
> See the uber [tracking issue] for pending work.

[builders-of-builders]: https://github.com/awslabs/smithy-rs/pull/1342
[tracking issue]: https://github.com/awslabs/smithy-rs/issues/1401

[Constraint traits] are used to constrain the values that can be provided for a
shape.

For example, given the following Smithy model,

```smithy
@length(min: 18)
Integer Age
```

the integer `Age` must take values greater than or equal to 18.

Constraint traits are most useful when enforced as part of _input_ model
validation to a service. When a server receives a request whose contents
deserialize to input data that violates the modeled constraints, the operation
execution's preconditions are not met, and as such rejecting the request
without executing the operation is expected behavior.

Constraint traits can also be applied to operation output member shapes, but
the expectation is that service implementations _not fail_ to render a response
when an output value does not meet the specified constraints. From
[awslabs/smithy#1039]:

> This might seem counterintuitive, but our philosophy is that a change in
> server-side state should not be hidden from the caller unless absolutely
> necessary. Refusing to service an invalid request should always prevent
> server-side state changes, but refusing to send a response will not, as
> there's generally no reasonable route for a server implementation to unwind
> state changes due to a response serialization failure.

_In general_, clients should not enforce constraint traits in generated code.
Clients must also never enforce constraint traits when sending requests. This
is because:

* addition and removal of constraint traits are backwards-compatible from a
  client's perspective (although this is _not_ documented anywhere in the
  Smithy specification),
* the client may have been generated with an older version of the model; and
* the most recent model version might have lifted some constraints.

On the other hand, server SDKs constitute _the_ source of truth for the
service's behavior, so they interpret the model in all its strictness.

The Smithy spec defines 8 constraint traits:

* [`enum`],
* [`idRef`],
* [`length`],
* [`pattern`],
* [`private`],
* [`range`],
* [`required`]; and
* [`uniqueItems`].

The `idRef` and `private` traits are enforced at SDK generation time by the
[`awslabs/smithy`] libraries and bear no relation to generated Rust code.

The only constraint trait enforcement that is generated by smithy-rs clients
should be and is the `enum` trait, which renders Rust `enum`s.

The `required` trait is already and only enforced by smithy-rs servers since
[#1148](https://github.com/awslabs/smithy-rs/pull/1148).

That leaves 4 traits: `length`, `pattern`, `range`, and `uniqueItems`.

[Constraint traits]: https://awslabs.github.io/smithy/1.0/spec/core/constraint-traits.html
[awslabs/smithy#1039]: https://github.com/awslabs/smithy/issues/1039
[`enum`]: https://awslabs.github.io/smithy/1.0/spec/core/constraint-traits.html#enum-trait
[`idRef`]: https://awslabs.github.io/smithy/1.0/spec/core/constraint-traits.html#idref-trait
[`length`]: https://awslabs.github.io/smithy/1.0/spec/core/constraint-traits.html#length-trait
[`pattern`]: https://awslabs.github.io/smithy/1.0/spec/core/constraint-traits.html#pattern-trait
[`private`]: https://awslabs.github.io/smithy/1.0/spec/core/constraint-traits.html#private-trait
[`range`]: https://awslabs.github.io/smithy/1.0/spec/core/constraint-traits.html#range-trait
[`required`]: https://awslabs.github.io/smithy/1.0/spec/core/constraint-traits.html#required-trait
[`uniqueItems`]: https://awslabs.github.io/smithy/1.0/spec/core/constraint-traits.html#uniqueItems-trait
[`awslabs/smithy`]: https://github.com/awslabs/smithy

Implementation
--------------

This section addresses how to implement and enforce the `length`, `pattern`,
`range`, and `uniqueItems` traits. We will use the `length` trait applied to a
`string` shape as a running example. The implementation of this trait mostly
carries over to the other three.

### Example implementation for the `length` trait

Consider the following Smithy model:

```smithy
@length(min: 1, max: 69)
string NiceString
```

The central idea to the implementation of constraint traits is: [parse, don't
validate]. Instead of code-generating a Rust `String` to represent `NiceString`
values and perform the _validation_ at request deserialization, we can
[leverage Rust's type system to guarantee domain invariants]. We can generate a
wrapper [tuple struct] that _parses_ the string's value and is "tight" in the
set of values it can accept:

```rust
pub struct NiceString(String);

impl TryFrom<String> for NiceString {
    type Error = nice_string::ConstraintViolation;

    fn try_from(value: String) -> Result<Self, Self::Error> {
        let num_code_points = value.chars().count();
        if 1 <= num_code_points && num_code_points <= 69 {
            Ok(Self(value))
        } else {
            Err(nice_string::ConstraintViolation::Length(num_code_points))
        }
    }
}
```

(Note that we're using the _linear_ time check `chars().count()` instead of
`len()` on the input value, since the Smithy specification says the `length`
trait counts [the number of _Unicode code points_ when applied to string
shapes].)

The goal is to enforce, at the type-system level, that these constrained
structs always hold valid data. It should be impossible for the service
implementer, without resorting to `unsafe` Rust, to construct a `NiceString`
that violates the model. The actual check is performed in the implementation of
[`TryFrom`]`<InnerType>` for the generated struct, which makes it convenient to use
the [`?` operator for error propagation]. Each constrained struct will have a
related `std::error::Error` enum type to signal the _first_ parsing failure,
with one enum variant per applied constraint trait:

```rust
pub mod nice_string {
    pub enum ConstraintViolation {
        /// Validation error holding the number of Unicode code points found, when a value between `1` and
        /// `69` (inclusive) was expected.
        Length(usize),
    }

    impl std::error::Error for ConstraintViolation {}
}
```

[`std::error::Error`] requires [`Display`] and [`Debug`]. We will
`#[derive(Debug)]`, unless the shape also has the [`sensitive` trait], in which
case we will just print the name of the struct:

```rust
impl std::fmt::Debug for ConstraintViolation {
    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
        let mut formatter = f.debug_struct("ConstraintViolation");
        formatter.finish()
    }
}
```

`Display` is used to produce human-friendlier representations. Its
implementation might be called when formatting a [400 HTTP response message] in
certain protocols, for example.

[parse, don't validate]: https://lexi-lambda.github.io/blog/2019/11/05/parse-don-t-validate/
[leverage Rust's type system to guarantee domain invariants]: https://www.lpalmieri.com/posts/2020-12-11-zero-to-production-6-domain-modelling/
[tuple struct]: https://doc.rust-lang.org/1.9.0/book/structs.html#tuple-structs
[the number of _Unicode code points_ when applied to string shapes]: https://awslabs.github.io/smithy/1.0/spec/core/constraint-traits.html#length-trait
[`?` operator for error propagation]: https://doc.rust-lang.org/book/ch09-02-recoverable-errors-with-result.html#a-shortcut-for-propagating-errors-the--operator
[`std::error::Error`]: https://doc.rust-lang.org/nightly/std/error/trait.Error.html
[`sensitive` trait]: https://awslabs.github.io/smithy/1.0/spec/core/documentation-traits.html#sensitive-trait
[400 HTTP response message]: https://developer.mozilla.org/en-US/docs/web/http/status/400
[`TryFrom`]: https://doc.rust-lang.org/std/convert/trait.TryFrom.html
[`Display`]: https://doc.rust-lang.org/std/fmt/trait.Display.html
[`Debug`]: https://doc.rust-lang.org/std/fmt/trait.Debug.html

### Request deserialization

We will continue to deserialize the different parts of the HTTP message into
the regular Rust standard library types. However, just before the
deserialization function returns, we will convert the type into the wrapper
tuple struct that will eventually be handed over to the operation handler. This
is what we're _already_ doing when deserializing strings into `enums`. For
example, given the Smithy model:

```smithy
@enum([
    { name: "Spanish", value: "es" },
    { name: "English", value: "en" },
    { name: "Japanese", value: "jp" },
])
string Language
```

the code the client generates when deserializing a string from a JSON document
into the `Language` enum is (excerpt):

```rust
...
match key.to_unescaped()?.as_ref() {
    "language" => {
        builder = builder.set_language(
            aws_smithy_json::deserialize::token::expect_string_or_null(
                tokens.next(),
            )?
            .map(|s| {
                s.to_unescaped()
                    .map(|u| crate::model::Language::from(u.as_ref()))
            })
            .transpose()?,
        );
    }
    _ => aws_smithy_json::deserialize::token::skip_value(tokens)?,
}
...
```

Note how the `String` gets converted to the enum via `Language::from()`.

```rust
impl std::convert::From<&str> for Language {
    fn from(s: &str) -> Self {
        match s {
            "es" => Language::Spanish,
            "en" => Language::English,
            "jp" => Language::Japanese,
            other => Language::Unknown(other.to_owned()),
        }
    }
}
```

For constrained shapes we would do the same to parse the inner deserialized
value into the wrapper tuple struct, except for these differences:

1. For enums, the client generates an `Unknown` variant that "contains new
   variants that have been added since this code was generated". The server
   does not need such a variant ([#1187]).
2. Conversions into the tuple struct are fallible (`try_from()` instead of
   `from()`). These errors will result in a `my_struct::ConstraintViolation`.

`length` trait
--------------

We will enforce the length constraint by calling `len()` on Rust's `Vec`
(`list` and `set` shapes), `HashMap` (`map` shapes) and our
[`aws_smithy_types::Blob`] (`bytes` shapes).

We will enforce the length constraint trait on `String` (`string` shapes) by
calling `.chars().count()`.

`pattern` trait
---------------

The [`pattern` trait]

> restricts string shape values to a specified regular expression.

We will implement this by using the `regex`'s crate [`is_match`]. We will use
[`once_cell`] to compile the regex only the first time it is required.

`uniqueItems` trait
-------------------

The [`uniqueItems` trait]

> indicates that the items in a [`List`] MUST be unique.

If the list shape is `sparse`, more than one `null` value violates this
constraint.

We will enforce this by copying _references_ to the `Vec`'s elements into a
`HashSet` and checking that the sizes of both containers coincide.

[`Eq`]: https://doc.rust-lang.org/std/cmp/trait.Eq.html
[even if its members are not equipped with an equivalence relation]: https://github.com/awslabs/smithy/issues/1093
[`List`]: https://awslabs.github.io/smithy/1.0/spec/core/model.html#list
[`uniqueItems` trait]: https://awslabs.github.io/smithy/1.0/spec/core/constraint-traits.html#uniqueitems-trait
[`is_match`]: https://docs.rs/regex/latest/regex/struct.Regex.html#method.is_match
[`pattern` trait]: https://awslabs.github.io/smithy/1.0/spec/core/constraint-traits.html#pattern-trait
[`aws_smithy_types::Blob`]: https://docs.rs/aws-smithy-types/latest/aws_smithy_types/struct.Blob.html
[`aws_smithy_http_server::rejection::SmithyRejection`]: https://docs.rs/aws-smithy-http-server/latest/aws_smithy_http_server/rejection/enum.SmithyRejection.html
[`aws_smithy_http_server::rejection::Deserialize`]: https://docs.rs/aws-smithy-http-server/latest/aws_smithy_http_server/rejection/struct.Deserialize.html
[#1187]: https://github.com/awslabs/smithy-rs/issues/1187
[`once_cell`]: https://docs.rs/once_cell/latest/once_cell/

Trait precedence and naming of the tuple struct
-----------------------------------------------

From [the spec]:

> Some constraints can be applied to shapes as well as structure members. If a
> constraint of the same type is applied to a structure member and the shape
> that the member targets, the trait applied to the member takes precedence.

```smithy
structure ShoppingCart {
    @range(min: 7, max:12)
    numberOfItems: PositiveInteger
}

@range(min: 1)
integer PositiveInteger
```

In the above example,

> the range trait applied to numberOfItems takes precedence over the one
> applied to PositiveInteger. The resolved minimum will be 7, and the maximum
> 12.

When the constraint trait is applied to a member shape, the tuple struct's name
will be the PascalCased name of the member shape, `NumberOfItems`.

[the spec]: https://awslabs.github.io/smithy/1.0/spec/core/constraint-traits.html?highlight=enum#trait-precedence

Unresolved questions
--------------------

1. Should we code-generate unsigned integer types (`u16`, `u32`, `u64`) when
   the `range` trait is applied with `min` set to a value greater than or equal
   to 0?
    - A user has even suggested to use the `std::num::NonZeroUX` types (e.g.
      [`NonZeroU64`]) when `range` is applied with `min` set to a value greater
      than 0.
    - UPDATE: This requires further design work. There are interoperability
      concerns: for example, the positive range of a
      `u32` is strictly greater than that of an `i32`, so clients wouldn't be
      able to receive values within the non-overlapping range.
2. In request deserialization, should we fail with the first violation and
   immediately render a response, or attempt to parse the entire request and
   provide a complete and structured report?
    - UPDATE: We will provide a response containing all violations. See the
      "Collecting Constraint Violations" section in the [Better Constraint
      Violations] RFC.
3. Should we provide a mechanism for the service implementer to construct a
   Rust type violating the modeled constraints in their business logic e.g. a
   `T::new_unchecked()` constructor? This could be useful (1) when the user
   _knows_ the provided inner value does not violate the constraints and
   doesn't want to incur the performance penalty of the check; (2) when the
   struct is in a transient invalid state. However:
    - (2) is arguably a modelling mistake and a separate struct to represent
      the transient state would be a better approach,
    - the user could use `unsafe` Rust to bypass the validation; and
    - adding this constructor is a backwards-compatible change, so it can
      always be added later if this feature is requested.
    - UPDATE: We decided to punt on this until users express interest.

[`NonZeroU64`]: https://doc.rust-lang.org/std/num/struct.NonZeroU64.html
[Better Constraint Violations]: https://github.com/awslabs/smithy-rs/pull/2040

Alternative design
------------------

An alternative design with less public API surface would be to perform
constraint validation at request deserialization, but hand over a regular
"loose" type (e.g. `String` instead of `NiceString`) that allows for values
violating the constraints. If we were to implement this approach, we can
implement it by wrapping the incoming value in the aforementioned tuple struct
to perform the validation, and immediately unwrap it.

Comparative advantages:

* Validation remains an internal detail of the framework. If the semantics of a
  constraint trait change, the behavior of the service is still
  backwards-incompatibly affected, but user code is not.
* Less "invasive". Baking validation in the generated type might be deemed as
  the service framework overreaching responsibilities.

Comparative disadvantages:

* It becomes _possible_ to send responses with invalid operation outputs. All
  the service framework could do is log the validation errors.
* Baking validation at the type-system level gets rid of an entire class of
  logic errors.
* Less idiomatic (this is subjective). The pattern of wrapping a more primitive
  type to guarantee domain invariants is widespread in the Rust ecosystem. The
  standard library makes use of it extensively.

Note that _both_ designs are backwards incompatible in the sense that you can't
migrate from one to the other without breaking user code.

UPDATE: We ended up implementing both designs, adding a flag to opt into the
alternative design. Refer to the mentions of the `publicConstrainedTypes` flag
in the description of the [Builders of builders PR][builders-of-builders].