Add server SDK constraint traits RFC (#1199) (7c86ecfc) · Commits · Public Repositories / Smithy Rust

design/src/rfcs/overview.md

+1 −0

Original line number	Diff line number	Diff line
		@@ -34,3 +34,4 @@
		- [RFC-0022: Error Context and Compatibility](./rfc0022_error_context_and_compatibility.md)
		- [RFC-0023: Evolving the new service builder API](./rfc0023_refine_builder.md)
		- [RFC-0024: RequestID](./rfc0024_request_id.md)
		- [RFC-0025: Constraint traits](./rfc0025_constraint_traits.md)

design/src/rfcs/rfc0025_constraint_traits.md

0 → 100644

+397 −0

Original line number	Diff line number	Diff line
		RFC: Constraint traits
		======================

		> Status: Implemented.
		>
		> See the description of [the PR][builders-of-builders] that laid the
		> foundation for the implementation of constraint traits for a complete
		> reference. See the [Better Constraint Violations] RFC too for subsequent
		> improvements to this design.
		>
		> See the uber [tracking issue] for pending work.

		[builders-of-builders]: https://github.com/awslabs/smithy-rs/pull/1342
		[tracking issue]: https://github.com/awslabs/smithy-rs/issues/1401

		[Constraint traits] are used to constrain the values that can be provided for a
		shape.

		For example, given the following Smithy model,

		```smithy
		@length(min: 18)
		Integer Age
		```

		the integer `Age` must take values greater than or equal to 18.

		Constraint traits are most useful when enforced as part of _input_ model
		validation to a service. When a server receives a request whose contents
		deserialize to input data that violates the modeled constraints, the operation
		execution's preconditions are not met, and as such rejecting the request
		without executing the operation is expected behavior.

		Constraint traits can also be applied to operation output member shapes, but
		the expectation is that service implementations _not fail_ to render a response
		when an output value does not meet the specified constraints. From
		[awslabs/smithy#1039]:

		> This might seem counterintuitive, but our philosophy is that a change in
		> server-side state should not be hidden from the caller unless absolutely
		> necessary. Refusing to service an invalid request should always prevent
		> server-side state changes, but refusing to send a response will not, as
		> there's generally no reasonable route for a server implementation to unwind
		> state changes due to a response serialization failure.

		_In general_, clients should not enforce constraint traits in generated code.
		Clients must also never enforce constraint traits when sending requests. This
		is because:

		* addition and removal of constraint traits are backwards-compatible from a
		client's perspective (although this is _not_ documented anywhere in the
		Smithy specification),
		* the client may have been generated with an older version of the model; and
		* the most recent model version might have lifted some constraints.

		On the other hand, server SDKs constitute _the_ source of truth for the
		service's behavior, so they interpret the model in all its strictness.

		The Smithy spec defines 8 constraint traits:

		* [`enum`],
		* [`idRef`],
		* [`length`],
		* [`pattern`],
		* [`private`],
		* [`range`],
		* [`required`]; and
		* [`uniqueItems`].

		The `idRef` and `private` traits are enforced at SDK generation time by the
		[`awslabs/smithy`] libraries and bear no relation to generated Rust code.

		The only constraint trait enforcement that is generated by smithy-rs clients
		should be and is the `enum` trait, which renders Rust `enum`s.

		The `required` trait is already and only enforced by smithy-rs servers since
		[#1148](https://github.com/awslabs/smithy-rs/pull/1148).

		That leaves 4 traits: `length`, `pattern`, `range`, and `uniqueItems`.

		[Constraint traits]: https://awslabs.github.io/smithy/1.0/spec/core/constraint-traits.html
		[awslabs/smithy#1039]: https://github.com/awslabs/smithy/issues/1039
		[`enum`]: https://awslabs.github.io/smithy/1.0/spec/core/constraint-traits.html#enum-trait
		[`idRef`]: https://awslabs.github.io/smithy/1.0/spec/core/constraint-traits.html#idref-trait
		[`length`]: https://awslabs.github.io/smithy/1.0/spec/core/constraint-traits.html#length-trait
		[`pattern`]: https://awslabs.github.io/smithy/1.0/spec/core/constraint-traits.html#pattern-trait
		[`private`]: https://awslabs.github.io/smithy/1.0/spec/core/constraint-traits.html#private-trait
		[`range`]: https://awslabs.github.io/smithy/1.0/spec/core/constraint-traits.html#range-trait
		[`required`]: https://awslabs.github.io/smithy/1.0/spec/core/constraint-traits.html#required-trait
		[`uniqueItems`]: https://awslabs.github.io/smithy/1.0/spec/core/constraint-traits.html#uniqueItems-trait
		[`awslabs/smithy`]: https://github.com/awslabs/smithy

		Implementation
		--------------

		This section addresses how to implement and enforce the `length`, `pattern`,
		`range`, and `uniqueItems` traits. We will use the `length` trait applied to a
		`string` shape as a running example. The implementation of this trait mostly
		carries over to the other three.

		### Example implementation for the `length` trait

		Consider the following Smithy model:

		```smithy
		@length(min: 1, max: 69)
		string NiceString
		```

		The central idea to the implementation of constraint traits is: [parse, don't
		validate]. Instead of code-generating a Rust `String` to represent `NiceString`
		values and perform the _validation_ at request deserialization, we can
		[leverage Rust's type system to guarantee domain invariants]. We can generate a
		wrapper [tuple struct] that _parses_ the string's value and is "tight" in the
		set of values it can accept:

		```rust
		pub struct NiceString(String);

		impl TryFrom<String> for NiceString {
		type Error = nice_string::ConstraintViolation;

		fn try_from(value: String) -> Result<Self, Self::Error> {
		let num_code_points = value.chars().count();
		if 1 <= num_code_points && num_code_points <= 69 {
		Ok(Self(value))
		} else {
		Err(nice_string::ConstraintViolation::Length(num_code_points))
		}
		}
		}
		```

		(Note that we're using the _linear_ time check `chars().count()` instead of
		`len()` on the input value, since the Smithy specification says the `length`
		trait counts [the number of _Unicode code points_ when applied to string
		shapes].)

		The goal is to enforce, at the type-system level, that these constrained
		structs always hold valid data. It should be impossible for the service
		implementer, without resorting to `unsafe` Rust, to construct a `NiceString`
		that violates the model. The actual check is performed in the implementation of
		[`TryFrom`]`<InnerType>` for the generated struct, which makes it convenient to use
		the [`?` operator for error propagation]. Each constrained struct will have a
		related `std::error::Error` enum type to signal the _first_ parsing failure,
		with one enum variant per applied constraint trait:

		```rust
		pub mod nice_string {
		pub enum ConstraintViolation {
		/// Validation error holding the number of Unicode code points found, when a value between `1` and
		/// `69` (inclusive) was expected.
		Length(usize),
		}

		impl std::error::Error for ConstraintViolation {}
		}
		```

		[`std::error::Error`] requires [`Display`] and [`Debug`]. We will
		`#[derive(Debug)]`, unless the shape also has the [`sensitive` trait], in which
		case we will just print the name of the struct:

		```rust
		impl std::fmt::Debug for ConstraintViolation {
		fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
		let mut formatter = f.debug_struct("ConstraintViolation");
		formatter.finish()
		}
		}
		```

		`Display` is used to produce human-friendlier representations. Its
		implementation might be called when formatting a [400 HTTP response message] in
		certain protocols, for example.

		[parse, don't validate]: https://lexi-lambda.github.io/blog/2019/11/05/parse-don-t-validate/
		[leverage Rust's type system to guarantee domain invariants]: https://www.lpalmieri.com/posts/2020-12-11-zero-to-production-6-domain-modelling/
		[tuple struct]: https://doc.rust-lang.org/1.9.0/book/structs.html#tuple-structs
		[the number of _Unicode code points_ when applied to string shapes]: https://awslabs.github.io/smithy/1.0/spec/core/constraint-traits.html#length-trait
		[`?` operator for error propagation]: https://doc.rust-lang.org/book/ch09-02-recoverable-errors-with-result.html#a-shortcut-for-propagating-errors-the--operator
		[`std::error::Error`]: https://doc.rust-lang.org/nightly/std/error/trait.Error.html
		[`sensitive` trait]: https://awslabs.github.io/smithy/1.0/spec/core/documentation-traits.html#sensitive-trait
		[400 HTTP response message]: https://developer.mozilla.org/en-US/docs/web/http/status/400
		[`TryFrom`]: https://doc.rust-lang.org/std/convert/trait.TryFrom.html
		[`Display`]: https://doc.rust-lang.org/std/fmt/trait.Display.html
		[`Debug`]: https://doc.rust-lang.org/std/fmt/trait.Debug.html

		### Request deserialization

		We will continue to deserialize the different parts of the HTTP message into
		the regular Rust standard library types. However, just before the
		deserialization function returns, we will convert the type into the wrapper
		tuple struct that will eventually be handed over to the operation handler. This
		is what we're _already_ doing when deserializing strings into `enums`. For
		example, given the Smithy model:

		```smithy
		@enum([
		{ name: "Spanish", value: "es" },
		{ name: "English", value: "en" },
		{ name: "Japanese", value: "jp" },
		])
		string Language
		```

		the code the client generates when deserializing a string from a JSON document
		into the `Language` enum is (excerpt):

		```rust
		...
		match key.to_unescaped()?.as_ref() {
		"language" => {
		builder = builder.set_language(
		aws_smithy_json::deserialize::token::expect_string_or_null(
		tokens.next(),
		)?
		.map(\|s\| {
		s.to_unescaped()
		.map(\|u\| crate::model::Language::from(u.as_ref()))
		})
		.transpose()?,
		);
		}
		_ => aws_smithy_json::deserialize::token::skip_value(tokens)?,
		}
		...
		```

		Note how the `String` gets converted to the enum via `Language::from()`.

		```rust
		impl std::convert::From<&str> for Language {
		fn from(s: &str) -> Self {
		match s {
		"es" => Language::Spanish,
		"en" => Language::English,
		"jp" => Language::Japanese,
		other => Language::Unknown(other.to_owned()),
		}
		}
		}
		```

		For constrained shapes we would do the same to parse the inner deserialized
		value into the wrapper tuple struct, except for these differences:

		1. For enums, the client generates an `Unknown` variant that "contains new
		variants that have been added since this code was generated". The server
		does not need such a variant ([#1187]).
		2. Conversions into the tuple struct are fallible (`try_from()` instead of
		`from()`). These errors will result in a `my_struct::ConstraintViolation`.

		`length` trait
		--------------

		We will enforce the length constraint by calling `len()` on Rust's `Vec`
		(`list` and `set` shapes), `HashMap` (`map` shapes) and our
		[`aws_smithy_types::Blob`] (`bytes` shapes).

		We will enforce the length constraint trait on `String` (`string` shapes) by
		calling `.chars().count()`.

		`pattern` trait
		---------------

		The [`pattern` trait]

		> restricts string shape values to a specified regular expression.

		We will implement this by using the `regex`'s crate [`is_match`]. We will use
		[`once_cell`] to compile the regex only the first time it is required.

		`uniqueItems` trait
		-------------------

		The [`uniqueItems` trait]

		> indicates that the items in a [`List`] MUST be unique.

		If the list shape is `sparse`, more than one `null` value violates this
		constraint.

		We will enforce this by copying _references_ to the `Vec`'s elements into a
		`HashSet` and checking that the sizes of both containers coincide.

		[`Eq`]: https://doc.rust-lang.org/std/cmp/trait.Eq.html
		[even if its members are not equipped with an equivalence relation]: https://github.com/awslabs/smithy/issues/1093
		[`List`]: https://awslabs.github.io/smithy/1.0/spec/core/model.html#list
		[`uniqueItems` trait]: https://awslabs.github.io/smithy/1.0/spec/core/constraint-traits.html#uniqueitems-trait
		[`is_match`]: https://docs.rs/regex/latest/regex/struct.Regex.html#method.is_match
		[`pattern` trait]: https://awslabs.github.io/smithy/1.0/spec/core/constraint-traits.html#pattern-trait
		[`aws_smithy_types::Blob`]: https://docs.rs/aws-smithy-types/latest/aws_smithy_types/struct.Blob.html
		[`aws_smithy_http_server::rejection::SmithyRejection`]: https://docs.rs/aws-smithy-http-server/latest/aws_smithy_http_server/rejection/enum.SmithyRejection.html
		[`aws_smithy_http_server::rejection::Deserialize`]: https://docs.rs/aws-smithy-http-server/latest/aws_smithy_http_server/rejection/struct.Deserialize.html
		[#1187]: https://github.com/awslabs/smithy-rs/issues/1187
		[`once_cell`]: https://docs.rs/once_cell/latest/once_cell/

		Trait precedence and naming of the tuple struct
		-----------------------------------------------

		From [the spec]:

		> Some constraints can be applied to shapes as well as structure members. If a
		> constraint of the same type is applied to a structure member and the shape
		> that the member targets, the trait applied to the member takes precedence.

		```smithy
		structure ShoppingCart {
		@range(min: 7, max:12)
		numberOfItems: PositiveInteger
		}

		@range(min: 1)
		integer PositiveInteger
		```

		In the above example,

		> the range trait applied to numberOfItems takes precedence over the one
		> applied to PositiveInteger. The resolved minimum will be 7, and the maximum
		> 12.

		When the constraint trait is applied to a member shape, the tuple struct's name
		will be the PascalCased name of the member shape, `NumberOfItems`.

		[the spec]: https://awslabs.github.io/smithy/1.0/spec/core/constraint-traits.html?highlight=enum#trait-precedence

		Unresolved questions
		--------------------

		1. Should we code-generate unsigned integer types (`u16`, `u32`, `u64`) when
		the `range` trait is applied with `min` set to a value greater than or equal
		to 0?
		- A user has even suggested to use the `std::num::NonZeroUX` types (e.g.
		[`NonZeroU64`]) when `range` is applied with `min` set to a value greater
		than 0.
		- UPDATE: This requires further design work. There are interoperability
		concerns: for example, the positive range of a
		`u32` is strictly greater than that of an `i32`, so clients wouldn't be
		able to receive values within the non-overlapping range.
		2. In request deserialization, should we fail with the first violation and
		immediately render a response, or attempt to parse the entire request and
		provide a complete and structured report?
		- UPDATE: We will provide a response containing all violations. See the
		"Collecting Constraint Violations" section in the [Better Constraint
		Violations] RFC.
		3. Should we provide a mechanism for the service implementer to construct a
		Rust type violating the modeled constraints in their business logic e.g. a
		`T::new_unchecked()` constructor? This could be useful (1) when the user
		_knows_ the provided inner value does not violate the constraints and
		doesn't want to incur the performance penalty of the check; (2) when the
		struct is in a transient invalid state. However:
		- (2) is arguably a modelling mistake and a separate struct to represent
		the transient state would be a better approach,
		- the user could use `unsafe` Rust to bypass the validation; and
		- adding this constructor is a backwards-compatible change, so it can
		always be added later if this feature is requested.
		- UPDATE: We decided to punt on this until users express interest.

		[`NonZeroU64`]: https://doc.rust-lang.org/std/num/struct.NonZeroU64.html
		[Better Constraint Violations]: https://github.com/awslabs/smithy-rs/pull/2040

		Alternative design
		------------------

		An alternative design with less public API surface would be to perform
		constraint validation at request deserialization, but hand over a regular
		"loose" type (e.g. `String` instead of `NiceString`) that allows for values
		violating the constraints. If we were to implement this approach, we can
		implement it by wrapping the incoming value in the aforementioned tuple struct
		to perform the validation, and immediately unwrap it.

		Comparative advantages:

		* Validation remains an internal detail of the framework. If the semantics of a
		constraint trait change, the behavior of the service is still
		backwards-incompatibly affected, but user code is not.
		* Less "invasive". Baking validation in the generated type might be deemed as
		the service framework overreaching responsibilities.

		Comparative disadvantages:

		* It becomes _possible_ to send responses with invalid operation outputs. All
		the service framework could do is log the validation errors.
		* Baking validation at the type-system level gets rid of an entire class of
		logic errors.
		* Less idiomatic (this is subjective). The pattern of wrapping a more primitive
		type to guarantee domain invariants is widespread in the Rust ecosystem. The
		standard library makes use of it extensively.

		Note that _both_ designs are backwards incompatible in the sense that you can't
		migrate from one to the other without breaking user code.

		UPDATE: We ended up implementing both designs, adding a flag to opt into the
		alternative design. Refer to the mentions of the `publicConstrainedTypes` flag
		in the description of the [Builders of builders PR][builders-of-builders].