ftl: Initial headers (e9a236d2) · Commits · Public Repositories / spdk

doc/Doxyfile

+1 −0

Original line number	Diff line number	Diff line
		@@ -811,6 +811,7 @@ INPUT += \
		concurrency.md \
		directory_structure.md \
		event.md \
		ftl.md \
		getting_started.md \
		ioat.md \
		iscsi.md \

doc/ftl.md

0 → 100644

+134 −0

Original line number	Diff line number	Diff line
		# Flash Translation Layer {#ftl}

		The Flash Translation Layer library provides block device access on top of non-block SSDs
		implementing Open Channel interface. It handles the logical to physical address mapping, responds to
		the asynchronous media management events, and manages the defragmentation process.

		# Terminology {#ftl_terminology}

		## Logical to physical address map

		* Shorthand: L2P

		Contains the mapping of the logical addresses (LBA) to their on-disk physical location (PPA). The
		LBAs are contiguous and in range from 0 to the number of surfaced blocks (the number of spare blocks
		are calculated during device formation and are subtracted from the available address space). The
		spare blocks account for chunks going offline throughout the lifespan of the device as well as
		provide necessary buffer for data [defragmentation](#ftl_reloc).

		## Band {#ftl_band}

		Band describes a collection of chunks, each belonging to a different parallel unit. All writes to
		the band follow the same pattern - a batch of logical blocks is written to one chunk, another batch
		to the next one and so on. This ensures the parallelism of the write operations, as they can be
		executed independently on a different chunks. Each band keeps track of the LBAs it consists of, as
		well as their validity, as some of the data will be invalidated by subsequent writes to the same
		logical address. The L2P mapping can be restored from the SSD by reading this information in order
		from the oldest band to the youngest.

		+--------------+ +--------------+ +--------------+
		band 1 \| chunk 1 +--------+ chk 1 +---- --- --- --- --- ---+ chk 1 \|
		+--------------+ +--------------+ +--------------+
		band 2 \| chunk 2 +--------+ chk 2 +---- --- --- --- --- ---+ chk 2 \|
		+--------------+ +--------------+ +--------------+
		band 3 \| chunk 3 +--------+ chk 3 +---- --- --- --- --- ---+ chk 3 \|
		+--------------+ +--------------+ +--------------+
		\| ... \| \| ... \| \| ... \|
		+--------------+ +--------------+ +--------------+
		band m \| chunk m +--------+ chk m +---- --- --- --- --- ---+ chk m \|
		+--------------+ +--------------+ +--------------+
		\| ... \| \| ... \| \| ... \|
		+--------------+ +--------------+ +--------------+

		parallel unit 1 pu 2 pu n

		The address map and valid map are, along with a several other things (e.g. UUID of the device it's
		part of, number of surfaced LBAs, band's sequence number, etc.), parts of the band's metadata. The
		metadata is split in two parts:
		* the head part, containing information already known when opening the band (device's UUID, band's
		sequence number, etc.), located at the beginning blocks of the band,
		* the tail part, containing the address map and the valid map, located at the end of the band.


		head metadata band's data tail metadata
		+-------------------+-------------------------------+----------------------+
		\|chk 1\|...\|chk n\|...\|...\|chk 1\|...\| \| ... \|chk m-1 \|chk m\|
		\|lbk 1\| \|lbk 1\| \| \|lbk x\| \| \| \|lblk y \|lblk y\|
		+-------------------+-------------+-----------------+----------------------+


		Bands are being written sequentially (in a way that was described earlier). Before a band can be
		written to, all of its chunks need to be erased. During that time, the band is considered to be in a
		`PREP` state. After that is done, the band transitions to the `OPENING` state, in which head metadata
		is being written. Then the band moves to the `OPEN` state and actual user data can be written to the
		band. Once the whole available space is filled, tail metadata is written and the band transitions to
		`CLOSING` state. When that finishes the band becomes `CLOSED`.

		## Ring write buffer {#ftl_rwb}

		* Shorthand: RWB

		Because the smallest write size the SSD may support can be a multiple of block size, in order to
		support writes to a single block, the data needs to be buffered. The write buffer is the solution to
		this problem. It consists of a number of pre-allocated buffers called batches, each of size allowing
		for a single transfer to the SSD. A single batch is divided into block-sized buffer entries.

		write buffer
		+-----------------------------------+
		\|batch 1 \|
		\| +-----------------------------+ \|
		\| \|rwb \|rwb \| ... \|rwb \| \|
		\| \|entry 1\|entry 2\| \|entry n\| \|
		\| +-----------------------------+ \|
		+-----------------------------------+
		\| ... \|
		+-----------------------------------+
		\|batch m \|
		\| +-----------------------------+ \|
		\| \|rwb \|rwb \| ... \|rwb \| \|
		\| \|entry 1\|entry 2\| \|entry n\| \|
		\| +-----------------------------+ \|
		+-----------------------------------+

		When a write is scheduled, it needs to acquire an entry for each of its blocks and copy the data
		onto this buffer. Once all blocks are copied, the write can be signalled as completed to the user.
		In the meantime, the `rwb` is polled for filled batches and, if one is found, it's sent to the SSD.
		After that operation is completed the whole batch can be freed. For the whole time the data is in
		the `rwb`, the L2P points at the buffer entry instead of a location on the SSD. This allows for
		servicing read requests from the buffer.

		## Defragmentation and relocation {#ftl_reloc}

		* Shorthand: defrag, reloc

		Since a write to the same LBA invalidates its previous physical location, some of the blocks on a
		band might contain old data that basically wastes space. As there is no way to overwrite an already
		written block, this data will stay there until the whole chunk is reset. This might create a
		situation in which all of the bands contain some valid data and no band can be erased, so no writes
		can be executed anymore. Therefore a mechanism is needed to move valid data and invalidate whole
		bands, so that they can be reused.

		band band
		+-----------------------------------+ +-----------------------------------+
		\| ** * * *** * *** * * \| \| \|
		\|** * * * * * * *\| +----> \| \|
		\|* *** * * * \| \| \|
		+-----------------------------------+ +-----------------------------------+

		Valid blocks are marked with an asterisk '\*'.

		Another reason for data relocation might be an event from the SSD telling us that the data might
		become corrupt if it's not relocated. This might happen due to its old age (if it was written a
		long time ago) or due to read disturb (media characteristic, that causes corruption of neighbouring
		blocks during a read operation).

		Module responsible for data relocation is called `reloc`. When a band is chosen for defragmentation
		or an ANM (asynchronous NAND management) event is received, the appropriate blocks are marked as
		required to be moved. The `reloc` module takes a band that has some of such blocks marked, checks
		their validity and, if they're still valid, copies them.

		Choosing a band for defragmentation depends on several factors: its valid ratio (1) (proportion of
		valid blocks to all user blocks), its age (2) (when was it written) and its write count / wear level
		index of its chunks (3) (how many times the band was written to). The lower the ratio (1), the
		higher its age (2) and the lower its write count (3), the higher the chance the band will be chosen
		for defrag.

doc/prog_guides.md

+1 −0

Original line number	Diff line number	Diff line
		@@ -5,3 +5,4 @@
		- @subpage bdev_pg
		- @subpage bdev_module
		- @subpage nvmf_tgt_pg
		- @subpage ftl

include/spdk/ftl.h

0 → 100644

+267 −0

Original line number	Diff line number	Diff line
		/*-
		* BSD LICENSE
		*
		* Copyright (c) Intel Corporation.
		* All rights reserved.
		*
		* Redistribution and use in source and binary forms, with or without
		* modification, are permitted provided that the following conditions
		* are met:
		*
		* * Redistributions of source code must retain the above copyright
		* notice, this list of conditions and the following disclaimer.
		* * Redistributions in binary form must reproduce the above copyright
		* notice, this list of conditions and the following disclaimer in
		* the documentation and/or other materials provided with the
		* distribution.
		* * Neither the name of Intel Corporation nor the names of its
		* contributors may be used to endorse or promote products derived
		* from this software without specific prior written permission.
		*
		* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
		* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
		* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
		* A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
		* OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
		* SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
		* LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
		* DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
		* THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
		* (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
		* OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
		*/

		#ifndef SPDK_FTL_H
		#define SPDK_FTL_H

		#include <spdk/stdinc.h>
		#include <spdk/nvme.h>
		#include <spdk/nvme_ocssd.h>
		#include <spdk/uuid.h>
		#include <spdk/thread.h>

		struct spdk_ftl_dev;

		/* Limit thresholds */
		enum {
		SPDK_FTL_LIMIT_CRIT,
		SPDK_FTL_LIMIT_HIGH,
		SPDK_FTL_LIMIT_LOW,
		SPDK_FTL_LIMIT_START,
		SPDK_FTL_LIMIT_MAX
		};

		struct spdk_ftl_limit {
		/* Threshold from which the limiting starts */
		size_t thld;

		/* Limit percentage */
		size_t limit;
		};

		struct spdk_ftl_conf {
		/* Number of reserved addresses not exposed to the user */
		size_t lba_rsvd;

		/* Write buffer size */
		size_t rwb_size;

		/* Threshold for opening new band */
		size_t band_thld;

		/* Trace enabled flag */
		int trace;

		/* Trace file name */
		const char *trace_path;

		/* Maximum IO depth per band relocate */
		size_t max_reloc_qdepth;

		/* Maximum active band relocates */
		size_t max_active_relocs;

		/* IO pool size per user thread */
		size_t user_io_pool_size;

		struct {
		/* Lowest percentage of invalid lbks for a band to be defragged */
		size_t invalid_thld;

		/* User writes limits */
		struct spdk_ftl_limit limits[SPDK_FTL_LIMIT_MAX];
		} defrag;
		};

		/* Range of parallel units (inclusive) */
		struct spdk_ftl_punit_range {
		unsigned int begin;
		unsigned int end;
		};

		enum spdk_ftl_mode {
		/* Create new device */
		SPDK_FTL_MODE_CREATE = (1 << 0),
		};

		struct spdk_ftl_dev_init_opts {
		/* NVMe controller */
		struct spdk_nvme_ctrlr *ctrlr;
		/* Controller's transport ID */
		struct spdk_nvme_transport_id trid;

		/* Thread responsible for core tasks execution */
		struct spdk_thread *core_thread;
		/* Thread responsible for read requests */
		struct spdk_thread *read_thread;

		/* Device's config */
		struct spdk_ftl_conf *conf;
		/* Device's name */
		const char *name;
		/* Parallel unit range */
		struct spdk_ftl_punit_range range;
		/* Mode flags */
		unsigned int mode;
		/* Device UUID (valid when restoring device from disk) */
		struct spdk_uuid uuid;
		};

		struct spdk_ftl_attrs {
		/* Device's UUID */
		struct spdk_uuid uuid;
		/* Parallel unit range */
		struct spdk_ftl_punit_range range;
		/* Number of logical blocks */
		uint64_t lbk_cnt;
		/* Logical block size */
		size_t lbk_size;
		};

		struct ftl_module_init_opts {
		/* Thread on which to poll for ANM events */
		struct spdk_thread *anm_thread;
		};

		typedef void (spdk_ftl_fn)(void , int);
		typedef void (spdk_ftl_init_fn)(struct spdk_ftl_dev , void *, int);

		/**
		* Initialize the FTL module.
		*
		* \param opts module configuration
		* \param cb callback function to call when the module is initialized
		* \param cb_arg callback's argument
		*
		* \return 0 if successfully started initialization, negative values if
		* resources could not be allocated.
		*/
		int spdk_ftl_module_init(const struct ftl_module_init_opts opts, spdk_ftl_fn cb, void cb_arg);

		/**
		* Deinitialize the FTL module. All FTL devices have to be unregistered prior to
		* calling this function.
		*
		* \param cb callback function to call when the deinitialization is completed
		* \param cb_arg callback's argument
		*
		* \return 0 if successfully scheduled deinitialization, negative errno
		* otherwise.
		*/
		int spdk_ftl_module_fini(spdk_ftl_fn cb, void *cb_arg);

		/**
		* Initialize the FTL on given NVMe device and parallel unit range.
		*
		* Covers the following:
		* - initialize and register NVMe ctrlr,
		* - retrieve geometry and check if the device has proper configuration,
		* - allocate buffers and resources,
		* - initialize internal structures,
		* - initialize internal thread(s),
		* - restore or create L2P table.
		*
		* \param opts configuration for new device
		* \param cb callback function to call when the device is created
		* \param cb_arg callback's argument
		*
		* \return 0 if initialization was started successfully, negative errno otherwise.
		*/
		int spdk_ftl_dev_init(const struct spdk_ftl_dev_init_opts opts, spdk_ftl_init_fn cb, void cb_arg);

		/**
		* Deinitialize and free given device.
		*
		* \param dev device
		* \param cb callback function to call when the device is freed
		* \param cb_arg callback's argument
		*
		* \return 0 if successfully scheduled free, negative errno otherwise.
		*/
		int spdk_ftl_dev_free(struct spdk_ftl_dev dev, spdk_ftl_fn cb, void cb_arg);

		/**
		* Initialize FTL configuration structure with default values.
		*
		* \param conf FTL configuration to initialize
		*/
		void spdk_ftl_conf_init_defaults(struct spdk_ftl_conf *conf);

		/**
		* Retrieve device’s attributes.
		*
		* \param dev device
		* \param attr Attribute structure to fill
		*
		* \return 0 if successfully initialized, negated EINVAL otherwise.
		*/
		int spdk_ftl_dev_get_attrs(const struct spdk_ftl_dev dev, struct spdk_ftl_attrs attr);

		/**
		* Submits a read to the specified device.
		*
		* \param dev Device
		* \param ch I/O channel
		* \param lba Starting LBA to read the data
		* \param lba_cnt Number of sectors to read
		* \param iov Single IO vector or pointer to IO vector table
		* \param iov_cnt Number of IO vectors
		* \param cb_fn Callback function to invoke when the I/O is completed
		* \param cb_arg Argument to pass to the callback function
		*
		* \return 0 if successfully submitted, negated EINVAL otherwise.
		*/
		int spdk_ftl_read(struct spdk_ftl_dev dev, struct spdk_io_channel ch, uint64_t lba,
		size_t lba_cnt,
		struct iovec iov, size_t iov_cnt, spdk_ftl_fn cb_fn, void cb_arg);

		/**
		* Submits a write to the specified device.
		*
		* \param dev Device
		* \param ch I/O channel
		* \param lba Starting LBA to write the data
		* \param lba_cnt Number of sectors to write
		* \param iov Single IO vector or pointer to IO vector table
		* \param iov_cnt Number of IO vectors
		* \param cb_fn Callback function to invoke when the I/O is completed
		* \param cb_arg Argument to pass to the callback function
		*
		* \return 0 if successfully submitted, negative values otherwise.
		*/
		int spdk_ftl_write(struct spdk_ftl_dev dev, struct spdk_io_channel ch, uint64_t lba,
		size_t lba_cnt,
		struct iovec iov, size_t iov_cnt, spdk_ftl_fn cb_fn, void cb_arg);

		/**
		* Submits a flush request to the specified device.
		*
		* \param dev device
		* \param cb_fn Callback function to invoke when all prior IOs have been completed
		* \param cb_arg Argument to pass to the callback function
		*
		* \return 0 if successfully submitted, negated EINVAL or ENOMEM otherwise.
		*/
		int spdk_ftl_flush(struct spdk_ftl_dev dev, spdk_ftl_fn cb_fn, void cb_arg);

		#endif /* SPDK_FTL_H */

lib/ftl/ftl_core.h

0 → 100644

+434 −0

File added.

Preview size limit exceeded, changes collapsed.