+173
−51
+1
−1
Loading
Earlier, we used to create multiple threads to process admin commands to cuse devices: one for each NVMe disk and one for each of its namespaces. This resulted in large number of threads in machines with multiple NVMe disks. Moreover most of these threads largely stay idle because they are only responsible to handle admin commands on that cuse device and are not in the IO hot path. Therefore it makes sense to consolidate all these threads into a single thread which handles all the commands. This patch does exactly the same by creating a 'cuse_thread' to receive and process admin commands to all the attached cuse devices. Note that we have created two lists, one with pending session's to be polled on and other with actively polling cuse sessions. The reason for creating two lists is to avoid taking any locks in the poller loop of cuse_thread(). Whenever a new session is to be polled on we add the session to 'g_pending_device_head' and generate an eventfd event by writing on 'g_cuse_thread_msg_fd'. This event is observed by the cuse_thread which then takes 'g_pending_device_mtx' lock and moves the session from 'g_pending_device_head' to 'g_active_device_head' list. In this way we end up taking lock only when we need to add a new session to be polled on. We use 'g_cuse_session_fdgrp' to poll on the fd's of all fuse sessions in 'g_active_device_head' list. List of field modifications in cuse_device: 1. pthread_t tid: This field is removed as we will now have a single thread polling for all admin commands. 2. bool force_exit: Older code ensured that we invoke fuse_session_reset() before cuse_lowlevel_teardown() by invoking pthread_join() on that cuse_thread's tid during force destruction of cuse device. This order is important because cuse_lowlevel_teardown() frees the fuse_session memory on which fuse_session_reset() operates. If this order is not maintained we could get us into use after free error. To strictly follow the order we have added a bool field 'force_exit' in cuse_device struct. When this bool is set to true the onus to call cuse_lowlevel_teardown() is on the cuse_thread where we maintain this order. Also this field signals the cuse_thread to free the cuse_device since the session exit has been done by the spdk library itself. 3. int fuse_efd: We would need the fd related to fuse session to remove it from fd group after the fuse session has exited. Also, we can't use fuse_session_fd() once the session has exited because the function definition states that it will return an undefined value. Hence we store the fd beforehand. 4. TAILQ_ENTRY(cuse_device) cuse_thread_tailq: Since we are storing the same struct in two different lists we need a different tail link for each of the lists. The two lists are 'g_ctrlr_ctx_head' and 'g_pending_device_head' or 'g_active_device_head' depending on whether we have started polling on that session. [Test Fix] We also fix test issues opened up by this patch. They are - 1. In test_cuse_update() we invoked nvme_cuse_start() directly which led to failures because helper data structures for cuse_thread were not initialised. This initialisation now take place in the first call to spdk_nvme_cuse_register(). In the test code we now invoke the register function rather than directly starting the cuse device. 2. In the test_nvme_cuse_stop() we created the spdk controller and multiple cuse devices associated with it. Then we stopped them by invoking nvme_cuse_stop(). Here we expected all the space allocated for cuse devices to be freed by the stop call but this isn't the case now. Since we have not registered the devices by invoking spdk_nvme_cuse_register() we don't poll for them in the cuse thread, which is now responsible for freeing the cuse_device memory. We fix this by registering the controller with spdk before stopping it. Also we now spin on g_device_fdgrp before ending the test to ensure that all resources of the cuse_thread have been freed. Change-Id: I0c1f5d57841ef670ba407cf4f08c3bbbd1bcf78a Signed-off-by:Yash Raj Singh <yash.rajsingh@nutanix.com> Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/21593 Tested-by:
SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by:
Tomasz Zawadzki <tomasz.zawadzki@intel.com> Reviewed-by:
Jim Harris <jim.harris@samsung.com> Reviewed-by:
Vasilii Ivanov <ivanov.vas@xinnor.io>