Transmission Control in NB-IoT With Model-Based Reinforcement Learning

In Narrowband Internet of Things (NB-IoT), the control of uplink transmissions is a complex task involving device scheduling, resource allocation in the carrier, and the configuration of link-adaptation parameters. Existing heuristic proposals partially address the problem, but reinforcement learning (RL) seems to be the most effective approach a priori, given its success in similar control problems. However, the low sample efficiency of conventional (model-free) RL algorithms is an important limitation for their deployment in real systems. During their initial learning stages, RL agents need to explore the policy space selecting actions that are, in general, highly ineffective. In an NB-IoT access network this implies a disproportionate increase in transmission delays. In this paper, we make two contributions to enable the adoption of RL in NB-IoT: first, we present a multi-agent architecture based on the principle of task division. Second, we propose a new model-based RL algorithm for link adaptation characterized by its high sample efficiency. The combination of these two strategies results in an algorithm that, during the learning phase, is able to maintain the transmission delay in the order of hundreds of milliseconds, whereas model-free RL algorithms cause delays of up to several seconds. This allows our approach to be deployed, without prior training, in an operating NB-IoT network and learn to control it efficiently without degrading its performance.

Dynamic Multihop Routing in Terahertz Flow-Guided Nanosensor Networks: A Reinforcement Learning Approach

The Internet of Nano-Things (IoNT) is an emerg-ing paradigm in which devices sized to the nanoscale (nanon-odes) and transmitting in the terahertz (THz) band canbecome decisive actors in future medical applications. Flow-guided nanonetworks are well-known THz networks aimedat deploying the IoNT inside the human body, among otherissues. In these networks, nanonodes flowing through thebloodstream monitor-sensitive biological/physical parame-ters and dispatch these data via electromagnetic (EM) wavesto a nanorouter implanted in human tissue, which operatesas a gateway to external Internet connectivity devices. Underthese premises, two shortcomings arise. First, the use ofthe THz band greatly limits the nanonode’s communicationrange. Second, the nanonodes lack resources for processing, memory, and batteries. To minimize the impact of theseconcerns in EM nanocommunications, a novel dynamic multihop routing scheme is proposed to model in-body, flow-guided nanonetwork architecture. To this end, a reinforcement learning-based framework is conceived, combining thefeatures of EM nanocommunications and hemodynamics or fluid dynamics applied to the bloodstream. A generic Markovdecision process (MDP) approach is derived to maximize the throughput metric, analytically modeling: 1) the movementof the nanonodes in the bloodstream as laminar flow; 2) energy consumption (including energy-harvesting issues); and3) prioritized events. A thoroughly THz flow-guided nanonetwork case of study is also defined. Under the umbrella of thiscase, diverse testbeds are planned to create a procedure of evaluation, validation, and discussion. Results reveal thatmultihop scenarios obtain better performance than direct nanonode-nanorouter communication, specifically, the two-hopscenario, which, for instance, quadrupled the throughput in a hand vein without sharply penalizing other aspects suchas energy consumption.

Bridging Nano- and Body Area Networks: A Full Architecture for Cardiovascular Health Applications

Cardiovascular events occurring in the bloodstream are responsible for about 40% of human deaths in developed countries. Motivated by this fact, we present a new global network architecture for a system for the diagnosis and treatment of cardiovascular events, focusing on problems related to pulmonary artery occlusion, i.e., situations of artery blockage by a blood clot. The proposed system is based on bio-sensors for detection of artery blockage and bio-actuators for releasing appropriate medicines, both types of devices being implanted in pulmonary arteries. The system can be used by a person leading an active life and provides bidirectional communication with medical personnel via nano-nodes circulating in the bloodstream constituting an in-body area network. We derive an analytical model for calculating the required number of nano-nodes to detect artery blockage and the probability of activating a bio-actuator. We also analyze the performance of the body area component of the system in terms of path loss and of wireless links budget. Results show that the system can diagnose a blocked artery in about 3 hours and that after another 3 hours medicines can be released in the exact spot of the artery occlusion, while with current medical practices the average time for diagnosis varies between 5 to 9 days.

Model-Based Reinforcement Learning with Kernels for Resource Allocation in RAN Slices

This paper addresses the dynamic allocation of RAN resources among network slices, aiming at maximizing resource efficiency while assuring the fulfillment of the service level agreements (SLAs) for each slice. It is a challenging stochastic control problem, since slices are characterized by multiple random variables and several objectives must be managed in parallel. Moreover, coexisting slices can have different descriptors and behaviors according to their type of service (e.g., enhanced mobile broadband, eMBB, or massive machine type communication, mMTC). Most of the existing proposals for this problem use a model-free RL (MFRL) strategy. The main drawback of MFRL algorithms is their low sample efficiency which, in an online learning scenario (i.e., when the agents learn on an operating network), may lead to long periods of resource over-provisioning and frequent SLA violations. To overcome this limitation, we follow a model-based RL (MBRL) approach built upon a novel modeling strategy that comprises a kernel-based classifier and a self-assessment mechanism. In numerical experiments, our proposal, referred to as kernel-based RL (KBRL), clearly outperforms state-of-the-art RL algorithms in terms of SLA fulfillment, resource efficiency, and computational overhead.

Multi-Agent Deep Reinforcement Learning to Manage Connected Autonomous Vehicles at Tomorrow’s Intersections

In recent years, the growing development of Connected Autonomous Vehicles (CAV), Intelligent Transport Systems (ITS), and 5G communication networks have led to the advent of Autonomous Intersection Management (AIM) systems. AIMs present a new paradigm for CAV control in future cities, taking control of CAVs in scenarios where cooperation is necessary and allowing safe and efficient traffic flows, eliminating traffic signals. So far, the development of AIM algorithms has been based on basic control algorithms, without the ability to adapt or keep learning new situations. To solve this, in this paper we present a new advanced AIM approach based on end-to-end Multi-Agent Deep Reinforcement Learning (MADRL) and trained using Curriculum through Self-Play , called advanced Reinforced AIM ( adv. RAIM). adv. RAIM enables the control of CAVs at intersections in a collaborative way, autonomously learning complex real-life traffic dynamics. In addition, adv .RAIM provides a new way to build smarter AIMs capable of proactively controlling CAVs in other highly complex scenarios. Results show remarkable improvements when compared to traffic light control techniques (reducing travel time by 59% or reducing time lost due to congestion by 95%), as well as outperforming other recently proposed AIMs (reducing waiting time by 56%), highlighting the advantages of using MADRL.