Reinforcement learning (RL) has the potential be applied to many real-world applications. In our research, we also investigate the applications of RL and Multi-agent RL. Currently, we have been investigating two applications: one is traffic signal control; another is EDA. Traffic signals coordinating traffic movements are the key for transportation efficiency. However, conventional traffic signal control that heavily relies on pre-defined rules and assumptions on traffic conditions is far from intelligence. RL that learns from directly interacting with the environment has great potential to be applied to traffic signal control for building smart cities. EDA has many combinatorial optimization problems. Many of them are currently solved by heuristics, which usually obtain the performance far from the optimal. How to achieve better performance has been a long-standing prbolem. RL that aims to optimize the long-term return naturally fits many problems in EDA. Therefore, we also pay attention to solving the problems of EDA. In the following, we introduce some of our studies. For detail, please refer to the paper.
HiLight
The objective of traffic signal control is to optimize average travel time, which is a delayed reward in a long time horizon in the context of RL. However, existing work simplifies the optimization by using queue length, waiting time, delay, etc., as immediate reward and presumes these short-term targets are always aligned with the objective. Nevertheless, these targets may deviate from the objective in different road networks with various traffic patterns. Moreover, it remains unsolved how to cooperatively control traffic signals to directly optimize average travel time. To address these challenges, we propose a hierarchical and cooperative reinforcement learning method-HiLight. HiLight enables each agent to learn a high-level policy that optimizes the objective locally by selecting among the sub-policies that respectively optimize short-term targets. Moreover, the high-level policy additionally considers the objective in the neighborhood with adaptive weighting to encourage agents to cooperate on the objective in the road network. Empirically, we demonstrate that HiLight outperforms state-of-the-art RL methods for traffic signal control in real road networks with real traffic.
Net Oder Exploration in Detailed Routing
The net orders in detailed routing are crucial to routing closure, especially in most modern routers following the sequential routing manner with the rip-up and reroute scheme. In advanced technology nodes, detailed routing has to deal with complicated design rules and large problem sizes, making its performance more sensitive to the order of nets to be routed. In literature, the net orders are mostly determined by simple heuristic rules tuned for specific benchmarks. We propose an asynchronous reinforcement learning (RL) framework to search for optimal ordering strategies automatically. By asynchronous querying the router and training the RL agents, we can generate high-performance routing sequences to achieve better solution quality.