ServiceNow Research

Learning Heuristics for the TSP by Policy Gradient


The aim of the study is to provide interesting insights on how efficient machine learning algorithms could be adapted to solve com- binatorial optimization problems in conjunction with existing heuristic procedures. More specifically, we extend the neural combinatorial opti- mization framework to solve the traveling salesman problem (TSP). In this framework, the city coordinates are used as inputs and the neural network is trained using reinforcement learning to predict a distribution over city permutations. Our proposed framework differs from the one in [1] since we do not make use of the Long Short-Term Memory (LSTM) architecture and we opted to design our own critic to compute a baseline for the tour length which results in more efficient learning. More impor- tantly, we further enhance the solution approach with the well-known 2-opt heuristic. The results show that the performance of the proposed framework alone is generally as good as high performance heuristics (OR- Tools). When the framework is equipped with a simple 2-opt procedure, it could outperform such heuristics and achieve close to optimal results on 2D Euclidean graphs. This demonstrates that our approach based on machine learning techniques could learn good heuristics which, once being enhanced with a simple local search, yield promising results.

International Conference on the Integration of Constraint Programming, Artificial Intelligence, and Operations Research
Alexandre Lacoste
Alexandre Lacoste
Research Scientist

Research Scientist at Human Decision Support located at Montreal, QC, Canada.