ur new PhD student Johan started on Nov 1st. He is an industrial PhD student from Kairos Logic, a company with people that were previously at Sony. Johan will sit on the 4th floor of the E building. He will be working with the Vehicle Routing Problem for warehouses. The idea is to reduce planning time with learning and simulation.
Johan is a new PhD student in our group. He is an industrial PhD student, funded through the Wallenberg Autonomous Systems program (WASP).
Abstract of the planned thesis work
As with many autonomous systems related projects, the scientific challenge is derived from a use-case that aims to exploit the possibilities of autonomous, intelligent agents.
The use-case in this proposal is material handling for customer orders in a warehouse similar to the fulfillment centers of Amazon, The goal o fthis thesis is the development of a high-level planning capability for a fleet of autonomous agents that allows to autonomously handle customer order
s in the warehouse in an efficient manner with multiple agents. However, different from the Amazon’s Kiva system, our autonomous agents are envisioned to not move the shelves but to visit the various pick locations. Such situations are common in, e.g., kitting applications in lean production and can, e.g., be found at car makers like PSA or Volvo.
The problem is often formalized as a Vehicle Routing Problem (VRP), which is generalizing the Travelling Salesman Problem (TSP) to several agents. TSP and VRP are known to be NP-hard. In the warehouse situation, we have additional challenges and complications such as:
• Orders consist usually of several parts, and orders should generally be completed by a single agent.
• Orders keep coming in dynamically and they have to be distributed to available robots
• Orders might come with varying levels of urgency.
• Agents can have different capacity.
• Agents can block each other and must comply with right-of-way rules
We call this the “dynamic warehouse VRP”.
In this project, we want to explore a Reinforcement learning approach to the problem where the robots are able to decide, on-the-fly, which item to collect next, based on the situation at hand and based on a previously trained set of policies. In detail, we will
1. formalize the VRP suitably such that it allows a proper comparison with other approaches
2. collect and share data sets, real and simulated, that allow to benchmark, verify and validate approaches
3. develop our own optimization approach based on recent advances in deep-reinforcement learning, which will be adapted for the warehouse settings
4. Validate and verify the approach on a 70qm mockup with 100 small robots.
The project will be executed in an agile fashion with regular evaluations, and the approaches will be tested in stimulated settings of real warehouses and possibly in a real warehouse as well.