To understand the surrounding environment in real time, a central component to the robot is an object detection module — a program that inputs images and returns a list of object boxes. An image is a large three-dimensional array consisting of a myriad of numbers representing pixel intensities. These values change significantly when the image is taken at night instead of daytime; when the object’s color, scale or position changes, or when the object itself is truncated or occluded. In the robot software, there are sets of trainable units, mostly neural networks, where the code is written by the model itself. The program is represented by a set of weights. At first, these numbers are randomly initialized, and the program’s output is random as well. The engineers present the model examples of what they would like to predict and ask the network to get better the next time it sees a similar input. By iteratively changing the weights, the optimization algorithm searches for programs that predict bounding boxes more and more accurately.