How Computer Vision Works for Self-Driving Cars

It looks like self-driving cars will be chauffeuring humans around in the foreseeable future.

Think about it: in the next couple of decades, all you need to do is hop into a car, input an address, and chillax. We’re moving towards a world where drivers will be obsolete, and vehicles do all the driving for us. Automation frees-up time, and you can do all sorts of tasks while riding in a car.

Self-driving cars will also make the roads a lot safer.

Did you know that around 33,000 Americans die in car crashes every year? It’s a sad but actual fact. Computers in cars will bring accuracy in transportation and eliminate human error and fatalities. Authorities don’t need to sift through a wreck to do an auto vin check anymore, because drunk and unskilled drivers will be off the wheel.

The question now is, how can self-driving cars do all this? How can they “see”?

Computer Vision

All autonomous vehicles have several cameras installed in strategic locations. These cameras provide the car with a 360-degree view of its surroundings. For instance, Tesla installs eight “surround cameras” on its vehicles for an all-around vision of up to 250 meters.

Cameras mounted on self-driving cars are crucial for the following tasks:

  • Detecting and classifying traffic signs
  • Estimating road curvature
  • Detecting and classifying obstacles
  • Finding its proper lane
  • Detecting pedestrians for an emergency stop

Therefore, a self-driving car “sees” the world around it by using computer vision, or “perception,” as it is also often called. However, cameras aren’t the only tool a self-driving car uses to perceive its environment. Autonomous vehicles create a “digital map” using other sensory input devices such as lasers and radar.

Camera Tasks

For a self-driving car to understand the world around it, the mounted cameras rely on object detection. This task is a two-part process: image classification and image localization (or detection).

Image Classification

This task is all about determining what the objects in the image are, whether it’s a person or a vehicle. Image classification works by repeatedly training a neural network to recognize different objects. We call this training “deep learning,” which we’ll talk about in a bit. Everyday objects include traffic lights, pedestrians, and other road obstacles.

Image Localization

This process provides the specific location of these objects (pedestrians, traffic lights) in a camera range.

The computer needs to do all this fast enough, sending the results to the driving system so it can make split-second decisions. Suffice it to say; the data has to be fast and accurate.

Deep Learning

Deep neural networks learn from data and are the dominant approach when it comes to working with camera images and video. For instance, to teach a self-driving car what a stop sign looks like, manufacturers “feed” it thousands of stop sign images. This approach allows the computer to “learn” by gradually building-up its image database.

Deep neural networks rely on advanced GPU (graphical processing unit) hardware, not CPUs. Optimized GPUs can perform thousands of computations at once, whereas CPUs can only perform one calculation as fast as it can.


Self-driving cars need computer vision to perceive their environment. Cameras provide the means to detect and classify objects on the road so that the onboard driving computer can make decisions quickly. Although it’s only a matter of time before self-driving cars hit the road, there’s still a long way to go.

Cameras still struggle with estimating distance, velocity, depth, and height. However, advancements in technology such as stereo cameras and deep learning, promise to move the future of driving cars forward. For now, lidar and radar are still the go-to sensors for detecting and estimating these measurements.

About Author