Docker Storage

Chay published in Docker

2024-06-28 1772 words 9 minutes

Website Visitors:

Contents

Docker File System

Here’s an overview of the Docker file system, specifically the /var/lib/docker directory and its subfolders:

/var/lib/docker

This is the default directory where Docker stores its data, including images, containers, and volumes. The /var/lib/docker directory is divided into several subfolders, each serving a specific purpose:

Subfolders:

aufs (or overlay2): This subfolder contains the layered file system for Docker images and containers. It’s where Docker stores the individual layers of an image, as well as the container’s writable layer.
containers: This subfolder stores metadata about each container, including its configuration, logs, and runtime information.
image: This subfolder contains metadata about each Docker image, including its configuration, layers, and dependencies.
layers: This subfolder stores the individual layers of Docker images, which are used to build the layered file system.
network: This subfolder contains information about Docker networks, including network configurations and IP address allocations.
plugins: This subfolder stores plugins that extend Docker’s functionality, such as volume plugins or network plugins.
tmp: This subfolder is used for temporary storage during Docker operations, such as building images or creating containers.
volumes: This subfolder stores data volumes, which are directories shared between the host machine and containers.
buildkit: This subfolder is used by the BuildKit build system, which is used to build Docker images.

Note: The exact structure and contents of the /var/lib/docker directory may vary depending on the Docker version, configuration, and usage.

Run docker info command to view the current storage driver in use.

Layered Architecture

Docker uses a layered file system, also known as a union mount file system, to manage images and containers. This file system is composed of multiple layers, each representing a specific component of the image or container.

Here’s a breakdown of the layers:

Base Image Layer: This is the bottom-most layer, which contains the base operating system (e.g., Ubuntu, CentOS) and its dependencies. This is not the host OS. This is the base Docker image that serves as the foundation for your Docker image. Here’s an example to illustrate the difference:
- Host OS: Ubuntu 20.04 (running on the machine)
- Base image: python:3.9-slim (a Docker image that serves as the base for your custom image)
- Your custom image: my-python-app (built on top of the python:3.9-slim base image)
In this example, the host OS is Ubuntu 20.04, but the base image layer in your my-python-app image is the python:3.9-slim image, which is a separate entity from the host OS.
Intermediate Layers: These layers contain the application code, dependencies, and configurations. Each layer builds upon the previous one, allowing for efficient reuse of layers.
Container Layer: This is the top-most layer, which is unique to each container and contains any changes made to the container’s file system.

When a container is created, Docker uses a copy-on-write mechanism to create a new layer on top of the base image layer. This allows containers to share the same base image layer, reducing storage usage and improving performance.

Docker also uses a concept called volumes to persist data even after a container is deleted or recreated. Volumes are directories that are shared between the host machine and the container, allowing data to be preserved across container restarts.

Docker uses storage drivers to maintaining the layered architecture, creating writable layer, moving files across layers to enable copy and write etc…

Docker will use the existing layers until they are changed. If they are changed, from that step onwards docker creates new layers even though they are already present.

For example, Let’s say there are 10 steps in docker file. For the very first time, docker runs all the steps. For the second time, docker uses the existing layers as much as possible. In the third time, if dockerfile value is changed from step4, all the steps from step4 step 10 are newly created even if they are in cache. Caching works from top to bottom. So if any step changes, it creates images from that step till the end/bottom of the steps newly.

Reusing Existing Layers

Let’s dive into what happens when a second Docker image is deployed.

Assumptions:

We have a Docker host machine with a running Docker daemon.
We’ve already deployed a first Docker image, let’s call it image1.
We’re now deploying a second Docker image, let’s call it image2.

Step 1: Pulling the Image

When we deploy image2, Docker checks if the image is already present on the host machine. If not, it pulls the image from a registry (e.g., Docker Hub) or a local repository.

Step 2: Layer Reuse

Docker checks if any layers from image1 can be reused for image2. Since both images share a common base image (e.g., ubuntu:latest), some layers might be identical. Docker reuses these layers to avoid duplicating data and reduce storage usage.

Step 3: Creating a New Container

Docker creates a new container from the image2 image. This container gets its own unique ID, and Docker initializes a new writable layer on top of the reused layers from image1. This writable layer is used to store data created by the container like log files, temp files, or any files modified by user on that container. When the container is destroyed, all the changes are destroyed.

Step 4: Container Startup

Docker starts the new container, and the container process begins executing. The container has its own isolated environment, including its own process space, network stack, and file system.

Step 5: Layer Management

Docker manages the layers for both image1 and image2 containers. The layers are stored in the /var/lib/docker/aufs (or /var/lib/docker/overlay2) directory, which is shared between containers.

Key Benefits:

Layer reuse: Docker reuses layers from image1 to reduce storage usage and improve deployment speed.
Efficient storage: Docker stores only the differences between the two images, rather than duplicating the entire image.
Isolated environments: Each container has its own isolated environment, ensuring that image1 and image2 containers don’t interfere with each other.

By reusing layers and efficiently managing storage, Docker enables fast and lightweight deployment of multiple images on a single host machine.

Docker Volumes

Docker volumes are a way to persist data even after a container is deleted or recreated. They allow you to store data outside of the container’s file system, making it possible to share data between containers or preserve data even when a container is restarted or deleted.

Types of Docker Volumes

There are three types of Docker volumes:

Named Volumes: These are volumes that are created with a specific name. They are stored in the /var/lib/docker/volumes directory on the host machine.
Anonymous Volumes: These are volumes that are created without a specific name. They are stored in the /var/lib/docker/volumes directory on the host machine, but they are not easily identifiable.
Bind Mounts: These are volumes that are mounted from a specific directory on the host machine to a container. They allow you to access files on the host machine from within a container.

How to Create a Docker Volume

To create a Docker volume, you can use the docker volume create command. For example:

1

docker volume create database-volume

This will create a named volume called database-volume.

How to Use a Docker Volume

To use a Docker volume, you need to mount it to a container. You can do this by using the -v flag when running a container. For example:

1

docker run -v database-volume:/var/lib/mysql mysql

This will mount the database-volume volume to the /var/lib/mysql directory inside the container. If you do not create a volume before running the docker run -v command, docker will automatically create a volume with the name you specified in the docker run -v command. Ex: Running docker run -v database-volume2:/var/lib/mysql mysql directly WITHOUT using the docker volume create database-volume2 command will create a volume called database-volume2 automatically.

You should be able to see all the volumes under /var/lib/docker/volumes folder on the host. This is called volume mounting.

In the above examples, database-volume and database-volume2 volumes are created under /var/lib/docker/volumes folder. If you have a folder in some other location, and you want to use that location as your docker volume for storing the container data, you have to use the full path in the docker run command as shown below:

docker run -v /home/user01/Desktop/mysqldata:/var/lib/mysql mysql

Benefits of Docker Volumes

Docker volumes provide several benefits, including:

Data persistence: Volumes allow you to persist data even after a container is deleted or recreated.
Data sharing: Volumes allow you to share data between containers.
Flexibility: Volumes provide a flexible way to manage data in a containerized environment.

Common Use Cases for Docker Volumes

Docker volumes are commonly used in the following scenarios:

Database storage: Volumes are often used to store database data, such as MySQL or PostgreSQL databases.
File sharing: Volumes are used to share files between containers, such as sharing a configuration file or a data file.
Persistent storage: Volumes are used to provide persistent storage for applications that require it, such as caching layers or message queues.

Docker Storage Drivers

Docker storage drivers are plugins that allow Docker to interact with various storage systems, such as local disks, network-attached storage (NAS), and cloud-based storage services. Storage drivers provide a way to manage and persist data in Docker containers.

Types of Docker Storage Drivers

There are several types of Docker storage drivers, including:

aufs: The default storage driver for Docker, which provides a union mount file system.
overlay: A storage driver that provides a union mount file system, similar to aufs.
overlay2: An improved version of the overlay storage driver, which provides better performance and reliability.
devicemapper: A storage driver that uses the device mapper framework to provide a thin provisioning layer.
btrfs: A storage driver that uses the Btrfs file system to provide a copy-on-write file system.
zfs: A storage driver that uses the ZFS file system to provide a copy-on-write file system.
vfs: A storage driver that provides a virtual file system, which allows Docker to interact with various storage systems.

How Docker Storage Drivers Work

When you run a Docker container, the storage driver is responsible for managing the container’s file system. The storage driver creates a thin layer on top of the underlying storage system, which allows Docker to write data to the container’s file system.

Here’s a high-level overview of how Docker storage drivers work:

Container creation: When you create a Docker container, the storage driver creates a new layer on top of the underlying storage system.
Write operations: When you write data to the container’s file system, the storage driver writes the data to the underlying storage system.
Read operations: When you read data from the container’s file system, the storage driver reads the data from the underlying storage system.

Your inbox needs more DevOps articles.

Subscribe to get our latest content by email.