Running & shipping AI model on Windows using Docker

This article mix AI generated content and human created content. AI generation is reviewed by the  autor.

Model & hardware used:

Model: Thebloke/Mixtral-8x7B-v0.1 (q4_k_m)
Run on CPU: AMD 5950x
RAM usage: 38.45Go
Run on GPU: RTX 3070 8G (GPU offloading with LMstudio)
GPU Offload: 10 Threads
Speed: ~5tok/s

AI: Welcome to our latest article, where we will guide you through the process of running an AI model on a Windows system using Docker. As the use of AI and machine learning models becomes increasingly prevalent in various industries, it is essential to understand how to deploy these models efficiently and effectively.

Docker provides a lightweight and portable solution for packaging and deploying applications, including AI models, making it an ideal choice for Windows users who want to streamline their deployment process. In this article, we will explore the benefits of using Docker, discuss the prerequisites for running Docker on Windows, and walk you through a step-by-step guide on how to containerize your AI model using Docker.

By the end of this article, you will have a solid understanding of how to use Docker to deploy your AI models on a Windows system, enabling you to save time and resources while ensuring consistent performance across different environments. Let’s get started!

Human: I only wanted to write a quick article about running a model with docker but it seems Mixtral has other plans for me tonight. So let’s see how to run AND contenerize AI models on windows (wich are, in fact, two different topics). ^^’

I will let the AI provide you with short explanations on docker while i prepare everything. View it as a quick tutorial, if you want to reproduce each steps on Windows you will need :

A computer with 8Go RAM or more
Windows (i’m using win11)
Docker Desktop
An AI model : this one for example (download the Q4_K_M)

AI: Docker is an open-source platform that enables developers to package and deploy applications in containers. Containers are lightweight, portable, and self-contained units that include all the necessary dependencies and configurations required to run an application, making it easier to deploy and manage applications across different environments.

Docker provides a consistent runtime environment for applications, allowing developers to avoid compatibility issues that can arise when deploying applications on different operating systems or infrastructure. It also simplifies the deployment process by enabling developers to package their applications into a single container image, which can be easily deployed to any system with Docker installed.

Overall, Docker has become an essential tool for modern software development and deployment, offering benefits such as portability, consistency, and efficiency, making it easier to manage complex microservices architectures and simplify the deployment of applications.

Human: You could see container as small virtual machine if it’s easier to understand. What matter for us in this article is that you could run command line on a docker container from the host. The AI will explain it to you, i’m style preparing some things for this tutorial.

Hint: if you are using Docker for the first time use this command (with cmd):

docker run -p 8080:80 nginx:latest

It will download some image and launch a docker with a web server (nginx) and expose it on port 8080, you should be able to reach it here : http://127.0.0.1:8080

This way you will be able to reproduce what th AI is doing next.

AI: To run a command in Docker on Windows, you can use the `docker exec` command. This command allows you to execute a command inside a running Docker container.

Here’s an example of how to use it:

First, you need to find the ID or name of your Docker container using the docker ps command:

C:\> docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
6f5a37c2438f myimage:latest "python app.py" 2 hours ago Up 2 hours mycontainer

In this example, the ID of the container is 6f5a37c2438f, and its name is mycontainer.

Next, you can use the docker exec command to run a command inside the container. For example, if you want to run the ls command inside the container to list the files in the current directory, you can use the following command:

C:\> docker exec -it mycontainer ls
app.py config.py Dockerfile requirements.txt

In this example, -it flag is used to allocate a pseudo-TTY and keep STDIN open so that you can interact with the command.

Here’s another example of running a python script inside a container:

C:\> docker exec -it mycontainer python app.py arg1 arg2

In this example, app.py is the name of the python script that you want to run, and arg1 and arg2 are arguments passed to the script.

Note that you can also use the container ID instead of its name in the above commands. Also, make sure that the command you are trying to run is installed inside the container and accessible from the path where you are running it.

Human: Ok perfect, let explain the “docker run” command we will need

AI: The docker run command is used to create a new container from an image and start the container. It has several options that can be used to configure the container, such as setting environment variables, mapping ports, and specifying volumes. Here’s an example of how to use the `docker run` command with some common options:

First, you need to find the image you want to run using the docker images command:

$ docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
myimage latest e94f8c5a1234 2 days ago 1GB

In this example, the image name is myimage, and its tag is latest.

Next, you can use the docker run command to start a new container from the image with some options:

$ docker run -d --name mycontainer -p 5000:5000 -e MY_ENV_VAR=myvalue -v myvolume:/app/data myimage:latest

Here’s what each option does:

-d: Run the container in detached mode (in the background).
--name mycontainer: Set the name of the container to mycontainer.
-p 5000:5000: Map port 5000 inside the container to port 5000 on the host machine.
-e MY_ENV_VAR=myvalue: Set an environment variable named MY_ENV_VAR with a value of myvalue inside the container.
-v myvolume:/app/data: Create a volume named myvolume and mount it to /app/data inside the container.
myimage:latest: The image to use to create the container, with its tag latest.

This command will start a new container from the myimage:latest image, set its name to mycontainer, map port 5000 on the host machine to port 5000 inside the container, set an environment variable named MY_ENV_VAR with a value of myvalue, create a volume named myvolume, and mount it to /app/data inside the container.

Note that you can also add commands to be executed inside the container when it starts by appending them after the image name, like so:

$ docker run -d --name mycontainer myimage:latest python app.py arg1 arg2

In this example, python app.py arg1 arg2 is the command that will be executed inside the container when it starts.

Human: Ok now you better understand the basics of docker we can see the fun part : AI model running in docker. 😀

You will need to trust me as things should take some time to complete, but the AI will explain everything right after.

Do as follow:

Download the model weight : https://huggingface.co/TheBloke/blossom-v3-mistral-7B-GGUF/resolve/main/blossom-v3-mistral-7b.Q4_K_M.gguf?download=true
Put it where you want as long as the path did not contain space (it’s easier without space i promise)
Launch your own version of the following command:

docker run -p 8181:8181 -v <path to the directory where you put the model>:/models ghcr.io/ggerganov/llama.cpp:full --server -m /models/blossom-v3-mistral-7b.Q4_K_M.gguf --port 8181 --host 0.0.0.0

Go get a cofee and came back to read AI’s explanations of this command

AI: Let’s break down the docker run command you provided and explain its options:

docker run: This is the base command to create a new container from an image and start it.
-p 8181:8181: This option maps port 8181 inside the container to port 8181 on the host machine. When you access http://localhost:8181 on your host machine, it will be forwarded to http://localhost:8181 inside the container.
-v <path to the directory where you put the model>:/models: This option mounts a volume from the host machine to the container. In this case, it maps the directory <path to the directory where you put the model> on the host machine to the directory /models inside the container.
ghcr.io/ggerganov/llama.cpp:full: This is the image name to use for creating the container. It specifies the location of the image on Docker Hub and includes the tag full.
--server: This option sets the server mode for the LLM (large language model) that will be running inside the container.
-m /models/blossom-v3-mistral-7b.Q4_K_M.gguf: This option specifies the path to the pre-trained LLM model file inside the container. In this case, it points to the file located at /models/blossom-v3-mistral-7b.Q4_K_M.gguf.
--port 8181: This option sets the port that the LLM server inside the container will listen on for incoming requests. In this case, it’s set to port 8181.

Overall, this command creates a new container from the image ghcr.io/ggerganov/llama.cpp:full, maps port 8181 inside the container to port 8181 on the host machine, mounts a volume from the host machine to the container, sets the server mode for the LLM, specifies the path to the pre-trained LLM model file inside the container, and sets the port that the LLM server will listen on. Once the container is started, you can access the LLM server via http://localhost:8181 on your host machine and interact with it by sending HTTP requests to that address.

Human: After some time, depending on your computer performances, you should be able to see something like this :

Available slots:
 -> Slot 0 - max context: 512
{"timestamp":1703897816,"level":"INFO","function":"main","line":3141,"message":"HTTP server listening","port":"8181","hostname":"0.0.0.0"}

llama server listening at http://0.0.0.0:8181

all slots are idle and system prompt is empty, clear the KV cache

As stated the model is now running (congratulation) and should be availlable here : http://127.0.0.1:8181

When visiting this page you should see this :

And if you say hello, it should answer. 🙂

Again, congratulation, you are now running a real AI model on your computer.

If you are a more advanced user and want to use some python to play with Llama.cpp API, you could follow the next steps while i prepare the next big topic : to ship the model in a docker.

AI: Here’s an example Python code using the requests library to interact with the LLM (large language model) running inside the Docker container:

import requests

# Set the URL for the LLM server running inside the Docker container
url = "http://localhost:8181/completion"

# Define a prompt to send to the LLM server
prompt = "What is the capital of France?"

# Send a POST request to the LLM server with the prompt as JSON data
response = requests.post(url, json={"prompt": prompt})

# Extract the generated text from the response
generated_text = response.json()["content"]

# Print the generated text
print("Generated text:", generated_text)

In this example code, we first set the URL for the LLM server running inside the Docker container to http://localhost:8181. We then define a prompt to send to the LLM server as a string. In this case, we ask “What is the capital of France?”

Next, we use the requests library to send a POST request to the LLM server with the prompt as JSON data in the body of the request. The LLM server will generate text based on the prompt and return it as a response.

Finally, we extract the generated text from the response using the json() method and print it to the console. In this example, we expect the generated text to be “The capital of France is Paris.”

Note that you can modify the prompt string to ask any question or make any statement you want to generate text for. Additionally, you can adjust the requests.post() function call to include additional parameters or headers as needed for your specific use case.

Human: In addition you could check the API documentation here.

Human: Now let’s do something really fun: shipping the model in a container.

In order to understand what will follow, you need to understand what is a Dockerfile and some key concepts around docker image building, so, again let the AI explain it, i’m still finishing some shinanigan. 🙂

AI: A Dockerfile is a text file that contains instructions for building a Docker image. It specifies the base image to use, any necessary dependencies or libraries to install, and how to run the application inside the container. The Dockerfile serves as a blueprint for creating a customized Docker image tailored to your specific application needs.

Here are some examples of common instructions found in a Dockerfile:

FROM: This instruction specifies the base image to use for the new image being built. For example, FROM python:3.9-slim-buster would use the official Python 3.9 slim version as the base image.
WORKDIR: This instruction sets the working directory inside the container for subsequent instructions. For example, WORKDIR /app would set the working directory to /app inside the container.
COPY: This instruction copies files or directories from the host machine to the container. For example, COPY requirements.txt . would copy the requirements.txt file from the host machine to the current directory (.) inside the container.
RUN: This instruction runs a command inside the container. For example, RUN pip install -r requirements.txt would run the pip install -r requirements.txt command inside the container to install any necessary Python packages.
CMD: This instruction specifies the default command to run when a container is launched from the image. For example, CMD ["python", "app.py"] would run the python app.py command by default when launching a container from the image.

To build a Docker image from a Dockerfile, you can use the following command:

$ docker build -t myimage:latest .

In this example command, docker build is used to build an image using the Dockerfile in the current directory (.). The -t flag sets the tag for the new image to myimage:latest, indicating that it is the latest version of the myimage image.

During the image building process, Docker will execute each instruction in the Dockerfile sequentially to create a new layer on top of the previous one. Each layer contains only the changes made by the corresponding instruction, making the final image smaller and faster to download or upload. Once all instructions have been executed, the resulting image can be used to launch containers with the desired application and dependencies pre-installed.

Overall, Dockerfiles provide a convenient way to automate the process of building customized Docker images for your applications. By specifying the exact steps required to build an image in a text file, you can easily share and reproduce consistent builds across different machines and environments.

Human: Great now you have basic knowledge of what is a dockerfile and how you build an image, let do it.

First you need create a directory and copy the desired model in it.

(MKDIR to create the directory from the windows command line, cd /D E:\directory if you want to change disk)

Now create a file named model.Dockerfile and edit it.

In this Dockerfile you will add the folowing:

FROM ghcr.io/ggerganov/llama.cpp:full
RUN mkdir -p /models
COPY blossom-v3-mistral-7b.Q4_K_M.gguf /models/.
EXPOSE 8181/tcp
ENTRYPOINT ["/app/.devops/tools.sh", "--server", "-m", "/models/blossom-v3-mistral-7b.Q4_K_M.gguf", "--port", "8181", "--host", "0.0.0.0"]

Don’t worry, we will explain it in a few minute.

Then, in the exact same directory, run the following command:

docker build -t blossom:v1 -f blossom.Dockerfile .

It may take some time so we will explain a bit what we are doing.

First of all, this dockerfile is the configuration file for the command `docker build`. Docker will use the informations in this file to build your image.

Here are each lines explained:

The FROM instruction tell docker on wich image you want to build as docker image have “layers” (one base image and several “layer image”. For intance the llama.cpp image we are using is based on Ubuntu 22.04 image, this is the true base image. So we are building on top of that.
The RUN instruction tell docker to create a directory inside the image.
The COPY instruction tell docker to copy the model inside the directory “models” inside the image
The EXPOSE instruction didn’t publish the port, it only tell docker wich port is used by the image. Image never publish ports as it’s a decision that should always be taken by the administrator
The ENTRYPOINT instruction give docker the command to run when launching the container using this image. here we are using the script develloped by ggerganov for launching llama.cpp with the rights parameters.

Now let’s see the docker build:

-t : it’s the parameter to define the name and the tag of your image (it’s easier to launch). Here i chose to name it as “blossom” and taged it as the v1. It’s arbitrary so you could named it whatever you want as long as it’s in lowcaps.
-f: you specify the Dockerfile you want to use in case you have multiple versions
The dot is here only to specify the working directory.

And finaly, when everything is ready and your image is built, you could run it:

docker run -p 8181:8181 blossom:v1

As you note we publish the port when running the container.

You could also launching the container from the Docker Desktop interface by going to the images tab on the left :

Now you know how to run an AI model with docker on Windows and how to ship it in your own image. Congratulation. 😀

Leave a Reply Cancel reply