{"id":81,"date":"2023-12-30T04:41:47","date_gmt":"2023-12-30T04:41:47","guid":{"rendered":"https:\/\/dataops-blog.com\/?p=81"},"modified":"2023-12-30T05:09:18","modified_gmt":"2023-12-30T05:09:18","slug":"running-ai-model-on-windows-using-docker","status":"publish","type":"post","link":"https:\/\/dataops-blog.com\/index.php\/2023\/12\/30\/running-ai-model-on-windows-using-docker\/","title":{"rendered":"Running &#038; shipping AI model on Windows using Docker"},"content":{"rendered":"\n<pre class=\"wp-block-preformatted\">This article mix AI generated content and human created content. AI generation is reviewed by the  autor.<\/pre>\n\n\n\n<p>Model &amp; hardware used:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Model:&nbsp;<a href=\"https:\/\/huggingface.co\/TheBloke\/Mixtral-8x7B-v0.1-GGUF\" target=\"_blank\" rel=\"noreferrer noopener\">Thebloke\/Mixtral-8x7B-v0.1<\/a>&nbsp;(q4_k_m)<\/li>\n\n\n\n<li>Run on CPU:&nbsp;<a href=\"https:\/\/www.amazon.com\/AMD-Ryzen-5950X-32-Thread-Processor\/dp\/B0815Y8J9N\/ref=sr_1_1?crid=21LCVRYQB6XCU&amp;keywords=5950x&amp;qid=1703711329&amp;sprefix=5950x%252Caps%252C210&amp;sr=8-1&amp;_encoding=UTF8&amp;tag=dataopsblog-20&amp;linkCode=ur2&amp;linkId=5ef58d261f7fc1646ccdef9409168d2c&amp;camp=1789&amp;creative=9325\" target=\"_blank\" rel=\"noreferrer noopener\">AMD 5950x<\/a><\/li>\n\n\n\n<li>RAM usage: 38.45Go<\/li>\n\n\n\n<li>Run on GPU:\u00a0<a href=\"https:\/\/www.amazon.com\/GIGABYTE-REV2-0-WINDFORCE-GV-N3070GAMING-OC-8GD\/dp\/B098Q4M4WH\/ref=sr_1_1?crid=1S8KSBI8UHAG1&amp;keywords=rtx+3070&amp;qid=1703728122&amp;sprefix=rtx+3070%252Caps%252C252&amp;sr=8-1&amp;_encoding=UTF8&amp;tag=dataopsblog-20&amp;linkCode=ur2&amp;linkId=3bd5637eb0a6674a14a88d4fe322f529&amp;camp=1789&amp;creative=9325\" target=\"_blank\" rel=\"noreferrer noopener\">RTX 3070 8G<\/a>\u00a0(GPU offloading with LMstudio)<\/li>\n\n\n\n<li>GPU Offload: 10 Threads<\/li>\n\n\n\n<li>Speed: ~5tok\/s<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<p><strong>AI:<\/strong> Welcome to our latest article, where we will guide you through the process of running an AI model on a Windows system using Docker. As the use of AI and machine learning models becomes increasingly prevalent in various industries, it is essential to understand how to deploy these models efficiently and effectively.<\/p>\n\n\n\n<p>Docker provides a lightweight and portable solution for packaging and deploying applications, including AI models, making it an ideal choice for Windows users who want to streamline their deployment process. In this article, we will explore the benefits of using Docker, discuss the prerequisites for running Docker on Windows, and walk you through a step-by-step guide on how to containerize your AI model using Docker.<\/p>\n\n\n\n<p>By the end of this article, you will have a solid understanding of how to use Docker to deploy your AI models on a Windows system, enabling you to save time and resources while ensuring consistent performance across different environments. Let&#8217;s get started!<\/p>\n\n\n\n<p><strong>Human:<\/strong> I only wanted to write a quick article about running a model with docker but it seems Mixtral has other plans for me tonight. So let&#8217;s see how to run AND contenerize AI models on windows (wich are, in fact, two different topics). ^^&#8217;<\/p>\n\n\n\n<p>I will let the AI provide you with short explanations on docker while i prepare everything. View it as a quick tutorial, if you want to reproduce each steps on Windows you will need :<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>A computer with 8Go RAM or more<\/li>\n\n\n\n<li>Windows (i&#8217;m using win11)<\/li>\n\n\n\n<li><a href=\"https:\/\/www.docker.com\/products\/docker-desktop\/\" target=\"_blank\" rel=\"noreferrer noopener\">Docker Desktop<\/a><\/li>\n\n\n\n<li>An AI model : <a href=\"https:\/\/huggingface.co\/TheBloke\/blossom-v3-mistral-7B-GGUF\/tree\/main\" target=\"_blank\" rel=\"noreferrer noopener\">this one for example<\/a> (download the Q4_K_M)<\/li>\n<\/ul>\n\n\n\n<p><strong>AI: <\/strong>Docker is an open-source platform that enables developers to package and deploy applications in containers. Containers are lightweight, portable, and self-contained units that include all the necessary dependencies and configurations required to run an application, making it easier to deploy and manage applications across different environments.<\/p>\n\n\n\n<p>Docker provides a consistent runtime environment for applications, allowing developers to avoid compatibility issues that can arise when deploying applications on different operating systems or infrastructure. It also simplifies the deployment process by enabling developers to package their applications into a single container image, which can be easily deployed to any system with Docker installed.<\/p>\n\n\n\n<p>Overall, Docker has become an essential tool for modern software development and deployment, offering benefits such as portability, consistency, and efficiency, making it easier to manage complex microservices architectures and simplify the deployment of applications.<\/p>\n\n\n\n<p><strong>Human:<\/strong> You could see container as small virtual machine if it&#8217;s easier to understand. What matter for us in this article is that you could run command line on a docker container from the host. The AI will explain it to you, i&#8217;m style preparing some things for this tutorial.<\/p>\n\n\n\n<p>Hint: if you are using Docker for the first time use this command (with cmd):<\/p>\n\n\n\n<div class=\"wp-block-jetpack-markdown\"><pre><code class=\"language-bash\">docker run -p 8080:80 nginx:latest\n<\/code><\/pre>\n<\/div>\n\n\n\n<p>It will download some image and launch a docker with a web server (nginx) and expose it on port 8080, you should be able to reach it here : <a href=\"http:\/\/127.0.0.1:8080\" target=\"_blank\" rel=\"noreferrer noopener\">http:\/\/127.0.0.1:8080<\/a><\/p>\n\n\n\n<p>This way you will be able to reproduce what th AI is doing next.<\/p>\n\n\n\n<p><strong>AI:<\/strong> To run a command in Docker on Windows, you can use the `docker exec` command. This command allows you to execute a command inside a <strong>running<\/strong> Docker container. <\/p>\n\n\n\n<p>Here&#8217;s an example of how to use it:<\/p>\n\n\n\n<div class=\"wp-block-jetpack-markdown\"><ol>\n<li>First, you need to find the ID or name of your Docker container using the <code>docker ps<\/code> command:<\/li>\n<\/ol>\n<pre><code class=\"language-bash\">C:\\&gt; docker ps\nCONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES\n6f5a37c2438f myimage:latest &quot;python app.py&quot; 2 hours ago Up 2 hours mycontainer\n<\/code><\/pre>\n<p>In this example, the ID of the container is <code>6f5a37c2438f<\/code>, and its name is <code>mycontainer<\/code>.<\/p>\n<\/div>\n\n\n\n<div class=\"wp-block-jetpack-markdown\"><ol start=\"2\">\n<li>Next, you can use the <code>docker exec<\/code> command to run a command inside the container. For example, if you want to run the <code>ls<\/code> command inside the container to list the files in the current directory, you can use the following command:<\/li>\n<\/ol>\n<pre><code class=\"language-bash\">C:\\&gt; docker exec -it mycontainer ls\napp.py config.py Dockerfile requirements.txt\n<\/code><\/pre>\n<p>In this example, <code>-it<\/code> flag is used to allocate a pseudo-TTY and keep STDIN open so that you can interact with the command.<\/p>\n<\/div>\n\n\n\n<div class=\"wp-block-jetpack-markdown\"><p>Here\u2019s another example of running a python script inside a container:<\/p>\n<pre><code class=\"language-bash\">C:\\&gt; docker exec -it mycontainer python app.py arg1 arg2\n<\/code><\/pre>\n<p>In this example, <code>app.py<\/code> is the name of the python script that you want to run, and <code>arg1<\/code> and <code>arg2<\/code> are arguments passed to the script.<\/p>\n<\/div>\n\n\n\n<p>Note that you can also use the container ID instead of its name in the above commands. Also, make sure that the command you are trying to run is installed inside the container and accessible from the path where you are running it.<\/p>\n\n\n\n<p>Human: Ok perfect, let explain the &#8220;docker run&#8221; command we will need<\/p>\n\n\n\n<p><strong>AI:<\/strong> The <code>docker run<\/code> command is used to create a new container from an image and start the container. It has several options that can be used to configure the container, such as setting environment variables, mapping ports, and specifying volumes. Here&#8217;s an example of how to use the `docker run` command with some common options:<\/p>\n\n\n\n<div class=\"wp-block-jetpack-markdown\"><ol>\n<li>First, you need to find the image you want to run using the <code>docker images<\/code> command:<\/li>\n<\/ol>\n<pre><code class=\"language-bash\">$ docker images\nREPOSITORY TAG IMAGE ID CREATED SIZE\nmyimage latest e94f8c5a1234 2 days ago 1GB\n<\/code><\/pre>\n<p>In this example, the image name is <code>myimage<\/code>, and its tag is <code>latest<\/code>.<\/p>\n<\/div>\n\n\n\n<div class=\"wp-block-jetpack-markdown\"><ol start=\"2\">\n<li>Next, you can use the <code>docker run<\/code> command to start a new container from the image with some options:<\/li>\n<\/ol>\n<pre><code class=\"language-bash\">$ docker run -d --name mycontainer -p 5000:5000 -e MY_ENV_VAR=myvalue -v myvolume:\/app\/data myimage:latest\n<\/code><\/pre>\n<p>Here\u2019s what each option does:<\/p>\n<ul>\n<li><code>-d<\/code>: Run the container in detached mode (in the background).<\/li>\n<li><code>--name mycontainer<\/code>: Set the name of the container to <code>mycontainer<\/code>.<\/li>\n<li><code>-p 5000:5000<\/code>: Map port 5000 inside the container to port 5000 on the host machine.<\/li>\n<li><code>-e MY_ENV_VAR=myvalue<\/code>: Set an environment variable named <code>MY_ENV_VAR<\/code> with a value of <code>myvalue<\/code> inside the container.<\/li>\n<li><code>-v myvolume:\/app\/data<\/code>: Create a volume named <code>myvolume<\/code> and mount it to <code>\/app\/data<\/code> inside the container.<\/li>\n<li><code>myimage:latest<\/code>: The image to use to create the container, with its tag <code>latest<\/code>.<\/li>\n<\/ul>\n<p>This command will start a new container from the <code>myimage:latest<\/code> image, set its name to <code>mycontainer<\/code>, map port 5000 on the host machine to port 5000 inside the container, set an environment variable named <code>MY_ENV_VAR<\/code> with a value of <code>myvalue<\/code>, create a volume named <code>myvolume<\/code>, and mount it to <code>\/app\/data<\/code> inside the container.<\/p>\n<p>Note that you can also add commands to be executed inside the container when it starts by appending them after the image name, like so:<\/p>\n<pre><code class=\"language-bash\">$ docker run -d --name mycontainer myimage:latest python app.py arg1 arg2\n<\/code><\/pre>\n<p>In this example, <code>python app.py arg1 arg2<\/code> is the command that will be executed inside the container when it starts.<\/p>\n<\/div>\n\n\n\n<p><strong>Human:<\/strong> Ok now you better understand the basics of docker we can see the fun part : AI model running in docker. \ud83d\ude00<\/p>\n\n\n\n<p>You will need to trust me as things should take some time to complete, but the AI will explain everything right after.<\/p>\n\n\n\n<p>Do as follow:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Download the model weight : <a href=\"https:\/\/huggingface.co\/TheBloke\/blossom-v3-mistral-7B-GGUF\/resolve\/main\/blossom-v3-mistral-7b.Q4_K_M.gguf?download=true\" target=\"_blank\" rel=\"noreferrer noopener\">https:\/\/huggingface.co\/TheBloke\/blossom-v3-mistral-7B-GGUF\/resolve\/main\/blossom-v3-mistral-7b.Q4_K_M.gguf?download=true<\/a><\/li>\n\n\n\n<li>Put it where you want as long as the path did not contain space (it&#8217;s easier without space i promise)<\/li>\n\n\n\n<li>Launch your own version of the following command:<\/li>\n<\/ul>\n\n\n\n<div class=\"wp-block-jetpack-markdown\"><pre><code class=\"language-bash\">docker run -p 8181:8181 -v &lt;path to the directory where you put the model&gt;:\/models ghcr.io\/ggerganov\/llama.cpp:full --server -m \/models\/blossom-v3-mistral-7b.Q4_K_M.gguf --port 8181 --host 0.0.0.0\n<\/code><\/pre>\n<\/div>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Go get a cofee and came back to read AI&#8217;s explanations of this command<\/li>\n<\/ul>\n\n\n\n<div class=\"wp-block-jetpack-markdown\"><p><strong>AI:<\/strong> Let\u2019s break down the <code>docker run<\/code> command you provided and explain its options:<\/p>\n<\/div>\n\n\n\n<div class=\"wp-block-jetpack-markdown\"><ol>\n<li><code>docker run<\/code>: This is the base command to create a new container from an image and start it.<\/li>\n<li><code>-p 8181:8181<\/code>: This option maps port 8181 inside the container to port 8181 on the host machine. When you access <code>http:\/\/localhost:8181<\/code> on your host machine, it will be forwarded to <code>http:\/\/localhost:8181<\/code> inside the container.<\/li>\n<li><code>-v &lt;path to the directory where you put the model&gt;:\/models<\/code>: This option mounts a volume from the host machine to the container. In this case, it maps the directory <code>&lt;path to the directory where you put the model&gt;<\/code> on the host machine to the directory <code>\/models<\/code> inside the container.<\/li>\n<li><code>ghcr.io\/ggerganov\/llama.cpp:full<\/code>: This is the image name to use for creating the container. It specifies the location of the image on Docker Hub and includes the tag <code>full<\/code>.<\/li>\n<li><code>--server<\/code>: This option sets the server mode for the LLM (large language model) that will be running inside the container.<\/li>\n<li><code>-m \/models\/blossom-v3-mistral-7b.Q4_K_M.gguf<\/code>: This option specifies the path to the pre-trained LLM model file inside the container. In this case, it points to the file located at <code>\/models\/blossom-v3-mistral-7b.Q4_K_M.gguf<\/code>.<\/li>\n<li><code>--port 8181<\/code>: This option sets the port that the LLM server inside the container will listen on for incoming requests. In this case, it\u2019s set to port 8181.<\/li>\n<\/ol>\n<\/div>\n\n\n\n<div class=\"wp-block-jetpack-markdown\"><p>Overall, this command creates a new container from the image <code>ghcr.io\/ggerganov\/llama.cpp:full<\/code>, maps port 8181 inside the container to port 8181 on the host machine, mounts a volume from the host machine to the container, sets the server mode for the LLM, specifies the path to the pre-trained LLM model file inside the container, and sets the port that the LLM server will listen on. Once the container is started, you can access the LLM server via <code>http:\/\/localhost:8181<\/code> on your host machine and interact with it by sending HTTP requests to that address.<\/p>\n<\/div>\n\n\n\n<p><strong>Human:<\/strong> After some time, depending on your computer performances, you should be able to see something like this :<\/p>\n\n\n\n<div class=\"wp-block-jetpack-markdown\"><pre><code class=\"language-bash\">Available slots:\n -&gt; Slot 0 - max context: 512\n{&quot;timestamp&quot;:1703897816,&quot;level&quot;:&quot;INFO&quot;,&quot;function&quot;:&quot;main&quot;,&quot;line&quot;:3141,&quot;message&quot;:&quot;HTTP server listening&quot;,&quot;port&quot;:&quot;8181&quot;,&quot;hostname&quot;:&quot;0.0.0.0&quot;}\n\nllama server listening at http:\/\/0.0.0.0:8181\n\nall slots are idle and system prompt is empty, clear the KV cache\n<\/code><\/pre>\n<\/div>\n\n\n\n<p>As stated the model is now running (congratulation) and should be availlable here : <a href=\"http:\/\/127.0.0.1:8181\" target=\"_blank\" rel=\"noreferrer noopener\">http:\/\/127.0.0.1:8181<\/a><\/p>\n\n\n\n<p>When visiting this page you should see this :<\/p>\n\n\n\n<figure class=\"wp-block-image size-full is-style-default\"><img loading=\"lazy\" decoding=\"async\" width=\"632\" height=\"968\" src=\"https:\/\/dataops-blog.com\/wp-content\/uploads\/2023\/12\/llamacpp.png\" alt=\"\" class=\"wp-image-84\" srcset=\"https:\/\/dataops-blog.com\/wp-content\/uploads\/2023\/12\/llamacpp.png 632w, https:\/\/dataops-blog.com\/wp-content\/uploads\/2023\/12\/llamacpp-196x300.png 196w\" sizes=\"auto, (max-width: 632px) 100vw, 632px\" \/><\/figure>\n\n\n\n<p>And if you say hello, it should answer. \ud83d\ude42<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"654\" height=\"282\" src=\"https:\/\/dataops-blog.com\/wp-content\/uploads\/2023\/12\/llamacpp2.png\" alt=\"\" class=\"wp-image-85\" srcset=\"https:\/\/dataops-blog.com\/wp-content\/uploads\/2023\/12\/llamacpp2.png 654w, https:\/\/dataops-blog.com\/wp-content\/uploads\/2023\/12\/llamacpp2-300x129.png 300w\" sizes=\"auto, (max-width: 654px) 100vw, 654px\" \/><\/figure>\n\n\n\n<p>Again, congratulation, you are now running a real AI model on your computer.<\/p>\n\n\n\n<p>If you are a more advanced user and want to use some python to play with Llama.cpp API, you could follow the next steps while i prepare the next big topic : to ship the model in a docker.<\/p>\n\n\n\n<div class=\"wp-block-jetpack-markdown\"><p><strong>AI:<\/strong> Here\u2019s an example Python code using the <code>requests<\/code> library to interact with the LLM (large language model) running inside the Docker container:<\/p>\n<pre><code class=\"language-python\">import requests\n\n# Set the URL for the LLM server running inside the Docker container\nurl = &quot;http:\/\/localhost:8181\/completion&quot;\n\n# Define a prompt to send to the LLM server\nprompt = &quot;What is the capital of France?&quot;\n\n# Send a POST request to the LLM server with the prompt as JSON data\nresponse = requests.post(url, json={&quot;prompt&quot;: prompt})\n\n# Extract the generated text from the response\ngenerated_text = response.json()[&quot;content&quot;]\n\n# Print the generated text\nprint(&quot;Generated text:&quot;, generated_text)\n<\/code><\/pre>\n<p>In this example code, we first set the URL for the LLM server running inside the Docker container to <code>http:\/\/localhost:8181<\/code>. We then define a prompt to send to the LLM server as a string. In this case, we ask \u201cWhat is the capital of France?\u201d<\/p>\n<p>Next, we use the <code>requests<\/code> library to send a POST request to the LLM server with the prompt as JSON data in the body of the request. The LLM server will generate text based on the prompt and return it as a response.<\/p>\n<p>Finally, we extract the generated text from the response using the <code>json()<\/code> method and print it to the console. In this example, we expect the generated text to be \u201cThe capital of France is Paris.\u201d<\/p>\n<p>Note that you can modify the prompt string to ask any question or make any statement you want to generate text for. Additionally, you can adjust the <code>requests.post()<\/code> function call to include additional parameters or headers as needed for your specific use case.<\/p>\n<\/div>\n\n\n\n<p><strong>Human:<\/strong> In addition you could check the API documentation <a href=\"https:\/\/github.com\/ggerganov\/llama.cpp\/blob\/master\/examples\/server\/README.md\" target=\"_blank\" rel=\"noreferrer noopener\">here<\/a>.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<p><strong>Human:<\/strong> Now let&#8217;s do something really fun: <strong>shipping the model in a container<\/strong>.<\/p>\n\n\n\n<p>In order to understand what will follow, you need to understand what is a Dockerfile and some key concepts around docker image building, so, again let the AI explain it, i&#8217;m still finishing some shinanigan. \ud83d\ude42<\/p>\n\n\n\n<div class=\"wp-block-jetpack-markdown\"><p><strong>AI:<\/strong> A Dockerfile is a text file that contains instructions for building a Docker image. It specifies the base image to use, any necessary dependencies or libraries to install, and how to run the application inside the container. The Dockerfile serves as a blueprint for creating a customized Docker image tailored to your specific application needs.<\/p>\n<p>Here are some examples of common instructions found in a Dockerfile:<\/p>\n<ul>\n<li><code>FROM<\/code>: This instruction specifies the base image to use for the new image being built. For example, <code>FROM python:3.9-slim-buster<\/code> would use the official Python 3.9 slim version as the base image.<\/li>\n<li><code>WORKDIR<\/code>: This instruction sets the working directory inside the container for subsequent instructions. For example, <code>WORKDIR \/app<\/code> would set the working directory to <code>\/app<\/code> inside the container.<\/li>\n<li><code>COPY<\/code>: This instruction copies files or directories from the host machine to the container. For example, <code>COPY requirements.txt .<\/code> would copy the <code>requirements.txt<\/code> file from the host machine to the current directory (<code>.<\/code>) inside the container.<\/li>\n<li><code>RUN<\/code>: This instruction runs a command inside the container. For example, <code>RUN pip install -r requirements.txt<\/code> would run the <code>pip install -r requirements.txt<\/code> command inside the container to install any necessary Python packages.<\/li>\n<li><code>CMD<\/code>: This instruction specifies the default command to run when a container is launched from the image. For example, <code>CMD [&quot;python&quot;, &quot;app.py&quot;]<\/code> would run the <code>python app.py<\/code> command by default when launching a container from the image.<\/li>\n<\/ul>\n<p>To build a Docker image from a Dockerfile, you can use the following command:<\/p>\n<pre><code class=\"language-bash\">$ docker build -t myimage:latest .\n<\/code><\/pre>\n<p>In this example command, <code>docker build<\/code> is used to build an image using the Dockerfile in the current directory (<code>.<\/code>). The <code>-t<\/code> flag sets the tag for the new image to <code>myimage:latest<\/code>, indicating that it is the latest version of the <code>myimage<\/code> image.<\/p>\n<p>During the image building process, Docker will execute each instruction in the Dockerfile sequentially to create a new layer on top of the previous one. Each layer contains only the changes made by the corresponding instruction, making the final image smaller and faster to download or upload. Once all instructions have been executed, the resulting image can be used to launch containers with the desired application and dependencies pre-installed.<\/p>\n<p>Overall, Dockerfiles provide a convenient way to automate the process of building customized Docker images for your applications. By specifying the exact steps required to build an image in a text file, you can easily share and reproduce consistent builds across different machines and environments.<\/p>\n<\/div>\n\n\n\n<p><strong>Human:<\/strong> Great now you have basic knowledge of what is a dockerfile and how you build an image, let do it.<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>First you need create a directory and copy the desired model in it.<\/li>\n<\/ul>\n\n\n\n<div class=\"wp-block-jetpack-markdown\"><p>(<code>MKDIR<\/code> to create the directory from the windows command line, <code>cd \/D E:\\directory<\/code> if you want to change disk)<\/p>\n<\/div>\n\n\n\n<div class=\"wp-block-jetpack-markdown\"><p>Now create a file named <code>model.Dockerfile<\/code> and edit it.<\/p>\n<\/div>\n\n\n\n<ul class=\"wp-block-list\">\n<li> In this Dockerfile you will add the folowing:<\/li>\n<\/ul>\n\n\n\n<div class=\"wp-block-jetpack-markdown\"><pre><code class=\"language-bash\">FROM ghcr.io\/ggerganov\/llama.cpp:full\nRUN mkdir -p \/models\nCOPY blossom-v3-mistral-7b.Q4_K_M.gguf \/models\/.\nEXPOSE 8181\/tcp\nENTRYPOINT [&quot;\/app\/.devops\/tools.sh&quot;, &quot;--server&quot;, &quot;-m&quot;, &quot;\/models\/blossom-v3-mistral-7b.Q4_K_M.gguf&quot;, &quot;--port&quot;, &quot;8181&quot;, &quot;--host&quot;, &quot;0.0.0.0&quot;]\n<\/code><\/pre>\n<\/div>\n\n\n\n<p>Don&#8217;t worry, we will explain it in a few minute.<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Then, in the exact same directory, run the following command:<\/li>\n<\/ul>\n\n\n\n<div class=\"wp-block-jetpack-markdown\"><p><code>docker build -t blossom:v1 -f blossom.Dockerfile .<\/code><\/p>\n<\/div>\n\n\n\n<p>It may take some time so we will explain a bit what we are doing.<\/p>\n\n\n\n<p>First of all, this dockerfile is the configuration file for the command `docker build`. Docker will use the informations in this file to build your image.<\/p>\n\n\n\n<p>Here are each lines explained:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>The <code>FROM<\/code> instruction tell docker on wich image you want to build as docker image have &#8220;layers&#8221; (one base image and several &#8220;layer image&#8221;. For intance the llama.cpp image we are using is based on Ubuntu 22.04 image, this is the true base image. So we are building on top of that.<\/li>\n\n\n\n<li>The <code>RUN<\/code> instruction tell docker to create a directory inside the image.<\/li>\n\n\n\n<li>The <code>COPY<\/code> instruction tell docker to copy the model inside the directory &#8220;models&#8221; inside the image<\/li>\n\n\n\n<li>The <code>EXPOSE <\/code>instruction didn&#8217;t publish the port, it only tell docker wich port is used by the image. Image never publish ports as it&#8217;s a decision that should always be taken by the administrator<\/li>\n\n\n\n<li>The <code>ENTRYPOINT<\/code> instruction give docker the command to run when launching the container using this image. here we are using the script develloped by ggerganov for launching llama.cpp with the rights parameters.<\/li>\n<\/ul>\n\n\n\n<p>Now let&#8217;s see the <code>docker build<\/code>:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><code>-t <\/code>: it&#8217;s the parameter to define the name and the tag of your image  (it&#8217;s easier to launch). Here i chose to name it as &#8220;blossom&#8221; and taged it as the v1. It&#8217;s arbitrary so you could named it whatever you want as long as it&#8217;s in lowcaps.<\/li>\n\n\n\n<li><code>-f<\/code>: you specify the Dockerfile you want to use in case you have multiple versions<\/li>\n\n\n\n<li>The dot is here only to specify the working directory.<\/li>\n<\/ul>\n\n\n\n<p>And finaly, when everything is ready and your image is built, you could run it:<\/p>\n\n\n\n<div class=\"wp-block-jetpack-markdown\"><pre><code class=\"language-bash\">docker run -p 8181:8181 blossom:v1\n<\/code><\/pre>\n<\/div>\n\n\n\n<p>As you note we publish the port when running the container.<\/p>\n\n\n\n<p>You could also launching the container from the Docker Desktop interface by going to the images tab on the left :<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"499\" src=\"https:\/\/dataops-blog.com\/wp-content\/uploads\/2023\/12\/docker-blossom-1024x499.png\" alt=\"\" class=\"wp-image-86\" srcset=\"https:\/\/dataops-blog.com\/wp-content\/uploads\/2023\/12\/docker-blossom-1024x499.png 1024w, https:\/\/dataops-blog.com\/wp-content\/uploads\/2023\/12\/docker-blossom-300x146.png 300w, https:\/\/dataops-blog.com\/wp-content\/uploads\/2023\/12\/docker-blossom-768x374.png 768w, https:\/\/dataops-blog.com\/wp-content\/uploads\/2023\/12\/docker-blossom.png 1364w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p>Now you know how to run an AI model with docker on Windows and how to ship it in your own image. Congratulation. \ud83d\ude00<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Welcome to our latest article, where we will guide you through the process of running an AI model on a Windows system using Docker. As the use of AI and machine learning models becomes increasingly prevalent in various industries, it is essential to understand how to deploy these models efficiently and effectively.<\/p>\n","protected":false},"author":1,"featured_media":92,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"site-container-style":"default","site-container-layout":"default","site-sidebar-layout":"default","disable-article-header":"default","disable-site-header":"default","disable-site-footer":"default","disable-content-area-spacing":"default","_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[20],"tags":[17,24,25,21,22,27,23,26],"class_list":["post-81","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-tutorial","tag-ai","tag-container-building","tag-container-shipping","tag-docker","tag-dockerfile","tag-llama-cpp","tag-llm","tag-windows"],"jetpack_sharing_enabled":true,"jetpack_featured_media_url":"https:\/\/dataops-blog.com\/wp-content\/uploads\/2023\/12\/Docker.png","_links":{"self":[{"href":"https:\/\/dataops-blog.com\/index.php\/wp-json\/wp\/v2\/posts\/81","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/dataops-blog.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/dataops-blog.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/dataops-blog.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/dataops-blog.com\/index.php\/wp-json\/wp\/v2\/comments?post=81"}],"version-history":[{"count":0,"href":"https:\/\/dataops-blog.com\/index.php\/wp-json\/wp\/v2\/posts\/81\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/dataops-blog.com\/index.php\/wp-json\/wp\/v2\/media\/92"}],"wp:attachment":[{"href":"https:\/\/dataops-blog.com\/index.php\/wp-json\/wp\/v2\/media?parent=81"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/dataops-blog.com\/index.php\/wp-json\/wp\/v2\/categories?post=81"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/dataops-blog.com\/index.php\/wp-json\/wp\/v2\/tags?post=81"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}