Serge LLaMA AI Chat - Windows Podman Container Setup


Serge is a chat interface crafted with llama.cpp for running Alpaca models. Because it is fully dockerized it was the perfect choice for me, also the software seems trusted to use.
I made the choice to run Serge with WSL2 and Podman on Windows. Podman with WSL2 is very easy to install and use.

I was inspired by this article that describes how to use WSL2 and Docker with Serge. But I quickly decided that I wanted to make it easier by using Podman, for example:

  • The Podman installation automatically installs WSL2
  • Podman will use a minimal Fedora installation (Ubuntu not needed)
  • No git installation needed or git clone
  • Docker Desktop not needed (however there is Podman desktop - not tested!)

Install Podman and WSL2

  • Podman is the open-source more secure and lightweight alternative for Docker. I have largely followed this tutorial.
  • WSL2 is the Windows Subsystem for Linux and lets users run a GNU/Linux environment directly on Windows. It sounds very difficult but it’s easy to install!

Follow the steps below:

  1. Go to the Podman releases page, scroll down to the Assets of the latest version and download the installation file podman-<version>-setup.exe.
  2. Start the installation and make sure Install WSL if not present is checked.
    • Click Install
    • Let the computer restart after the installation
    • After the restart the installation will be finished

Now configure WSL2 to have enough memory available for Serge. I have adjusted this within the global config.

  1. In the Windows Explorer go to %UserProfile% and create the file .wslconfig and add the following to the file:
# Settings apply across all Linux distros running on WSL 2
[wsl2]

# Limits VM memory to use no more than 4 GB, this can be set as whole numbers using GB or MB
memory=20GB

# Optional: Sets the VM to use two virtual processors
#processors=2

# Optional: Sets amount of swap storage space to 8GB, default is 25% of available RAM
#swap=8GB

Adjust the following:

memory=20GB
Replace 20GB with the maximum amount of memory you want Serge to use. Here you can find more information about memory usage.

  1. Start Windows PowerShell (you can find this app by searching in the start menu)
  2. Apply the new config by restarting WSL2: wsl --shutdown
  3. Run the following commands to initialize and start the virtual machine with Podman:
    • podman machine init
    • podman machine start

Install Serge

  1. Choose the location for Serge (in my case c:\podman\serge) and create the folders for the weights and data:
    • mkdir c:\podman\serge\weights
    • mkdir c:\podman\serge\datadb
  2. Create the batch file which will create the Serge container:
    • notepad c:\podman\serge\podman-run-serge.bat (choose yes to create the file)

Paste the following to Notepad:

podman run -d --name=serge --hostname=serge -p 8008:8008 -v c:\podman\serge\weights:/usr/src/app/weights -v c:\podman\serge\datadb:/data/db/ --restart unless-stopped ghcr.io/serge-chat/serge:main

Adjust the location of Serge (c:\podman\serge) if needed. Close Notepad and save the changes.

  1. Run the batch file to create the Serge container:
    • c:\podman\serge\podman-run-serge.bat
    • You can check if the container is running with the podman ps -a command (optional)

With this batch file you can easily update Serge by removing the container and running the batch file again (I won’t explain this further here)

Using Serge

VERY IMPORTANT: Everytime you restart Windows and want to use Serge you have to start manually the virtual machine and the container with podman machine start and podman start serge. Or have this done automatically but I won’t describe that further here

Serge can be reached in the browser via the URL:

http://127.0.0.1:8008/

You can find the downloaded models in the weights folder, for example c:\podman\serge\weights. There you can also place manually downloaded models.

If you want to check the memory usage of Serge and the models you can run the following commands in PowerShell: wsl and then free -mh

Due to hardware limits, I personally use 7B or 13B models that require less than 20GB of memory. The chatting is still not fast (minutes waiting), but it is fun to experiment with!


Read other notes

Comments

    No comments found for this note.

    Join the discussion for this note on this ticket. Comments appear on this page instantly.

    Tags


    Notes mentioning this note


    Notes Graph