Version 2.1 (updated Sep 11, 2024)

Helm charts are a popular method for deploying and managing applications on Kubernetes. They provide a standard, reproducible method of installing complex software systems. However, third-party Helm charts often need to be customized to meet specific operational requirements. This guide explains how to customize Helm charts directly at the source level, allowing for flexibility and ensuring ease of updates.

State of the Art

Hannah’s blog post explores the advantages of integrating Helm with GitOps, particularly using Argo CD. For a detailed exploration of Argo CD’s capabilities with Helm, refer to the official Argo CD documentation.

Trevor outlines three deployment patterns for Helm charts with Argo CD:

An Argo application that points to a chart in a Helm repository.
An Argo application that points to a chart in a Git repository containing binary Helm packages.
An Argo application that points to a Kustomization file, which then renders a Helm chart.

The first two patterns restrict customizations to those explicitly permitted by the chart’s author. Conversely, the third allows for extensive modifications to the chart at the source level. However, it also introduces complexities, particularly in local troubleshooting, and necessitates patching the Helm chart only after it has been rendered, not before.

A new proposed approach aims to merge the benefits of these patterns, offering both customization flexibility and simplicity in deployment.

Helm with git-upstream

This pattern enables users to customize the source code of the chart and record these modifications as git commits. When a new version of the Helm chart is released, users can update the chart and re-apply their patches using the git-upstream tool.

The git-upstream script stores these modifications in a patch queue, which acts like a pair of glasses through which users view the upstream Helm chart—allowing the chart to be updated while preserving the patches.

The git-upstream tool addresses a common issue with standard git: the difficulty in tracking an upstream project. Traditional git allows users to store custom patches on a branch; however, when new updates are made to the upstream project, users must either merge these changes into their custom branch or rebase their branch onto the latest upstream changes—both approaches come with drawbacks. Merging can obscure the visibility of custom changes as they become lost in the branch’s history. Rebasing can cause errors for any user who has previously cloned and checked out the custom branch when they run a git pull. The git-upstream tool introduces a third option, the “rebase & weld” operation, which combines the benefits of both merging and rebasing without their respective disadvantages. For an in-depth explanation of the rebase & weld method, refer to my previous blog post on the underlying concepts of git-upstream.

Let’s proceed with installing git-upstream.

Installation

To access the latest version of git-upstream, I recommend installing it directly from the source. As of this writing, git-upstream requires Python 3.9; newer versions are not supported. Install Python 3.9 on your workstation using the method suited to your operating system. If you’re unsure how to install an older version of Python, you can search online for “install old version of Python on [your system].” After setting up Python, execute the following commands:

git clone https://opendev.org/x/git-upstream.git
cd git-upstream
python3.9 -m pip install -r requirements.txt
python3.9 -m pip install .

Repo Creation

GitHub provides convenient buttons for forking a repository and syncing with an upstream source. However, we’ll opt for a different method because GitHub’s standard merge or rebase does not employ the “rebase & weld” strategy. Additionally, GitHub does not permit making forks private or internal, even for GitHub Enterprise users. Therefore, we will use the command line to fork the repository.

While I typically recommend using the same name for the repository as the original to maintain a clear lineage, sometimes this is not descriptive enough for your project’s specific needs. In this instance, we will use a more informative name to clarify the repository’s purpose. Here are the names:

Original: https://github.com/kubernetes/dashboard
Fork: https://github.com/akorzynski/k8s-dashboard

Create the repository using the standard GUI:

For personal GitHub accounts, navigate to: https://github.com/new
For GitHub Enterprise accounts, use the link below and replace MY_ORG with your organization’s identifier:
https://github.com/organizations/MY_ORG/repositories/new

Do not create a README file or a license for this repository. All files will be cloned from the original open-source repository. We need the repository to be completely empty initially, as indicated in the screenshot provided:

Defining the Environment

Set up the following environment variables to suit your specific situation. Once defined, these variables allow you to easily copy and paste commands from the subsequent sections without needing to modify them:

# source organization
UPSTREAM_ORG=kubernetes 

# destination organization
MY_ORG=akorzynski

# source repository
UPSTREAM_REPO=dashboard

# destination repository
MY_REPO=k8s-dashboard

# path in the repo
CHART_PATH=charts/kubernetes-dashboard

Copying Repository Contents

Run the following commands:

# Clone the repository
git clone "https://github.com/$MY_ORG/$MY_REPO"
cd "$MY_REPO"

# Add the upstream repository as a new remote, 
# so that you can fetch the data from it
git remote add upstream "https://github.com/$UPSTREAM_ORG/$UPSTREAM_REPO"

# Fetch the data
git fetch --all

# Retrieve the head branch name
HEAD_BRANCH="$(git remote show upstream | \
               grep '^ *HEAD branch:' | \
               grep --only-matching '[^ ]*$' )"
              
# Push the head branch to your internal repository
git push origin --set-upstream \
  "upstream/$HEAD_BRANCH:refs/heads/$HEAD_BRANCH"

# Push the data to your internal repository, 
# prefixing branches with 'upstream/'
# (command based on git-upstream's documentation)
git for-each-ref refs/remotes/upstream --format "%(refname:short)" | \
  sed -e 's@\(upstream/\(.*\)\)$@\1:refs/heads/upstream/\2@' | \
  xargs git push --tags origin

# Checkout the head branch
git checkout --track "origin/$HEAD_BRANCH"

Downloading Helm Dependencies

When managing Helm dependencies, all updates will be recorded in Git as source code, with each update operation (which may include one or more dependencies) captured in a distinct commit. This structured approach ensures that any modifications to the dependencies are easily traceable and reversible.

Committing Dependency Updates

Initially, all Helm dependencies will be downloaded and committed to the repository in a single commit. This initial commit will include a unique Change-Id at the bottom of the commit message, providing a clear reference point. Subsequent updates to the dependencies will also be stored in their respective commits but linked to the same Change-Id, enabling continuous tracking across various updates.

What is a Change-Id?

A Change-Id is a unique identifier used to track individual commits, making it particularly useful for reviewing and managing changes in source code. It is a SHA-1 hash prefixed with an I to distinguish it from other Git SHAs. By convention, a Change-Id is stored at the end of the commit message. For example, the last line of the commit message might look like this:

Change-Id: I01078eafe753956fbcddc3895c61f18c922f804a

This identifier was first introduced by the Gerrit code review tool to associate changes with specific reviews. Although git-upstream utilizes Change-Ids, using Gerrit is not necessary for this process, as the git-upstream tool itself implements support for these identifiers.

Generating a Unique Change-Id

You can generate a unique Change-Id using an online SHA-1 generator or by executing the following command:

CHANGE_ID="I$(xxd -p -l 20 /dev/random)"

This command generates a random SHA-1 hash, ensuring that each Change-Id used in your commits is unique and effectively tracks the lineage of changes.

Commands to Update Dependencies

To initially download and commit the Helm dependencies, and to update them when necessary, execute the following commands:

# Go to the directory containing the chart
cd "$CHART_PATH"

# Remove all dependencies
git rm -r --force --ignore-unmatch charts/*

# Download the dependencies as *.tgz archives
helm dependency update

# Uncompress the archives
find charts -name '*.tgz' -exec tar -C charts -zxvf '{}' ';'

# Remove the archives
rm -f charts/*.tgz

# Add the updated dependencies to git index.
# We use --force, because the original maintainer
# may have included these files in .gitignore.
git add --force charts Chart.lock

# Commit the dependencies, adding a Change-Id string to the commit message.
git commit -m "helm dependency update" -m "Change-Id: $CHANGE_ID"

Committing Your Changes

Now you can develop your customizations to the Helm chart using your favorite editor. Once your changes are complete, commit them to your repository:

git commit -a -m "My change"

Deploying

At this point, we encounter a challenge within the existing GitOps ecosystem: Helm tools typically expect to deploy charts from binary tarballs, not from the source code of the chart in text form. Although there is a plugin called helm-git, it requires users to store binary tarballs in Git, as it does not read Helm source code. Similarly, the native Helm support in Argo CD can only read binary tarballs, stored either in a Helm repository or in Git. Therefore, we need a solution that supports deploying a chart directly from its source code stored in Git.

The ideal scenario would involve Helm deploying a chart using only the following information:

The URL of the Git repository
The Git ref (tag, commit SHA, or branch)
The path to the source code of the chart within the repository

In the future, it would be beneficial to extend both Helm and Argo CD to support specifying the location of the chart using the information above. Until then, we have two workarounds:

Pre-render the chart using helm template, save the rendered Kubernetes manifests in another Git repository, and point Argo CD to those rendered manifests.
Publish the Helm chart in binary form to a Helm repository and point Argo CD to that repository.

I have implemented the first workaround for a client, but it requires significant custom scripting, which is outside the scope of this article. The second option is easier to implement, and there are plenty of online resources explaining how to publish a Helm chart. The instructions will vary depending on the type of Helm repository available to you, so I won’t cover those details here.

Importing Upstream

What happens when the upstream Helm chart changes and you need to update it? This is where git-upstream becomes essential.

First, identify the branch you want to import (it might also be a tag, but we will focus on branches here). Usually, this branch is upstream/$HEAD_BRANCH. Then, run the following commands:

# Remove the upstream remote to avoid ambiguity
# with the commands below
git remote remove upstream

# Define a variable with the upstream branch to import
UPSTREAM_BRANCH="upstream/$HEAD_BRANCH"

# Checkout the upstream branch
git checkout "$UPSTREAM_BRANCH"

Re-run the commands listed in the Downloading Helm Dependencies section. After completing those steps, execute the following commands:

# Checkout your head branch
git checkout "$HEAD_BRANCH"

# Make sure you are in the root directory of the repository
pwd

# Create an import branch without the last "welding merge"
git upstream import --no-merge --force --import-branch "import/$UPSTREAM_BRANCH" "$UPSTREAM_BRANCH"

# Switch to the import branch
git checkout "import/$UPSTREAM_BRANCH"

# Optional: you can use the interactive rebase
# to alter the patch queue as required
git rebase --interactive "import/$UPSTREAM_BRANCH-base"

# Return to head branch
git checkout "$HEAD_BRANCH"

# Create the final "welding merge"
git upstream import --finish --import-branch "import/$UPSTREAM_BRANCH" "$UPSTREAM_BRANCH"

Conclusion

We are done! You have now completed the steps to tailor a third-party Helm chart. I envision a future where Helm charts are routinely distributed in source form using this method, rather than in binary form. Let me know your thoughts in the comments section.