Skip to main content

A Python function to ignore a path with .git/info/exclude

  • Posted

One of the things I work on is a Catalogue API for searching Wellcome Collection’s collections. The API returns “works”, and a work can include metadata and images – for example, the individual pages of a digitised book. Sometimes I need all the images from a work, so I wrote a Python script that uses the API to find all the images, then download them to a local directory.

Like all our code, this script lives in a Git repo, and it saves images to the working directory – usually the folder with the script in. This means the images it saves are visible to Git, and the next time I run git status, it’ll offer to track the images alongside our code. That’s not what I want – these images are temporary downloads, not part of our codebase.

I could remember not to check in these images, and delete them when I’m done – but they clutter up my Git tooling, and I could make a mistake.

I could add the images to .gitignore, but that gets tracked as part of the codebase. Nobody else needs to be ignoring these paths, because they haven’t downloaded these images.

I could save the images in a different directory, but that’s slightly less convenient.

I realised this was a good use case for .git/info/exclude – a place for gitignore rules that shouldn’t tracked as part of the repo history. Each clone can have different rules in this file. If I ignore the downloaded images in this file, they won’t show up in git status, but nor will the ignore rules be shared with anyone else.

I could add images to .git/info/exclude by hand, but I’m already running a script, so I extended the script to ignore these files automatically. I wrote a Python function that appends a path to .git/info/exclude:

import os
import subprocess


def ignore_path_locally(path):
    """
    Tell Git to ignore a path, but without adding it to the .gitignore.

    This function instead adds paths to .git/info/exclude.

    See https://alexwlchan.net/2015/git-info-exclude/
    """
    # Get the absolute path to the root of the repo.
    # See https://git-scm.com/docs/git-rev-parse#Documentation/git-rev-parse.txt---show-toplevel
    repo_root = subprocess.check_output([
        "git", "rev-parse", "--show-toplevel"]).strip().decode("utf8")

    # Get the path of the file/directory to ignore, relative to the root
    # of the repo.
    path_to_ignore = os.path.relpath(path, start=repo_root)

    # Gets the path to info/exclude inside the .git directory.
    # See https://git-scm.com/docs/git-rev-parse#Documentation/git-rev-parse.txt---git-pathltpathgt
    git_info_exclude_path = subprocess.check_output(
        ["git", "rev-parse", "--git-path", "info/exclude"]).strip().decode("utf8")

    with open(git_info_exclude_path, "a") as exclude_file:
        exclude_file.write(path_to_ignore + "\n")

Inside the script, I call this function to add downloaded paths to .git/info/exclude.

Hopefully the comments mean you can see how this function works. There are a couple of Git features it’s using that are worth highlighting: