Monday, 24 October 2022

Updating Binary File on Github using Contents API

After successfully updating a plain text file using the GitHub Repository Contents API, I tried to do the same thing with an Excel file. I understand that git isn't really designed to store binaries; however, this is what my client needs.

Here are the relevant lines of Python code:

# Get the XLSX file from the repo to get its SHA
g = GitHub(my_admin_token)
repo = g.get_repo("theowner/therepo")
contents = repo.get_contents("myfile.xlsx", ref="main")
sha = contents.sha

# So far, so good. We have the SHA.

# Read the bytes we want to use to replace the contents of the file
data = open('my_new_file.xlsx', 'rb').read()
base64_encoded_data = base64.b64encode(data)

# Update the XLSX file in the repo with the new bytes
result = repo.update_file(contents.path, "auto-committed", base64_encoded_data,
            sha, branch="main") 

print("Result of update_file:")
print(result)

# Result: {'commit': Commit(sha="88f46eb99ce6c1d7d7d287fb8913a7f92f6faeb2"), 'content': ContentFile(path="myfile.xlsx")}

Now, you'd think everything went well; however, when I go to GitHub and look at the file, it's a mass of Base64 encoded data. It somehow "loses the fact that it's an Excel file" in the translation. When I click on the file in the GitHub user interface, and I have the option to Download the file, I get the "big blob" of Base64 text vs. having the XLSX file download.

There doesn't seem to be a way to tell the API what encoding I want to use, e.g., there doesn't seem to be a way to set HTTP headers on the call.

I also tried using the Python requests library to PUT (per doc) to the GitHub API:

result = requests.put('https://api.github.com/repos/myname/myrepo/contents/myfile.xlsx', {
   "headers": {
      "Accept": "application/vnd.github.VERSION.raw", 
      "Authorization": "token my_admin_token"
    },
    "committer": {'name':'My Name', 'email':'me@mymail.com'},
    "message": "Did it work?",
    "branch": "main",
    "content": base64_encoded_data})

and I get an HTTP 404.

I tried playing with the Accept header types as well. No dice.

Various other issues trying this with curl.

If you have a working sample of updating/replacing an XLSX file on GitHub using curl, python, etc. I'd love to see it! Thanks.



from Updating Binary File on Github using Contents API

No comments:

Post a Comment