Monday 30 October 2023

How to query arxiv daily based on keywords and write the results in a Google doc?

I want to find papers that are published in the computer science section of arxiv every day based on a list of keywords and write their titles and arxiv link to my Google doc (i.e., append to the end of what's already written):

For example, the Google doc can look as follows:

  1. Test-time Augmentation for Factual Probing
  2. Controlled Decoding from Language Models

And so on...

My list of search keywords:

arxiv_keywords = ['machine learning', 'llm', 'potato']

The titles should not be case sensitive and should contain the keywords. For example, the following made up titles should be returned Machine learninG is a mystery, LLM-based models are weird, potatoes are tasty when turned into fries

My Google doc is located in my_drive/Research/my_google_doc_name

I found this SO question that asks about querying arxiv for a specific year based on one keyword, but there are several different things in my request and theirs which complicates things to me:

  1. I only need to query the computer science section. Based on this SO question there seems to be a difference in returned results when querying from the general arxiv website and a more advance search.
  2. I need to automatically query it once a day, so I'm not sure how to automatically update the dates.
  3. I'm not sure how to modify their script to handle multiple keywords
  4. I'm not sure how to append the results to a Google doc, which requires a separate query from my understanding, from here and here and probably to enter my password somehow.

I found that I can automatically run a python using cron following this link.

Overall there seems to be a lot going on which confuses me and I'm not entirely sure how to handle all the parts.



from How to query arxiv daily based on keywords and write the results in a Google doc?

No comments:

Post a Comment