Sunday, March 6, 2022

Pubsub Crawl

Today was another foray into Google Cloud Functions. This time, my goal was to begin to string things together using Google Pub/Sub. This type of approach forms the backbone of modern "serverless" workflows. 

My final use case requires hitting an API (with the appropriate credentials), retrieving a list of file URLs, and downloading those files to a cloud storage location. Because my brain is not as agile as it was when I was younger and I can't hold too many things in mind at once, I like to break things down into smaller chunks to help with troubleshooting. (This is probably a good practice regardless of age.) Previously, I had been interacting with things via gcloud commands in the CLI, but I broke down and decided to use the console for this bit.

My first hurdle was hitting an API with credentials. When experimenting with cloud functions before, I was always using the –allow-unauthenticated flag, which is really not good practice. I spent a bit of time trying to figure out the most secure way to do things, but ended up going down lots of rabbit holes in the Google's IAM documentation. At the end of the day, it seemed like creating a service account and requiring authentication seemed like the safest bet.
 
By default, Google Cloud Functions use the App Engine default service account, which has an "Editor" role in the project. This is considered a "primitive" role (along with "Owner" and "Viewer") and is not as secure as using more granular permissions. Because I knew that I would need to invoke the function, send pub/sub messages, and access an API key stored in the Google Secrets Manager, I created a new service account with just those permissions. This account will be linked to the Cloud Function.
 
Previously, I had been using the fake data API at http//:jsonplaceholder.typicode.com for practice. That's a great resource, but doesn't require authentication. For that, I turned to https://www.alphavantage.co/, a site that allows you to create an access token for their stock tracker. One can receive stock information by hitting an http endpoint with a stock symbol and the access token. 
 
I created key/value pair in the Google Secret Manager to hold the token, and then spent a while trying to get everything to work, following some of the sample scripts in the Google documentation (https://cloud.google.com/secret-manager/docs/creating-and-accessing-secrets). My first bare-bones function literally just tried to return the secret value. Once I could grab the apikey value from a function, I rewrote the function to iterate through a list of stock symbols and create URL strings that incorporate the stock symbol and apikey. 

The next step was to create a pubsub topic and subscriptions, which I did through the console interface. The idea is that the URLs I had just created could be sent as pubsub messages to other cloud functions that would do the actual downloading. Although creating the topic and subscriptions was relatively straight-forward, it took 15 versions of the cloud function script to finally get things working. At least half of the revisions were due to typos and/or forgetting to import a package, with the rest being logical errors or variable issues. 
 
So, now I had a cloud function that could retrieve a stored secret, create appropriately formatted URLs, and send them to a pubsub topic as individual messages. The next step is to create a cloud function that is triggered by those messages and downloads the files. 
 
But that's a task for another day.
 

No comments: