Skip to content

¡hola! 👋🏼

how to solve permission error from airflow official docker image

what i learned

tl;dr: when you use the Airflow official docker image you need to make sure that the variable AIRFLOW_UID is set to match your UID (and AIRFLOW_GID=0 aka root ) or you’re going to get permission errors. i was working on deploying Airflow on a VM at work this week and I got a permission error (Errno 13) regarding the containers’ python’s logging config. When I first started working with this docker-compose.yml i used the suggested echo -e "AIRFLOW_UID=$(id -u)" > .env command which provided my user id (let’s say it’s 506 ) from my local machine and assigned it to the AIRFLOW_UID key. Now that i am working in the VM and have extended my .env file to include other information i figured i could just use a copy of the same file. Everything else works fine except airflow cannot write logs because the user in this virtual machine with user id 506 does not have permission to write to this ./logs/ directory. If you google this error i found — among a sea of almost right answers — that most of the solutions online are variations of “change the logs folder’s permissions to 777” meaning anyone can read, write, and execute the contents of the logs. That works. However, you don’t really need everyone to be able to read and write — just this airflow user. Updating the UID on the VM’s .env file worked perfectly without having to mess with the permissions.

about jq [ ] syntax

what i learned

If you want to dump a list of objects you’re constructing from some other json you need to wrap your entire jq string in square brackets ( [] ). Otherwise you’ll be writing each object one at a time and that’s not valid JSON. For example, running something like

jq '.[] | {id: .id, title: .title, created: .created }'

returns →

{
    id: "123",
    title: "page 1",
    created: "2022-01-25T23:15:00.000Z"
}
{
    id: "124",
    title: "page 2",
    created: "2022-01-26T13:18:15.000Z"
}
{
    id: "125",
    title: "page 3",
    created: "2022-01-27T18:37:05.000Z"
}

This file is not valid JSON. However, if you wrap your entire expression in square brackets [] jq will group these all as a list of objects instead of appending each object at a time.

jq '[.[] | { id: .id, title: .title, created: .created }]'

returns →

[
  {
    "id": "123",
    "title": "page 1",
    "created": "2022-01-25T23:15:00.000Z"
  },
  {
    "id": "124",
    "title": "page 2",
    "created": "2022-01-26T13:18:15.000Z"
  },
  {
    "id": "125",
    "title": "page 3",
    "created": "2022-01-27T18:37:05.000Z"
  }
]

how to execute a shell script in the current shell

what i learned

when you execute a shell script, it defaults to creating a new shell, executing the script in that shell and closing it. if you want to, for example, set environmental variables you would need to run the script in the current shell. let's say you want to have a short shell script that sets the database url as an environmental variable called env_vars.sh.

#!/bin/bash
export DATABASE_URL="super_secret_url"

if you run

sh env_vars.sh

in your terminal, it would run said script in a new shell and therefore those environmental variables would not be set in your current shell and would then be unavailable to your other scripts.

to run that in your current shell you use the following syntax

. ./env_vars.sh

this way your environmental variables are set in your current shell and you can use them as expected.

Haciendo datos abiertos más accesibles con datasette

California recientemente liberó datos sobre las detenciones hechas por oficiales de las 8 agencias más grandes del estado. Estos datos cubren los meses de julio a diciembre del 2018. Esta fue la primera ola de divulgación de datos que entrará en vigencia en los años siguientes. Los datos cubrieron más de 1.8 millones de paradas en todo el estado. Si bien este es un paso en la dirección correcta, un solo archivo .csv de alrededor de 640 megabytes con más de 1.8 millones de filas y más de 140 columnas podría ser intimidante para algunas personas que se beneficiarían de la exploración de estos datos: líderes locales, periodistas, activistas y organizadores, por nombrar algunos.