Tell HN: Do Not Assume Anything

news.ycombinator.com
4 min read
very easy
OK so here is something I learned yesterday and today:*Story 1:*I was tasked to modify a Looker view. It's just a bunch of small updates so I figured it should be easy. One task is to modify a metrics A. Before the modification, A = B - C and after A = B - C - D. OK this is super easy, so I simple put B - C - D and call it a day.I quickly figured out that D could be NULL so this messes up the calculation sometimes. Now that I have zero experience with Looker but I know a lot about SQL, so I know I need something similar to IFNULL(). It happened that there was a NULLIF() in a neighbouring metrics so I told myself, Oh this must be Looker's way to implement IFNULL(), you know, guys need new names for everything.Rest of the story is that I spent a good part of today to figure out why A gives really weird values. I removed NULLIF(D, 0) and everything looked fine. I added it back and weird results came back. I pulled my hairs and chanted YHVH. I tried everything except the right thing. Eventually I gave up and pulled out Google. Well that's it, NULLIF() is *completely* different from IFNULL()!*Story 2:*There is an Airflow DAG called blah which dumps data into a BigQuery table. The DAG does not have a `time_partitioning` part and neither does it have a `clustered_fields` part. Somehow the DAG runs fine in production but fails in test. Eventually, after some Googling and discussion with a more senior colleague, I figured that I need to add a `time_partitioning` part to the DAG to remove the error.It was added and the error was removed. All tests went through and the BigQuery table in test env looked fine. The change was pushed into production shortly. One less ticket to deal with. Profit!Well not so fast. A few hours later I got a message on Slack that the DAG is failing on production. What went wrong? Eventually, after combing through the log, we found out that the clustering fields is missing from the DAG. This really caught me off because it was not there in the first place!I still haven't figured out the full picture, but here is what I think that happened (and I need to dig deeper to remove that "think" word): If `time_partitioning` and `clustered_fields` are missing from the DAG, somehow the default values (stored in BigQuery) are used. However, if I mention `time_partitioning` without specifying `clustered_fields`, it actually thinks there is NO clustered fields and removed it from BigQuery! Again I'm still not 100% sure about this, but looks like this is what happened.Now you are probably wondering: Why did your new DAG pass the test env? Well that's a bit embarrassing, I found out that the table in test env actually ONLY has partitions, but has no cluster fields, so that's why the new DAG passed the test env. Comments URL: https://news.ycombinator.com/item?id=31416412 Points: 1 # Comments: 0
OK so here is something I learned yesterday and today: *Story 1:* I was tasked to modify a Looker view. It's just a bunch of small updates so I figured it should be easy. One task is to modify a metrics A. Before the modification, A = B - C and after A = B - C - D. OK this is super easy, so I simple put B - C - D and call it a day. I quickly figured out that D could be NULL so this messes up the calculation sometimes. Now that I have zero experience with Looker but I know a lot about SQL, so I know I need something similar to IFNULL(). It happened that there was a NULLIF() in a neighbouring metrics so I told myself, Oh this must be Looker's way to implement IFNULL(), you know, guys need new names for everything. Rest of the story is that I spent a good part of today to figure out why A gives really weird values. I removed NULLIF(D, 0) and everything looked fine. I added it back and weird results came back. I…
Read full article