This question isn’t OE related but here goes:

This question isn’t OE related but here goes: If anyone here is following my J1 saga, you’ll remember that I took a temp job because I very foolishly believed the recruiter when she told me it was C2H. I’m now at the point where it’s time for me to motivate the boss to true up the recruiter’s 🐂💩 and hire me for real. This is a data quality role where they hired be because I have some very niche subject matter expertise. It’s not airline data but I’m going to use that as my examples. No one else in the team has any not-airline experience even though that’s what the data is about. I’m finding problems all over the place. I’m finding flights arriving with more fuel than when they took off. I’m finding duplicated flight numbers. I found a case where there were no first class tickets sold. When I asked about this, they told me that they had checked with SouthWest and were told this was fine because they don’t have first class. Unfortunately they applied that to other airlines that do. It’s a mess and I discovered last week that much (maybe all) of this is an artifact of their internal ETL process. The data coming in doesn’t have these issues. Now I don’t want to tell them they have an ugly baby but they very clearly do. I’m trying to figure out a way to communicate this to them such that they think I’m a genius who can find issues that no one else can rather than an obnoxious know it all who is saying that their years old process sucks and needs a massive revamp. Any tips? At my last job the data engineers had an absolutely toxic relationship with the analysts and I would have gleefully humiliated them for putting out data saying the plane crash survivors were buried in a cemetery a mile away. These people are sweet and have gone out of their way to be welcoming to me.
9 Replies
Needle
Needle5mo ago
Thread automatically created by DataGeek(2J) in #🤔|questions
grumpy-cyan
grumpy-cyan5mo ago
I think a good place to start is to find few obvious nonsensical problem like the landing with more fuel thing or burying survivors thing, and then explain they stem from the pipeline. I think even a moron would realize that they have a problem there once they see multiple big data quality issues coming from the same place. Gotta bring it up one problem at a time tho so they have time to digest each issue one by one
inland-turquoise
inland-turquoise5mo ago
Maybe a power point which goes in order of least problematic issue to most problematic issue would work for getting his points across? If the company has problems they absolutely need to address, you need to give them a sales pitch on why it behooves them to invest time and money to fix them? Just brainstorming
quickest-silver
quickest-silver5mo ago
That’s interesting. It means I’m going to have to get access to various intermediate steps in the pipeline. (Right now I just have the raw data from the outside source and the final result.) I wonder if they would consider it stepping on their toes if I start digging into that. At my old job they definitely would have been affronted by an analyst doing that but here they might appreciate the help.
Ed
Ed5mo ago
Might have to be careful with how you communicate the bug. Some engineers may not appreciate the implied blaming or fingerpointing, especially if it's in a public forum/channel that their manager can see. If I was the engineer, I would be pissed if it was escalated to my manager without me knowing.
quickest-silver
quickest-silver5mo ago
Wanted to thank all you guys for your comments and give you a follow up on what happened. After years of working with data engineers I should have anticipated this. When I showed them that their data said they had buried the survivors they pulled out a Source to Target Mapping document which said that the data was supposed to say exactly that. It was a deliberate design decision and I wasn’t intended to interpret that literally.😂 I told them that this was an unintuitive design and suggested that they call out how this works as prominently as possible so the data users don’t make the same mistake I did. They are looking into the duplicated flight numbers. We think that updates from the source are being interpreted as new entries rather than replacements.
grumpy-cyan
grumpy-cyan5mo ago
That’s unexpected lmfao
inland-turquoise
inland-turquoise5mo ago
No offense but that org. Has some shitty DE’s if they haven’t checked for duplicates, NULLS, suspicious data volume trends, etc.
quickest-silver
quickest-silver5mo ago
Apparently, they looked for them, found many, and then consulted with someone who had no idea what they were talking about who told them it was fine. Since they thought it was supposed to be like that they wrote those mistakes into the requirements. I’m the temp here trying to get hired on permanently at this very cushy job. I’m not going to rock the boat. I’m just going to point out that this could cause problems downstream and leave it at that. If I start making noise they will find it very easy to get rid of me once my contract ends.
Want results from more Discord servers?
Add your server