This message was sent first to subscribers of Data Craze Weekly newsletter.
I remember too well the time when NoSQL was going mainstream. “SQL is dead” sentence was all over the place at that time. However you can call me an old-fashioned guy but I was always sceptical to such strong words. NoSQL is great it can help and speed up development process especially when you are on a greenfield project and can architect everything, but … In the companies Where dollars were flowing like a river, SQL was there standing right in the shadow doing its “mundane” job.
You don’t need to know SQL by heart, but in the linked article there is an important part. SQL is de facto standard in majority of the roles description. From Frontend, through Backend, going further to Machine learning and ending on Analytics.
Bartosz Konieczny, author of the linked article is doing a great job in data engineer space. An example of it is among others this post with all the new features in cloud world (multiple cloud providers) from the data perspective.
If we are not working with a certain technology on a daily basis (but we used to and still are somewhat interested) we would normally miss such things. Sometime these things are perls that can help our future (or current) clients. Example: AWS data warehouse (Redshift) is officially giving a Serverless option as a general available thing (no more infrastructure management).
Are you wondering what is data warehouse exactly or what tools are needed for data processing?
Linked article will give you solid fundamentals, plus you can in practice build your own data pipeline using popular nowadays tools - Airflow. There is a catch though … this article will not explain to you how to configure all of the tools.
However, feel free to start from just checking it out, read the code. Understand what DAG’s are how does Scheduler works and later you can jump into configuration.
On of the most “used” interview questions for the position connected with data (Business Intelligence / Data Engineer etc.).
What is the difference between ETL and ETL?
For someone that is prepering for such role it might be a no brainer. What if a recruiter with take a step further and ask:
What is a CDC (Change Data Capture) or what exactly is a data virtualization?
In attached article author is giving solid fundamentals of all the main concepts - processing the data. You don’t need to Google through X pages. Check it, make notes but most importantly understand the difference.
remark – are you a Markdown fan or a heavy user? Do you keep your notes in MD supported app (ex. Obsidian, LogSeq)? Why not take a step further, add a bit of HTML and CSS, copy your notes and create a rich slides ready for your presentation.
Some examples: https://remarkjs.com/#1
#SQL I’ve recently found quite solid interview question (presumably from UBER). It will test your SQL skills - Window Functions and Having.
“Write a query that for each person, which has used Uber at least twice will calculate time difference between first and second ride.”
- Data Engineer – Tessian – UK / EU Remote – 40000£ – 100000£