Data Craze Weekly #2

This message was sent first to subscribers of Data Craze Weekly newsletter.

WEEK IN DATA

Weekly dose of curated informations from data world!
Data engineering, analytics, case studies straight to your inbox.

    We won't send you spam. Unsubscribe at any time.


    The administrator of personal data necessary in the processing process, including the data provided above, is Data Craze - Krzysztof Bury, Piaski 50 st., 30-199 Rząska, Poland, NIP: 7922121365. By subscribing to the newsletter, you consent to the processing of your personal data (name, e-mail) as part of Data Craze activities.

    Week in Data

    SQL is a King … again 👑

    I remember too well the time when NoSQL was going mainstream. “SQL is dead” sentence was all over the place at that time. However you can call me an old-fashioned guy but I was always sceptical to such strong words. NoSQL is great it can help and speed up development process especially when you are on a greenfield project and can architect everything, but … In the companies Where dollars were flowing like a river, SQL was there standing right in the shadow doing its “mundane” job.

    You don’t need to know SQL by heart, but in the linked article there is an important part. SQL is de facto standard in majority of the roles description. From Frontend, through Backend, going further to Machine learning and ending on Analytics.

    Link: https://spectrum.ieee.org/the-rise-of-sql

    What’s new in Cloud from Data perspective

    Bartosz Konieczny, author of the linked article is doing a great job in data engineer space. An example of it is among others this post with all the new features in cloud world (multiple cloud providers) from the data perspective.

    If we are not working with a certain technology on a daily basis (but we used to and still are somewhat interested) we would normally miss such things. Sometime these things are perls that can help our future (or current) clients. Example: AWS data warehouse (Redshift) is officially giving a Serverless option as a general available thing (no more infrastructure management).

    Link: https://www.waitingforcode.com/data-engineering-cloud/what-new-cloud-data-engineers-part-7-05-08-2022/read

    Warehouse and processing of data - Fundamentals

    Are you wondering what is data warehouse exactly or what tools are needed for data processing?

    Linked article will give you solid fundamentals, plus you can in practice build your own data pipeline using popular nowadays tools - Airflow. There is a catch though … this article will not explain to you how to configure all of the tools.

    However, feel free to start from just checking it out, read the code. Understand what DAG’s are how does Scheduler works and later you can jump into configuration.

    Link: https://medium.com/@devparmar967/a-quick-guide-for-building-datawarehouse-and-etl-pipelines-with-airflow-19cce17017bd

    ETL / ELT / ETLT or maybe data virtualization?

    On of the most “used” interview questions for the position connected with data (Business Intelligence / Data Engineer etc.).

    What is the difference between ETL and ETL?

    For someone that is prepering for such role it might be a no brainer. What if a recruiter with take a step further and ask:

    What is a CDC (Change Data Capture) or what exactly is a data virtualization?

    In attached article author is giving solid fundamentals of all the main concepts - processing the data. You don’t need to Google through X pages. Check it, make notes but most importantly understand the difference.

    Link: https://medium.com/codex/data-pipeline-architecture-variety-of-ways-you-can-build-your-data-pipeline-66b3dd456df1

    Tools

    remark – are you a Markdown fan or a heavy user? Do you keep your notes in MD supported app (ex. Obsidian, LogSeq)? Why not take a step further, add a bit of HTML and CSS, copy your notes and create a rich slides ready for your presentation.

    Link: https://github.com/gnab/remark

    Some examples: https://remarkjs.com/#1

    Check Your Skills

    #SQL I’ve recently found quite solid interview question (presumably from UBER). It will test your SQL skills - Window Functions and Having.

    “Write a query that for each person, which has used Uber at least twice will calculate time difference between first and second ride.

    Link: https://app.bigtechinterviews.com/challenge/68J6fB6sJ42PwUjkVxdwtR

    Data Jobs