Disclaimer: Post was initially published on datacraze.pl domain. It was migrated, as previous domain is going to be driven in Polish.
Year 2017 is coming to an end, but before this chapter will be completely closed, let’s take a challenge and try to set 10 Data & Analytics Predictions for 2018.
1. Smart Data
Starting from a buzzword – smart data. Based on a definition from techtarget portal – “Smart data is digital information that is formatted so it can be acted upon at the collection point before being sent to a downstream analytics platform for further data consolidation and analytics.”
For me term smart data should have a slightly different definition, we as a society are creating huge amounts of data every second, it doesn’t mean that this huge amounts of data is really useful. Using Pareto 80/20 principle (I would go even further to 95/5), 80% of data that is being created is junk and only 20% is valuable. Smart Data should be defined as a data that brings a real value, without any prior transformations, adjustments etc. Based on Forbes article 80% of the work of data scientists is data preparation, and let’s face it these guys and gals are well payed employees.
If this 80% of work would be lowered to 50% (wouldn’t expect to make it 0% that’s kind of impossible), think how much more insights companies could get for the same price, and how easier would it be for a Data Scientist to focus on the task rather than on cleaning the data. However this requires “smart data” data that is ready to be used by machine learning algorithms, data that is cleaned from junk and noise, data that on the phase of extraction was (or even on the phase of gathering) was sifted from all the crappy stuff that was produced along.
2. Machine Learning embedded into ETL and analytics tools
Another hot topic that was having its renaissance in 2017 which is Machine Learning. Most likely you’ve heard blabbering that AI will replace us all etc., sure whatever, my opinion is similar to a great quote from one and only Uncle Ben from Marvel Spider Man “with great power comes great responsibility”. There is a huge potential in machine learning (Deep Learning / Reinforced Learning / Neural Networks you name it) and honestly there was no better time than now to make use of it.
You may say that there are already tools on the market with built in ML blocks for ETL or advanced analytics, and you will be perfectly right, my idea behind this point is slightly different. I’m not expecting that each and every person in IT world that is working on a daily basis with data will know ML algorithms and concepts (although it would be magnificent), thus I can imagine that advanced analytics and machine learning will be embedded to existing and new tools used in data & analytics processes, without explicitly saying or naming it that way, they will just be there coded into the product itself and they will guide users / developers or even make decision on their own during the execution process.
- Automated products classification during ETL process to data warehouse;
- Data value adjustments (corrections of spelling etc) for dimensions;
- Automated ETL steps behaviour on processing error – if similar cases were flagged in the past as corner cases not failures, process should match them and let them pass into DWH;
- Sentiment Analysis built in ETL flows;
3. Effective Metadata processing and Data Governance
Something more cumbersome, not really shiny, not really visible for your customers and users but how crucially important. Having a lot of data is one thing, but to know the exact meaning of the objects, governing usage and performance statistics, analysing ETL and usage logs is totally different thing.
My expectation is that this point will be connected with point 1 and 2 and some brand new cool products or services will emerge in the market, or some awesome features would be added to existing solutions. Currently (at least from my experience, couldn’t find any reliable market surveys) not much time is spent on effective governance and metadata processing, but in most of the cases it contains information even more valuable than regular data. With great increase in big data and data science topics, metadata management should be a core competency in data & analytics part of company.
4. “Augmented Intelligence”
That’s an awesome concept, which I’ve first been introduced to during Qlik Qonnections event in Florida, and it is also related with second point described above. The goal here is to combine human understanding of the data and it’s insights with machine intelligence-driven approach to analytics, and data. Human is still and (let’s hope) always will be a decision maker but algorithms may and should make it easier to explore and present insights from your data, identify patterns and prepare key visualizations for people that are using data & analytics products. You can read more about it on Qlik page.
5. Data & Analytics expansion in Clouds
For tech start-ups that were build based on cloud solutions or for companies that have fully migrated (with all their ERPs / DWH’s) it would be even easier, for other companies that are long time market players, and still have a lot of systems and sources in an on-site data centers it may be more problematic to fully use potential of cloud services, but the fact is that this business and its services is destined to be even bigger in 2018.
Each year top players (Amazon Web Services / Microsoft Azure / Google Cloud Platform) are revealing couple new services, that are aim to be the competitors for current market products or can be used along with others already used services, and honestly that’s the best thing Cloud Service Providers can do, if you are already fully in 1 particular product why not to use its services in data & analytics area.
To name just a few services to get more visibility (read the details if you want in attached links): AWS Glue, AWS QuickSight, Amazon Kinesis, Amazon Redshift, Azure HDInsights, Azure Stream Analytics, Azure Data Factory, Google Cloud Datalab, Google Cloud Dataflow, Google Cloud Dataprep, Google Cloud Data Studio.
6. Self-service as a standard
If for each time that I’ve heard self-service in 2017 I would get 1$ I would be most likely billionaire by now. This a hot topic and it should be that way, employees / customers need to have a possibility to play with data in any way that they want, as frankly speaking they know their data best, thus IT departments should concentrate more on quality and on time delivery of data, rather than on building pixel perfect reports.
However this doesn’t mean that customers / employees should be left alone with data in dashboards – such approach may lead to a lot of difficulties, among others: poor performance of analytics platform, data dumps to Excel (rather than exploring data in an analytics environment), creating couple source of truths, etc.
As data & analytics experts, or companies that are providing such services, we should focus explicitly on enabling people to make use of their data in easy way, but in addition there is a need to guide and educate them, as to how get the most from it.
My bet for 2018 is that majority of products / companies and services that are or will be on the market will focus on “teaching a man to fish, rather than simply giving them a fish” – this was already highly visible in 2017 and it will only continue to be the trending topic.
7. Analytics Assistants (Bots / Voice Recognition)
What if you could talk to your PC or Alexa / Google Assistant etc. enabled speaker, and it will present you (answer) the data that you are looking for?
In 2017 it was making a “wow” effect during presentations of analytics products (example: https://www.youtube.com/watch?v=1LT1XJc5zpw, or https://community.qlik.com/blogs/qlikviewdesignblog/2017/04/18/push-the-boundaries-of-analytics-qlik-sense-bot-video ), in 2018 it may appear as a must have feature, and honestly I’m really looking forward to further use cases of Bots and Voice Recognition in data & analytics world.
8. Increase in leveraging Data Virtualization solutions
Let’s embrace another abstraction layer in data world, without fear my friends. Moving, wrangling and storing data is expensive and time consuming. With increase in data sources for each and every company, data virtualization solutions are being more visible on the market (https://www.gartner.com/reviews/market/Data-Virtualization), they can not only save time and money without a need to focus on extensive ETL and data storage projects, but most importantly they are aimed to produce quick and accurate insights leveraging multiple sources without requiring much technical details about the data, such as how it is formatted at source, or where it is physically located.
9. “All in one / Cockpit Dashboards”
For sure, your company have (you’ve came across) a lot of dashboards for different purposes – from financial income statements to supply chain planning. These dashboards may have its self-service capabilities, can present different crucial KPI’s and insights. It is all good and easy when a user is using only a particular dashboard for their daily tasks, however in majority of the cases, your colleagues / customers will use couple of dashboards looking for different data.
For this particular case “all in one dashboards” (don’t know if that is the correct business name) were created / invented / popularized. In this concept 1 dashboard presents most critical KPI’s / data from 2 or more different applications, along with all needed filtering and other capabilities.
This trend is not new, but it will only grow in upcoming years (the way how it it technologically handled, and the needs of the business), as the vast majority of the users is monitoring daily more than 1 dashboard, you can help your clients by introducing a custom made dashboard (if your BI tool provides relevant capabilities, using standard web stack – HTML / CSS / JS + tool API), or you can use you BI built-in solutions, for a particular example check out Qlik Mashups concept.
10. Data Certification / Data Officers
With growing amount of data each second, there is an increasing need to distinct “correct” data from those that are not really useful in a particular case. If a company is struggling with finding out what is 1 source of truth for instance for theirs financial data, data certification or data officers may come into play.
What / Who would that be, and how it would be beneficial for the company. Such position or a process, would basically lead to marking group of data / data marts as a valid ones for a particular use case or for business unit to use in their processing. This validation would aim to create a data marts marked as certified and eligible to use by other business / parties meaning data that are stored there have correct business logic and provides correct insights.
Furthermore based on GDPR regulation that will be introduced in May 2018 it will be a hot topic as to how it should be properly addressed by the companies, more about it: https://en.wikipedia.org/wiki/General_Data_Protection_Regulation#Summary
Bonus Point – Data Security
Data is an asset, it always was. Either you will protect this asset or you will go down because you’ve failed to do so.
We can name here all the top failures from 2017, but frankly that’s not the point. Obviously it’s not only about big companies either, if you are a 1 man army serving your customers with a great product, but you are not doing everything you can to protect their data, than you are not doing good job. “To Protect and to Serve” is a motto of the Los Angeles Police Department you can use it as your motto as well for your customers and their data that they have entrusted you with.
Whether or not above predictions will come true, 2018 will be a year of data, as this is the foundation for all the companies. We will hear about the companies doing great job using their (or in general) data and we will hear about others failures.
It’s great time to be in data & analytics business, and it will be even better in upcoming year, don’t overslept it.
What are your predictions for 2018 in data & analytics area?