Today’s Data: More than numbers

  • Today’s Data: More than numbers

    Megan van der Ham
    20/05/2021
    Data

    Often when people think about data they imagine all these numbers and difficult coding work. And do not get me wrong because obviously this is a big part of Data Science, however, there is way more to it than just numbers. Data can really tell a story. This month’s blog will emphasise the importance of data visualization. Building upon the knowledge from the first blog, we know that the purpose of data analysis is to gain insights and that data visualization is of as much importance as the other steps. How the brain works is that having a visual summary helps identifying patterns and outliers more easily compared to searching through a file that contains thousands of numbers. Imagine you have 100,000 observations and you want to know the distribution of, for example, race in your sample. In this case it is way easier to create a pie chart that exactly shows you this distribution instead of you scrolling through these observations and roughly remembering a few races that you come across a lot. Visualization leads to better data exploration, which is just one of the many aspects where it comes in handy.

    There are many different ways of data visualization and it is important to choose the visualization that conveys what you want to find in the data. Line charts are often used to illustrate changes over time. Bar charts can make it easier to compare data of multiple variables and histograms look like bar charts, however, they often measure frequency rather than trends over time. Scatterplots are often used to find correlations within the data. Pie charts help when wanting to visualize percentages. In addition, it is also possible to visualize data in maps, for example when wanting to take location into account. Lastly, a heat map is a matrix that uses colour, which is quicker to interpret than numbers.

    In the last decades lots of software is developed to create these visualizations in up to a few minutes and it requires no coding skills. One of the simplest ways to perform data visualization is by using Microsoft PowerPoint. When you are not a Data Scientist, this will give you enough options on how to display your data that might be helpful, however, if your goal is to really decompose your data it does not give you the opportunity to get the best out of it. The best-known data visualization tool is Tableau, which offers unlimited data exploration in an intuitive interface. There are so many options to choose from in order to create the exact visualization you want. Next to that, Qlik is a big data visualization tool as well. It is very intuitive and comprehensive, just like Tableau. Microsoft has its own data visualization tool, called Power BI, which has a layout that looks a lot like the other Microsoft applications making it easy to work with. Lastly, a data visualization tool used by Google and The New York University among others, is Plotly. This tool will give you well-organized charts in a few minutes and is available for different languages such as Python, R and Excel as well. Want to become an expert in these data visualization tools? At the end of this blog I put some links to (free) courses that will help you manage these tools.

    Data visualization is not only important when wanting to explore your data and identify patterns, it is also very important when wanting to communicate your findings to others. It increases the ability to share insights with everyone involved, even the people that have no understanding of Data Science. It will improve maintaining the audience’s interest since they can understand the information that is given to them. This skill is not only usable in Data Science but in almost every career, think about teachers wanting to show students their grades, brokers needing to analyse stock performance and consultants having to examine a certain market. Improving your data visualization skills will help you no matter what career you are pursuing. 

    Next month’s blog will be the last blog of this Data Science rubric and to close it in the best way possible, I will inform you all about the education and career possibilities in the field of Data Science.

    Links to the data visualization tools:
    Tableau
    Qlik
    Power BI 
    Plotly