The scatter diagram or graph is a control and support tool to verify the existence of a correlation or relationship between quantitative variables. The objective of this tool is to analyze the possible cause-effect relationship between two variables and verify the hypotheses.
The scatter chart component displays a visual comparison of two sets of values on a chart. Best suited for data such as survey results, tests, and demographics
Let’s learn more about its features and uses.
What is a scatter plot?
The scatterplot is a data visualization used when there are many different data points and you want to highlight similarities in the data set. This is useful when looking for outliers or to understand the distribution of the data.
If the data forms a band that extends from the bottom left to the top right, there is most likely a positive correlation between the two variables. If the band goes from top left to bottom right, a negative correlation is likely. If it’s hard to see a pattern, there’s probably no correlation.
When to use a scatter plot?
Use a scatter plot for one of the following reasons:
- You want to show the relationship between two variables
- You need a compact data visualization
Don’t use a scatter chart if you want to quickly explore information or want clear, precise data points.
Advantages of Scatter Plot
Here are some of the benefits of using a scatterplot to visualize your data:
- Shows the relationship between two variables.
- It is the best method to display a non-linear pattern.
- The data flow range, that is, the maximum and minimum value, can be determined.
- Observation and reading are simple.
- Plotting the scatterplot is easy.
Correlation in scatter plots
The relationship between variables is called correlation. Correlation is another word for “relationship.” For example, what you weigh is related (correlated) to what you eat.
There are two types of correlation: positive correlation and negative correlation. If the data points make a line from the origin from the low values of “x” and “y” to the high values of “x” and “y”, the data points are positively correlated. If the graph starts with high y values and continues with low y values, then the graph is negatively correlated.
You can think of positive correlation as something that produces a positive result. For example, the more you exercise, the better your cardiovascular health will be. “Positive” does not necessarily mean “good.” The more you smoke, the more likely you are to get cancer, and the more you drive, the more likely you are to get into a car accident.
Scatter plots use a collection of points placed using Cartesian coordinates to display the values of two variables.
Various types of correlation can be interpreted through the patterns shown in scatter plots. These are: positive (values increase together), negative (one value decreases as the other increases), null (no correlation), linear, exponential, and U-shaped.
The strength of the correlation can be determined by the proximity of the points on the graph. Points that end up too far from the overall group of points are known as outliers.
Lines or curves are fitted within the graph to aid analysis and are drawn as close as possible to all points and to show what all points would look like condensed into a single line. This is commonly known as a line of best fit or trend line and can be used to make estimates using interpolation.