Why does one need to use the GROUP BY function when using COUNT?

196 Views Asked by At

I am new to SQL. I have the the following query that works, attached below.

I tried it multiple times without the GROUP BY function and it didn't work. I was curious as to what this function does as I can't understand any descriptions of it online, specifically this article: https://www.w3schools.com/sql/sql_groupby.asp. I know it causes the rows of the items table to be collected into groups, but why do I need to do that for this query to work?

SELECT 
    usertype,
    CONCAT(start_station_name," to ", end_station_name) AS route,
    COUNT(*) as num_trips,
    ROUND(AVG(cast(tripduration as int64)/60), 2) AS duration
FROM 
    `bigquery-public-data.new_york_citibike.citibike_trips` 
GROUP BY
    start_station_name, end_station_name, usertype
LIMIT 10
2

There are 2 best solutions below

0
On
select col1, col2, col3, concatenate(col1, col2)
from table_
where col1 = 'some_value'

This kind of SQL refers just to the rows concatenate(col1, col2) also refers to the rows.

COUNT(*) is one of the aggregate functions that means they are referring to the groups of rows, that's why you need to use the GROUP BY clause. With group by you can perform HAVING -> what is WHERE for the rows, that is HAVING for the groups of rows.

Try to create table with 3 fields of any type as above, populate them with few rows, -> col1 has all values present, col2 has just 2 values, col3 has 1 value, and then try to perform counting of each column to see what happens.

0
On

I'm not sure how to explain why you need a group by with the definition. Because sounds like you went through some definitions on those sites. However, let me give you an example:

Let's say you have a store. People come visit your store and purchase different items that you sell day to day and your store maintains the ledger of each transaction.

Imagine, now you are trying to prepare a report for this month or week or current day that your report should contain total sales, total sales by say category, or total sales by each product (whatever was sold for the period).

Let's assume your store sales data table is like this below:

Sn| Date| Product| Qty| Rate| TotalAmount | Customer
-----------------------------------------------------
1  10-12   Sugar    10    20    200          A
2  10-12   Rice     5     50    250          B
3  10-12   Rice     2     50    100          C
4  10-12   Potato   15    3     45           C
5  10-12   Cola      2    15    30           Cash
6  10-13   Rice     1     50    50           Cash

Now comes what group by can give you along with aggregate function. In your report you want to see:

1 Total Sales Amount or Transaction Count
2 Total Sales each day Or Transaction Count each day
3 Total Cash Sale vs Customer Sales
4 Total Sale by each Product

All these questions is asking us to run some aggregation on our data. Count, Sum, Average etc are all aggregations. And based on our aggregation we "may" need to group our records. Example:

select count(*) from ledger   

this gives me total count of sale from ledger, I'm not required to provide any group by here becasue i'm not using any of other fields into my selection list that I would perform aggregation based on the group of data. Another example:

select count(*), Product from ledger 

If I do not provide group by then its gonna throw the exception because now our select list has the column that affect the aggregration we are trying to get. In this example we are trying to get the Total Sales Count by Product. Our requirement in this example clearly says "By Product". That means we are trying to group the sales data by product we have and print the total sales Count.

select count(*), Product from Ledger 
group by Product

When you are using the aggregate function along with other fields in your select list then you must supply all the non-aggregate fields in your group by. If you're not selecting any other fields than aggregate fields then group by fields is "optional". Hope this helps a bit to understand why we need group by.

Suggestion: I would ask you to create a simple table like this and ask yourself for the different kinds of reports that you want to see. That way you will learn a lot.