Problem Getting Count for Complex SELECT

Alles, was MariaDB und MySQL betrifft, kann hier besprochen werden.

Problem Getting Count for Complex SELECT

Postby JamesCobban » 18. September 2021 03:23

My application implements a genealogical database. The database is necessarily complex because it models the real world, and real world families are messy. At a spot where I wish to identify all of the people who have a particular surname either because it is the one they were born with, or one that they acquired through adoption or legal name change I have the following query to determine how many such individuals there are:

Code: Select all
SELECT COUNT(*) FROM Persons JOIN Names ON Names.IDIR=Persons.IDIR JOIN tblNR ON tblNR.Surname=Names.Surname LEFT JOIN Events AS EBirth ON (EBirth.IDIR=Persons.IDIR AND EBirth.IDType=0 AND EBirth.IDET=3 AND EBirth.Preferred=1) LEFT JOIN Events AS EDeath ON (EDeath.IDIR=Persons.IDIR AND EDeath.IDType=0 AND EDeath.IDET=6 AND EDeath.Preferred=1) WHERE Names.`Surname`='Smith' AND Names.TreeName='' AND Names.`Type`>=0 GROUP BY Persons.IDIR, Persons.ID, EBirth.EventD, EDeath.EventD, Names.Surname, Names.GivenName, Names.`Type`, EBirth.EventSD, Persons.BirthSD


Each record in the table Persons represents an individual in the family tree. Each record in the table Names is a name by which a person is known. Each record in the table Events describes something that happened to a person.

If I issue this query with the list of fields extracted from the database I get 1,588 rows. However I do not want to display 1,588 rows at once to the user as it would create a very long web-page which the user would have to scroll through and could potentially exceed the amount of memory in the server engine. So I issue the query first with COUNT(*) and then with the list of fields and a LIMIT clause. However starting some time ago, both with MySQL and MariaDB, my application gave unexpected results. Instead of giving me a single row containing a single field containing the value 1588, the above SELECT instead returns an array containing 1,588 rows each of which contains a field with the value 1! Since my application code is only expecting a single row response, it displayed 1 as the number of matching records. This appears to be a consequence of the presence of the GROUP BY clause, which is there to hide cosmetically duplicate records. If I remove the GROUP BY CLAUSE I get the expected single row response. To recover from this I had to add an extra query, for the above specific case:

Code: Select all
SELECT COUNT(*) FROM Names  WHERE Names.`Surname`='Smith' AND Names.TreeName='' AND Names.`Type`>=0


However note that in unusual situations this will give the wrong total because of the missing GROUP BY clause.

I cannot find this issue discussed anywhere on the Web and would like to understand what I am doing that results in the SELECT COUNT(*) returning an array of rows instead of a single row with the total count.
JamesCobban
 
Posts: 4
Joined: 18. August 2021 02:09
XAMPP version: na
Operating System: ubuntu linux

Re: Problem Getting Count for Complex SELECT

Postby Altrea » 18. September 2021 07:24

We don't provide any support via personal channels like PM, email, Skype, TeamViewer!

It's like porn for programmers 8)
User avatar
Altrea
AF Moderator
 
Posts: 11207
Joined: 17. August 2009 13:05
XAMPP version: several
Operating System: Windows 10 Pro x64

Re: Problem Getting Count for Complex SELECT

Postby Nobbie » 18. September 2021 10:32

Actually, this "unexpected behaviour" nowadays is the correct behaviour, because COUNT(*) relates to the Grouped results which obviously have 1 row per Group.

Instead of evaluating this Select after fetching it, simply execute a mysqli_num_rows() in your PHP script. This gives you the count you are looking for.
Nobbie
 
Posts: 12199
Joined: 09. March 2008 13:04

Re: Problem Getting Count for Complex SELECT

Postby JamesCobban » 30. September 2021 03:50

Thank you Altrea and Nobbie for looking at my issue.

I cannot use mysqli_num_rows() because I am using PDO and every help page I can find instructs me to replace mysqli_num_rows with a SELECT COUNT(*) when migrating to PDO which I did over 12 years ago. For example see https://phpdelusions.net/pdo#count. Since the behavior changed simultaneously in both MariaDB and MySQL I appreciate that it was intentionally changed to resolve an issue. But I still need to know the total number of records that will be returned when I issue the SELECT with those GROUP BY clauses. With the current behavior if the main query would return a million rows, I need to know that before I issue the main query. I cannot issue the query and count how many rows I got back. But with this intentional behavior I instead get a response set that has a million rows in it, with each row containing the value 1. I would have to read in those million rows and add up all of the 1s to get the total number of records. That would almost certainly blow my PHP instance out of the water!

Specifically the database maintains historical information about people in a family tree. The query seeks to include individuals who match particular names and may also require that they born or died within certain date ranges or born or died at particular locations, etc. The specific query here is a very simple one which only looks at surnames, but I wanted to give the simplest possible example while looking for a solution. The birth and death dates and locations are not recorded in the Persons record because historical documents may supply different dates so the database schema records those dates in separate records in the Events table along with every other fact about the person. I use the GROUP BY because one possibility is that researchers accidentally created two birth events with the same date, and I do not want to return two records which contain the same information. If I do distinct only on the Persons fields then if two birth event records exist that both match the date range but have different values of EventD I do want to present both of those records to the user because they are different.

OK. I rearranged the query for the count of the number of records which would be returned to:
Code: Select all
SELECT COUNT(DISTINCT tblIR.IDIR, tblIR.ID, EBirth.EventD, EDeath.EventD, tblNX.Surname, tblNX.GivenName, tblNX.`Order`, EBirth.EventSD, tblIR.BirthSD) FROM tblIR JOIN tblNX ON tblNX.IDIR=tblIR.IDIR JOIN tblNR ON tblNR.Surname=tblNX.Surname LEFT JOIN tblER AS EBirth ON (EBirth.IDIR=tblIR.IDIR AND EBirth.IDType=0 AND EBirth.IDET=3 AND EBirth.Preferred=1) LEFT JOIN tblER AS EDeath ON (EDeath.IDIR=tblIR.IDIR AND EDeath.IDType=0 AND EDeath.IDET=6 AND EDeath.Preferred=1) WHERE tblNX.`Surname`='Sinclair' AND tblNX.TreeName='' AND tblNX.`Order`>=0

This returned the value 28. But if I removed the COUNT and just asked for all of the distinct rows:
Code: Select all
SELECT DISTINCT tblIR.IDIR, tblIR.ID, EBirth.EventD, EDeath.EventD, tblNX.Surname, tblNX.GivenName, tblNX.`Order`, EBirth.EventSD, tblIR.BirthSD FROM tblIR JOIN tblNX ON tblNX.IDIR=tblIR.IDIR JOIN tblNR ON tblNR.Surname=tblNX.Surname LEFT JOIN tblER AS EBirth ON (EBirth.IDIR=tblIR.IDIR AND EBirth.IDType=0 AND EBirth.IDET=3 AND EBirth.Preferred=1) LEFT JOIN tblER AS EDeath ON (EDeath.IDIR=tblIR.IDIR AND EDeath.IDType=0 AND EDeath.IDET=6 AND EDeath.Preferred=1) WHERE tblNX.`Surname`='Sinclair' AND tblNX.TreeName='' AND tblNX.`Order`>=0

It returned 221 rows! This is also the number of rows returned by:
Code: Select all
SELECT tblIR.*, EBirth.EventD AS EBirthD, EDeath.EventD AS EDeathD, tblNX.Surname AS indexsurname, tblNX.GivenName AS indexgivenname, tblNX.`Order` AS indextype FROM tblIR JOIN tblNX ON tblNX.IDIR=tblIR.IDIR JOIN tblNR ON tblNR.Surname=tblNX.Surname LEFT JOIN tblER AS EBirth ON (EBirth.IDIR=tblIR.IDIR AND EBirth.IDType=0 AND EBirth.IDET=3 AND EBirth.Preferred=1) LEFT JOIN tblER AS EDeath ON (EDeath.IDIR=tblIR.IDIR AND EDeath.IDType=0 AND EDeath.IDET=6 AND EDeath.Preferred=1) WHERE tblNX.`Surname`='Sinclair' AND tblNX.TreeName='' AND tblNX.`Order`>=0 GROUP BY `tblIR`.IDIR, tblIR.ID, EBirth.EventD, EDeath.EventD, tblNX.Surname, tblNX.GivenName, tblNX.`Order`, EBirth.EventSD, tblIR.BirthSD ORDER BY `tblIR`.Surname,`tblIR`.GivenName,COALESCE(EBirth.EventSD,`tblIR`.BirthSD,-99999999)


After many experiments I found the following worked, doing a count of a subquery:
Code: Select all
SELECT COUNT(1) FROM (SELECT DISTINCT tblIR.*, EBirth.EventD AS EBirthD, EDeath.EventD AS EDeathD, tblNX.Surname AS indexsurname, tblNX.GivenName AS indexgivenname, tblNX.`Order` AS indextype FROM tblIR JOIN tblNX ON tblNX.IDIR=tblIR.IDIR JOIN tblNR ON tblNR.Surname=tblNX.Surname LEFT JOIN tblER AS EBirth ON (EBirth.IDIR=tblIR.IDIR AND EBirth.IDType=0 AND EBirth.IDET=3 AND EBirth.Preferred=1) LEFT JOIN tblER AS EDeath ON (EDeath.IDIR=tblIR.IDIR AND EDeath.IDType=0 AND EDeath.IDET=6 AND EDeath.Preferred=1) WHERE tblNX.`Surname`='Sinclair' AND tblNX.TreeName='' AND tblNX.`Order`>=0) AS Temp


This technique will probably also work with the original query using GROUP BY statements.
JamesCobban
 
Posts: 4
Joined: 18. August 2021 02:09
XAMPP version: na
Operating System: ubuntu linux

Re: Problem Getting Count for Complex SELECT

Postby Nobbie » 30. September 2021 13:39

JamesCobban wrote:But I still need to know the total number of records that will be returned


Actually - i dont think you really NEED the total number of records. Browsing thru a table (as a result of any Select) using Limit to show only some lines per page does NOT require the knowledge of the total number of rows. If have written many PHP/MySQL scripts (like forum script for example, or also a database for holiday evaluation etc.) and i never needed to "know" how many rows total a Select created. Its not difficult to browse thru the records only with the help of a few session variables, which keeps the information about the current "position" in order to navigate thru the data. And when a fetch() returns NULL, I am at the end.
Nobbie
 
Posts: 12199
Joined: 09. March 2008 13:04


Return to MariaDB - MySQL

Who is online

Users browsing this forum: No registered users and 5 guests