Thank you Altrea and Nobbie for looking at my issue.
I cannot use mysqli_num_rows() because I am using PDO and every help page I can find instructs me to replace mysqli_num_rows with a SELECT COUNT(*) when migrating to PDO which I did over 12 years ago. For example see https://phpdelusions.net/pdo#count. Since the behavior changed simultaneously in both MariaDB and MySQL I appreciate that it was intentionally changed to resolve an issue. But I still need to know the
total number of records that will be returned when I issue the SELECT with those GROUP BY clauses. With the current behavior if the main query would return a million rows, I need to know that
before I issue the main query. I cannot issue the query and count how many rows I got back. But with this intentional behavior I instead get a response set that has a
million rows in it, with each row containing the value 1. I would have to read in those
million rows and add up all of the 1s to get the total number of records. That would almost certainly blow my PHP instance out of the water!
Specifically the database maintains historical information about people in a family tree. The query seeks to include individuals who match particular names and may also require that they born or died within certain date ranges or born or died at particular locations, etc. The specific query here is a very simple one which only looks at surnames, but I wanted to give the simplest possible example while looking for a solution. The birth and death dates and locations are not recorded in the Persons record because historical documents may supply different dates so the database schema records those dates in separate records in the Events table along with every other fact about the person. I use the GROUP BY because one possibility is that researchers accidentally created two birth events with the same date, and I do not want to return two records which contain the same information. If I do distinct only on the Persons fields then if two birth event records exist that both match the date range but have different values of EventD I
do want to present both of those records to the user because they
are different.
OK. I rearranged the query for the count of the number of records which would be returned to:
- Code: Select all
SELECT COUNT(DISTINCT tblIR.IDIR, tblIR.ID, EBirth.EventD, EDeath.EventD, tblNX.Surname, tblNX.GivenName, tblNX.`Order`, EBirth.EventSD, tblIR.BirthSD) FROM tblIR JOIN tblNX ON tblNX.IDIR=tblIR.IDIR JOIN tblNR ON tblNR.Surname=tblNX.Surname LEFT JOIN tblER AS EBirth ON (EBirth.IDIR=tblIR.IDIR AND EBirth.IDType=0 AND EBirth.IDET=3 AND EBirth.Preferred=1) LEFT JOIN tblER AS EDeath ON (EDeath.IDIR=tblIR.IDIR AND EDeath.IDType=0 AND EDeath.IDET=6 AND EDeath.Preferred=1) WHERE tblNX.`Surname`='Sinclair' AND tblNX.TreeName='' AND tblNX.`Order`>=0
This returned the value 28. But if I removed the COUNT and just asked for all of the distinct rows:
- Code: Select all
SELECT DISTINCT tblIR.IDIR, tblIR.ID, EBirth.EventD, EDeath.EventD, tblNX.Surname, tblNX.GivenName, tblNX.`Order`, EBirth.EventSD, tblIR.BirthSD FROM tblIR JOIN tblNX ON tblNX.IDIR=tblIR.IDIR JOIN tblNR ON tblNR.Surname=tblNX.Surname LEFT JOIN tblER AS EBirth ON (EBirth.IDIR=tblIR.IDIR AND EBirth.IDType=0 AND EBirth.IDET=3 AND EBirth.Preferred=1) LEFT JOIN tblER AS EDeath ON (EDeath.IDIR=tblIR.IDIR AND EDeath.IDType=0 AND EDeath.IDET=6 AND EDeath.Preferred=1) WHERE tblNX.`Surname`='Sinclair' AND tblNX.TreeName='' AND tblNX.`Order`>=0
It returned
221 rows! This is also the number of rows returned by:
- Code: Select all
SELECT tblIR.*, EBirth.EventD AS EBirthD, EDeath.EventD AS EDeathD, tblNX.Surname AS indexsurname, tblNX.GivenName AS indexgivenname, tblNX.`Order` AS indextype FROM tblIR JOIN tblNX ON tblNX.IDIR=tblIR.IDIR JOIN tblNR ON tblNR.Surname=tblNX.Surname LEFT JOIN tblER AS EBirth ON (EBirth.IDIR=tblIR.IDIR AND EBirth.IDType=0 AND EBirth.IDET=3 AND EBirth.Preferred=1) LEFT JOIN tblER AS EDeath ON (EDeath.IDIR=tblIR.IDIR AND EDeath.IDType=0 AND EDeath.IDET=6 AND EDeath.Preferred=1) WHERE tblNX.`Surname`='Sinclair' AND tblNX.TreeName='' AND tblNX.`Order`>=0 GROUP BY `tblIR`.IDIR, tblIR.ID, EBirth.EventD, EDeath.EventD, tblNX.Surname, tblNX.GivenName, tblNX.`Order`, EBirth.EventSD, tblIR.BirthSD ORDER BY `tblIR`.Surname,`tblIR`.GivenName,COALESCE(EBirth.EventSD,`tblIR`.BirthSD,-99999999)
After many experiments I found the following worked, doing a count of a subquery:
- Code: Select all
SELECT COUNT(1) FROM (SELECT DISTINCT tblIR.*, EBirth.EventD AS EBirthD, EDeath.EventD AS EDeathD, tblNX.Surname AS indexsurname, tblNX.GivenName AS indexgivenname, tblNX.`Order` AS indextype FROM tblIR JOIN tblNX ON tblNX.IDIR=tblIR.IDIR JOIN tblNR ON tblNR.Surname=tblNX.Surname LEFT JOIN tblER AS EBirth ON (EBirth.IDIR=tblIR.IDIR AND EBirth.IDType=0 AND EBirth.IDET=3 AND EBirth.Preferred=1) LEFT JOIN tblER AS EDeath ON (EDeath.IDIR=tblIR.IDIR AND EDeath.IDType=0 AND EDeath.IDET=6 AND EDeath.Preferred=1) WHERE tblNX.`Surname`='Sinclair' AND tblNX.TreeName='' AND tblNX.`Order`>=0) AS Temp
This technique will probably also work with the original query using GROUP BY statements.