Very slow self-join query vs MySQL

Alles, was MariaDB und MySQL betrifft, kann hier besprochen werden.

Very slow self-join query vs MySQL

Postby Blackmyre » 03. July 2021 20:05

I have a query that runs 400 times slower in a default install of MariaDB than it does in MySQL. Query is:
Code: Select all
SELECT `prices`.* FROM `prices` LEFT JOIN `prices` `p2` ON p2.commodity_guid=prices.commodity_guid
and year(p2.date)=year(prices.date)
and month(p2.date)=month(prices.date)
and prices.date<p2.date;

The [InnoDB] table definition is:
Code: Select all
show columns in prices;
+----------------+---------------+------+-----+---------------------+-------+
| Field          | Type          | Null | Key | Default             | Extra |
+----------------+---------------+------+-----+---------------------+-------+
| guid           | varchar(32)   | NO   | PRI | NULL                |       |
| commodity_guid | varchar(32)   | NO   |     | NULL                |       |
| currency_guid  | varchar(32)   | NO   |     | NULL                |       |
| date           | datetime      | NO   |     | 1970-01-01 00:00:00 |       |
| source         | varchar(2048) | YES  |     | NULL                |       |
| type           | varchar(2048) | YES  |     | NULL                |       |
| value_num      | bigint(20)    | NO   |     | NULL                |       |
| value_denom    | bigint(20)    | NO   |     | NULL                |       |
+----------------+---------------+------+-----+---------------------+-------+

There are about 10,000 rows and no indexes (database not under my control).

The query runs fine on my development machine, which uses MySQL (v8.0) but very slowly on the target platform (MariaDB 10.1). I have now installed MariaDB (10.1 and 10.6) alongside MySQL on my development machine and I am seeing the same difference: the query takes 16 seconds in MariaDB but only 0.4 seconds in MySQL (both with default settings, as far as I am aware).

I'm no database expert and I have no idea what is likely to be causing such a huge difference in performance. Can anyone suggest how I could find and fix whatever is causing the bottleneck?
Blackmyre
 
Posts: 5
Joined: 03. July 2021 19:33
XAMPP version: LAMP
Operating System: Linux

Re: Very slow self-join query vs MySQL

Postby Nobbie » 04. July 2021 22:50

What is the goal of that weird query?
Nobbie
 
Posts: 13175
Joined: 09. March 2008 13:04

Re: Very slow self-join query vs MySQL

Postby Blackmyre » 05. July 2021 08:17

It's part of a "greatest n per group" query. It makes more sense in its complete form with other related tables, I reduced it to this while investigating the performance problem. Regardless of the goal though, it's valid SQL that performs 400x slower in MariaDB than in MySQL.
Blackmyre
 
Posts: 5
Joined: 03. July 2021 19:33
XAMPP version: LAMP
Operating System: Linux

Re: Very slow self-join query vs MySQL

Postby Nobbie » 05. July 2021 17:15

Blackmyre wrote:It's part of a "greatest n per group" query.


Thats what i thought. Why not doing an intelligent Query instead of that brut force dumb query? For example:

Code: Select all
Select commodity_guid, MAX(date) From Prices Group By commodity_guid, year(prices.date), month(prices.date)


Only an example. i am pretty sure that your brut force query is NOT the way to go. I do not believe that you have to join the table to itself. That is bad programming.

Blackmyre wrote:Regardless of the goal though, it's valid SQL that performs 400x slower in MariaDB than in MySQL.


Its "correct" syntax, but its bad programming and a huge join without any index is running slowly, yes. I dont know if MariaDB or MySQL performs any optimization, it may also be a matter of caching, but it is a weird query and without indexes a huge task. Its not the way to go.
Nobbie
 
Posts: 13175
Joined: 09. March 2008 13:04

Re: Very slow self-join query vs MySQL

Postby Blackmyre » 06. July 2021 08:58

As I understand it the self outer join is a standard and well established way to solve the greatest-n-per-group problem, where the row is required rather than just a grouped column or an aggregate. It's an approach that was recommended to me some time ago by someone far more experienced with SQL than I am.

The aim of the query is to get the latest price value for each commodity per month. I could use a GROUP BY query to get the latest date (per commodity per month) with something like MAX(date), but what I need is the value columns from the latest date rather than the date itself. The self outer join approach gets the row containing the latest date, so other columns from the row are then available. My SQL experience is fairly limited but this approach has always seemed to perform very well, and does absolutely fine in this case on MySQL. Presumably it is something to do with optimisations, I was just very surprised to see such a huge difference on MariaDB which I had understood to be a drop-in replacement that usually performs better.

I get the impression that windowing functions might be a better approach. I've never used those before so I'll do a bit of reading around, although they're only supported in later versions of MariaDB and I don't know whether I'll be able to upgrade it on the target machine. I would also have to see whether the windowing functions are supported by the ORM that generates the SQL. Never a dull moment :)
Blackmyre
 
Posts: 5
Joined: 03. July 2021 19:33
XAMPP version: LAMP
Operating System: Linux

Re: Very slow self-join query vs MySQL

Postby Nobbie » 06. July 2021 12:02

Blackmyre wrote:As I understand it the self outer join is a standard and well established way to solve the greatest-n-per-group problem, where the row is required rather than just a grouped column or an aggregate.


Yes and no, its "old school" (and i am old school anyway), but its not a MUST and you always have to keep in mind the size of your data and if you have proper indexes. In your case, you have plenty of data and no index, your join results in a table with at least 100.000.000 (!) lines (one hundert million!) to compare. That is quite big.

A good SQL programmer must always keep in mind performance issues, not so important for batch programming, but super important for online programming. The Select above should (i could not test it due to missing data) result in a table with commodity_guid and the maximum date per month (exactly what you are looking for). Simply put the result into another table and finally join it with the prices table - that should run thousand times as fast as your SQL. I only show the SQL statements, you probably embed these into PHP code or so.

Code: Select all
Select into P2 commodity_guid As cg, MAX(date) As md From Prices Group By commodity_guid, year(date), month(date)
Select ... FROM prices, P2 WHERE commudity_guid = cg AND date = md


Do not Select * in an SQL, instead apply all columns precisely which you need (Select * is not good programming style). You could also put the whole thing into one SQL Select with subselect, but as said, i am "old fashioned" as well (i learned twenty years ago with MySQL 3.0, there was no subselect) and anyway, its easer to read to put into two selects. Keep it simple and easy! You should give it a try.
Nobbie
 
Posts: 13175
Joined: 09. March 2008 13:04

Re: Very slow self-join query vs MySQL

Postby Blackmyre » 08. July 2021 08:38

Interesting. I had tried a similar approach as a subquery but that only returned a single column, which didn't help. I never thought about creating a separate table (said I have limited database skills). The program uses a read-only account but I should be able to get the CREATE TEMPORARY TABLES privilege. I don't know if it will be possible to get the ORM to take this approach but for this task I could bypass the ORM. Thanks for the suggestion.
Nobbie wrote:Do not Select * in an SQL, instead apply all columns precisely which you need (Select * is not good programming style).

Understood, and in my own code I wouldn't select more columns than I need. The snippet I posted was extracted from ORM-generated code, and the ORM presumably wants all columns so it can populate the object properties.
You could also put the whole thing into one SQL Select with subselect,

Haven't heard of subselect, will look it up, thanks.
Blackmyre
 
Posts: 5
Joined: 03. July 2021 19:33
XAMPP version: LAMP
Operating System: Linux

Re: Very slow self-join query vs MySQL

Postby Nobbie » 08. July 2021 13:23

What is "ORM"?
Nobbie
 
Posts: 13175
Joined: 09. March 2008 13:04

Re: Very slow self-join query vs MySQL

Postby Blackmyre » 08. July 2021 18:00

https://en.wikipedia.org/wiki/Object%E2%80%93relational_mapping
Blackmyre
 
Posts: 5
Joined: 03. July 2021 19:33
XAMPP version: LAMP
Operating System: Linux


Return to MariaDB - MySQL

Who is online

Users browsing this forum: No registered users and 14 guests