Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix unlimited key lengths issue #168

Merged
merged 13 commits into from
Dec 20, 2024

Conversation

wojtekn
Copy link
Contributor

@wojtekn wojtekn commented Dec 12, 2024

Fixes #167

I propose to fix an issue where dumping the table with keys that use multiple long fields produces SQL code that produces an error.

`meta_value` text DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `order_id_meta_key_meta_value` (`order_id`,`meta_key`(100),`meta_value`(82)),
KEY `meta_key_value` (`meta_key`(100),`meta_value`(82))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JanJakes, currently, it skips all lengths.

As we don't store the lengths, do you have any idea how to approach that in a hotfix? Assume 100 for any varchar or text fields?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wojtekn Using something like 100 sounds like a reasonable hotfix. Maybe we could go for 191, as WordPress seems to use that value.

Copy link

@katinthehatsite katinthehatsite Dec 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I was fixing these CREATE TABLE statements manually to attempt using it for import in Jetpack, I used 191 for length as well and that fixed the issue for me and I was able to work with these statements. This was an example:

DROP TABLE IF EXISTS `wp_wc_orders_meta`;
CREATE TABLE `wp_wc_orders_meta` (
	`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
	`order_id` bigint(20) unsigned DEFAULT NULL,
	`meta_key` varchar(255) DEFAULT NULL,
	`meta_value` text DEFAULT NULL,
	PRIMARY KEY (`id`),
	KEY `order_id_meta_key_meta_value` (`order_id`, `meta_key`, `meta_value`(191)),
	KEY `meta_key_value` (`meta_key`, `meta_value`(191))
);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wojtekn Using something like 100 sounds like a reasonable hotfix. Maybe we could go for 191, as WordPress seems to use that value.

Thanks for sharing @JanJakes . It seems they calculated 191 assuming the limit of 767 bytes and utf8mb4, which uses 4 bytes per character. It works for the index with one field, but if we used such a value for an index that has more fields, it wouldn't work.

Currently, at least on WoA sites, the limit is 3072 bytes, and the collation used is latin1_swedish_ci, which uses 1 byte per character. I played with the queries shown in the issue description, and it allowed me to add an index when I capped the text field length to 2809 characters. With one more, it failed:

	KEY `order_id_meta_key_meta_value` (`order_id`, `meta_key`, `meta_value`(2810))

However, if collation were set to utf8mb4, I would need to set it to 511 instead. If the limit for the index was 767 bytes, the meta_key alone would exceed it.

I propose going with an arbitrary value of 100 to keep things simple. With that one, for any varchar and text field, for given settings, it could support:

  • utf8mb4 and limit of 767 bytes: one text/varchar field plus a few integers
  • utf8mb4 and limit of 3072 bytes: seven text/varchar fields plus a few integers
  • latin1_swedish_ci and limit of 767 bytes: seven text/varchar fields plus a few integers
  • latin1_swedish_ci and limit of 3072 bytes: 30 text/varchar fields plus a few integers

If we dropped it to 80, in the first, worst-case scenario, it would support two text/varchar fields and a few integers. Thoughts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@katinthehatsite, according to my tests, values higher than 191 would make it work for you, too.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wojtekn

It works for the index with one field, but if we used such a value for an index that has more fields, it wouldn't work.

Ah, right!

I propose going with an arbitrary value of 100 to keep things simple.
If we dropped it to 80, in the first, worst-case scenario, it would support two text/varchar fields and a few integers.

That sounds reasonable to me. I think 80 is fine as well. Unless some string values are expected to share very long prefixes, I actually don't see much need to index that much. That said, WP tries to make use of the 191 limit, so pushing it to something like 80 or 100 might be the way to go.

On the other hand, if we expect some plugins to index 3 string fields (which I'm sure can happen), then we may even go with something like 50. After all, these numbers are the maximums of what can be indexed, and I would expect the actual values to be much shorter in most cases.

@wojtekn wojtekn marked this pull request as ready for review December 12, 2024 16:30
@wojtekn wojtekn marked this pull request as draft December 12, 2024 17:53
@wojtekn
Copy link
Contributor Author

wojtekn commented Dec 12, 2024

It seems we are not there yet. The current implementation results in producing an incorrect CREATE TABLE as the key field length can't be longer than field length itself:

DROP TABLE IF EXISTS `_tmp_table`;
CREATE TABLE `_tmp_table` (
	`field` varchar(20) NOT NULL DEFAULT 'Y',
	KEY `field` (`field`(100))
);

@wojtekn wojtekn marked this pull request as ready for review December 13, 2024 10:28
@wojtekn
Copy link
Contributor Author

wojtekn commented Dec 16, 2024

The issue mentioned in previous comment is fixed.

Copy link
Collaborator

@JanJakes JanJakes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me, but we could try to follow more nuances of the syntax and more types, if that's not too much trouble.

$data_length = $key_length_limit;

// Extract the length from the data type. Make it lower if needed.
if ( 1 === preg_match( '/^(\w+)\((\d+)\)$/', $data_type, $matches ) ) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wojtekn I think this would fail to capture types with spaces, such as varchar (20), character varying(20), etc. I'm not sure if they are normalized when saved.

Not too important, but I guess it can be done without a regex with something like (int) (explode( '(', $data_type )[ 1 ] ?? '' ), which would return > 0 when the length exists and 0 otherwise.

}

// Set the data length to the varchar and text key length
if ( 'text' === $data_type || str_starts_with( $data_type, 'varchar' ) ) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is likely to be the case as well for all the variants of text (tinytext, text, mediumtext, longtext), as well as blob (tinyblob, blob, mediumblob, longblob), and varbinary. With the binary data, the limits are in bytes, so the space is potentially bigger, but it's probably fine to cap them at 100 too.

With char and binary it seems to work fine, as they are limited to the length of 255 (tinytext and tinyblob are too, but they still require limiting the index).

Since some of the data types can be written in many ways (char varying, character varying, varchar, national varchar, nvarchar, nchar varchar, national char varying, nchar varying, long char varying, long character varying, long varchar, ...), and there are all the text and blob variants, we could probably just check whether the type contains one of the following substrings — varchar, varying, text, blob, varbinary.

@fredrikekelund
Copy link

I addressed the final review comments in #169. I know it's clunky to open a new PR, but I couldn't find a better way to put up my commits.

@wojtekn
Copy link
Contributor Author

wojtekn commented Dec 19, 2024

@fredrikekelund, thanks for opening another PR. What if we used $field_types_translation to make the condition more generic? I've updated this PR.

$data_length = $key_length_limit;

// Extract the length from the data type. Make it lower if needed. Skip 'unsigned' parts and whitespace.
if ( 1 === preg_match( '/^(\w+)\s*\(\s*(\d+)\s*\)\s*(.*)?$/', $data_type, $matches ) ) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just a tiny nitpick:

  • I guess the trailing (.*)? is equivalent to just .*, or we can simply drop the $ as anchoring the beginning is enough.
  • The Skip 'unsigned' parts piece of the comment is probably rather irrelevant, as we're not trying to match integers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for ideas, I cleaned up this regex more.

@wojtekn wojtekn force-pushed the fix/unlimited-key-lengths branch from 86547ee to a6c240e Compare December 20, 2024 10:24
Copy link
Collaborator

@JanJakes JanJakes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@JanJakes JanJakes merged commit 658f32f into WordPress:develop Dec 20, 2024
8 checks passed
@JanJakes JanJakes mentioned this pull request Dec 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Create table produces too long keys
4 participants