-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix unlimited key lengths issue #168
Fix unlimited key lengths issue #168
Conversation
tests/WP_SQLite_Translator_Tests.php
Outdated
`meta_value` text DEFAULT NULL, | ||
PRIMARY KEY (`id`), | ||
KEY `order_id_meta_key_meta_value` (`order_id`,`meta_key`(100),`meta_value`(82)), | ||
KEY `meta_key_value` (`meta_key`(100),`meta_value`(82)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JanJakes, currently, it skips all lengths.
As we don't store the lengths, do you have any idea how to approach that in a hotfix? Assume 100 for any varchar or text fields?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wojtekn Using something like 100 sounds like a reasonable hotfix. Maybe we could go for 191, as WordPress seems to use that value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I was fixing these CREATE TABLE statements manually to attempt using it for import in Jetpack, I used 191
for length as well and that fixed the issue for me and I was able to work with these statements. This was an example:
DROP TABLE IF EXISTS `wp_wc_orders_meta`;
CREATE TABLE `wp_wc_orders_meta` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`order_id` bigint(20) unsigned DEFAULT NULL,
`meta_key` varchar(255) DEFAULT NULL,
`meta_value` text DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `order_id_meta_key_meta_value` (`order_id`, `meta_key`, `meta_value`(191)),
KEY `meta_key_value` (`meta_key`, `meta_value`(191))
);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wojtekn Using something like 100 sounds like a reasonable hotfix. Maybe we could go for 191, as WordPress seems to use that value.
Thanks for sharing @JanJakes . It seems they calculated 191 assuming the limit of 767 bytes and utf8mb4
, which uses 4 bytes per character. It works for the index with one field, but if we used such a value for an index that has more fields, it wouldn't work.
Currently, at least on WoA sites, the limit is 3072 bytes, and the collation used is latin1_swedish_ci
, which uses 1 byte per character. I played with the queries shown in the issue description, and it allowed me to add an index when I capped the text field length to 2809 characters. With one more, it failed:
KEY `order_id_meta_key_meta_value` (`order_id`, `meta_key`, `meta_value`(2810))
However, if collation were set to utf8mb4
, I would need to set it to 511 instead. If the limit for the index was 767 bytes, the meta_key
alone would exceed it.
I propose going with an arbitrary value of 100 to keep things simple. With that one, for any varchar and text field, for given settings, it could support:
utf8mb4
and limit of 767 bytes: one text/varchar field plus a few integersutf8mb4
and limit of 3072 bytes: seven text/varchar fields plus a few integerslatin1_swedish_ci
and limit of 767 bytes: seven text/varchar fields plus a few integerslatin1_swedish_ci
and limit of 3072 bytes: 30 text/varchar fields plus a few integers
If we dropped it to 80, in the first, worst-case scenario, it would support two text/varchar fields and a few integers. Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@katinthehatsite, according to my tests, values higher than 191 would make it work for you, too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It works for the index with one field, but if we used such a value for an index that has more fields, it wouldn't work.
Ah, right!
I propose going with an arbitrary value of 100 to keep things simple.
If we dropped it to 80, in the first, worst-case scenario, it would support two text/varchar fields and a few integers.
That sounds reasonable to me. I think 80 is fine as well. Unless some string values are expected to share very long prefixes, I actually don't see much need to index that much. That said, WP tries to make use of the 191 limit, so pushing it to something like 80 or 100 might be the way to go.
On the other hand, if we expect some plugins to index 3 string fields (which I'm sure can happen), then we may even go with something like 50. After all, these numbers are the maximums of what can be indexed, and I would expect the actual values to be much shorter in most cases.
It seems we are not there yet. The current implementation results in producing an incorrect
|
The issue mentioned in previous comment is fixed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me, but we could try to follow more nuances of the syntax and more types, if that's not too much trouble.
$data_length = $key_length_limit; | ||
|
||
// Extract the length from the data type. Make it lower if needed. | ||
if ( 1 === preg_match( '/^(\w+)\((\d+)\)$/', $data_type, $matches ) ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wojtekn I think this would fail to capture types with spaces, such as varchar (20)
, character varying(20)
, etc. I'm not sure if they are normalized when saved.
Not too important, but I guess it can be done without a regex with something like (int) (explode( '(', $data_type )[ 1 ] ?? '' )
, which would return > 0
when the length exists and 0
otherwise.
} | ||
|
||
// Set the data length to the varchar and text key length | ||
if ( 'text' === $data_type || str_starts_with( $data_type, 'varchar' ) ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is likely to be the case as well for all the variants of text
(tinytext
, text
, mediumtext
, longtext
), as well as blob
(tinyblob
, blob
, mediumblob
, longblob
), and varbinary
. With the binary data, the limits are in bytes, so the space is potentially bigger, but it's probably fine to cap them at 100 too.
With char
and binary
it seems to work fine, as they are limited to the length of 255 (tinytext
and tinyblob
are too, but they still require limiting the index).
Since some of the data types can be written in many ways (char varying
, character varying
, varchar
, national varchar
, nvarchar
, nchar varchar
, national char varying
, nchar varying
, long char varying
, long character varying
, long varchar
, ...), and there are all the text
and blob
variants, we could probably just check whether the type contains one of the following substrings — varchar
, varying
, text
, blob
, varbinary
.
I addressed the final review comments in #169. I know it's clunky to open a new PR, but I couldn't find a better way to put up my commits. |
@fredrikekelund, thanks for opening another PR. What if we used |
$data_length = $key_length_limit; | ||
|
||
// Extract the length from the data type. Make it lower if needed. Skip 'unsigned' parts and whitespace. | ||
if ( 1 === preg_match( '/^(\w+)\s*\(\s*(\d+)\s*\)\s*(.*)?$/', $data_type, $matches ) ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just a tiny nitpick:
- I guess the trailing
(.*)?
is equivalent to just.*
, or we can simply drop the$
as anchoring the beginning is enough. - The
Skip 'unsigned' parts
piece of the comment is probably rather irrelevant, as we're not trying to match integers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for ideas, I cleaned up this regex more.
86547ee
to
a6c240e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
Fixes #167
I propose to fix an issue where dumping the table with keys that use multiple long fields produces SQL code that produces an error.