-
-
Notifications
You must be signed in to change notification settings - Fork 13
migrate from NHunspell to WeCantSpell.Hunspell #309
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…s with modifying the word list at runtime.
If it is of interest to you, I added "Add" methods in the following release: https://www.nuget.org/packages/WeCantSpell.Hunspell/6.0.0 . |
…some bugs around normalization and the prototype words
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see the updated WordList being serialized back out to disk for WeCantSpell, am I missing it? (I'm pretty sure that we do with NHunspell because we end up suggesting people use that dictionary in LibreOffice ect.
Reviewed 5 of 7 files at r1, 2 of 2 files at r2, 4 of 4 files at r3, all commit messages.
Reviewable status:complete! all files reviewed, all discussions resolved (waiting on @hahn-kev)
src/SIL.LCModel.Core/SpellChecking/SpellEngine.cs
line 93 at r2 (raw file):
private bool _isVernacular; private bool _gotIsVernacular;
Real question. Now that we have the option which is better, two bools or one bool?
WeCantSpell doesn't support that. Add operates on the in-memory internal state only. It would be a huge pile of problems to solve to save after modification. For example, should comments be preserved on save? What about the ordering of lines in the dic/aff files? What about any simplifications or fixes that occur during load, should those be preserved? OH HECKIN' HECK what about the file encodings 🫠. Just ... so much to stress over 😬. Does NHunspell support saving? I never noticed that in their API. To work with v6 , you would have to keep something like:
On load, load "base", then add words from your own file. When adding a word in memory, add it to both the WordList and "stuff-to-add.txt", either directly to the file or in to an in memory See also: aarondandy/WeCantSpell.Hunspell#103
If you are asking me, one private bool? _isVernacular;
public bool IsVernacular
{
get
{
_isVernacular ??= Check(SpellingHelper.PrototypeWord);
return _isVernacular.GetValueOrDefault();
}
} |
@jasonleenaylor I think saving changes is handled outside NHunspell in an Exception file which you can see here liblcm/src/SIL.LCModel.Core/SpellChecking/SpellEngine.cs Lines 98 to 142 in aee8c1a
SetStatusInternal is abstract and implemented by the specific library, this is basically exactly what aarondandy was suggesting. There's no code that I can see which is saving from NHunspell, I'm not even sure how you would do that given it's API.
|
I've updated this PR to remove NHunspell, and remove the strong name requirement, we're no longer strong naming LCM dlls. I'm not sure if this would qualify as a breaking change. If there's a program using LCM which requires strong named assemblies then this would be breaking, but I wouldn't expect that. As it is there's a bunch of solutions on nuget to sign other nuget packages automatically to work around this problem, so anyone downstream that still wants strong names I would recommend that. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 13 of 17 files at r5, all commit messages.
Reviewable status: 15 of 19 files reviewed, all discussions resolved (waiting on @jasonleenaylor)
tests/SIL.LCModel.Core.Tests/App.config
line 7 at r5 (raw file):
<listeners> <clear /> <add name="FwTraceListener" type="SIL.LCModel.Utils.EnvVarTraceListener, SIL.LCModel.Utils, Culture=neutral, PublicKeyToken=f245775b81dcfaab" initializeData="assertuienabled='true' logfilename='%temp%/asserts.log'" />
What did Culture=neutral
do and why isn't it needed anymore?
src/SIL.LCModel.Core/SpellChecking/SpellEngineWeCantSpell.cs
line 0 at r5 (raw file):
.editorconfig
has max_line_length = 98
but this file has a number of lines longer than that.
src/SIL.LCModel.Core/SpellChecking/SpellEngineWeCantSpell.cs
line 1 at r5 (raw file):
using System;
It looks like using System;
isn't needed.
Directory.Build.props
line 18 at r5 (raw file):
<AppendTargetFrameworkToOutputPath>false</AppendTargetFrameworkToOutputPath> <PackageOutputPath>$(MSBuildThisFileDirectory)/artifacts</PackageOutputPath> <WarningsAsErrors>NU1605;CS8002</WarningsAsErrors>
This file has a mix of space and tab indenting.
src/SIL.LCModel.Core/SpellChecking/SpellEngine.cs
line 11 at r5 (raw file):
using System.Text; using Icu; using SIL.PlatformUtilities;
Looks like using SIL.PlatformUtilities;
is no longer needed.
tests/SIL.LCModel.Core.Tests/SpellChecking/SpellingHelperTests.cs
line 8 at r5 (raw file):
using System.IO; using System.Linq; using System.Runtime.InteropServices;
Is using System.Runtime.InteropServices;
still needed?
src/SIL.LCModel.Core/SpellChecking/SpellEngine.cs
line 94 at r2 (raw file):
private bool _isVernacular; private bool _gotIsVernacular; public bool IsVernacular
According to https://stackoverflow.com/a/78230896/10210583, an empty /// <inheritdoc /> is useless; since you've dropped it from
IsVernacular`, you might as well drop the other ones in this file.
It technically is a breaking change and is why I don't want to get myself involved in any strong naming 😅. I think you should be careful about doing this but that doesn't mean you can't obviously. If you understand your user base well enough (maybe your user base is you) it makes it a lot easier. It has been a long while since I have had to deal with GAC, strong naming, or Full Framework dependencies but I think that is where you may run into most issues. That said I haven't researched it fully myself and I forgot everything I might have known about it a decade ago... and I never looked back!
This along with the public key token is part of the assembly's fully qualified name. If you aren't doing strong naming, you don't need that. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 17 files at r5.
Reviewable status: 17 of 19 files reviewed, all discussions resolved (waiting on @jasonleenaylor)
Thanks for the review @imnasnainaec, I've applied changes from your feedback |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 3 of 4 files at r6, all commit messages.
Reviewable status: 18 of 19 files reviewed, all discussions resolved (waiting on @jasonleenaylor)
Directory.Build.props
line 18 at r5 (raw file):
Previously, imnasnainaec (D. Ror.) wrote…
This file has a mix of space and tab indenting.
It still does.
…spaces in Directory.Build.props
Alright @imnasnainaec it turns out the props file wasn't caught by the editor config which is why my editor was formatting it incorrectly. I've fixed that and the whitespace. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 2 files at r7, all commit messages.
Reviewable status:complete! all files reviewed, all discussions resolved (waiting on @hahn-kev)
Right now LCM is using NHunspell, which hasn't been updated in 9 years and only supports dotnet framework (it does work on windows through compatibility, but not any other platform).
It looks like the suggested alternative is WeCantSpell.Hunspell.
One issue with WeCantSpell is that it makes the wordlist read only once it's been parsed. There's a discussion about that here, from what I can tell that's the biggest hurdle blocking us from adopting the library. I gave it a try, but I don't understand some of the Hunspell details enough to figure out what's going on in our old version.
Waiting for #306 to be merged in so the tests run.
This change is