Description
Gorse version
Latest nightly
Describe the bug
I'm not seeing collaborative recommendations being created for most users, only a very small fraction appears to succeed.
I'm seeing workers crash a lot with this error:
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x30 pc=0x1bb7c0a]
goroutine 1 [running]:
github.com/zhenghaoz/gorse/logics.(*MatrixFactorization).Search(0x0, {0xc026aa8dc0?, 0xc059c80f60?, 0x18?}, 0x1e37e00?)
/src/logics/cf.go:58 +0x2a
github.com/zhenghaoz/gorse/worker.(*Worker).collaborativeRecommendHNSW(0xc000298900, 0x0, {0xc059c80f60, 0x18}, {0x24a9540, 0xc04d7b6580}, 0xc03bc40608)
/src/worker/worker.go:867 +0x17c
github.com/zhenghaoz/gorse/worker.(*Worker).Recommend.func2(0x0, 0x1477)
/src/worker/worker.go:620 +0xc9a
github.com/zhenghaoz/gorse/common/parallel.Parallel(0x4793, 0xc021f16810?, 0xc047c1a0a0)
/src/common/parallel/parallel.go:40 +0xf7
github.com/zhenghaoz/gorse/worker.(*Worker).Recommend(0xc000298900, {0xc062c80000, 0x4793, 0x4955})
/src/worker/worker.go:574 +0xbca
....
To Reproduce
I believe it to be a race condition, but not sure under which circumstances exactly it triggers.
-
w.matrixFactorization
is nil here:
Line 620 in ec03de6
-
It's set to nil here:
Line 276 in ec03de6
-
It's supposed to be created here:
Line 545 in ec03de6
Whenever 2.
occurs between 3.
and 1.
this error is going to be triggered. I believe there needs to be a mutex protecting it, or alternatively use atomic pointers and update both the model and matrixFactorization atomically.