Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replication of cache data to other nodes #35

Open
shannonantony opened this issue Jan 21, 2019 · 20 comments
Open

Replication of cache data to other nodes #35

shannonantony opened this issue Jan 21, 2019 · 20 comments

Comments

@shannonantony
Copy link

Implement replicating nuster node cache to other nodes

@jiangwenyuan
Copy link
Owner

jiangwenyuan commented Jan 21, 2019

That would be great, in fact, it's in the todo list. I'll work on this after HAProxy v1.9 migration, some cache headers and disk persistence implementation. The progress is slow as I don't have much spare time as I used to.

@shannonantony
Copy link
Author

shannonantony commented Jan 21, 2019 via email

@shannonantony
Copy link
Author

shannonantony commented Jan 21, 2019 via email

@jiangwenyuan
Copy link
Owner

Hi, haven't looked into peers yet, that would be great if I can just use the sync feature from peers.

@igorescobar
Copy link

Meanwhile, what I'm doing to "solve" this issue is by using AWS EFS which allows me to have several nodes/instances/containers pointing to the same data volume for caching. So regardless of how many containers I have going up or down they are always sharing the same volume for caching which is great. 👍

@hugos99
Copy link

hugos99 commented Jul 30, 2020

I am trying to deploy nuster in a "cluster" (multiple containers) configuration, and the only problem I am having is that while they share the cache in a shared disk and not in memory, the disk loader doesn't synchronize the cache cross the different containers. Is there a way that I can force the disk loader to constantly ensure all containers have the updated cache?

@jiangwenyuan
Copy link
Owner

@hugos99 What's your setup? a local disk directory mounted in multiple containers?

@hugos99
Copy link

hugos99 commented Jul 30, 2020

@jiangwenyuan It's a Kubernetes Cluster with a shared NFS folder across all containers and it resolves to a single shared folder and multiple containers accessing it

@jiangwenyuan
Copy link
Owner

jiangwenyuan commented Jul 30, 2020

@hugos99 Should work then. disk loader only loads the cache files metainfo into memory on startup, it does not sync nor update cache files. Since you are using shared NFS folder, then all containers are seeing the same content, so there's no need to sync. If container 1 creates a cache file in the shared NFS folder, container 2 can use that file to serve the identical request.
Make sure you are using memory off disk on

@hugos99
Copy link

hugos99 commented Jul 30, 2020

Yeah I know but what happens is since the load process only occurs on startup and is not checking constantly for new cached data what happens is that if a pod dies all other living pods don't "know" all the data that that one pod cached and as such they perform the request again

This could be solved if the disk load thread was constantly operating trying to read the data every X nanoseconds

For reference, I am using memory off disk on as to ensure that the cache is disk only

@jiangwenyuan I made a fork of the repository with an ugly fix for this problem check it out at https://github.com/HugoS99/nuster/blob/master/src/nuster/store/disk.c I basically just make the thread constantly operate

@hugos99
Copy link

hugos99 commented Jul 30, 2020

@jiangwenyuan Want me to create a pull request with my ugly fix?

@jiangwenyuan
Copy link
Owner

@hugos99 You are right, and I now understand why #82 high cache MISS issue happens in multiple containers setup.

When the loader is done, nuster does not check disk anymore and only relies on memory, so if another container creates a new file, it does not know.

Thanks for the proposal, but constantly loading does not solve this problem(because it may happen before next round loading) and is not necessary in single nuster setup.

I probably will add a new mode for multiple nuster setup.

@hugos99
Copy link

hugos99 commented Jul 30, 2020

I agree that a new "cluster" mode should be created, but I believe the constant read is the most generic and more natural way to facilitate this feature the best solution would be to allow the user to set the time between round loading you may see my implementation for reference (where each cycle takes 300ms) simply allow a user to set that value with the clarification that there may be missed caches with this setup, but a user can simply put 0 to ensure the best possible synchronization.

@jiangwenyuan
Copy link
Owner

nst_disk_load does not load all files but several files. If you have like 10M files, it will cause much time to complete the load. And it will generate lots IO.

@hugos99
Copy link

hugos99 commented Jul 30, 2020

So a method that compared all the files already loaded with the not loaded is required with that the solution would be valid right?

@jiangwenyuan
Copy link
Owner

Yeah. Currently a simplest solution pops up into my head is a new mode that always check disk if does not exists in memory.

So currently the logic is (for disk on memory off):

  • check memory(does disk.file exist?)
    • yes, use disk.file
    • no, loaded?
      • no, check disk (does x/xx/uuid file exist?)
      • yes, goto backend

So the new mode is:

  • check memory(does disk.file exist?)
    • yes, use disk.file
    • no, loaded?
      • no, check disk (does x/xx/uuid file exist?)
      • yes and (new mode), check disk (does x/xx/uuid file exist?)
      • else, goto backend

@hugos99
Copy link

hugos99 commented Jul 30, 2020

That looks like a great simple solution! If you tell the file where that logic is performed I can try and implement an initial draft solution!

@jiangwenyuan
Copy link
Owner

jiangwenyuan commented Jul 30, 2020

Cool! Thanks a lot! I'll update this comment later.

@hugos99 After some thinking, I found that this always-check should be put in global instead of rule level as the disk applies for all rules.

So my thought is

a new mode in https://github.com/jiangwenyuan/nuster#global-nuster-cachenosql

nuster cache on|off [data-size size] [dict-size size] [dir DIR] [dict-cleaner n] [data-cleaner n] [disk-cleaner n] [disk-loader n] [disk-saver n] [clean-temp on|off] [disk-always-check on|off]

nuster nosql on|off [data-size size] [dict-size size] [dir DIR] [dict-cleaner n] [data-cleaner n] [disk-cleaner n] [disk-loader n] [disk-saver n] [clean-temp on|off] [disk-always-check on|off]

disk-always-check ? shared-disk-mode ? if you have better words:)

Refer to clean-temp, add a new var(say disk_always_check) in https://github.com/jiangwenyuan/nuster/blob/master/include/haproxy/global-t.h#L196 (and nosql)

and here https://github.com/jiangwenyuan/nuster/blob/master/src/haproxy.c#L196 (and nosql)

add parser: https://github.com/jiangwenyuan/nuster/blob/master/src/nuster/parser.c#L642 (and nosql)

and https://github.com/jiangwenyuan/nuster/blob/master/src/nuster/cache/engine.c#L810
https://github.com/jiangwenyuan/nuster/blob/master/src/nuster/nosql/engine.c#L1020

        if(!disk->loaded || disk-always-check) {

Probably that's all for the check.

But you might need to check the code contains loaded like https://github.com/jiangwenyuan/nuster/blob/master/src/nuster/cache/engine.c#L984

to make sure it works in multiple containers mode when we delete the cache.

@hugos99
Copy link

hugos99 commented Jul 30, 2020

Hey @jiangwenyuan just finished the initial implementation the only thing missing is the delete portion I didn't understand the code so I thought I should ask you do you want a PR for this?

@jiangwenyuan
Copy link
Owner

@hugos99 Sure, thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants