Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible Bug in DataplaneAPI: Dangling and Duplicated Transaction Files with Concurrent Jobs (HAProxy Terraform Provider) #358

Open
cepitacio opened this issue Jan 24, 2025 · 4 comments

Comments

@cepitacio
Copy link

Description:
I have encountered a potential bug in the DataplaneAPI while working on an HAProxy provider. The issue arises when running concurrent jobs that interact with the API across multiple workspaces.

The problem does not occur when using a single workspace. Specifically:

  • Dangling Transactions: Some transactions remain uncommitted and incomplete.
  • Duplicate Files: The same transaction files also appear in the outdated/ directory.

This behavior suggests there might be an issue with transaction isolation or handling in high-concurrency, multi-workspace scenarios.

Steps to Reproduce:
Use my HAProxy provider available on the Terraform Registry: https://registry.terraform.io/providers/cepitacio/haproxy/latest

  • Set up the DataplaneAPI and configure multiple workspaces.
  • Run multiple concurrent terraform apply jobs using the provider to interact with the API.

Check the transaction directories:

  • Observe uncommitted transaction files in the main directory.
  • Note that duplicates of these files appear in the outdated/ directory.

Expected Behavior:

  • Transactions should either be committed or cleaned up properly.
  • No duplicate transaction files should exist, even under concurrent usage across multiple workspaces.

Actual Behavior:

  • Some transactions remain uncommitted and are left dangling.
  • Duplicate transaction files are created in the outdated/ directory.

Environment:
DataplaneAPI version: reproduced with v2.9.2 and v2.9.8
HAProxy version: 2.9.0
Terraform provider: https://registry.terraform.io/providers/cepitacio/haproxy/latest

Additional Notes:
I am relatively new to Go and Terraform provider development, so I might have missed something in my implementation. However, the issue seems to be directly related to the API's behavior under concurrent requests.
The issue does not occur when jobs are executed sequentially in a single workspace, which suggests it may be related to handling concurrent API transactions across workspaces.

@cepitacio
Copy link
Author

Here is an example of what we are seeing with concurrent jobs:

[root@haproxy1 haproxy]# ll /tmp/dataplaneapi/transactions/
total 39
drwxr-xr-x. 2 root root  4096 Jan 24 22:20 failed
-rw-r--r--. 1 root root  6509 Jan 24 22:23 haproxy.cfg.29ab53bf-436a-421a-a39b-6f86f7fe81f1
-rw-r--r--. 1 root root 16186 Jan 24 22:23 haproxy.cfg.db6e7ac4-fc93-4e75-ae80-667bac1f5f9f
drwxr-xr-x. 2 root root 12288 Jan 24 22:23 outdated

Above two files are also present/duplicated in the outdated/ directory.

[root@haproxy1 haproxy]# ll /tmp/dataplaneapi/transactions/outdated/ | grep haproxy.cfg.29ab53bf-436a-421a-a39b-6f86f7fe81f1
-rw-r--r--. 1 root root  6708 Jan 24 22:23 haproxy.cfg.29ab53bf-436a-421a-a39b-6f86f7fe81f1
[root@haproxy1 haproxy]# ll /tmp/dataplaneapi/transactions/outdated/ | grep haproxy.cfg.db6e7ac4-fc93-4e75-ae80-667bac1f5f9
-rw-r--r--. 1 root root 16387 Jan 24 22:23 haproxy.cfg.db6e7ac4-fc93-4e75-ae80-667bac1f5f9f

@mjuraga
Copy link
Collaborator

mjuraga commented Feb 3, 2025

So Data Plane API operates on optimistic locking using transactions. Meaning that high concurrency isn't available in this case. If multiple transactions are started on one version of the file, only the first one that is committed will succeed, all the rest will be outdated and cannot be committed. Hope this helps your issue?

@cepitacio
Copy link
Author

cepitacio commented Feb 3, 2025

@mjuraga thanks for the quick response! While I understand the optimistic locking mechanism, the issue I’m encountering is with the cleanup process in multi-workspace scenarios.

When running multiple workspaces concurrently (around 13), I’m seeing the following:

  • Dangling files remain in the /tmp/dataplaneapi/transactions/ directory (uncommitted transactions) and are not cleaned up as they should be.
  • The same files also appear in the /tmp/dataplaneapi/transactions/outdated/ directory, even though they should have been properly cleaned up or marked as outdated and removed after the transaction becomes outdated. A transaction file should never exist in both directories (in progress in transactions/ and outdated in outdated/).

The provider handles concurrency by implementing a retry mechanism when a transaction version or commit becomes outdated. However, the cleanup process seems to fail in high-concurrency scenarios, leaving outdated files behind.

@mjuraga
Copy link
Collaborator

mjuraga commented Feb 5, 2025

oh, I understand now, thank you, we can treat this as a bug and fix it. I'll get back to you with a fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants