Unable to create any new tables in namespace IllegalStateException #1458

lkindere · 2025-04-25T12:19:38Z

Describe the bug

Hello,

We wanted to drop some tables form Polaris, this was done using PyIceberg catalog.drop_table in a loop for several hundred tables.

After this we noticed that when trying to create a table, every single time without fail the following exception would be thrown.

Creating a new namespace works fine, creating and dropping tables in that new namespace works fine, reading tables in affected namespace works fine, only issue is creating new tables it seems, and only for this specific namespace.

Does anyone have any ideas what may be the root cause of this?

ServiceFailureException: Server error: IllegalStateException: Unable to resolve sibling entities to validate location - could not resolvenull
at org.apache.iceberg.rest.ErrorHandlers$TableErrorHandler.accept(ErrorHandlers.java:118)
at org.apache.iceberg.rest.ErrorHandlers$TableErrorHandler.accept(ErrorHandlers.java:102)
at org.apache.iceberg.rest.HTTPClient.throwFailure(HTTPClient.java:211)
at org.apache.iceberg.rest.HTTPClient.execute(HTTPClient.java:323)
at org.apache.iceberg.rest.HTTPClient.execute(HTTPClient.java:262)
at org.apache.iceberg.rest.HTTPClient.post(HTTPClient.java:368)
at org.apache.iceberg.rest.RESTClient.post(RESTClient.java:112)
at org.apache.iceberg.rest.RESTSessionCatalog$Builder.create(RESTSessionCatalog.java:737)
at org.apache.iceberg.CachingCatalog$CachingTableBuilder.lambda$create$0(CachingCatalog.java:262)
at org.apache.iceberg.shaded.com.github.benmanes.caffeine.cache.BoundedLocalCache.lambda$doComputeIfAbsent$14(BoundedLocalCache.java:2406)
at java.util.concurrent.ConcurrentHashMap.compute(ConcurrentHashMap.java:1853)
at org.apache.iceberg.shaded.com.github.benmanes.caffeine.cache.BoundedLocalCache.doComputeIfAbsent(BoundedLocalCache.java:2404)
at org.apache.iceberg.shaded.com.github.benmanes.caffeine.cache.BoundedLocalCache.computeIfAbsent(BoundedLocalCache.java:2387)
at org.apache.iceberg.shaded.com.github.benmanes.caffeine.cache.LocalCache.computeIfAbsent(LocalCache.java:108)
at org.apache.iceberg.shaded.com.github.benmanes.caffeine.cache.LocalManualCache.get(LocalManualCache.java:62)
at org.apache.iceberg.CachingCatalog$CachingTableBuilder.create(CachingCatalog.java:258)
at org.apache.iceberg.spark.SparkCatalog.createTable(SparkCatalog.java:247)
at org.apache.spark.sql.connector.catalog.TableCatalog.createTable(TableCatalog.java:246)
at org.apache.spark.sql.execution.datasources.v2.CreateTableExec.run(CreateTableExec.scala:58)
at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.$anonfun$result$2(V2CommandExec.scala:48)
at org.apache.spark.sql.execution.SparkPlan.runCommandInAetherOrSpark(SparkPlan.scala:189)
at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.$anonfun$result$1(V2CommandExec.scala:48)
at com.databricks.spark.util.FrameProfiler$.record(FrameProfiler.scala:94)
at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result$lzycompute(V2CommandExec.scala:47)
at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result(V2CommandExec.scala:45)
at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.executeCollect(V2CommandExec.scala:56)
at org.apache.spark.sql.execution.QueryExecution$$anonfun$$nestedInanonfun$eagerlyExecuteCommands$1$1.$anonfun$applyOrElse$5(QueryExecution.scala:425)
at com.databricks.util.LexicalThreadLocal$Handle.runWith(LexicalThreadLocal.scala:63)
at org.apache.spark.sql.execution.QueryExecution$$anonfun$$nestedInanonfun$eagerlyExecuteCommands$1$1.$anonfun$applyOrElse$4(QueryExecution.scala:425)
at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:194)
at org.apache.spark.sql.execution.QueryExecution$$anonfun$$nestedInanonfun$eagerlyExecuteCommands$1$1.$anonfun$applyOrElse$3(QueryExecution.scala:425)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId0$10(SQLExecution.scala:475)
at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:826)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId0$1(SQLExecution.scala:334)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:1210)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId0(SQLExecution.scala:205)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:763)
at org.apache.spark.sql.execution.QueryExecution$$anonfun$$nestedInanonfun$eagerlyExecuteCommands$1$1.$anonfun$applyOrElse$2(QueryExecution.scala:421)
at org.apache.spark.sql.execution.QueryExecution$.withInternalError(QueryExecution.scala:1219)
at org.apache.spark.sql.execution.QueryExecution$$anonfun$$nestedInanonfun$eagerlyExecuteCommands$1$1.$anonfun$applyOrElse$1(QueryExecution.scala:417)
at org.apache.spark.sql.execution.QueryExecution.org$apache$spark$sql$execution$QueryExecution$$withMVTagsIfNecessary(QueryExecution.scala:355)
at org.apache.spark.sql.execution.QueryExecution$$anonfun$$nestedInanonfun$eagerlyExecuteCommands$1$1.applyOrElse(QueryExecution.scala:414)
at org.apache.spark.sql.execution.QueryExecution$$anonfun$$nestedInanonfun$eagerlyExecuteCommands$1$1.applyOrElse(QueryExecution.scala:388)
at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:511)
at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(origin.scala:85)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:511)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:40)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:379)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:375)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:40)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:40)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:487)
at org.apache.spark.sql.execution.QueryExecution.$anonfun$eagerlyExecuteCommands$1(QueryExecution.scala:388)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:436)
at org.apache.spark.sql.execution.QueryExecution.eagerlyExecuteCommands(QueryExecution.scala:388)
at org.apache.spark.sql.execution.QueryExecution.commandExecuted$lzycompute(QueryExecution.scala:314)
at org.apache.spark.sql.execution.QueryExecution.commandExecuted(QueryExecution.scala:311)
at org.apache.spark.sql.Dataset.(Dataset.scala:343)
at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:131)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:1210)
at org.apache.spark.sql.SparkSession.$anonfun$withActiveAndFrameProfiler$1(SparkSession.scala:1217)
at com.databricks.spark.util.FrameProfiler$.record(FrameProfiler.scala:94)
at org.apache.spark.sql.SparkSession.withActiveAndFrameProfiler(SparkSession.scala:1217)
at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:122)
at org.apache.spark.sql.SparkSession.$anonfun$sql$4(SparkSession.scala:989)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:1210)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:973)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:1012)

To Reproduce

No response

Actual Behavior

No response

Expected Behavior

No response

Additional context

No response

System information

No response

lkindere · 2025-04-25T12:56:12Z

Upon further investigation it seems that one of the tables did not really get dropped and was still visible when investigating the database directly, somehow the row persisted, but trying to load/drop the table would just say it does not exist.

After deleting the row from "entities" table in Postgres the issue seems to be solved, however I am unsure of the risks with this approach or the root cause.

MonkeyCanCode · 2025-05-05T02:20:41Z

Upon further investigation it seems that one of the tables did not really get dropped and was still visible when investigating the database directly, somehow the row persisted, but trying to load/drop the table would just say it does not exist.

After deleting the row from "entities" table in Postgres the issue seems to be solved, however I am unsure of the risks with this approach or the root cause.

@lkindere With older version (anything before #1092), it is possible you can loss a transaction during high concurrency. I saw all sorts strange issues locally without enforcing the transaction isolation (missing commits as well). If you were using the older version with enforcing transaction isolation, it is possible if there are two changes needed for two tables, one of the change can be silently dropped thus caused invalid table reference. Do keep in mind with transaction isolation, you may see performance downgrade. This will be resolve with plain JDBC route for entities management.

lkindere added the bug Something isn't working label Apr 25, 2025

github-project-automation bot added this to Basic Kanban Board Apr 25, 2025

lkindere changed the title ~~Unable to create any new tables in namespace~~ Unable to create any new tables in namespace IllegalStateException Apr 25, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unable to create any new tables in namespace IllegalStateException #1458

Unable to create any new tables in namespace IllegalStateException #1458

lkindere commented Apr 25, 2025 •

edited

Loading

lkindere commented Apr 25, 2025 •

edited

Loading

MonkeyCanCode commented May 5, 2025

Unable to create any new tables in namespace IllegalStateException #1458

Unable to create any new tables in namespace IllegalStateException #1458

Comments

lkindere commented Apr 25, 2025 • edited Loading

Describe the bug

To Reproduce

Actual Behavior

Expected Behavior

Additional context

System information

lkindere commented Apr 25, 2025 • edited Loading

MonkeyCanCode commented May 5, 2025

lkindere commented Apr 25, 2025 •

edited

Loading

lkindere commented Apr 25, 2025 •

edited

Loading