[BUG] Error in get_initial_predictions #56

baike687 · 2025-05-07T13:49:03Z

Hello, I was trying to use mLLMCelltype in the annotation and it went successfully before. However, this time it returns:

consensus_results <- interactive_consensus_annotation(
input = T_NK_seurat_markers_res0.7_sub,
tissue_name = "T and NK cell clusters in tumor, adjacent tissue, and PBMC", # 提供组织上下文
models = c(
"deepseek-reasoner",
"gpt-4o" ),
api_keys = list(
deepseek = "xxxxxxxxxx",
openai = "xxxxxxxxxx"
),
top_gene_count = 20,
controversy_threshold = 0.5)
Phase 1: Getting initial predictions from all models...
[2025-05-07 15:42:52] Processing input with Model: deepseek-reasoner (Provider: deepseek)
[2025-05-07 15:42:52] Processing input with Model: gpt-4o (Provider: openai)
Error in get_initial_predictions(input = input, tissue_name = tissue_name, :
No models successfully completed predictions. Please check API keys and model availability.
In addition: Warning messages:
1: In logger$log_entry("INFO", cache_msg) :
No active log file. Call start_cluster_discussion first.
2: In logger$log_entry("INFO", "Phase 1: Getting initial predictions from all models...") :
No active log file. Call start_cluster_discussion first.
3: In create_annotation_prompt(input, tissue_name, top_gene_count) :
NAs introduced by coercion
4: In value[3L] :
Failed to get predictions from deepseek-reasoner: missing value where TRUE/FALSE needed
5: In logger$log_entry("WARNING", warning_msg) :
No active log file. Call start_cluster_discussion first.
6: In create_annotation_prompt(input, tissue_name, top_gene_count) :
NAs introduced by coercion
7: In value[3L] :
Failed to get predictions from gpt-4o: missing value where TRUE/FALSE needed
8: In logger$log_entry("WARNING", warning_msg) :
No active log file. Call start_cluster_discussion first.

And after changing the API keys, this problem still exists. I am not sure whether it is related to the package. Thank you.

sessionInfo()
R version 4.4.2 (2024-10-31)
Platform: aarch64-apple-darwin20
Running under: macOS Sequoia 15.4.1

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.12.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: Europe/Zurich
tzcode source: internal

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] cowplot_1.1.3 ggplot2_3.5.1 dplyr_1.1.4 Seurat_5.2.1 SeuratObject_5.0.2 sp_2.2-0
[7] mLLMCelltype_1.2.1

loaded via a namespace (and not attached):
[1] RColorBrewer_1.1-3 rstudioapi_0.17.1 jsonlite_2.0.0 magrittr_2.0.3 spatstat.utils_3.1-3
[6] farver_2.1.2 rmarkdown_2.29 fs_1.6.5 vctrs_0.6.5 ROCR_1.0-11
[11] memoise_2.0.1 spatstat.explore_3.4-2 htmltools_0.5.8.1 usethis_3.1.0 curl_6.2.2
[16] sctransform_0.4.1 parallelly_1.43.0 KernSmooth_2.23-26 htmlwidgets_1.6.4 desc_1.4.3
[21] ica_1.0-3 plyr_1.8.9 plotly_4.10.4 zoo_1.8-13 cachem_1.1.0
[26] igraph_2.1.4 mime_0.13 lifecycle_1.0.4 pkgconfig_2.0.3 Matrix_1.7-3
[31] R6_2.6.1 fastmap_1.2.0 fitdistrplus_1.2-2 future_1.34.0 shiny_1.10.0
[36] digest_0.6.37 colorspace_2.1-1 patchwork_1.3.0 ps_1.9.0 tensor_1.5
[41] RSpectra_0.16-2 irlba_2.3.5.1 pkgload_1.4.0 progressr_0.15.1 spatstat.sparse_3.1-0
[46] httr_1.4.7 polyclip_1.10-7 abind_1.4-8 compiler_4.4.2 remotes_2.5.0
[51] withr_3.0.2 fastDummies_1.7.5 pkgbuild_1.4.7 MASS_7.3-65 sessioninfo_1.2.3
[56] tools_4.4.2 lmtest_0.9-40 httpuv_1.6.15 future.apply_1.11.3 goftest_1.2-3
[61] glue_1.8.0 callr_3.7.6 nlme_3.1-168 promises_1.3.2 grid_4.4.2
[66] Rtsne_0.17 cluster_2.1.8.1 reshape2_1.4.4 generics_0.1.3 gtable_0.3.6
[71] spatstat.data_3.1-6 tidyr_1.3.1 data.table_1.17.0 spatstat.geom_3.3-6 RcppAnnoy_0.0.22
[76] ggrepel_0.9.6 RANN_2.6.2 pillar_1.10.2 stringr_1.5.1 spam_2.11-1
[81] RcppHNSW_0.6.0 later_1.4.1 splines_4.4.2 lattice_0.22-6 survival_3.8-3
[86] deldir_2.0-4 tidyselect_1.2.1 miniUI_0.1.1.1 pbapply_1.7-2 knitr_1.50
[91] gridExtra_2.3 scattermore_1.2 xfun_0.51 devtools_2.4.5 matrixStats_1.5.0
[96] stringi_1.8.7 lazyeval_0.2.2 yaml_2.3.10 evaluate_1.0.3 codetools_0.2-20
[101] tibble_3.2.1 cli_3.6.5 uwot_0.2.3 xtable_1.8-4 reticulate_1.42.0
[106] munsell_0.5.1 processx_3.8.6 Rcpp_1.0.14 globals_0.16.3 spatstat.random_3.3-3
[111] png_0.1-8 spatstat.univar_3.1-2 parallel_4.4.2 ellipsis_0.3.2 dotCall64_1.2
[116] profvis_0.4.0 urlchecker_1.0.1 listenv_0.9.1 viridisLite_0.4.2 scales_1.3.0
[121] ggridges_0.5.6 purrr_1.0.4 rlang_1.1.6

cafferychen777 · 2025-05-07T22:32:19Z

Hi @baike687,

Thank you for reporting this issue. I've identified the root cause of the problem, and it's related to the cluster ID format in your data file.

The Problem

The error message you're seeing:

Error in get_initial_predictions(input = input, tissue_name = tissue_name, :
No models successfully completed predictions. Please check API keys and model availability.

And particularly these warnings:

3: In create_annotation_prompt(input, tissue_name, top_gene_count) :
NAs introduced by coercion
4: In value[3L] :
Failed to get predictions from deepseek-reasoner: missing value where TRUE/FALSE needed

These indicate that there's an issue with the cluster IDs in your data.

Root Cause

I've examined the exact same data file (/Users/apple/Downloads/T_NK_seurat_markers_res0.7_sub.rds) and found that it contains non-standard cluster IDs. The cluster column in this file is a factor with mixed format IDs, including:

Simple numeric IDs like "4", "0", "2"
Complex IDs with underscores like "7_0", "9_2", "6_0_1"

When the mLLMCelltype package tries to process these IDs, it attempts to convert them to numeric values. The complex IDs with underscores cannot be converted to numbers, resulting in NA values, which then causes the error you're experiencing.

Solution

To fix this issue, you need to standardize the cluster IDs before using the mLLMCelltype package. Here's how you can do it:

# Load the data
load('/Users/apple/Downloads/T_NK_seurat_markers_res0.7_sub.rds')

# Check the structure of the data
str(T_NK_seurat_markers_res0.7_sub)

# Create a mapping from original cluster IDs to numeric IDs
original_ids <- levels(T_NK_seurat_markers_res0.7_sub$cluster)
id_mapping <- data.frame(
  original = original_ids,
  numeric = seq(0, length(original_ids) - 1)
)

# Create a copy of the data with standardized cluster IDs
standardized_data <- T_NK_seurat_markers_res0.7_sub
standardized_data$original_cluster <- standardized_data$cluster  # Save original IDs for reference
standardized_data$cluster <- id_mapping$numeric[match(as.character(standardized_data$cluster), id_mapping$original)]

# Now use the standardized data with mLLMCelltype
consensus_results <- interactive_consensus_annotation(
  input = standardized_data,
  tissue_name = "T and NK cell clusters in tumor, adjacent tissue, and PBMC",
  models = c(
    "deepseek-reasoner",
    "gpt-4o"
  ),
  api_keys = list(
    deepseek = "your-deepseek-key",
    openai = "your-openai-key"
  ),
  top_gene_count = 20,
  controversy_threshold = 0.5
)

Documentation Update

We've recently updated the documentation to include warnings about this issue. The mLLMCelltype package requires numeric cluster IDs or values that can be cleanly converted to numeric. Non-numeric cluster IDs (like "cluster_1", "T_cells", or "7_0") may cause errors or unexpected behavior.

You can find this warning in the updated documentation for the annotate_cell_types and interactive_consensus_annotation functions, as well as in the README files.

Future Improvements

We're working on making the package more robust to handle various cluster ID formats. In the meantime, please ensure your cluster IDs are numeric or can be cleanly converted to numeric values before using the package.

Thank you for bringing this issue to our attention. It helps us improve the package for everyone.

Best,

baike687 · 2025-05-08T13:38:20Z

Thank you so much for your prompt response! @cafferychen777

cafferychen777 added the by design Intentional behavior by design. 此行为是有意的设计 label May 7, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Error in get_initial_predictions #56

[BUG] Error in get_initial_predictions #56

baike687 commented May 7, 2025 •

edited

Loading

cafferychen777 commented May 7, 2025

baike687 commented May 8, 2025

[BUG] Error in get_initial_predictions #56

[BUG] Error in get_initial_predictions #56

Comments

baike687 commented May 7, 2025 • edited Loading

cafferychen777 commented May 7, 2025

The Problem

Root Cause

Solution

Documentation Update

Future Improvements

baike687 commented May 8, 2025

baike687 commented May 7, 2025 •

edited

Loading