Skip to content

[BUG] Error in get_initial_predictions #56

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
baike687 opened this issue May 7, 2025 · 2 comments
Open

[BUG] Error in get_initial_predictions #56

baike687 opened this issue May 7, 2025 · 2 comments
Labels
by design Intentional behavior by design. 此行为是有意的设计

Comments

@baike687
Copy link

baike687 commented May 7, 2025

Hello, I was trying to use mLLMCelltype in the annotation and it went successfully before. However, this time it returns:

consensus_results <- interactive_consensus_annotation(
input = T_NK_seurat_markers_res0.7_sub,
tissue_name = "T and NK cell clusters in tumor, adjacent tissue, and PBMC", # 提供组织上下文
models = c(
"deepseek-reasoner",
"gpt-4o" ),
api_keys = list(
deepseek = "xxxxxxxxxx",
openai = "xxxxxxxxxx"
),
top_gene_count = 20,
controversy_threshold = 0.5)
Phase 1: Getting initial predictions from all models...
[2025-05-07 15:42:52] Processing input with Model: deepseek-reasoner (Provider: deepseek)
[2025-05-07 15:42:52] Processing input with Model: gpt-4o (Provider: openai)
Error in get_initial_predictions(input = input, tissue_name = tissue_name, :
No models successfully completed predictions. Please check API keys and model availability.
In addition: Warning messages:
1: In logger$log_entry("INFO", cache_msg) :
No active log file. Call start_cluster_discussion first.
2: In logger$log_entry("INFO", "Phase 1: Getting initial predictions from all models...") :
No active log file. Call start_cluster_discussion first.
3: In create_annotation_prompt(input, tissue_name, top_gene_count) :
NAs introduced by coercion
4: In value[3L] :
Failed to get predictions from deepseek-reasoner: missing value where TRUE/FALSE needed
5: In logger$log_entry("WARNING", warning_msg) :
No active log file. Call start_cluster_discussion first.
6: In create_annotation_prompt(input, tissue_name, top_gene_count) :
NAs introduced by coercion
7: In value[3L] :
Failed to get predictions from gpt-4o: missing value where TRUE/FALSE needed
8: In logger$log_entry("WARNING", warning_msg) :
No active log file. Call start_cluster_discussion first.

And after changing the API keys, this problem still exists. I am not sure whether it is related to the package. Thank you.

sessionInfo()
R version 4.4.2 (2024-10-31)
Platform: aarch64-apple-darwin20
Running under: macOS Sequoia 15.4.1

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.12.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: Europe/Zurich
tzcode source: internal

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] cowplot_1.1.3 ggplot2_3.5.1 dplyr_1.1.4 Seurat_5.2.1 SeuratObject_5.0.2 sp_2.2-0
[7] mLLMCelltype_1.2.1

loaded via a namespace (and not attached):
[1] RColorBrewer_1.1-3 rstudioapi_0.17.1 jsonlite_2.0.0 magrittr_2.0.3 spatstat.utils_3.1-3
[6] farver_2.1.2 rmarkdown_2.29 fs_1.6.5 vctrs_0.6.5 ROCR_1.0-11
[11] memoise_2.0.1 spatstat.explore_3.4-2 htmltools_0.5.8.1 usethis_3.1.0 curl_6.2.2
[16] sctransform_0.4.1 parallelly_1.43.0 KernSmooth_2.23-26 htmlwidgets_1.6.4 desc_1.4.3
[21] ica_1.0-3 plyr_1.8.9 plotly_4.10.4 zoo_1.8-13 cachem_1.1.0
[26] igraph_2.1.4 mime_0.13 lifecycle_1.0.4 pkgconfig_2.0.3 Matrix_1.7-3
[31] R6_2.6.1 fastmap_1.2.0 fitdistrplus_1.2-2 future_1.34.0 shiny_1.10.0
[36] digest_0.6.37 colorspace_2.1-1 patchwork_1.3.0 ps_1.9.0 tensor_1.5
[41] RSpectra_0.16-2 irlba_2.3.5.1 pkgload_1.4.0 progressr_0.15.1 spatstat.sparse_3.1-0
[46] httr_1.4.7 polyclip_1.10-7 abind_1.4-8 compiler_4.4.2 remotes_2.5.0
[51] withr_3.0.2 fastDummies_1.7.5 pkgbuild_1.4.7 MASS_7.3-65 sessioninfo_1.2.3
[56] tools_4.4.2 lmtest_0.9-40 httpuv_1.6.15 future.apply_1.11.3 goftest_1.2-3
[61] glue_1.8.0 callr_3.7.6 nlme_3.1-168 promises_1.3.2 grid_4.4.2
[66] Rtsne_0.17 cluster_2.1.8.1 reshape2_1.4.4 generics_0.1.3 gtable_0.3.6
[71] spatstat.data_3.1-6 tidyr_1.3.1 data.table_1.17.0 spatstat.geom_3.3-6 RcppAnnoy_0.0.22
[76] ggrepel_0.9.6 RANN_2.6.2 pillar_1.10.2 stringr_1.5.1 spam_2.11-1
[81] RcppHNSW_0.6.0 later_1.4.1 splines_4.4.2 lattice_0.22-6 survival_3.8-3
[86] deldir_2.0-4 tidyselect_1.2.1 miniUI_0.1.1.1 pbapply_1.7-2 knitr_1.50
[91] gridExtra_2.3 scattermore_1.2 xfun_0.51 devtools_2.4.5 matrixStats_1.5.0
[96] stringi_1.8.7 lazyeval_0.2.2 yaml_2.3.10 evaluate_1.0.3 codetools_0.2-20
[101] tibble_3.2.1 cli_3.6.5 uwot_0.2.3 xtable_1.8-4 reticulate_1.42.0
[106] munsell_0.5.1 processx_3.8.6 Rcpp_1.0.14 globals_0.16.3 spatstat.random_3.3-3
[111] png_0.1-8 spatstat.univar_3.1-2 parallel_4.4.2 ellipsis_0.3.2 dotCall64_1.2
[116] profvis_0.4.0 urlchecker_1.0.1 listenv_0.9.1 viridisLite_0.4.2 scales_1.3.0
[121] ggridges_0.5.6 purrr_1.0.4 rlang_1.1.6

@cafferychen777
Copy link
Owner

Hi @baike687,

Thank you for reporting this issue. I've identified the root cause of the problem, and it's related to the cluster ID format in your data file.

The Problem

The error message you're seeing:

Error in get_initial_predictions(input = input, tissue_name = tissue_name, :
No models successfully completed predictions. Please check API keys and model availability.

And particularly these warnings:

3: In create_annotation_prompt(input, tissue_name, top_gene_count) :
NAs introduced by coercion
4: In value[3L] :
Failed to get predictions from deepseek-reasoner: missing value where TRUE/FALSE needed

These indicate that there's an issue with the cluster IDs in your data.

Root Cause

I've examined the exact same data file (/Users/apple/Downloads/T_NK_seurat_markers_res0.7_sub.rds) and found that it contains non-standard cluster IDs. The cluster column in this file is a factor with mixed format IDs, including:

  • Simple numeric IDs like "4", "0", "2"
  • Complex IDs with underscores like "7_0", "9_2", "6_0_1"

When the mLLMCelltype package tries to process these IDs, it attempts to convert them to numeric values. The complex IDs with underscores cannot be converted to numbers, resulting in NA values, which then causes the error you're experiencing.

Solution

To fix this issue, you need to standardize the cluster IDs before using the mLLMCelltype package. Here's how you can do it:

# Load the data
load('/Users/apple/Downloads/T_NK_seurat_markers_res0.7_sub.rds')

# Check the structure of the data
str(T_NK_seurat_markers_res0.7_sub)

# Create a mapping from original cluster IDs to numeric IDs
original_ids <- levels(T_NK_seurat_markers_res0.7_sub$cluster)
id_mapping <- data.frame(
  original = original_ids,
  numeric = seq(0, length(original_ids) - 1)
)

# Create a copy of the data with standardized cluster IDs
standardized_data <- T_NK_seurat_markers_res0.7_sub
standardized_data$original_cluster <- standardized_data$cluster  # Save original IDs for reference
standardized_data$cluster <- id_mapping$numeric[match(as.character(standardized_data$cluster), id_mapping$original)]

# Now use the standardized data with mLLMCelltype
consensus_results <- interactive_consensus_annotation(
  input = standardized_data,
  tissue_name = "T and NK cell clusters in tumor, adjacent tissue, and PBMC",
  models = c(
    "deepseek-reasoner",
    "gpt-4o"
  ),
  api_keys = list(
    deepseek = "your-deepseek-key",
    openai = "your-openai-key"
  ),
  top_gene_count = 20,
  controversy_threshold = 0.5
)

Documentation Update

We've recently updated the documentation to include warnings about this issue. The mLLMCelltype package requires numeric cluster IDs or values that can be cleanly converted to numeric. Non-numeric cluster IDs (like "cluster_1", "T_cells", or "7_0") may cause errors or unexpected behavior.

You can find this warning in the updated documentation for the annotate_cell_types and interactive_consensus_annotation functions, as well as in the README files.

Future Improvements

We're working on making the package more robust to handle various cluster ID formats. In the meantime, please ensure your cluster IDs are numeric or can be cleanly converted to numeric values before using the package.

Thank you for bringing this issue to our attention. It helps us improve the package for everyone.

Best,

@cafferychen777 cafferychen777 added the by design Intentional behavior by design. 此行为是有意的设计 label May 7, 2025
@baike687
Copy link
Author

baike687 commented May 8, 2025

Thank you so much for your prompt response! @cafferychen777

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
by design Intentional behavior by design. 此行为是有意的设计
Projects
None yet
Development

No branches or pull requests

2 participants