-
Notifications
You must be signed in to change notification settings - Fork 36
[BUG] Error in get_initial_predictions #56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hi @baike687, Thank you for reporting this issue. I've identified the root cause of the problem, and it's related to the cluster ID format in your data file. The ProblemThe error message you're seeing:
And particularly these warnings:
These indicate that there's an issue with the cluster IDs in your data. Root CauseI've examined the exact same data file (
When the mLLMCelltype package tries to process these IDs, it attempts to convert them to numeric values. The complex IDs with underscores cannot be converted to numbers, resulting in NA values, which then causes the error you're experiencing. SolutionTo fix this issue, you need to standardize the cluster IDs before using the mLLMCelltype package. Here's how you can do it: # Load the data
load('/Users/apple/Downloads/T_NK_seurat_markers_res0.7_sub.rds')
# Check the structure of the data
str(T_NK_seurat_markers_res0.7_sub)
# Create a mapping from original cluster IDs to numeric IDs
original_ids <- levels(T_NK_seurat_markers_res0.7_sub$cluster)
id_mapping <- data.frame(
original = original_ids,
numeric = seq(0, length(original_ids) - 1)
)
# Create a copy of the data with standardized cluster IDs
standardized_data <- T_NK_seurat_markers_res0.7_sub
standardized_data$original_cluster <- standardized_data$cluster # Save original IDs for reference
standardized_data$cluster <- id_mapping$numeric[match(as.character(standardized_data$cluster), id_mapping$original)]
# Now use the standardized data with mLLMCelltype
consensus_results <- interactive_consensus_annotation(
input = standardized_data,
tissue_name = "T and NK cell clusters in tumor, adjacent tissue, and PBMC",
models = c(
"deepseek-reasoner",
"gpt-4o"
),
api_keys = list(
deepseek = "your-deepseek-key",
openai = "your-openai-key"
),
top_gene_count = 20,
controversy_threshold = 0.5
) Documentation UpdateWe've recently updated the documentation to include warnings about this issue. The mLLMCelltype package requires numeric cluster IDs or values that can be cleanly converted to numeric. Non-numeric cluster IDs (like "cluster_1", "T_cells", or "7_0") may cause errors or unexpected behavior. You can find this warning in the updated documentation for the Future ImprovementsWe're working on making the package more robust to handle various cluster ID formats. In the meantime, please ensure your cluster IDs are numeric or can be cleanly converted to numeric values before using the package. Thank you for bringing this issue to our attention. It helps us improve the package for everyone. Best, |
Thank you so much for your prompt response! @cafferychen777 |
Hello, I was trying to use mLLMCelltype in the annotation and it went successfully before. However, this time it returns:
And after changing the API keys, this problem still exists. I am not sure whether it is related to the package. Thank you.
The text was updated successfully, but these errors were encountered: