-
Notifications
You must be signed in to change notification settings - Fork 36
[BUG] Error with Cluster Index Handling in Version 1.1.4 #12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hello @luna2terra, Thank you for reporting this issue with the cluster index handling in mLLMCelltype version 1.1.4. I'd like to help troubleshoot this problem as soon as possible. Could you please share your processing code that encountered this error? It would be extremely helpful to see exactly how you're setting up and calling the functions. If possible, please also send your dataset CSV to my email at [email protected] so I can reproduce the issue directly. Rest assured, your data will only be used for debugging purposes. For reference, I've created an example script that demonstrates how we process CSV files with cluster indices in a way that works correctly. You can see this example below: # First install the latest version of mLLMCelltype
devtools::install_github("cafferychen777/mLLMCelltype", subdir = "R")
# mLLMCelltype example using CSV file as input
# This script directly uses marker genes from the CSV file, avoiding recalculation through Seurat each time
# Load necessary packages
library(mLLMCelltype)
# Create cache and log directories
cache_dir <- "/Users/apple/Research/mLLMCelltype/R/examples/cache"
log_dir <- "/Users/apple/Research/mLLMCelltype/R/examples/logs"
dir.create(cache_dir, showWarnings = FALSE, recursive = TRUE)
dir.create(log_dir, showWarnings = FALSE, recursive = TRUE)
# Read CSV file content
cat_heart_markers_file <- "/Users/apple/Research/LLMCelltype/data/reference/Cat_Heart_markers.csv"
file_content <- readLines(cat_heart_markers_file)
# Skip header row
data_lines <- file_content[-1]
# Check data structure
cat("Number of data rows: ", length(data_lines), "\n")
cat("First data row: ", data_lines[1], "\n")
# Convert data to list format, using numeric indices as keys
marker_genes_list <- list()
cluster_names <- c()
# First collect all cluster names
for(line in data_lines) {
parts <- strsplit(line, ",", fixed = TRUE)[[1]]
cluster_names <- c(cluster_names, parts[1])
}
# Then create marker_genes_list with numeric indices
for(i in 1:length(data_lines)) {
line <- data_lines[i]
parts <- strsplit(line, ",", fixed = TRUE)[[1]]
# First part is the cluster name
cluster_name <- parts[1]
# Use index as key (0-based index, compatible with Seurat)
cluster_id <- as.character(i - 1)
# Remaining parts are genes
genes <- parts[-1]
# Filter out NA and empty strings
genes <- genes[!is.na(genes) & genes != ""]
# Add to marker_genes_list
marker_genes_list[[cluster_id]] <- list(genes = genes)
# Print mapping relationship
cat(sprintf("Mapping cluster '%s' to index %s\n", cluster_name, cluster_id))
}
# Print the processed marker_genes_list structure
cat("\nProcessed marker_genes_list structure:\n")
for(cluster in names(marker_genes_list)) {
cat(sprintf("Cluster: %s, Genes: %s\n",
cluster,
paste(head(marker_genes_list[[cluster]]$genes, 5), collapse=", ")))
}
# Set API keys
api_keys <- list(
gemini = "YOUR_GEMINI_API_KEY",
qwen = "YOUR_QWEN_API_KEY",
grok = "YOUR_GROK_API_KEY",
openrouter = "YOUR_OPENROUTER_API_KEY"
)
# Run consensus annotation
cat("\nStarting interactive_consensus_annotation...\n")
consensus_results <-
interactive_consensus_annotation(
input = marker_genes_list,
tissue_name = "cat heart", # Cat heart data
models = c("gemini-2.0-flash",
"gemini-1.5-pro",
"qwen-max-2025-01-25",
"grok-3-latest",
"anthropic/claude-3-7-sonnet-20250219",
"openai/gpt-4o"),
api_keys = api_keys,
controversy_threshold = 0.6,
entropy_threshold = 1.0,
max_discussion_rounds = 3,
cache_dir = cache_dir,
log_dir = log_dir
)
# Save results
saveRDS(consensus_results, "/Users/apple/Research/mLLMCelltype/R/examples/cat_heart_results.rds")
# Print results summary
cat("\nResults summary:\n")
cat("Available fields:", paste(names(consensus_results), collapse=", "), "\n\n")
# Print final annotations
cat("Final cell type annotations:\n")
for(cluster in names(consensus_results$final_annotations)) {
cat(sprintf("%s: %s\n", cluster, consensus_results$final_annotations[[cluster]]))
}
# Print controversial clusters
cat("\nControversial clusters:", paste(consensus_results$controversial_clusters, collapse=", "), "\n")
# Check number of clusters
cat("\nCluster count check:\n")
cat("Number of input clusters:", length(marker_genes_list), "\n")
cat("Number of finally annotated clusters:", length(consensus_results$final_annotations), "\n")
cat("Number of controversial clusters:", length(consensus_results$controversial_clusters), "\n")
# Check if additional clusters were added
all_clusters <- unique(c(
names(marker_genes_list),
names(consensus_results$final_annotations),
consensus_results$controversial_clusters
))
cat("All occurring clusters:", paste(all_clusters, collapse=", "), "\n")
if(length(all_clusters) > length(marker_genes_list)) {
extra_clusters <- setdiff(all_clusters, names(marker_genes_list))
cat("Warning: Additional clusters found:", paste(extra_clusters, collapse=", "), "\n")
} The key points to note in this approach:
If you could adapt this approach for your own CSV format and data paths, it might resolve the issue. The main thing to ensure is that all cluster indices are non-negative and that they follow a consistent pattern. Once I receive your code and data, I'll be able to investigate further and provide a more targeted solution. Thank you for your patience and for helping improve mLLMCelltype! Best regards, |
Important Note: Before running the script, please make sure to:
These steps will help ensure that you're working with a clean environment and the latest bug fixes. |
こんにちは @luna2terra さん、 mLLMCelltype バージョン 1.1.4 におけるクラスターインデックス処理の問題をご報告いただき、ありがとうございます。前回のコメントで解決策を提案させていただきました。 提案したアプローチで問題は解決しましたでしょうか?共有した例示コードを使用して、CSVファイルを正常に処理できましたか? まだ問題が発生している場合は、お知らせください。さらなるサポートを提供させていただきます。パッケージの改善に向けて、あなたのフィードバックは非常に貴重です。 よろしくお願いいたします。 Hello @luna2terra, Thank you for reporting the issue with cluster index handling in mLLMCelltype version 1.1.4. We've provided a potential solution in our previous comments. I wanted to follow up and check if the suggested approach resolved your problem? Were you able to successfully process your CSV files using the example code we shared? If you're still experiencing issues, please let us know and we'd be happy to provide further assistance. Your feedback is invaluable in helping us improve the package. Best regards, |
I encountered an issue while using the mLLMCelltype package (version 1.1.4). It seems that the cluster index handling is not functioning as expected. Specifically, I received errors related to negative indices when processing my CSV input files.
Steps to Reproduce:
interactive_consensus_annotation()
function using this file.Expected Behavior:
The function should accept the input files without errors related to cluster indices, ensuring that they start from 0.
Actual Behavior:
I encountered the following error:
Environment:
Additional Context:
I noticed that the recent update mentions strict validation for input cluster indices, but it appears that there might still be issues affecting users with specific datasets.
Suggested Fix:
Could you please investigate this issue? It might be helpful to include additional validation checks in the code to handle negative indices more gracefully.
Thank you for your assistance!
The text was updated successfully, but these errors were encountered: