We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
您好,我在本地部署R 1 7b模型时,想获取词汇表中常见中文token之间的相似度,于是用以下代码获取token的嵌入向量 tokenizer = AutoTokenizer.from_pretrained("./my_path/DeepSeek-R1-Distill-Qwen-7B") model = AutoModel.from_pretrained("/my_path/DeepSeek-R1-Distill-Qwen-7B") model.eval() embeddings = model.get_input_embeddings().weight.data
我计算出”苹果“这个token与其他中文token的欧式距离,并从小到大排序,但是得到的最近的100个token的语义与”苹果“的语义相差甚远,这是为什么?是因为我获取嵌入向量不对吗?还是其他问题?
我也试了英文token之间的欧式距离,发现与”apple“最近的20个token也没有与”apple“语义相近的!
The text was updated successfully, but these errors were encountered:
No branches or pull requests
您好,我在本地部署R 1 7b模型时,想获取词汇表中常见中文token之间的相似度,于是用以下代码获取token的嵌入向量
tokenizer = AutoTokenizer.from_pretrained("./my_path/DeepSeek-R1-Distill-Qwen-7B")
model = AutoModel.from_pretrained("/my_path/DeepSeek-R1-Distill-Qwen-7B")
model.eval()
embeddings = model.get_input_embeddings().weight.data
我计算出”苹果“这个token与其他中文token的欧式距离,并从小到大排序,但是得到的最近的100个token的语义与”苹果“的语义相差甚远,这是为什么?是因为我获取嵌入向量不对吗?还是其他问题?
我也试了英文token之间的欧式距离,发现与”apple“最近的20个token也没有与”apple“语义相近的!
The text was updated successfully, but these errors were encountered: