-
Notifications
You must be signed in to change notification settings - Fork 8.2k
微调自定义数据集用ch_PP-OCRv3_rec_distillation.yml vs ch_PP-OCRv4_rec_distillation.yml都有些bug #14872
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
能提供一个最小复现的demo吗,我试试看。 |
你说用 ch_PP-OCRv4_rec_distillation.yml 训练会报错? 这个必现的,我看issue里面差不多5.6个提过,好像在安排修复中 @GreatV 如果用 ch_PP-OCRv3_rec_distillation.yml 进行蒸馏训练 ,不收敛的话,我把配置给你 |
PaddleOCR/configs/rec/PP-OCRv3/ch_PP-OCRv3_rec_distillation.yml Lines 44 to 46 in 75526f0
可能的原因
如果 pretrained 字段为空,教师模型可能从随机权重开始训练,而不是使用 PP-OCRv3 的预训练权重。这可能导致模型在印刷体上的性能下降,而更适应训练数据中的手写体。
训练数据中手写体样本可能较多,导致模型在手写体上表现良好,但在印刷体上表现较差。建议确保训练数据包含足够的印刷体样本。
配置文件中 freeze_params: false 表示教师模型参数在训练中会更新,这可能导致教师模型偏向于训练数据中的手写体特征,影响印刷体性能。 |
@GreatV https://aistudio.baidu.com/projectdetail/4330587 你能不能提供下你们官方例子OCR手写文字识别例子里面修改好的ch_PP-OCRv3_rec_distillation.yml 配置,我看你们代码里面doc描述https://paddlepaddle.github.io/PaddleOCR/latest/applications/%E6%89%8B%E5%86%99%E6%96%87%E5%AD%97%E8%AF%86%E5%88%AB.html :configs/rec/PP-OCRv3/ch_PP-OCRv3_rec_distillation.yml epoch_num: 100 # 训练epoch数 lr: Train: 感觉很写有点含糊:上面例子感觉没有teacher,student,但是他名字ch_PP-OCRv3_rec_distillation.yml 不应该没有teacher,student,你能否提供完整的针对这个例子修改好ch_PP-OCRv3_rec_distillation.yml 。已经走好几天弯路,网上好多例子都拷贝这份描述,这个例子到底有没有配置teacher,student,如果配置teacher,student里面到底怎么配置,一会说 freeze_params: false,一会true,我就是想看看你们跑成功手写体例子时候ch_PP-OCRv3_rec_distillation.yml |
🔎 Search before asking
🐛 Bug (问题描述)
在bug14870:ch_PP-OCRv3_rec_distillation.yml vs ch_PP-OCRv4_rec_distillation.yml
如果使用的是 PP-OCRv3 作为基础模型,建议使用 ch_PP-OCRv3_rec_distillation.yml 进行蒸馏训练,以保持与 PP-OCRv3 训练策略一致。
回答:因为我当初按照官网训练:
14870
Architecture:
model_type: &model_type "rec"
name: DistillationModel
algorithm: Distillation
Models:
Teacher:
pretrained:
freeze_params: false -----官方默认配置false
return_all_feats: true
model_type: *model_type
algorithm: SVTR_LCNet
可以训练,可以收敛,但是感觉对手写体测试有效果,原先印刷体失效的,后面怀疑 freeze_params: false 配置
有问题,根据bug 14866 建议GreatV大佬建议 改成 freeze_params: true,训练好久,损失率还很大,收敛不了
Models:
Teacher:
pretrained:
freeze_params: true
return_all_feats: true
model_type: *model_type
algorithm: SVTR_LCNet
Transform:
数据集是手写OCR汇总,由中科院手写数据和网上开源数据合并组合:https://aistudio.baidu.com/datasetdetail/102884/0
如果使用 PP-OCRv4 作为基础模型,则应使用 ch_PP-OCRv4_rec_distillation.yml,因为 PP-OCRv4 可能在蒸馏策略上有优化或新的调整。
用 ch_PP-OCRv4_rec_distillation.yml 训练会报错
python tools/train.py -c configs/rec/PP-OCRv4/ch_PP-OCRv4_rec_distillation.yml
[2025/03/17 15:13:06] ppocr ERROR: When parsing line train_data/138440.png 率普遍较高如方正
, error happened with msg: Traceback (most recent call last):
File "/home/PaddleOCR/ppocr/data/simple_dataset.py", line 137, in getitem
outs = transform(data, self.ops)
File "/home/PaddleOCR/ppocr/data/imaug/init.py", line 73, in transform
data = op(data)
File "/home/PaddleOCR/ppocr/data/imaug/operators.py", line 123, in call
data_list.append(data[key])
KeyError: 'valid_ratio' 然后尝试在配置里面把 - KeepKeys:
keep_keys:
- image
- label_ctc
- label_gtc
- length
- valid_ratio 把valid_ratio 去掉 要报其他错误
Exception in thread Thread-1 (_thread_loop):
Traceback (most recent call last):
File "/root/anaconda3/envs/ocr/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
self.run()
File "/root/anaconda3/envs/ocr/lib/python3.10/threading.py", line 953, in run
self._target(*self._args, **self._kwargs)
File "/root/anaconda3/envs/ocr/lib/python3.10/site-packages/paddle/io/dataloader/dataloader_iter.py", line 619, in _thread_loop
batch = self._get_data()
File "/root/anaconda3/envs/ocr/lib/python3.10/site-packages/paddle/io/dataloader/dataloader_iter.py", line 766, in _get_data
batch.reraise()
File "/root/anaconda3/envs/ocr/lib/python3.10/site-packages/paddle/io/dataloader/worker.py", line 195, in reraise
raise self.exc_type(msg)
ValueError: DataLoader worker(1) caught ValueError with message:
Traceback (most recent call last):
File "/root/anaconda3/envs/ocr/lib/python3.10/site-packages/paddle/io/dataloader/worker.py", line 380, in _worker_loop
batch = fetcher.fetch(indices)
File "/root/anaconda3/envs/ocr/lib/python3.10/site-packages/paddle/io/dataloader/fetcher.py", line 85, in fetch
data = self.collate_fn(data)
File "/root/anaconda3/envs/ocr/lib/python3.10/site-packages/paddle/io/dataloader/collate.py", line 75, in default_collate_fn
return [default_collate_fn(fields) for fields in zip(*batch)]
File "/root/anaconda3/envs/ocr/lib/python3.10/site-packages/paddle/io/dataloader/collate.py", line 75, in
return [default_collate_fn(fields) for fields in zip(*batch)]
File "/root/anaconda3/envs/ocr/lib/python3.10/site-packages/paddle/io/dataloader/collate.py", line 56, in default_collate_fn
batch = np.stack(batch, axis=0)
File "/root/anaconda3/envs/ocr/lib/python3.10/site-packages/numpy/core/shape_base.py", line 449, in stack
raise ValueError('all input arrays must have the same shape')
ValueError: all input arrays must have the same shape
Traceback (most recent call last):
File "/home/PaddleOCR/tools/train.py", line 270, in
main(config, device, logger, vdl_writer, seed)
File "/home/PaddleOCR/tools/train.py", line 223, in main
program.train(
File "/home/PaddleOCR/tools/program.py", line 312, in train
for idx, batch in enumerate(train_dataloader):
File "/root/anaconda3/envs/ocr/lib/python3.10/site-packages/paddle/io/dataloader/dataloader_iter.py", line 840, in next
self.reader.read_next_list()[0]
SystemError: (Fatal) Blocking queue is killed because the data reader raises an exception.
[Hint: Expected killed != true, but received killed_:1 == true:1.] (at ../paddle/phi/core/operators/reader/blocking_queue.h:175)
🏃♂️ Environment (运行环境)
release/2.10.0
🌰 Minimal Reproducible Example (最小可复现问题的Demo)
见bug描述
The text was updated successfully, but these errors were encountered: