Spaces:
Sleeping
A newer version of the Gradio SDK is available:
5.12.0
Code Migration
MMOCR has been designed in a way that there are a lot of shortcomings in the initial version in order to balance the tasks of text detection, recognition and key information extraction. In this 1.0 release, MMOCR synchronizes its new model architecture to align as much as possible with the overall OpenMMLab design and to achieve structural uniformity within the algorithm library. Although this upgrade is not fully backward compatible, we summarize the changes that may be of interest to developers for those who need them.
Fundamental Changes
Functional boundaries of modules has not been clearly defined in MMOCR 0.x. In MMOCR 1.0, we address this issue by refactoring the design of model modules. Here are some major changes in 1.0:
MMOCR 1.0 no longer supports named entity recognition tasks since it's not in the scope of OCR.
The module that computes the loss in a model is named as Module Loss, which is also responsible for the conversion of gold annotations into loss targets. Another module, Postprocessor, is responsible for decoding the model raw output into
DataSample
for the corresponding task at prediction time.The inputs of all models are now organized as a dictionary that consists of two keys:
inputs
, containing the original features of the images, andList[DataSample]
, containing the meta-information of the images. At training time, the output format of a model is standardized to a dictionary containing the loss tensors. Similarly, a model generates a sequence ofDataSample
s containing the prediction outputs in testing.In MMOCR 0.x, the majority of classes named
XXLoss
have the implementations closely bound to the corresponding model, while their names made users hard to tell them apart from other generic losses likeDiceLoss
. In 1.0, they are renamed to the formXXModuleLoss
. (e.g.DBLoss
was renamed toDBModuleLoss
). The key to their configurations in config files is also changed fromloss
tomodule_loss
.The names of generic loss classes that are not related to the model implementation are kept as
XXLoss
. (e.g.MaskedBCELoss
) They are all placed undermmocr/models/common/losses
.Changes under
mmocr/models/common/losses
:DiceLoss
is renamed toMaskedDiceLoss
.FocalLoss
has been removed.MMOCR 1.0 adds a Dictionary module which originates from label converter. It is used in text recognition and key information extraction tasks.
Text Detection Models
Key Changes (TL;DR)
The model weights from MMOCR 0.x still works in the 1.0, but the fields starting with
bbox_head
in the state dictstate_dict
need to be renamed todet_head
.XXTargets
transforms, which were responsible for genearting detection targets, have been merged intoXXModuleLoss
.
SingleStageTextDetector
The original inheritance chain was
mmdet.BaseDetector->SingleStageDetector->SingleStageTextDetector
. NowSingleStageTextDetector
is directly inherited fromBaseDetector
without extra dependency on MMDetection, andSingleStageDetector
is deleted.bbox_head
is renamed todet_head
.train_cfg
,test_cfg
andpretrained
fields are removed.forward_train()
andsimple_test()
are refactored toloss()
andpredict()
. The part ofsimple_test()
that was responsible for splitting the raw output of the model and feeding it intohead.get_bounary()
is integrated intoBaseTextDetPostProcessor
.TextDetectorMixin
has been removed since its implementation overlaps withTextDetLocalVisualizer
.
Head
HeadMixin
, the base class thatXXXHead
had to inherit from in version 0.x, has been replaced byBaseTextDetHead
.get_boundary()
andresize_boundary()
are now rewritten as__call__()
andrescale()
inBaseTextDetPostProcessor
.
ModuleLoss
- Data transforms
XXXTargets
in text detection tasks are all moved toXXXModuleLoss._get_target_single()
. Target-related configurations are no longer specified in the data pipeline but inXXXLoss
instead.
Postprocessor
The logic in the original
XXXPostprocessor.__call__()
are transferred to the refactoredXXXPostprocessor.get_text_instances()
.BasePostprocessor
is refactored toBaseTextDetPostProcessor
. This base class splits and processes the model output predictions one by one and supports automatic scaling of the output polygon or bounding box based onscale_factor
.
Text Recognition
Key Changes (TL;DR)
Due to the change of the character order and some bugs in the model architecture being fixed, the recognition model weights in 0.x can no longer be directly used in 1.0. We will provide a migration script and tutorial for those who need it.
The support of SegOCR has been removed. TPS-CRNN will still be supported in a later version.
Test time augmentation will be supported in the upcoming release.
Label converter module has been removed and its functions have been split into Dictionary, ModuleLoss and Postprocessor.
The definition of
max_seq_len
has been unified and now it represents the original output length of the model.
Label Converter
The original label converters had spelling errors (written as label convertors). We fixed them by removing label converters from this project.
The part responsible for converting characters/strings to and from numeric indexes was extracted to Dictionary.
In older versions, different label converters would have different special character sets and character order. In version 0.x, the character order was as follows.
Converter | Character order |
---|---|
AttnConvertor , ABIConvertor |
<UKN> , <BOS/EOS> , <PAD> , characters |
CTCConvertor |
<BLK> , <UKN> , characters |
In 1.0, instead of designing different dictionaries and character orders for different tasks, we have a unified Dictionary implementation with the character order always as characters, <BOS/EOS>, <PAD>, <UKN>. <BLK> in CTCConvertor
has been equivalently replaced by <PAD>.
Label convertor originally supported three ways to initialize dictionaries:
dict_type
,dict_file
anddict_list
, which are now reduced todict_file
only inDictionary
. Also, we have put those pre-defined character sets originally supported indict_type
intodicts/
directory now. The corresponding mapping is as follows:MMOCR 0.x: dict_type
MMOCR 1.0: Dict path DICT90 dicts/english_digits_symbols.txt DICT91 dicts/english_digits_symbols_space.txt DICT36 dicts/lower_english_digits.txt DICT37 dicts/lower_english_digits_space.txt The implementation of
str2tensor()
in label converter has been moved toModuleLoss.get_targets()
. The following table shows the correspondence between the old and new method implementations. Note that the old and new implementations are not identical.MMOCR 0.x MMOCR 1.0 Note ABIConvertor.str2tensor()
,AttnConvertor.str2tensor()
BaseTextRecogModuleLoss.get_targets()
The different implementations between ABIConvertor.str2tensor()
andAttnConvertor.str2tensor()
have been unified in the new version.CTCConvertor.str2tensor()
CTCModuleLoss.get_targets()
The implementation of
tensor2idx()
in label converter has been moved toPostprocessor.get_single_prediction()
. The following table shows the correspondence between the old and new method implementations. Note that the old and new implementations are not identical.MMOCR 0.x MMOCR 1.0 ABIConvertor.tensor2idx()
,AttnConvertor.tensor2idx()
AttentionPostprocessor.get_single_prediction()
CTCConvertor.tensor2idx()
CTCPostProcessor.get_single_prediction()
Key Information Extraction
Key Changes (TL;DR)
- Due to changes in the inputs to the model, the model weights obtained in 0.x can no longer be directly used in 1.0.
KIEDataset & OpensetKIEDataset
The part that reads data is kept in
WildReceiptDataset
.The part that additionally processes the nodes and edges is moved to
LoadKIEAnnotation
.The part that uses dictionaries to transform text is moved to
SDMGRHead.convert_text()
, with the help of Dictionary.The part of
compute_relation()
that computes the relationships between text boxes is moved toSDMGRHead.compute_relations()
. It's now done inside the model.The part that evaluates the model performance is done in
F1Metric
.The part of
OpensetKIEDataset
that processes model's edge outputs is moved toSDMGRPostProcessor
.
SDMGR
show_result()
is integrated intoKIEVisualizer
.The part of
forward_test()
that post-processes the output is organized inSDMGRPostProcessor
.
Utils Migration
Utility functions are now grouped together under mmocr/utils/
. Here are the scopes of the files in this directory:
- bbox_utils.py: bounding box related functions.
- check_argument.py: used to check argument type.
- collect_env.py: used to collect running environment.
- data_converter_utils.py: used for data format conversion.
- fileio.py: file input and output related functions.
- img_utils.py: image processing related functions.
- mask_utils.py: mask related functions.
- ocr.py: used for MMOCR inference.
- parsers.py: used for parsing datasets.
- polygon_utils.py: polygon related functions.
- setup_env.py: used for initialize MMOCR.
- string_utils.py: string related functions.
- typing.py: defines the abbreviation of types used in MMOCR.