# Inference In OpenMMLab, all the inference operations are unified into a new interface - `Inferencer`. `Inferencer` is designed to expose a neat and simple API to users, and shares very similar interface across different OpenMMLab libraries. In MMOCR, Inferencers are constructed in different levels of task abstraction. - Standard Inferencer: Following OpenMMLab's convention, each fundamental task in MMOCR has a standard Inferencer, namely `TextDetInferencer` (text detection), `TextRecInferencer` (text recognition), `TextSpottingInferencer` (end-to-end OCR), and `KIEInferencer` (key information extraction). They are designed to perform inference on a single task, and can be chained together to perform inference on a series of tasks. They also share very similar interface, have standard input/output protocol, and overall follow the OpenMMLab design. - **MMOCRInferencer**: We also provide `MMOCRInferencer`, a convenient inference interface only designed for MMOCR. It encapsulates and chains all the Inferencers in MMOCR, so users can use this Inferencer to perform a series of tasks on an image and directly get the final result in an end-to-end manner. *However, it has a relatively different interface from other standard Inferencers, and some of standard Inferencer functionalities might be sacrificed for the sake of simplicity.* For new users, we recommend using **MMOCRInferencer** to test out different combinations of models. If you are a developer and wish to integrate the models into your own project, we recommend using **standard Inferencers**, as they are more flexible and standardized, equipped with full functionalities. ## Basic Usage `````{tabs} ````{group-tab} MMOCRInferencer As of now, `MMOCRInferencer` can perform inference on the following tasks: - Text detection - Text recognition - OCR (text detection + text recognition) - Key information extraction (text detection + text recognition + key information extraction) - *OCR (text spotting)* (coming soon) For convenience, `MMOCRInferencer` provides both Python and command line interfaces. For example, if you want to perform OCR inference on `demo/demo_text_ocr.jpg` with `DBNet` as the text detection model and `CRNN` as the text recognition model, you can simply run the following command: ::::{tabs} :::{code-tab} python >>> from mmocr.apis import MMOCRInferencer >>> # Load models into memory >>> ocr = MMOCRInferencer(det='DBNet', rec='SAR') >>> # Perform inference >>> ocr('demo/demo_text_ocr.jpg', show=True) ::: :::{code-tab} bash python tools/infer.py demo/demo_text_ocr.jpg --det DBNet --rec SAR --show ::: :::: The resulting OCR output will be displayed in a new window:
```{note} If you are running MMOCR on a server without GUI or via SSH tunnel with X11 forwarding disabled, the `show` option will not work. However, you can still save visualizations to files by setting `out_dir` and `save_vis=True` arguments. Read [Dumping Results](#dumping-results) for details. ``` Depending on the initialization arguments, `MMOCRInferencer` can run in different modes. For example, it can run in KIE mode if it is initialized with `det`, `rec` and `kie` specified. ::::{tabs} :::{code-tab} python >>> kie = MMOCRInferencer(det='DBNet', rec='SAR', kie='SDMGR') >>> kie('demo/demo_kie.jpeg', show=True) ::: :::{code-tab} bash python tools/infer.py demo/demo_kie.jpeg --det DBNet --rec SAR --kie SDMGR --show ::: :::: The output image should look like this:

You may have found that the Python interface and the command line interface of `MMOCRInferencer` are very similar. The following sections will use the Python interface as an example to introduce the usage of `MMOCRInferencer`. For more information about the command line interface, please refer to [Command Line Interface](#command-line-interface). ```` ````{group-tab} Standard Inferencer In general, all the standard Inferencers across OpenMMLab share a very similar interface. The following example shows how to use `TextDetInferencer` to perform inference on a single image. ```python >>> from mmocr.apis import TextDetInferencer >>> # Load models into memory >>> inferencer = TextDetInferencer(model='DBNet') >>> # Inference >>> inferencer('demo/demo_text_ocr.jpg', show=True) ``` The visualization result should look like:
```` ````` ## Initialization Each Inferencer must be initialized with a model. You can also choose the inference device during initialization. ### Model Initialization `````{tabs} ````{group-tab} MMOCRInferencer For each task, `MMOCRInferencer` takes two arguments in the form of `xxx` and `xxx_weights` (e.g. `det` and `det_weights`) for initialization, and there are many ways to initialize a model for inference. We will take `det` and `det_weights` as an example to illustrate some typical ways to initialize a model. - To infer with MMOCR's pre-trained model, passing its name to the argument `det` can work. The weights will be automatically downloaded and loaded from OpenMMLab's model zoo. Check [Weights](../modelzoo.md#weights) for available model names. ```python >>> MMOCRInferencer(det='DBNet') ``` - To load custom config and weight, you can pass the path to the config file to `det` and the path to the weight to `det_weights`. ```python >>> MMOCRInferencer(det='path/to/dbnet_config.py', det_weights='path/to/dbnet.pth') ``` You may click on the "Standard Inferencer" tab to find more initialization methods. ```` ````{group-tab} Standard Inferencer Every standard `Inferencer` accepts two parameters, `model` and `weights`. (In `MMOCRInferencer`, they are referred to as `xxx` and `xxx_weights`) - `model` takes either the name of a model, or the path to a config file as input. The name of a model is obtained from the model's metafile ([Example](https://github.com/open-mmlab/mmocr/blob/1.x/configs/textdet/dbnet/metafile.yml)) indexed from [model-index.yml](https://github.com/open-mmlab/mmocr/blob/1.x/model-index.yml). You can find the list of available weights [here](../modelzoo.md#weights). - `weights` accepts the path to a weight file.
There are various ways to initialize a model. - To infer with MMOCR's pre-trained model, you can pass its name to `model`. The weights will be automatically downloaded and loaded from OpenMMLab's model zoo. ```python >>> from mmocr.apis import TextDetInferencer >>> inferencer = TextDetInferencer(model='DBNet') ``` ```{note} The model type must match the Inferencer type. ``` You can load another weight by passing its path/url to `weights`. ```python >>> inferencer = TextDetInferencer(model='DBNet', weights='path/to/dbnet.pth') ``` - To load custom config and weight, you can pass the path to the config file to `model` and the path to the weight to `weights`. ```python >>> inferencer = TextDetInferencer(model='path/to/dbnet_config.py', weights='path/to/dbnet.pth') ``` - By default, [MMEngine](https://github.com/open-mmlab/mmengine/) dumps config to the weight. If you have a weight trained on MMEngine, you can also pass the path to the weight file to `weights` without specifying `model`: ```python >>> # It will raise an error if the config file cannot be found in the weight >>> inferencer = TextDetInferencer(weights='path/to/dbnet.pth') ``` - Passing config file to `model` without specifying `weight` will result in a randomly initialized model. ```` ````` ### Device Each Inferencer instance is bound to a device. By default, the best device is automatically decided by [MMEngine](https://github.com/open-mmlab/mmengine/). You can also alter the device by specifying the `device` argument. For example, you can use the following code to create an Inferencer on GPU 1. `````{tabs} ````{group-tab} MMOCRInferencer ```python >>> inferencer = MMOCRInferencer(det='DBNet', device='cuda:1') ``` ```` ````{group-tab} Standard Inferencer ```python >>> inferencer = TextDetInferencer(model='DBNet', device='cuda:1') ``` ```` ````` To create an Inferencer on CPU: `````{tabs} ````{group-tab} MMOCRInferencer ```python >>> inferencer = MMOCRInferencer(det='DBNet', device='cpu') ``` ```` ````{group-tab} Standard Inferencer ```python >>> inferencer = TextDetInferencer(model='DBNet', device='cpu') ``` ```` ````` Refer to [torch.device](torch.device) for all the supported forms. ## Inference Once the Inferencer is initialized, you can directly pass in the raw data to be inferred and get the inference results from return values. ### Input `````{tabs} ````{tab} MMOCRInferencer / TextDetInferencer / TextRecInferencer / TextSpottingInferencer Input can be either of these types: - str: Path/URL to the image. ```python >>> inferencer('demo/demo_text_ocr.jpg') ``` - array: Image in numpy array. It should be in BGR order. ```python >>> import mmcv >>> array = mmcv.imread('demo/demo_text_ocr.jpg') >>> inferencer(array) ``` - list: A list of basic types above. Each element in the list will be processed separately. ```python >>> inferencer(['img_1.jpg', 'img_2.jpg]) >>> # You can even mix the types >>> inferencer(['img_1.jpg', array]) ``` - str: Path to the directory. All images in the directory will be processed. ```python >>> inferencer('tests/data/det_toy_dataset/imgs/test/') ``` ```` ````{tab} KIEInferencer Input can be a dict or list[dict], where each dictionary contains following keys: - `img` (str or ndarray): Path to the image or the image itself. If KIE Inferencer is used in no-visual mode, this key is not required. If it's an numpy array, it should be in BGR order. - `img_shape` (tuple(int, int)): Image shape in (H, W). Only required when KIE Inferencer is used in no-visual mode and no `img` is provided. - `instances` (list[dict]): A list of instances. Each `instance` looks like the following: ```python { # A nested list of 4 numbers representing the bounding box of # the instance, in (x1, y1, x2, y2) order. "bbox": np.array([[x1, y1, x2, y2], [x1, y1, x2, y2], ...], dtype=np.int32), # List of texts. "texts": ['text1', 'text2', ...], } ``` ```` ````` ### Output By default, each `Inferencer` returns the prediction results in a dictionary format. - `visualization` contains the visualized predictions. But it's an empty list by default unless `return_vis=True`. - `predictions` contains the predictions results in a json-serializable format. As presented below, the contents are slightly different depending on the task type. `````{tabs} :::{group-tab} MMOCRInferencer ```python { 'predictions' : [ # Each instance corresponds to an input image { 'det_polygons': [...], # 2d list of length (N,), format: [x1, y1, x2, y2, ...] 'det_scores': [...], # float list of length (N,) 'det_bboxes': [...], # 2d list of shape (N, 4), format: [min_x, min_y, max_x, max_y] 'rec_texts': [...], # str list of length (N,) 'rec_scores': [...], # float list of length (N,) 'kie_labels': [...], # node labels, length (N, ) 'kie_scores': [...], # node scores, length (N, ) 'kie_edge_scores': [...], # edge scores, shape (N, N) 'kie_edge_labels': [...] # edge labels, shape (N, N) }, ... ], 'visualization' : [ array(..., dtype=uint8), ] } ``` ::: :::{group-tab} Standard Inferencer ````{tabs} ```{code-tab} python TextDetInferencer { 'predictions' : [ # Each instance corresponds to an input image { 'polygons': [...], # 2d list of len (N,) in the format of [x1, y1, x2, y2, ...] 'bboxes': [...], # 2d list of shape (N, 4), in the format of [min_x, min_y, max_x, max_y] 'scores': [...] # list of float, len (N, ) }, ] 'visualization' : [ array(..., dtype=uint8), ] } ``` ```{code-tab} python TextRecInferencer { 'predictions' : [ # Each instance corresponds to an input image { 'text': '...', # a string 'scores': 0.1, # a float }, ... ] 'visualization' : [ array(..., dtype=uint8), ] } ``` ```{code-tab} python TextSpottingInferencer { 'predictions' : [ # Each instance corresponds to an input image { 'polygons': [...], # 2d list of len (N,) in the format of [x1, y1, x2, y2, ...] 'bboxes': [...], # 2d list of shape (N, 4), in the format of [min_x, min_y, max_x, max_y] 'scores': [...] # list of float, len (N, ) 'texts': ['...',] # list of texts, len (N, ) }, ] 'visualization' : [ array(..., dtype=uint8), ] } ``` ```{code-tab} python KIEInferencer { 'predictions' : [ # Each instance corresponds to an input image { 'labels': [...], # node label, len (N,) 'scores': [...], # node scores, len (N, ) 'edge_scores': [...], # edge scores, shape (N, N) 'edge_labels': [...], # edge labels, shape (N, N) }, ] 'visualization' : [ array(..., dtype=uint8), ] } ``` ```` ::: ````` If you wish to get the raw outputs from the model, you can set `return_datasamples` to `True` to get the original [DataSample](structures.md), which will be stored in `predictions`. ### Dumping Results Apart from obtaining predictions from the return value, you can also export the predictions/visualizations to files by setting `out_dir` and `save_pred`/`save_vis` arguments. ```python >>> inferencer('img_1.jpg', out_dir='outputs/', save_pred=True, save_vis=True) ``` Results in the directory structure like: ```text outputs ├── preds │ └── img_1.json └── vis └── img_1.jpg ``` The filename of each file is the same as the corresponding input image filename. If the input image is an array, the filename will be a number starting from 0. ### Batch Inference You can customize the batch size by setting `batch_size`. The default batch size is 1. ## API Here are extensive lists of parameters that you can use. ````{tabs} ```{group-tab} MMOCRInferencer **MMOCRInferencer.\_\_init\_\_():** | Arguments | Type | Default | Description | | ------------- | ---------------------------------------------------- | ------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `det` | str or [Weights](../modelzoo.html#weights), optional | None | Pretrained text detection algorithm. It's the path to the config file or the model name defined in metafile. | | `det_weights` | str, optional | None | Path to the custom checkpoint file of the selected det model. If it is not specified and "det" is a model name of metafile, the weights will be loaded from metafile. | | `rec` | str or [Weights](../modelzoo.html#weights), optional | None | Pretrained text recognition algorithm. It’s the path to the config file or the model name defined in metafile. | | `rec_weights` | str, optional | None | Path to the custom checkpoint file of the selected rec model. If it is not specified and “rec” is a model name of metafile, the weights will be loaded from metafile. | | `kie` \[1\] | str or [Weights](../modelzoo.html#weights), optional | None | Pretrained key information extraction algorithm. It’s the path to the config file or the model name defined in metafile. | | `kie_weights` | str, optional | None | Path to the custom checkpoint file of the selected kie model. If it is not specified and “kie” is a model name of metafile, the weights will be loaded from metafile. | | `device` | str, optional | None | Device used for inference, accepting all allowed strings by `torch.device`. E.g., 'cuda:0' or 'cpu'. If None, the available device will be automatically used. Defaults to None. | \[1\]: `kie` is only effective when both text detection and recognition models are specified. **MMOCRInferencer.\_\_call\_\_()** | Arguments | Type | Default | Description | | -------------------- | ----------------------- | ------------ | ------------------------------------------------------------------------------------------------ | | `inputs` | str/list/tuple/np.array | **required** | It can be a path to an image/a folder, an np array or a list/tuple (with img paths or np arrays) | | `return_datasamples` | bool | False | Whether to return results as DataSamples. If False, the results will be packed into a dict. | | `batch_size` | int | 1 | Inference batch size. | | `det_batch_size` | int, optional | None | Inference batch size for text detection model. Overwrite batch_size if it is not None. | | `rec_batch_size` | int, optional | None | Inference batch size for text recognition model. Overwrite batch_size if it is not None. | | `kie_batch_size` | int, optional | None | Inference batch size for KIE model. Overwrite batch_size if it is not None. | | `return_vis` | bool | False | Whether to return the visualization result. | | `print_result` | bool | False | Whether to print the inference result to the console. | | `show` | bool | False | Whether to display the visualization results in a popup window. | | `wait_time` | float | 0 | The interval of show(s). | | `out_dir` | str | `results/` | Output directory of results. | | `save_vis` | bool | False | Whether to save the visualization results to `out_dir`. | | `save_pred` | bool | False | Whether to save the inference results to `out_dir`. | ``` ```{group-tab} Standard Inferencer **Inferencer.\_\_init\_\_():** | Arguments | Type | Default | Description | | --------- | ---------------------------------------------------- | ------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `model` | str or [Weights](../modelzoo.html#weights), optional | None | Path to the config file or the model name defined in metafile. | | `weights` | str, optional | None | Path to the custom checkpoint file of the selected det model. If it is not specified and "det" is a model name of metafile, the weights will be loaded from metafile. | | `device` | str, optional | None | Device used for inference, accepting all allowed strings by `torch.device`. E.g., 'cuda:0' or 'cpu'. If None, the available device will be automatically used. Defaults to None. | **Inferencer.\_\_call\_\_()** | Arguments | Type | Default | Description | | -------------------- | ----------------------- | ------------ | ---------------------------------------------------------------------------------------------------------------- | | `inputs` | str/list/tuple/np.array | **required** | It can be a path to an image/a folder, an np array or a list/tuple (with img paths or np arrays) | | `return_datasamples` | bool | False | Whether to return results as DataSamples. If False, the results will be packed into a dict. | | `batch_size` | int | 1 | Inference batch size. | | `progress_bar` | bool | True | Whether to show a progress bar. | | `return_vis` | bool | False | Whether to return the visualization result. | | `print_result` | bool | False | Whether to print the inference result to the console. | | `show` | bool | False | Whether to display the visualization results in a popup window. | | `wait_time` | float | 0 | The interval of show(s). | | `draw_pred` | bool | True | Whether to draw predicted bounding boxes. *Only applicable on `TextDetInferencer` and `TextSpottingInferencer`.* | | `out_dir` | str | `results/` | Output directory of results. | | `save_vis` | bool | False | Whether to save the visualization results to `out_dir`. | | `save_pred` | bool | False | Whether to save the inference results to `out_dir`. | ``` ```` ## Command Line Interface ```{note} This section is only applicable to `MMOCRInferencer`. ``` You can use `tools/infer.py` to perform inference through `MMOCRInferencer`. Its general usage is as follows: ```bash python tools/infer.py INPUT_PATH [--det DET] [--det-weights ...] ... ``` where `INPUT_PATH` is a required field, which should be a path to an image or a folder. Command-line parameters follow the mapping relationship with the Python interface parameters as follows: - To convert the Python interface parameters to the command line ones, you need to add two `--` in front of the Python interface parameters, and replace the underscore `_` with the hyphen `-`. For example, `out_dir` becomes `--out-dir`. - For boolean type parameters, putting the parameter in the command is equivalent to specifying it as True. For example, `--show` will specify the `show` parameter as True. In addition, the command line will not display the inference result by default. You can use the `--print-result` parameter to view the inference result. Here is an example: ```bash python tools/infer.py demo/demo_text_ocr.jpg --det DBNet --rec SAR --show --print-result ``` Running this command will give the following result: ```bash {'predictions': [{'rec_texts': ['CBank', 'Docbcba', 'GROUP', 'MAUN', 'CROBINSONS', 'AOCOC', '916M3', 'BOO9', 'Oven', 'BRANDS', 'ARETAIL', '14', '70S', 'ROUND', 'SALE', 'YEAR', 'ALLY', 'SALE', 'SALE'], 'rec_scores': [0.9753464579582214, ...], 'det_polygons': [[551.9930285844646, 411.9138765335083, 553.6153911653112, 383.53195309638977, 620.2410061195247, 387.33785033226013, 618.6186435386782, 415.71977376937866], ...], 'det_scores': [0.8230461478233337, ...]}]} ```