sunnychenxiwang's picture
Upload 1600 files
14c9181 verified
# Inference
In OpenMMLab, all the inference operations are unified into a new interface - `Inferencer`. `Inferencer` is designed to expose a neat and simple API to users, and shares very similar interface across different OpenMMLab libraries.
In MMOCR, Inferencers are constructed in different levels of task abstraction.
- Standard Inferencer: Following OpenMMLab's convention, each fundamental task in MMOCR has a standard Inferencer, namely `TextDetInferencer` (text detection), `TextRecInferencer` (text recognition), `TextSpottingInferencer` (end-to-end OCR), and `KIEInferencer` (key information extraction). They are designed to perform inference on a single task, and can be chained together to perform inference on a series of tasks. They also share very similar interface, have standard input/output protocol, and overall follow the OpenMMLab design.
- **MMOCRInferencer**: We also provide `MMOCRInferencer`, a convenient inference interface only designed for MMOCR. It encapsulates and chains all the Inferencers in MMOCR, so users can use this Inferencer to perform a series of tasks on an image and directly get the final result in an end-to-end manner. *However, it has a relatively different interface from other standard Inferencers, and some of standard Inferencer functionalities might be sacrificed for the sake of simplicity.*
For new users, we recommend using **MMOCRInferencer** to test out different combinations of models.
If you are a developer and wish to integrate the models into your own project, we recommend using **standard Inferencers**, as they are more flexible and standardized, equipped with full functionalities.
## Basic Usage
`````{tabs}
````{group-tab} MMOCRInferencer
As of now, `MMOCRInferencer` can perform inference on the following tasks:
- Text detection
- Text recognition
- OCR (text detection + text recognition)
- Key information extraction (text detection + text recognition + key information extraction)
- *OCR (text spotting)* (coming soon)
For convenience, `MMOCRInferencer` provides both Python and command line interfaces. For example, if you want to perform OCR inference on `demo/demo_text_ocr.jpg` with `DBNet` as the text detection model and `CRNN` as the text recognition model, you can simply run the following command:
::::{tabs}
:::{code-tab} python
>>> from mmocr.apis import MMOCRInferencer
>>> # Load models into memory
>>> ocr = MMOCRInferencer(det='DBNet', rec='SAR')
>>> # Perform inference
>>> ocr('demo/demo_text_ocr.jpg', show=True)
:::
:::{code-tab} bash
python tools/infer.py demo/demo_text_ocr.jpg --det DBNet --rec SAR --show
:::
::::
The resulting OCR output will be displayed in a new window:
<div align="center">
<img src="https://user-images.githubusercontent.com/22607038/220563262-e9c1ab52-9b96-4d9c-bcb6-f55ff0b9e1be.png" height="250"/>
</div>
```{note}
If you are running MMOCR on a server without GUI or via SSH tunnel with X11 forwarding disabled, the `show` option will not work. However, you can still save visualizations to files by setting `out_dir` and `save_vis=True` arguments. Read [Dumping Results](#dumping-results) for details.
```
Depending on the initialization arguments, `MMOCRInferencer` can run in different modes. For example, it can run in KIE mode if it is initialized with `det`, `rec` and `kie` specified.
::::{tabs}
:::{code-tab} python
>>> kie = MMOCRInferencer(det='DBNet', rec='SAR', kie='SDMGR')
>>> kie('demo/demo_kie.jpeg', show=True)
:::
:::{code-tab} bash
python tools/infer.py demo/demo_kie.jpeg --det DBNet --rec SAR --kie SDMGR --show
:::
::::
The output image should look like this:
<div align="center">
<img src="https://user-images.githubusercontent.com/22607038/220569700-fd4894bc-f65a-405e-95e7-ebd2d614aedd.png" height="250"/>
</div>
<br />
You may have found that the Python interface and the command line interface of `MMOCRInferencer` are very similar. The following sections will use the Python interface as an example to introduce the usage of `MMOCRInferencer`. For more information about the command line interface, please refer to [Command Line Interface](#command-line-interface).
````
````{group-tab} Standard Inferencer
In general, all the standard Inferencers across OpenMMLab share a very similar interface. The following example shows how to use `TextDetInferencer` to perform inference on a single image.
```python
>>> from mmocr.apis import TextDetInferencer
>>> # Load models into memory
>>> inferencer = TextDetInferencer(model='DBNet')
>>> # Inference
>>> inferencer('demo/demo_text_ocr.jpg', show=True)
```
The visualization result should look like:
<div align="center">
<img src="https://user-images.githubusercontent.com/22607038/221418215-2431d0e9-e16e-4deb-9c52-f8b86801706a.png" height="250"/>
</div>
````
`````
## Initialization
Each Inferencer must be initialized with a model. You can also choose the inference device during initialization.
### Model Initialization
`````{tabs}
````{group-tab} MMOCRInferencer
For each task, `MMOCRInferencer` takes two arguments in the form of `xxx` and `xxx_weights` (e.g. `det` and `det_weights`) for initialization, and there are many ways to initialize a model for inference. We will take `det` and `det_weights` as an example to illustrate some typical ways to initialize a model.
- To infer with MMOCR's pre-trained model, passing its name to the argument `det` can work. The weights will be automatically downloaded and loaded from OpenMMLab's model zoo. Check [Weights](../modelzoo.md#weights) for available model names.
```python
>>> MMOCRInferencer(det='DBNet')
```
- To load custom config and weight, you can pass the path to the config file to `det` and the path to the weight to `det_weights`.
```python
>>> MMOCRInferencer(det='path/to/dbnet_config.py', det_weights='path/to/dbnet.pth')
```
You may click on the "Standard Inferencer" tab to find more initialization methods.
````
````{group-tab} Standard Inferencer
Every standard `Inferencer` accepts two parameters, `model` and `weights`. (In `MMOCRInferencer`, they are referred to as `xxx` and `xxx_weights`)
- `model` takes either the name of a model, or the path to a config file as input. The name of a model is obtained from the model's metafile ([Example](https://github.com/open-mmlab/mmocr/blob/1.x/configs/textdet/dbnet/metafile.yml)) indexed from [model-index.yml](https://github.com/open-mmlab/mmocr/blob/1.x/model-index.yml). You can find the list of available weights [here](../modelzoo.md#weights).
- `weights` accepts the path to a weight file.
<br />
There are various ways to initialize a model.
- To infer with MMOCR's pre-trained model, you can pass its name to `model`. The weights will be automatically downloaded and loaded from OpenMMLab's model zoo.
```python
>>> from mmocr.apis import TextDetInferencer
>>> inferencer = TextDetInferencer(model='DBNet')
```
```{note}
The model type must match the Inferencer type.
```
You can load another weight by passing its path/url to `weights`.
```python
>>> inferencer = TextDetInferencer(model='DBNet', weights='path/to/dbnet.pth')
```
- To load custom config and weight, you can pass the path to the config file to `model` and the path to the weight to `weights`.
```python
>>> inferencer = TextDetInferencer(model='path/to/dbnet_config.py', weights='path/to/dbnet.pth')
```
- By default, [MMEngine](https://github.com/open-mmlab/mmengine/) dumps config to the weight. If you have a weight trained on MMEngine, you can also pass the path to the weight file to `weights` without specifying `model`:
```python
>>> # It will raise an error if the config file cannot be found in the weight
>>> inferencer = TextDetInferencer(weights='path/to/dbnet.pth')
```
- Passing config file to `model` without specifying `weight` will result in a randomly initialized model.
````
`````
### Device
Each Inferencer instance is bound to a device.
By default, the best device is automatically decided by [MMEngine](https://github.com/open-mmlab/mmengine/). You can also alter the device by specifying the `device` argument. For example, you can use the following code to create an Inferencer on GPU 1.
`````{tabs}
````{group-tab} MMOCRInferencer
```python
>>> inferencer = MMOCRInferencer(det='DBNet', device='cuda:1')
```
````
````{group-tab} Standard Inferencer
```python
>>> inferencer = TextDetInferencer(model='DBNet', device='cuda:1')
```
````
`````
To create an Inferencer on CPU:
`````{tabs}
````{group-tab} MMOCRInferencer
```python
>>> inferencer = MMOCRInferencer(det='DBNet', device='cpu')
```
````
````{group-tab} Standard Inferencer
```python
>>> inferencer = TextDetInferencer(model='DBNet', device='cpu')
```
````
`````
Refer to [torch.device](torch.device) for all the supported forms.
## Inference
Once the Inferencer is initialized, you can directly pass in the raw data to be inferred and get the inference results from return values.
### Input
`````{tabs}
````{tab} MMOCRInferencer / TextDetInferencer / TextRecInferencer / TextSpottingInferencer
Input can be either of these types:
- str: Path/URL to the image.
```python
>>> inferencer('demo/demo_text_ocr.jpg')
```
- array: Image in numpy array. It should be in BGR order.
```python
>>> import mmcv
>>> array = mmcv.imread('demo/demo_text_ocr.jpg')
>>> inferencer(array)
```
- list: A list of basic types above. Each element in the list will be processed separately.
```python
>>> inferencer(['img_1.jpg', 'img_2.jpg])
>>> # You can even mix the types
>>> inferencer(['img_1.jpg', array])
```
- str: Path to the directory. All images in the directory will be processed.
```python
>>> inferencer('tests/data/det_toy_dataset/imgs/test/')
```
````
````{tab} KIEInferencer
Input can be a dict or list[dict], where each dictionary contains
following keys:
- `img` (str or ndarray): Path to the image or the image itself. If KIE Inferencer is used in no-visual mode, this key is not required.
If it's an numpy array, it should be in BGR order.
- `img_shape` (tuple(int, int)): Image shape in (H, W). Only required when KIE Inferencer is used in no-visual mode and no `img` is provided.
- `instances` (list[dict]): A list of instances.
Each `instance` looks like the following:
```python
{
# A nested list of 4 numbers representing the bounding box of
# the instance, in (x1, y1, x2, y2) order.
"bbox": np.array([[x1, y1, x2, y2], [x1, y1, x2, y2], ...],
dtype=np.int32),
# List of texts.
"texts": ['text1', 'text2', ...],
}
```
````
`````
### Output
By default, each `Inferencer` returns the prediction results in a dictionary format.
- `visualization` contains the visualized predictions. But it's an empty list by default unless `return_vis=True`.
- `predictions` contains the predictions results in a json-serializable format. As presented below, the contents are slightly different depending on the task type.
`````{tabs}
:::{group-tab} MMOCRInferencer
```python
{
'predictions' : [
# Each instance corresponds to an input image
{
'det_polygons': [...], # 2d list of length (N,), format: [x1, y1, x2, y2, ...]
'det_scores': [...], # float list of length (N,)
'det_bboxes': [...], # 2d list of shape (N, 4), format: [min_x, min_y, max_x, max_y]
'rec_texts': [...], # str list of length (N,)
'rec_scores': [...], # float list of length (N,)
'kie_labels': [...], # node labels, length (N, )
'kie_scores': [...], # node scores, length (N, )
'kie_edge_scores': [...], # edge scores, shape (N, N)
'kie_edge_labels': [...] # edge labels, shape (N, N)
},
...
],
'visualization' : [
array(..., dtype=uint8),
]
}
```
:::
:::{group-tab} Standard Inferencer
````{tabs}
```{code-tab} python TextDetInferencer
{
'predictions' : [
# Each instance corresponds to an input image
{
'polygons': [...], # 2d list of len (N,) in the format of [x1, y1, x2, y2, ...]
'bboxes': [...], # 2d list of shape (N, 4), in the format of [min_x, min_y, max_x, max_y]
'scores': [...] # list of float, len (N, )
},
]
'visualization' : [
array(..., dtype=uint8),
]
}
```
```{code-tab} python TextRecInferencer
{
'predictions' : [
# Each instance corresponds to an input image
{
'text': '...', # a string
'scores': 0.1, # a float
},
...
]
'visualization' : [
array(..., dtype=uint8),
]
}
```
```{code-tab} python TextSpottingInferencer
{
'predictions' : [
# Each instance corresponds to an input image
{
'polygons': [...], # 2d list of len (N,) in the format of [x1, y1, x2, y2, ...]
'bboxes': [...], # 2d list of shape (N, 4), in the format of [min_x, min_y, max_x, max_y]
'scores': [...] # list of float, len (N, )
'texts': ['...',] # list of texts, len (N, )
},
]
'visualization' : [
array(..., dtype=uint8),
]
}
```
```{code-tab} python KIEInferencer
{
'predictions' : [
# Each instance corresponds to an input image
{
'labels': [...], # node label, len (N,)
'scores': [...], # node scores, len (N, )
'edge_scores': [...], # edge scores, shape (N, N)
'edge_labels': [...], # edge labels, shape (N, N)
},
]
'visualization' : [
array(..., dtype=uint8),
]
}
```
````
:::
`````
If you wish to get the raw outputs from the model, you can set `return_datasamples` to `True` to get the original [DataSample](structures.md), which will be stored in `predictions`.
### Dumping Results
Apart from obtaining predictions from the return value, you can also export the predictions/visualizations to files by setting `out_dir` and `save_pred`/`save_vis` arguments.
```python
>>> inferencer('img_1.jpg', out_dir='outputs/', save_pred=True, save_vis=True)
```
Results in the directory structure like:
```text
outputs
β”œβ”€β”€ preds
β”‚ └── img_1.json
└── vis
└── img_1.jpg
```
The filename of each file is the same as the corresponding input image filename. If the input image is an array, the filename will be a number starting from 0.
### Batch Inference
You can customize the batch size by setting `batch_size`. The default batch size is 1.
## API
Here are extensive lists of parameters that you can use.
````{tabs}
```{group-tab} MMOCRInferencer
**MMOCRInferencer.\_\_init\_\_():**
| Arguments | Type | Default | Description |
| ------------- | ---------------------------------------------------- | ------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `det` | str or [Weights](../modelzoo.html#weights), optional | None | Pretrained text detection algorithm. It's the path to the config file or the model name defined in metafile. |
| `det_weights` | str, optional | None | Path to the custom checkpoint file of the selected det model. If it is not specified and "det" is a model name of metafile, the weights will be loaded from metafile. |
| `rec` | str or [Weights](../modelzoo.html#weights), optional | None | Pretrained text recognition algorithm. It’s the path to the config file or the model name defined in metafile. |
| `rec_weights` | str, optional | None | Path to the custom checkpoint file of the selected rec model. If it is not specified and β€œrec” is a model name of metafile, the weights will be loaded from metafile. |
| `kie` \[1\] | str or [Weights](../modelzoo.html#weights), optional | None | Pretrained key information extraction algorithm. It’s the path to the config file or the model name defined in metafile. |
| `kie_weights` | str, optional | None | Path to the custom checkpoint file of the selected kie model. If it is not specified and β€œkie” is a model name of metafile, the weights will be loaded from metafile. |
| `device` | str, optional | None | Device used for inference, accepting all allowed strings by `torch.device`. E.g., 'cuda:0' or 'cpu'. If None, the available device will be automatically used. Defaults to None. |
\[1\]: `kie` is only effective when both text detection and recognition models are specified.
**MMOCRInferencer.\_\_call\_\_()**
| Arguments | Type | Default | Description |
| -------------------- | ----------------------- | ------------ | ------------------------------------------------------------------------------------------------ |
| `inputs` | str/list/tuple/np.array | **required** | It can be a path to an image/a folder, an np array or a list/tuple (with img paths or np arrays) |
| `return_datasamples` | bool | False | Whether to return results as DataSamples. If False, the results will be packed into a dict. |
| `batch_size` | int | 1 | Inference batch size. |
| `det_batch_size` | int, optional | None | Inference batch size for text detection model. Overwrite batch_size if it is not None. |
| `rec_batch_size` | int, optional | None | Inference batch size for text recognition model. Overwrite batch_size if it is not None. |
| `kie_batch_size` | int, optional | None | Inference batch size for KIE model. Overwrite batch_size if it is not None. |
| `return_vis` | bool | False | Whether to return the visualization result. |
| `print_result` | bool | False | Whether to print the inference result to the console. |
| `show` | bool | False | Whether to display the visualization results in a popup window. |
| `wait_time` | float | 0 | The interval of show(s). |
| `out_dir` | str | `results/` | Output directory of results. |
| `save_vis` | bool | False | Whether to save the visualization results to `out_dir`. |
| `save_pred` | bool | False | Whether to save the inference results to `out_dir`. |
```
```{group-tab} Standard Inferencer
**Inferencer.\_\_init\_\_():**
| Arguments | Type | Default | Description |
| --------- | ---------------------------------------------------- | ------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `model` | str or [Weights](../modelzoo.html#weights), optional | None | Path to the config file or the model name defined in metafile. |
| `weights` | str, optional | None | Path to the custom checkpoint file of the selected det model. If it is not specified and "det" is a model name of metafile, the weights will be loaded from metafile. |
| `device` | str, optional | None | Device used for inference, accepting all allowed strings by `torch.device`. E.g., 'cuda:0' or 'cpu'. If None, the available device will be automatically used. Defaults to None. |
**Inferencer.\_\_call\_\_()**
| Arguments | Type | Default | Description |
| -------------------- | ----------------------- | ------------ | ---------------------------------------------------------------------------------------------------------------- |
| `inputs` | str/list/tuple/np.array | **required** | It can be a path to an image/a folder, an np array or a list/tuple (with img paths or np arrays) |
| `return_datasamples` | bool | False | Whether to return results as DataSamples. If False, the results will be packed into a dict. |
| `batch_size` | int | 1 | Inference batch size. |
| `progress_bar` | bool | True | Whether to show a progress bar. |
| `return_vis` | bool | False | Whether to return the visualization result. |
| `print_result` | bool | False | Whether to print the inference result to the console. |
| `show` | bool | False | Whether to display the visualization results in a popup window. |
| `wait_time` | float | 0 | The interval of show(s). |
| `draw_pred` | bool | True | Whether to draw predicted bounding boxes. *Only applicable on `TextDetInferencer` and `TextSpottingInferencer`.* |
| `out_dir` | str | `results/` | Output directory of results. |
| `save_vis` | bool | False | Whether to save the visualization results to `out_dir`. |
| `save_pred` | bool | False | Whether to save the inference results to `out_dir`. |
```
````
## Command Line Interface
```{note}
This section is only applicable to `MMOCRInferencer`.
```
You can use `tools/infer.py` to perform inference through `MMOCRInferencer`.
Its general usage is as follows:
```bash
python tools/infer.py INPUT_PATH [--det DET] [--det-weights ...] ...
```
where `INPUT_PATH` is a required field, which should be a path to an image or a folder. Command-line parameters follow the mapping relationship with the Python interface parameters as follows:
- To convert the Python interface parameters to the command line ones, you need to add two `--` in front of the Python interface parameters, and replace the underscore `_` with the hyphen `-`. For example, `out_dir` becomes `--out-dir`.
- For boolean type parameters, putting the parameter in the command is equivalent to specifying it as True. For example, `--show` will specify the `show` parameter as True.
In addition, the command line will not display the inference result by default. You can use the `--print-result` parameter to view the inference result.
Here is an example:
```bash
python tools/infer.py demo/demo_text_ocr.jpg --det DBNet --rec SAR --show --print-result
```
Running this command will give the following result:
```bash
{'predictions': [{'rec_texts': ['CBank', 'Docbcba', 'GROUP', 'MAUN', 'CROBINSONS', 'AOCOC', '916M3', 'BOO9', 'Oven', 'BRANDS', 'ARETAIL', '14', '70<UKN>S', 'ROUND', 'SALE', 'YEAR', 'ALLY', 'SALE', 'SALE'],
'rec_scores': [0.9753464579582214, ...], 'det_polygons': [[551.9930285844646, 411.9138765335083, 553.6153911653112,
383.53195309638977, 620.2410061195247, 387.33785033226013, 618.6186435386782, 415.71977376937866], ...], 'det_scores': [0.8230461478233337, ...]}]}
```