--- library_name: transformers tags: [] --- # Model Card for Model ID ## Code to create model ```python import torch from transformers import GroundingDinoConfig, GroundingDinoForObjectDetection, AutoProcessor model_id = 'IDEA-Research/grounding-dino-tiny' config = GroundingDinoConfig.from_pretrained( model_id, decoder_layers=1, decoder_attention_heads=2, encoder_layers=1, encoder_attention_heads=2, text_config=dict( num_attention_heads=2, num_hidden_layers=1, hidden_size=32, ), backbone_config=dict( attention_probs_dropout_prob=0.0, depths=[1, 1, 2, 1], drop_path_rate=0.1, embed_dim=12, encoder_stride=32, hidden_act="gelu", hidden_dropout_prob=0.0, hidden_size=48, image_size=224, initializer_range=0.02, layer_norm_eps=1e-05, mlp_ratio=4.0, num_channels=3, num_heads=[1, 2, 3, 4], num_layers=4, out_features=["stage2", "stage3", "stage4"], out_indices=[2, 3, 4], patch_size=4, stage_names=["stem", "stage1", "stage2", "stage3", "stage4"], window_size=7 ) ) # Create model and randomize all weights model = GroundingDinoForObjectDetection(config) torch.manual_seed(0) # Set for reproducibility for name, param in model.named_parameters(): param.data = torch.randn_like(param) processor = AutoProcessor.from_pretrained(model_id) print(model.num_parameters()) # 7751525 ``` ## Code to export to ONNX ```python import requests import torch from PIL import Image from transformers import AutoProcessor, AutoModelForZeroShotObjectDetection from transformers.models.grounding_dino.modeling_grounding_dino import ( GroundingDinoObjectDetectionOutput, ) # torch.onnx.errors.UnsupportedOperatorError: Exporting the operator 'aten::__ior_' to ONNX opset version 16 is not supported. # Please feel free to request support or submit a pull request on PyTorch GitHub: https://github.com/pytorch/pytorch/issues. torch.Tensor.__ior__ = lambda self, other: self.__or__(other) # model_id = "IDEA-Research/grounding-dino-tiny" model_id = "hf-internal-testing/tiny-random-GroundingDinoForObjectDetection" processor = AutoProcessor.from_pretrained(model_id) model = AutoModelForZeroShotObjectDetection.from_pretrained(model_id) old_forward = model.forward def new_forward(*args, **kwargs): output = old_forward(*args, **kwargs, return_dict=True) # Only return the logits and pred_boxes return GroundingDinoObjectDetectionOutput( logits=output.logits, pred_boxes=output.pred_boxes ) model.forward = new_forward image_url = "http://images.cocodataset.org/val2017/000000039769.jpg" image = Image.open(requests.get(image_url, stream=True).raw).resize((800, 800)) text = "a cat." # NB: text query need to be lowercased + end with a dot # Run python model inputs = processor(images=image, text=text, return_tensors="pt") with torch.no_grad(): outputs = model(**inputs) results = processor.post_process_grounded_object_detection( outputs, inputs.input_ids, box_threshold=0.4, text_threshold=0.3, target_sizes=[image.size[::-1]], ) text_axes = { "input_ids": {1: "sequence_length"}, "token_type_ids": {1: "sequence_length"}, "attention_mask": {1: "sequence_length"}, } image_axes = {} output_axes = { "logits": {1: "num_queries"}, "pred_boxes": {1: "num_queries"}, } input_names = [ "pixel_values", "input_ids", "token_type_ids", "attention_mask", "pixel_mask", ] # Input to the model x = tuple(inputs[key] for key in input_names) # Export the model torch.onnx.export( model, # model being run x, # model input (or a tuple for multiple inputs) "model.onnx", # where to save the model (can be a file or file-like object) export_params=True, # store the trained parameter weights inside the model file opset_version=16, # the ONNX version to export the model to do_constant_folding=True, # whether to execute constant folding for optimization input_names=input_names, output_names=list(output_axes.keys()), dynamic_axes={ **text_axes, **image_axes, **output_axes, }, ) ``` ## Model Details ### Model Description This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated. - **Developed by:** [More Information Needed] - **Funded by [optional]:** [More Information Needed] - **Shared by [optional]:** [More Information Needed] - **Model type:** [More Information Needed] - **Language(s) (NLP):** [More Information Needed] - **License:** [More Information Needed] - **Finetuned from model [optional]:** [More Information Needed] ### Model Sources [optional] - **Repository:** [More Information Needed] - **Paper [optional]:** [More Information Needed] - **Demo [optional]:** [More Information Needed] ## Uses ### Direct Use [More Information Needed] ### Downstream Use [optional] [More Information Needed] ### Out-of-Scope Use [More Information Needed] ## Bias, Risks, and Limitations [More Information Needed] ### Recommendations Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. ## How to Get Started with the Model Use the code below to get started with the model. [More Information Needed] ## Training Details ### Training Data [More Information Needed] ### Training Procedure #### Preprocessing [optional] [More Information Needed] #### Training Hyperparameters - **Training regime:** [More Information Needed] #### Speeds, Sizes, Times [optional] [More Information Needed] ## Evaluation ### Testing Data, Factors & Metrics #### Testing Data [More Information Needed] #### Factors [More Information Needed] #### Metrics [More Information Needed] ### Results [More Information Needed] #### Summary ## Model Examination [optional] [More Information Needed] ## Environmental Impact Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). - **Hardware Type:** [More Information Needed] - **Hours used:** [More Information Needed] - **Cloud Provider:** [More Information Needed] - **Compute Region:** [More Information Needed] - **Carbon Emitted:** [More Information Needed] ## Technical Specifications [optional] ### Model Architecture and Objective [More Information Needed] ### Compute Infrastructure [More Information Needed] #### Hardware [More Information Needed] #### Software [More Information Needed] ## Citation [optional] **BibTeX:** [More Information Needed] **APA:** [More Information Needed] ## Glossary [optional] [More Information Needed] ## More Information [optional] [More Information Needed] ## Model Card Authors [optional] [More Information Needed] ## Model Card Contact [More Information Needed]