microsoft/table-transformer-structure-recognition · Training model with Custom Data

Nov 14, 2022

I want to train this model with custom data for the tables structures I have. The accuracy of default model is not at par. Any repo to get help on this.

nielsr

Nov 15, 2022

Hi,

Refer to this notebook (be sure to replace the model and processor): https://github.com/NielsRogge/Transformers-Tutorials/blob/master/DETR/Fine_tuning_DetrForObjectDetection_on_custom_dataset_(balloon).ipynb

Yasharma

Nov 17, 2022

Hi @nielsr ,

I am facing an issue in downloading the data from Microsoft open dataset; Do you have any suggestions on the custom table structure data annotation/tagging or downloading the Microsoft open dataset PubTables-1M?

Reference:https://msropendata.com/datasets/505fcbe3-1383-42b1-913a-f651b8b712d3

Issue: Not able to log in to Microsoft open dataset.

ankitom

Apr 13, 2023

Hi @nielsr ,

I have fine tuned the microsoft table detector using custom data using your approach and results are great , but when I tried to fine tune the microsoft table structure recognition with four classes table, column, row and header, results were very bad. Any suggestions how to fine tune the microsoft structure recognition model ?

qooob

Jun 1, 2023

•

edited Jun 1, 2023

@nielsr just to add to the above, I have a high quality dataset I used to try and fine-tune the model but it just weakened the model, making it worse than with pre-trained weights. I'm having a hard time figuring out if pretrained weights should be taken from DetrForObjectDetection or TableTransformerForObjectDetection when loading the pretrained model, what other params should be used, what is the purpose of using no_timm in the original Detr object detection fine-tuning example etc. I'm also not sure how to apply the difference of Table Transformer normalizing before MLP instead of after during training.

This worked (no errors during training), but gave bad results
self.model = TableTransformerForObjectDetection.from_pretrained("microsoft/table-transformer-structure-recognition")
This gave errors during training and bad results
self.model = DetrForObjectDetection.from_pretrained("microsoft/table-transformer-structure-recognition")

pathikg

Jan 29, 2024

Hi @nielsr ,

I have fine tuned the microsoft table detector using custom data using your approach and results are great , but when I tried to fine tune the microsoft table structure recognition with four classes table, column, row and header, results were very bad. Any suggestions how to fine-tune the microsoft structure recognition model ?

Hey @ankitom
Have you tried training the same on custom dataset using AutoModelForObjectDetection ?
https://huggingface.co/docs/transformers/tasks/object_detection#training-the-detr-model

I haven't tried it yet but I am going to do the same in coming weeks so I'll update on the same once done

amitkumarp

Mar 9, 2024

•

edited Mar 9, 2024

Hey everyone. Does anyone have annotated image dataset for structure recognition model?

amitkumarp

Mar 9, 2024

@nielsr just to add to the above, I have a high quality dataset I used to try and fine-tune the model but it just weakened the model, making it worse than with pre-trained weights. I'm having a hard time figuring out if pretrained weights should be taken from DetrForObjectDetection or TableTransformerForObjectDetection when loading the pretrained model, what other params should be used, what is the purpose of using no_timm in the original Detr object detection fine-tuning example etc. I'm also not sure how to apply the difference of Table Transformer normalizing before MLP instead of after during training.

This worked (no errors during training), but gave bad results
self.model = TableTransformerForObjectDetection.from_pretrained("microsoft/table-transformer-structure-recognition")
This gave errors during training and bad results
self.model = DetrForObjectDetection.from_pretrained("microsoft/table-transformer-structure-recognition")

Hey @qooob can you please share your dataset of tables ?

mali17361

Apr 17, 2024

Hi @amitkumarp . Did you get any custom dataset of tables for fine tuning ?

mali17361

Apr 19, 2024

@

Hi @nielsr ,

I have fine tuned the microsoft table detector using custom data using your approach and results are great , but when I tried to fine tune the microsoft table structure recognition with four classes table, column, row and header, results were very bad. Any suggestions how to fine-tune the microsoft structure recognition model ?

Hey @ankitom
Have you tried training the same on custom dataset using AutoModelForObjectDetection ?
https://huggingface.co/docs/transformers/tasks/object_detection#training-the-detr-model

I haven't tried it yet but I am going to do the same in coming weeks so I'll update on the same once done

@pathikg Have you tried it yet ?

srv-sh007

Sep 10, 2024

This comment has been hidden

srv-sh007

Sep 10, 2024

•

edited Sep 10, 2024

@ankitom have you figure out why your fine tuned model perform worst ? i have fine tuned table_structure_recognize model on my custom data set using following approach but my model performance was very bad, even a single object can't be detected. however the performance of the base model was average. could please suggest me how can i enhance the performance?

waterabbit114

9 days ago

If anyone is still looking for a guide on data annotation/preparation and how to fine-tune the table transformer (either detection or structure recognition), I have prepared two articles on how to do it. One on data annotation/preparation and one on fine-tuning. I hope these helps and feel free to let me know if you have any questions. Big thanks and credits to @nielsr for his notebook on fine-tuning a DETR for object detection and his other notebooks on inference using Table Transformer.

nielsr

8 days ago

@waterabbit114 that is very cool, however I see they are on Medium, any interest in publishing them at https://huggingface.co/blog/community?

See https://huggingface.co/new-blog

waterabbit114

8 days ago

•

edited 8 days ago

Hi @nielsr , thanks for the suggestion! Medium is just a platform I always default to, and will always keep my articles free access.

I was more than happy to port my articles over to the community blog for wider reach and accessibility, but I realized that to publish an article requires a PRO or Enterprise Hub subscription which I do not have. Let me know if I have misinterpreted it otherwise.

Edit: I realized that there's this Blog-Explorers organization that I can request to join to publish community articles. I have requested access and will port over my articles if I manage to get access!