|
Abstract for Transactify...... |
|
|
|
Transactify is an LSTM-based model designed to predict the category of online payment transactions from their descriptions. |
|
By analyzing textual inputs like "Live concert stream on YouTube" or "Coffee at Starbucks," it classifies transactions into categories such as "Movies & Entertainment" or "Food & Dining." |
|
This model helps users track and organize their spending across various sectors, providing better financial insights and budgeting. |
|
Transactify is trained on real-world transaction data for improved accuracy and generalization. |
|
|
|
Table of contents.... |
|
|
|
1.Data Collection: |
|
The dataset consists of 5,000 transaction records generated using ChatGPT, each containing a transaction description and its corresponding category. |
|
Example entries include descriptions like "Live concert stream on YouTube" (Movies & Entertainment) and "Coffee at Starbucks" (Food & Dining). |
|
These records cover various spending categories such as Lifestyle, Movies & Entertainment, Food & Dining, and others. |
|
|
|
|
|
2.Data Preprocessing: |
|
The preprocessing step involves several natural language processing (NLP) tasks to clean and prepare the text data for model training. |
|
These include: |
|
Lowercasing all text. |
|
Removing digits and punctuation using regular expressions (regex). |
|
Tokenizing the cleaned text to convert it into a sequence of tokens. |
|
Applying text_to_sequences to transform the tokenized words into numerical sequences. |
|
Using pad_sequences to ensure all sequences have the same length for input into the LSTM model. |
|
Label encoding the target categories to convert them into numerical labels. |
|
After preprocessing, the data is split into training and testing sets to build and validate the model. |
|
|
|
|
|
|
|
3.Model Building: |
|
Embedding Layer: Converts tokenized transaction descriptions into dense vectors, capturing word semantics and relationships. |
|
|
|
LSTM Layer: Learns sequential patterns from the embedded text, helping the model understand the context and relationships between words over time. |
|
|
|
Dropout Layer: Introduces regularization by randomly turning off neurons during training, reducing overfitting and improving the model's generalization. |
|
|
|
Dense Layer with Softmax Activation: Outputs a probability distribution across categories, allowing the model to predict the correct category for each transaction description. |
|
|
|
Model Compilation: Compiled with the Adam optimizer for efficient learning, sparse categorical cross-entropy loss for multi-class classification, and accuracy as the evaluation metric. |
|
|
|
Model Training: The model is trained for 50 epochs with a batch size of 8, using a validation set to monitor performance and adjust during training. |
|
|
|
Saving the Model and Preprocessing Objects: |
|
|
|
The trained model is saved as transactify.h5 for future use. |
|
The tokenizer and label encoder used during preprocessing are saved using joblib as tokenizer.joblib and label_encoder.joblib, respectively, |
|
ensuring they can be reused for consistent tokenization and label encoding when making predictions on new data. |
|
|
|
|
|
|
|
4.Prediction: |
|
Once trained, the model is used to predict the category of new transaction descriptions. |
|
The output provides the category label, enabling users to classify their spending based on transaction descriptions. |
|
|
|
|
|
|
|
5.Conclusion: |
|
The Transactify model effectively categorizes transaction descriptions using LSTM networks. |
|
However, to improve the accuracy and reliability of predictions, a larger and more diverse dataset is necessary. |
|
Expanding the dataset will help the model generalize better across various spending behaviors and conditions. |
|
This enhancement will lead to more precise predictions, enabling users to gain deeper insights into their spending patterns. |
|
Future work should focus on collecting additional data to refine the model's performance and applicability in real-world scenarios. |
|
|
|
|
|
![Excepted Output:](result.gif) |