Graph convolution networks for social media trolls detection use deep feature extraction

This study presents a novel approach to identifying trolls and toxic content on social media using deep learning. We developed a machine‑learning model capable of detecting toxic images through their embedded text content. Our approach leverages GloVe word embeddings to enhance the model’s predictive accuracy. We also utilized Graph Con‑ volutional Networks (GCNs) to effectively analyze the intricate relationships inherent in social media data. The practi‑ cal implications of our work are significant, despite some limitations in the model’s performance. While the model accurately identifies toxic content more than half of the time, it struggles with precision, correctly identifying positive instances less than 50% of the time. Additionally, its ability to detect all positive cases (recall) is limited, capturing only 40% of them. The F1‑score, which is a measure of the model’s balance between precision and recall, stands at around 0.4, indicating a need for further refinement to enhance its effectiveness. This research offers a promising step towards more effective monitoring and moderation of toxic content on social platforms


Introduction
In the ever-evolving landscape of criminal investigations, the convergence of digital forensics and social media data has become a pivotal focal point.The omnipresence of digital devices, ranging from smartphones to smart home systems, unfolds a treasure trove of forensic possibilities [1].From communication logs to geolocation data, these devices, equipped with sophisticated sensors like RFID and GPS, unveil intricate details about user behavior [2].
The infusion of artificial intelligence (AI) and machine learning into the realm of digital forensics has revolutionized traditional investigative tools.Nowhere is this transformation more pronounced than in the forensic extraction of data from digital devices, with a spotlight on the realm of social media [3,4].The copious data emanating from social media profiles, spanning posts, comments, messages, images, and videos, assumes a critical role in diverse investigations, spanning law enforcement to internal corporate probes.Evolution in digital media investigations encompasses open-source research, Wi-Fi survey analysis, IP address tracking, and the application of big data analytics for delving into historical and social networking data [5][6][7][8].A formidable challenge in digital forensics lies in combatting cyber abuse and online toxicity, encapsulating a spectrum of behaviors from profanity to hate speech.Compounded by the tactic of embedding toxic messages within images shared on social platforms, detecting negative sentiment necessitates a nuanced approach beyond mere keyword searches.Context, language subtleties, and imagery intricacies must be meticulously considered [9][10][11][12][13].
Machine learning, especially the prowess of deep learning models, emerges as a potent tool in sentiment analysis.With a knack for pattern detection, these models exhibit remarkable accuracy in discerning sentiment, a fact substantiated by various studies.The crux lies in their application to the extracted textual content, scrutinizing whether the context of a message veers into offensive territory [14][15][16].This paper makes a distinctive contribution to the field by crafting a framework adept at extracting and classifying text from images within online messages.The framework's adaptability shines through its capacity to train models on diverse datasets and labels, positioning it as an invaluable asset in the arsenal of digital forensics.This study underscores the imperative of fortifying forensic preparedness to match the cadence of evolving content types, with a specific focus on the synergy of embedded device forensics and the transformative applications of deep learning in this dynamic domain.
The major contributions of the study: These contributions collectively advance the field of content moderation on social media platforms and offer a promising solution to the ongoing challenge of identifying and mitigating toxic content and 'trolls' in online communities.

Literature review
The analysis of social media data and the information within posts, particularly text embedded in images, is gaining prominence in forensic investigations [17,18].This process involves extracting text from images and then analyzing its content.Currently, three primary methods are employed for this purpose [19,20].The first method involves directly extracting the text from the image for analysis.Optical character recognition (OCR) engines like Tesseract are commonly used for this purpose [21][22][23].Experiments have demonstrated high detection accuracy in various applications, including text detection on book spines and traffic signs [24,25].The second approach utilizes neural networks to analyze the content of the image for pattern recognition.This method is particularly useful in identifying contextual patterns within images [26][27][28][29][30][31][32][33].
The third method is a hybrid approach that combines both text extraction and neural network analysis to enhance prediction confidence [34][35][36].Interpretation of the extracted text often employs machine learning and deep learning techniques, commonly used in natural language processing (NLP) tasks like sentiment detection.Various models, including Support Vector Machine (SVM) and Extreme Machine Learning (ELM), have been applied to this task, showing high success rates in classifying texts [37][38][39][40].Research on the classification of online toxic comments has explored standard machine learning algorithms applied to datasets comprising different types of toxicity.Various methods, including Logistic Regression, K-Nearest Neighbor, SVM, and Decision Tree, have been adapted for multi-label classification problems [41,42].These methods transform the multilabel problem into a binary classification task, achieving high accuracy and f1-scores, although some bias towards non-toxic classes has been observed [43,44].
Deep learning methods, particularly those employing word embeddings like GloVe, have been proposed for the classification of toxic comments [45].These models leverage the relationships between words to produce vector representations, enhancing prediction capabilities [46].The use of convolutional layers and LSTM in conjunction with word embeddings has shown promising results [47].Data imbalance in toxicity datasets has been a significant challenge, with the majority of comments being non-toxic [48].To address this, data augmentation techniques have been employed, including the creation of new comments and the substitution of words with synonyms [49].These methods have improved model performance, with CNN ensemble models showing notable effectiveness.Pre-processing of toxic comments is another area of focus, with techniques like removal of stop words, stemming, and tokenization being employed.Feature extraction based on word length, particularly bigrams, has proven effective [50,51].Models tested on binary and multi-label classification tasks have shown high accuracy, with Logistic Regression performing well in both scenarios [52].
This analysis underscores the importance of appropriate content pre-processing, the incorporation of word embeddings, and data balancing in enhancing algorithm performance.Deep learning approaches generally yield more robust results.Additionally, the selection of appropriate evaluation metrics is critical, as accuracy alone can be misleading, especially in imbalanced classification scenarios.Metrics that consider true negative values are essential for ensuring the robustness of the solution.

Methodology
The "Dataset for Detection of Cyber-Trolls" (https:// www.kaggle.com/ code/ kevin lwebb/ cyber trolls-explo ration-and-ml) is a comprehensive collection of data designed to facilitate the development and evaluation of machine learning models for identifying cyber-trolling behavior on social media platforms.This dataset comprises a wide range of social media posts, including comments, replies, and image captions, annotated with labels indicating the presence or absence of trolling behavior.Each entry includes the text content of the post, relevant metadata such as timestamps and user information, and a binary label classifying the post as either 'troll' or 'nontroll' .To ensure a robust and diverse dataset, the content was sourced from various social media platforms, covering a broad spectrum of topics and user demographics.Special attention was paid to include examples of subtle trolling behavior, which is often challenging to detect, in addition to more overt instances.The dataset has been preprocessed to remove personally identifiable information to adhere to privacy and ethical standards.Furthermore, the dataset incorporates a range of linguistic styles and expressions, including slang, internet acronyms, and emoticons, making it particularly suited for training models to understand and interpret the nuances of online communication.The JSON format of the dataset allows for easy integration and manipulation in data processing pipelines, facilitating its use in various machine learning frameworks and environments.The Dataset for Detection of Cyber-Trolls.jsonprovides a valuable resource for researchers and practitioners in the field of online behavior analysis, particularly for those focusing on the detection and prevention of online harassment and abusive behavior.The comprehensive process for analyzing social media data, particularly focusing on the classification of potentially toxic content using a Graph Convolutional Network (GCN).Stepwise implementation is shown below:

Preparing and cleaning data
Data Loading: The dataset is loaded from a JSON file.
Label Extraction: Labels for classification (e.g., toxic or not) are extracted from the dataset.
Data Cleaning: This involves converting text to lowercase, removing punctuation and numbers, tokenizing (splitting text into words), removing stopwords (common words that don't contribute much meaning), and lemmatizing (reducing words to their base or root form) (Fig. 1).

Feature extraction
TF-IDF Vectorization: Text data is converted into numerical features using Term Frequency-Inverse Document Frequency (TF-IDF), which reflects the importance of words in the corpus.

Creating graph for word similarity
Cosine Similarity: You calculate the cosine similarity between word vectors to measure how similar they are.
Graph Construction: A graph is constructed where nodes represent words, and edges are formed between words that have a cosine similarity above a certain threshold.

Graph Convolutional Network (GCN)
GCN Model Definition: A GCN model is defined with two GraphConv layers, where the first layer is a hidden layer and the second is the output layer.
Model Training: The model is trained on the node features (word vectors) with the corresponding labels (e.g., toxic or not).

Model training and evaluation
Train-Test Split: The dataset is split into training and test sets.
Training Loop: The GCN model is trained over several epochs, using a cross-entropy loss function and Adam optimizer.
Subgraph Creation: To handle discrepancies in node numbers, a subgraph is created matching the number of features and labels.
Model Evaluation: The trained model is evaluated on the test set to calculate metrics like accuracy, precision, recall, and F1-score.
This algorithm is a sophisticated approach to text classification, leveraging the relational information among words captured in a graph structure, which is a novel method compared to traditional text classification techniques.The GCN allows for learning complex patterns in the data, potentially leading to more accurate classifications of text as toxic or non-toxic.

Experiments and result
The experimental configuration entails the deployment of a 238 GB Solid State Disk and a motherboard with 12 GB of RAM.The system is operational with Windows 10 Pro as the operating system, supported by an Intel (R) Core (TM) processor.The experimentation environment further incorporates the utilization of Google Colab platform, Python programming language, and the availability of a Google Colab GPU.

Evaluation matrix
In the context of assessing machine learning models, commonplace performance metrics encompass accuracy and loss.The formulation denoting accuracy finds prevalent application as a quintessential measure for evaluation.

Precision
where: • TP (True Positives) is the number of correctly predicted positive instances.• FP (False Positives) is the number of incorrectly predicted positive instances.

Recall (Sensitivity or True Positive Rate)
(2) Precision = TP TP + FP where: • TP (True Positives) is the number of correctly predicted positive instances.• FN (False Negatives) is the number of incorrectly predicted negative instances.

F1-score
The F1-score is the harmonic mean of precision and recall, combining both metrics into a single value.These equations provide a quantitative way to assess the performance of classification models by measuring their accuracy, precision, recall, and the F1-score based on the number of true positives, true negatives, false positives, and false negatives.These metrics are essential for evaluating the effectiveness of machine learning models in tasks like binary classification, where the g s to Top-Right (Purple): False Negative (FN) -The number here (364) represents the instances that were actually positive but the model incorrectly predicted them as negative.
Bottom-Left (Dark Purple): False Positive (FP) -The number here (571) represents the instances that were actually negative but the model incorrectly predicted them as positive.
Bottom-Right (Green): True Negative (TN) -The number here (1213) represents the instances that were negative and the model correctly predicted them as negative.
Figure 3 provided a Receiver Operating Characteristic (ROC) curve, which is a graphical plot used to show Fig. 2 Confusion matrix the diagnostic ability of a binary classifier system as its discrimination threshold is varied.The x-axis represents the False Positive Rate (FPR), which is the proportion of negative instances that are incorrectly classified as positive.
The y-axis represents the True Positive Rate (TPR), also known as sensitivity or recall, which is the proportion of positive instances that are correctly classified.
The orange line represents the ROC curve of a classifier.Points on the curve represent TPR and FPR of the classifier at different threshold settings.
The dashed blue line represents a random classifier (a classifier that makes random guesses).It serves as a baseline; any useful classifier should have a curve that lies above this line, indicating performance better than random.
The AUC (Area Under the Curve) is a metric used to quantify the overall performance of a classifier.In this graph, the AUC is 0.77, which indicates a good predictive ability.The AUC ranges from 0 to 1, where 1 indicates perfect classification and 0.5 indicates a performance no better than random chance.
In summary, this ROC curve suggests that the classifier being evaluated has a good ability to distinguish between the positive and negative classes.The closer the ROC curve is to the top left corner, the higher the overall accuracy of the test.
Figure 4 shows a bar chart representing four different evaluation metrics used to assess the performance of a predictive model or classifier.These metrics are calculated using the model's predictions compared to the actual observed outcomes.
Accuracy: The bar for accuracy appears to be just over 0.5, suggesting that the model correctly predicts more than half of the time.
Precision: The precision bar is just under 0.5, indicating that when the model predicts a positive class, it's correct less than half of the time.
Recall: The recall bar is around 0.4, suggesting that the model identifies 40% of all actual positive cases.
F1-score: The F1-score bar is close to 0.4, indicating that the model's precision and recall are somewhat balanced but not particularly high.
Overall, the bars indicate moderate performance of the model across these metrics.The exact values are not provided, but they can be estimated based on the relative heights of the bars.The use of these metrics together provides a more comprehensive understanding of the model's performance than any single metric alone.
The chart indicates that these metrics were computed using 30% of the data as a testing set.The image shows a confusion matrix, which is a table often used to describe the performance of a classification model on a set of test data for which the true values are known.The number of instances that were actual negatives but predicted as positives.
Bottom-left cell (False Negative-1437): The number of instances that were actual positives but predicted as negatives.
Bottom-right cell (True Positive-928): The number of instances that were actual positives and predicted as positives.
The confusion matrix is color-coded, which usually corresponds to the values in each cell, with darker colors often representing higher numbers.The side bar acts as a legend indicating the scale of the counts in the cells.
This matrix shows the counts of correct and incorrect predictions broken down by actual and predicted classifications, allowing you to see where the model is making errors as shown in Fig. 5.
The study presents an innovative approach using deep learning to detect trolls and toxic content on social media, but it does have some notable limitations: • Limited Precision: The model successfully identifies toxic content more than half the time, but its precision is less than 50%.This means it often incorrectly labels non-toxic content as toxic, leading to a high rate of false positives.• Suboptimal Recall: The model's ability to detect all positive cases of toxic content (recall) is limited to 40%.This low recall rate indicates that a significant portion of toxic content is not being detected, resulting in many false negatives.• Moderate F1-Score: An F1-score of around 0.4 reflects a moderate balance between precision and recall.This score, while not insignificant, suggests that the model's overall accuracy in identifying toxic content is quite modest and could be significantly improved.These limitations highlight the need for further refinement and development to enhance the effectiveness and reliability of the model in detecting toxic content on social media platforms.

Conclusion
Our study marks a significant advancement in the use of deep learning for combating 'troll' behavior and toxic content on social media.By incorporating GloVe word embeddings and utilizing Graph Convolutional Networks (GCNs), we have developed a sophisticated framework that adeptly interprets the complex and interrelated aspects of social media interactions.The model demonstrates a reasonable degree of accuracy, successfully identifying over half of the toxic content.However, it exhibits limitations in precision, with less than 50% accuracy in positively identifying instances of toxicity.Moreover, the model's recall rate is at 40%, indicating room for improvement in recognizing all instances of toxic behavior.The F1-score of around 0.4 reflects these challenges, underscoring the need for ongoing development to enhance the model's precision and recall balance.Despite these limitations, our research makes a substantial contribution to the field of online safety and digital well-being.It paves the way for more sophisticated and effective tools for monitoring and moderating harmful online content.As we continue to refine our model, we anticipate significant improvements in its ability to provide safer and more positive social media environments.This study not only demonstrates the potential of deep learning in addressing online toxicity but also highlights the critical areas for future research and development in this rapidly evolving field.
Accuracy = TP + TN TP + TN + FP + FN • TP (True Positives) is the number of correctly predicted positive instances.• TN (True Negatives) is the number of correctly predicted negative instances.• FP (False Positives) is the number of incorrectly predicted positive instances.• FN (False Negatives) is the number of incorrectly predicted negative instances.

2 •
Precision • Recall Precision + Recall classify instances into one of two classes (e.g.positive or negative).Figure2is a confusion table to evaluate the performance of a classification algorithm.It compares the actual target values with those predicted by the model.This matrix helps to visualize the accuracy of a classifier on a set of test data (20%) for which the true values are known.The confusion matrix is divided into four quadrants:Top-Left (Yellow): True Positive (TP) -The number here (1853) represents the instances that were positive and the model correctly predicted them as positive.

Fig. 3
Fig. 3 Receiver Operating Characteristic (ROC) curve • Challenges with Embedded Text in Images: The model is designed to detect toxic content through embedded text in images.However, the complexity of interpreting visual content combined with text might pose challenges, especially when the text is stylized or obscured.• Complexity of Social Media Data: Utilizing Graph Convolutional Networks (GCNs) addresses the complexity of social media data, but the intricate relationships and varying contexts inherent in this data can still pose significant challenges for accurate detection.• Potential for Overfitting or Bias: Given the nuanced nature of language and imagery on social media,

Fig. 5
Fig. 5 Correct and incorrect predictions broken down by actual and predicted classifications