Social networks, and digital communication in general, have evolved at an impressive speed in recent years. They have enabled everyone to stay in constant contact with family members, co-workers or classmates. This technological progress has also brought with it a number of disadvantages, one of them being cyberbullying.
Cyberbullying, which is simply bullying that occurs on digital devices, is primarily directed at teens. In the past, this problem was more or less limited to school boundaries. But unfortunately, technology has removed these boundaries and so the bullying continues unabated, leaving no respite for the victims. The consequences are numerous and this phenomenon has already led to many suicides. It is therefore necessary to be able to detect cyberbullying on social networks and take action accordingly.
During this project, it soon became clear that the lack of existing resources, including datasets containing relatively recent cyberbullying texts, would complicate the task. Therefore, a slightly different approach has been adopted. Indeed, the objective has been changed. It was no longer a question of detecting cyberbullying, but rather of finding out whether or not a text containing insults was hateful.
To do so, around 4’000 tweets have been collected and labelled. From this dataset, different features have been extracted and different predictions, mainly based on random forest and neural networks models, have been realized. This process made it possible to identify the most useful features which were none other than the TFIDF values. Combining these features with a few others made it possible to reach an accuracy of 72.76%, a relatively low score for a binary classification problem.
It is possible that the model currently in use relies too much on the statistics of the various insults. For example, if one insult appears predominantly in positive samples rather than negative ones, the model will have difficulty in correctly predicting the samples containing this insult but whose class should be positive. Of course, the opposite is also true.