About

A Casual Language Joke Modeling Bot

Trained & Fine-Tuned via AlekseyKorshuk/gpt2-jokes Model On Hugging Face by Magical Macaronis

  • Model: Jokes-GPT
  • Fine-Tuned On: AlekseyKorshuk/gpt2-jokes
  • Dev Time: 3 Weeks
  • Team: Magical Macaronis
  • Epochs: 10
  • Loss: 0.001
  • Repo: Aj-Cdr/jokes-gpt
  • Organization: AI-CAMP

Trained using the Fraser & Jester dataset of an approximate 2 million reddit jokes. The primary intention of this product is to evoke hilarity, mainly to lighten someone’s mood, while fundamentally test the proficiency of AI in producing an emotion so simple and yet healthy through something as multi-faceted and variable as comedy.

Downloads

# Of Team Members

Hours Worked

# Of Approx Jokes Trained On

Project Tools & Assets

Python 100%
HTML 100%
CSS 85%
JavaScript 50%
Bootstrap 90%
Gradio 80%

Statistics

Challenges

Along the road of getting the right model, there were many challenges. The first one was getting to understand the concepts behind and how to implement NLP models. Of course, over the course of the first and second weeks, we learned and overcame that obstacle. Next was finding the right model, as discussed in the rest of the slides. Everyone tried a different model, but eventually we landed upon a decent one. Finally, the quality of jokes is the last challenge. Filtering out explicitness (bad/inappropriate words) is being fixed through fine-tuning the model and using less explicit data. Time restrictions were also another factor for this project but if we were to continue more on this project, we would add multilingual support, interactive conversations, and integration with social media

Timeline

Distil-GPT

The pre-trained model that we used INITIALLY was distil-GPT. This model was used for 99.5% test size, and used 0.5% trained data. Since this was 232,000 rows we used, the trained data was using 1160 rows. Initially, the validation rate was decreasing. Then errors started to pop up with the runtime, which made me change parameters in my program. Then the validation LOSS slightly fluctuated and started to increase after multiple epochs. Finally, with adjusting to A100 Gpu, it started to process the code faster.

  • Errors: Memory Error, Unable to generate proper text, gibberish.
  • Loss: 2.1765
  • Hyperparameters={Evaluation_strategy = “epoch”, learning_rate = 1e-5, weight_decay = 0.01, push_to_hub = True, num_train_epochs = 3, per_device_train_batch_size = 1}
  • Input: Cat
    Output: Cat s e r i e s t h o w ? i e n o f l e s t e r y t h e 1 g S t i v e n g. W h u p

gpt-neo-125m

  • Model: EleutherAI/gpt-neo-125m
  • Loss: 1.7751
  • Dataset: First 1000 items of Jiri Roznovjak’s “Question-Answer Jokes” (Kaggle); attempted as an alternative to Fraser & Jester datasets
  • Training time: ~1-2 minutes
  • Downsides: Outputs are at best only superficially coherent, at worst nonsense not related to jokes.
  • Hyperparameters={Evaluation_strategy = “epoch”, learning_rate = 5e-05, train_batch_size = 8, eval_batch_size = 8, seed = 42, num_train_epochs = 1, lr_scheduler_type = linear, optimizer = Adam with betas=(0.9, 0.999) & epsilon=1e-08 }
  • Input: Knock Knock
    Output: Knock Knock-out and Knock-out-in-the-world?

Crumb-GPT

  • Added fp16=True, to speed up processing. Precision reduced, However, the validation loss was higher. Lot of experimentation with batch size and learning rate. No explicit documentation Experimentation with hyperparameters Time-taken to process: 11 hours
  • Loss: 1.5231
  • Dataset: Fraser Short Jokes
  • Downsides: High Evaluation Loss
  • Hyperparameters={evaluation_strategy = "epoch", learning_rate=3e-5, weight_decay = 0.01, num_train_epochs = 15, per_device_trained_batch_size = 30}
  • Input: What do you call a
    Output: What do you call a man with a bad speech impediment? A coffee-o-phile (Credit to John Oliver for this)

Jokes-GPT

  • Pre-trained model: AlekseyKorshuk/gpt2-jokes · Hugging Face
  • Amount of train data used:
    Hugging Face Dataset: Fraser/short-jokes · Datasets at Hugging Face
    Kaggle Dataset: Jester 1.7M jokes ratings dataset | Kaggle
  • hyperparameters={ learning_rate: 5e-05 train_batch_size: 8 eval_batch_size: 8 seed: 42 optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 lr_scheduler_type: linear num_epochs: 10 }
  • Loss: 0.1577
  • Training Time: 10 min due to multi-gpu factor in the pretrained model & T4 gpu on colab.
  • Downsides: Explicitness & Slight Grammar Issues.
  • Input: Your momma's
    Output: Your momma's so fat I said, "Hey momma, we need t oget a big pizza to help her. We'll get a big pizza by noon, we're starving."

Model