Article by Townim Faisal Chowdhury & Syeda Sifat Hasnain
Kaggle is the largest platform for learning, exploring, and refining your talents in the field of data science. The platform is massive, and for a newbie, it is easier to get confused about where to begin the road to being a skilled data scientist.
The platform includes millions of datasets and competitions to help learning, as well as competitions where people can compete with their skills and earn money. Competitions such as Titanic Survival Prediction and Predicting Housing Prices are excellent for novices to practice different machine learning approaches before moving on to larger competitions. There are also tutorial kernels available.
Kaggle now offers courses for absolute newcomers to machine learning, where numerous coded and detailed kernels are available to teach everything from Python basics to R to Deep learning.
Being the largest machine learning and data science community, Kaggle has a rich discussion section where one can share queries or experiences on a topic and get connected with others. Have a look at some of these parts and, to begin, participate in some competitions.
In this article we have shared the experience gained through participating in Kaggle competitions. One is chaii – Hindi and Tamil Question Answering, which was an NLP problem and another is TensorFlow – Help Protect the Great Barrier Reef, a computer vision problem.

chaii - Hindi and Tamil Question Answering
Identify the Answer to Questions Found in Indian Language Passages:
The goal of this competition was to predict answers to real questions about Wikipedia articles.

Figure – Sample Data
A new question answering dataset chaii-1, was provided with question-answer pairs. The dataset covers Hindi and Tamil, collected without the use of translation. It provides a realistic information-seeking task with questions written by native-speaking expert data annotators. They also provided a baseline model and inference code to build upon.
57th Place - Increasing batch size helps
We really lost hope when we saw that our ranking in public LB has been decreasing day by day. So, it was a huge jump from 557th to 57th place in the final standings.
I have tried the following models:
- xlm roberta squad2
- Muril
In our case, Muril did not perform well on the public leaderboard. So, we have tried to use only Xlm Roberta squad2.
While increasing batch size from 4 to 16, we have found it performs well. Also, validation loss is not a good measurement to pick the best model in each fold, so I have used Jaccard score.
Also, when we got stuck, we have fine tuned our current best model with all external datasets available in Kaggle.
We started this competition 24 days before the competition had ended. We wish we could experiment with blending the models, using other (Rembert, etc) models, and increasing the number of folds.
Also, another thing, we wish that we could select one of our other models for the final score because this model really performs well in the private score.
TensorFlow - Help Protect the Great Barrier Reef
Detect Crown-of-Thorns Starfish in Underwater Image Data
Australia’s Great Barrier Reef, the world’s largest coral reef is under threat, in part because of the overpopulation of one particular starfish – the coral-eating crown-of-thorns starfish (or COTS for short). Scientists, tourism operators and reef managers established a large-scale intervention program to control COTS outbreaks to ecologically sustainable levels.
The goal of this competition was to accurately identify starfish in real-time by building an object detection model trained on underwater videos of coral reefs.

Figure – Sample Frame from Training Video 1
Three videos were included in the training set with COTS annotations and test set composed of about 13,000 images of which three were provided for public LB evaluation. The private leaderboard is calculated with approximately 75% of the test data.
This competition is evaluated on the F2 Score at the different intersections over union (IoU) thresholds(0.3 to 0.8 with a step size of 0.05). The F2 metric weights recall more heavily than precision, as in this case, it makes sense to tolerate some false positives in order to ensure very few starfish are missed.
244th place - Need more GPU!
First of all, a huge thanks to the competition hosts and Kaggle community. Our team spent a very good amount of time on this competition, and definitely learned a lot about small object detection and competition strategies in general.
Resources Used-
For experiments like image enhancements, style transfer, etc Colab Pro was used. For training and validation purposes, 1 x NVIDIA Tesla A100 (vCPU 12, Memory 85 GB) and 1 x NVIDIA Tesla V100 (vCPU 12, Memory 78 GB, 3.75 GB) were in use throughout the competition..
Data Analysis and Augmentations-
According to the paper published about the competition data collection process, we learned that all images are actually in higher resolution than that of training data shared which was 1280×720. Also naturally, due to the underwater scenario with haze and reef structure added increased complexity in COTS detection. So days into the competition we were trying out different image enhancements techniques to improve the image quality as a whole.
From the solutions we tried, CLAHE, BCCR, etc worked pretty well for the dataset, but due to real-time constraint we dropped this approach from our list. Here is a very good repository for Underwater Image Enhancement. In addition to image processing FUnIE-GAN was also tried for enhancement purposes that did not work out while resizing the image to higher resolution.
Later with the intent to produce more training data, we also tried style transfer on the URPC_2019 underwater dataset using WCT2 from ClovaAI, but could not generate good results.
For augmentations, albumentation library and yolo custom configurations were used later on. Final Augmentations that we have used are-



Our Solution
Cross-Validation
3-Fold – Hold-out one video train on other two
5-Fold – CV with sub-sequence from 🐠 Reef – A CV strategy: subsequences! | Kaggle
TPH YOLOv5
For tph-yolov5m6 the highest score was 0.416 (Public LB 0.524) with image size 1664, confidence 0.15, iou 0.5 and Test Time Augmentation(TTA), trained on video 0 and 1 with and without annotations for 100 epochs and batch size 32.
The observation here was that there were more false positives but strong augmentation and TTA with increased image size will give better results. Also it was observed that models do not learn after some epochs and multi-gpu training did not do well in this case.
Faster-RCNN
For faster-rcnn the highest score was 0.464 (Public LB 0.346) with default image size, confidence 0.15, iou 0.75 and Test Time Augmentation(TTA) CLAHE, Sharpen, HueSaturationValue, RandomBrightnessContrast, trained on video 1 and 2 with annotated data only for 30 epochs and batch size 8.
The observation from all experiments with faster-rcnn also indicated that strong augmentation and TTA will give better results. Also overfitting was observed here and that including non-annotated frames and parameter tuning can actually give better results.
YOLOV5s
For yoloV5s the highest score was 0.602 (Public LB 0.663) for single fold with image size 6316, confidence 0.2, iou 0.55 and Test Time Augmentation(TTA), trained on for 20 epochs, image size 3008 and batch size 4. Unfortunately we did not select this model for final submission.
YOLOv5s was the highest tested model from our team in the competition. Various training configurations were tried with different parameters. Image sizes 3008, 3392, 4800 with different batch sizes, augmentations and both types of cross validation were tested for training but what worked was image size 3008 and batch size 4.
Ensembling
Two ensembling implementations – Weighted Boxes Fusion(wbf) and Non-maximum Suppression(NMS) from yolo were tried. WBF did not work out in most cases so we progressed with NMS later.
Best performed WBF submission is for 3 yolo-V5l6 models trained on subsequence 20% cross validated data which has a score of 0.418 (Private LB 0.478).
Our best scored solution in the competition is also an ensemble of 5 folds trained on 5-fold subsequence data with image size 3008 and batch size 4 on v100. It was submitted with confidence threshold 0.25, IOU 0.45, image size 4800, TTA and NorFair Tracking.
The public LB score for this solution is 0.609 (Private LB 0.668). Unfortunately we did not select this model either for final submission and missed the chance to obtain a better result.
Performance Booster
- Image Upsizing at Inference upto twice the training image size
- Adding Tracker during Inference with custom configurations
- Ensembling with yolo models
What did not Work
- Style Transfer
- YOLO train augmentation Copy-paste
- Test Time Augmentation CLAHE, Sharpen, etc
- YoloV5m
- EfficientDet-d0
Result-Lesson
Our solution scored 0.657 with a position 244 on Private Leaderboard.
After trying various ways and with 189 submissions this is actually a pretty bad score having an overfit LB. But not to overestimate the Public LB in any case and be strategic to model selection with proper validation is a good lesson to learn after spending an absurd amount of time in this competition.
Thanks to the Kaggle community and all the teams who shared their approaches like video annotation, subsequence strategy, yolo training through notebooks or discussions which added to our knowledge and showed ways to try new techniques and inspired us to continue to the last.