A group in Japan has provided clear evidence that so-called intrasource balance in COVID-19 chest x-ray data sets is vital to minimize the risk of poor performance of AI deep-learning models, according to a study published November 3 in Scientific Reports.
A group led by Zhang Zhang, PhD, of Tohoku University in Sendai, found that using an intra-source imbalanced dataset of x-rays caused a serious training bias, even though the data set had a good intercategory balance.
“Our study reveals that the [intrasource imbalance] of training data can lead to an unreliable performance of deep-learning models,” the group wrote.
When developing deep-learning AI models to detect COVID-19, researchers collect as much data as possible from different medical facilities to avoid the impact of intercategory imbalance (ICI), which means a difference in data quantity among categories.
However, due to the ICI within each medical facility, medical data are often isolated and acquired in different settings among medical facilities, and this is known as the intra-source imbalance (ISI) characteristic, the authors explained. Moreover, this imbalance can also impact the performance of DL models, yet has received negligible attention, they added.
Thus, the group aimed to explore the impact of the ISI on DL models by comparing a version of a deep-learning model that was trained separately by an intrasource imbalanced chest x-ray data set and an intra-source balanced data set for COVID-19 diagnosis.
One data set, called the Qata-COV19 data set, contained 3,761 COVID-19 positive images from five different public facilities and 3,761 negative x-ray images from seven other public facilities. In comparison, a second data set called the BIMCV data set contained 2,461 positive images and 2,461 negative CXR images from a single public facility.
In brief, the group used a cross-data set approach in which they trained a VGG-16 deep-learning model using the original images from the Qata-COV19 data set and then tested it on the images from the BIMCV data set.
According to the findings, area under the curve (AUC) values were all larger than 0.99 when the VGG-16 model was trained and tested on the original images from Qata-COV19. Significantly, yet inexplicably, the model performed well even when lung regions were hidden in the Qata-COV19 images.
Conversely, the deep-learning model also performed well identifying disease in the BIMCV images, but when lungs were removed or boxed-out in these x-rays, the model’s AUC values degraded significantly, according to the analysis.
“Such different results with different data sets demonstrate that the unreliable performance is related to the ISI,” the group wrote.
Ultimately, the study also demonstrates the fundamental “black-box problem” of deep learning, the authors noted. That is, although deep-learning models can achieve high performance on COVID-19 detection, there is a lack of lack transparency and explanation in how they achieve their predictions.
Without a sufficient understanding of the machine-made predictions, it becomes very complicated to detect errors in models’ performance and therefore, the reliability of deep learning models remains a concern, they wrote.
“Our study revealed a risk of training bias when using an intra-source imbalanced dataset, so researchers should raise their concerns about the intra-source balance when collecting training data to minimize the risk of unreliability,” the group concluded.
The full article is available.
The WEBFI algorithm actively curates and presents current news from the Internet, delivering it in both written and video formats on our platform. Unlike many other news sources, WEBFI Network - News Balance Security is committed to a user-friendly experience. We refrain from displaying advertising within our content, avoid any redirects to external sites, and meticulously filter out any graphic content deemed unsafe, sensitive, or private. Our primary goal is to provide visitors with a distraction-free and secure environment, ensuring they receive the news they seek.
Importantly, WEBFI Network does not collect any personal information from our visitors, and we do not engage in newsletter subscriptions. We take pride in remaining entirely advertiser-free, thanks to the support of our contributors and our dedicated hosting service partners. It's crucial to note that the opinions and content presented on our platform do not necessarily align with WEBFI NETWORK's opinion, philosophy, or vision. We strongly uphold the principle of freedom of speech, welcoming a diverse range of perspectives and ideas.
🌐 Discover News Balance 🇺🇲 - Your Round-the-Clock Source for Unbiased News!
Experience a continuous stream of comprehensive, unbiased news coverage 24/7/365 with News Balance 🇺🇲. Our carefully curated playlist ⏯ delivers a harmonious blend of national and global politics, cutting-edge tech updates, weather forecasts, noteworthy events, and captivating entertainment news.
The best part? No subscriptions, registrations, or downloads required. Enjoy an ad-free news experience with News Balance 🇺🇲.
"Introducing Unstoppable Private WebFi Websites – Your Forever Digital Haven.
Experience a lifetime of ownership with WebFi – where your digital presence is a lifelong investment. Embark on your journey to own a private website for life.
Our private servers set the gold standard in security and performance, ensuring your website stays in top form. With our lifetime license, the days of fretting about hosting renewals are behind you.
Unlock your very own WebFi space granting you a perpetual haven for your projects, free from the burden of recurring payments. Your sole financial commitment? Domain annuities to your domain provider – nothing more!
Choose WebFi and own your digital future, secure, simple, and everlasting."LEARN MORE