TY - JOUR
T1 - Explainable Image Captioning to Identify Ergonomic Problems and Solutions for Construction Workers
AU - Yong, Gunwoo
AU - Liu, Meiyin
AU - Lee, Sang Hyun
N1 - Publisher Copyright: © 2024 American Society of Civil Engineers.
PY - 2024/7/1
Y1 - 2024/7/1
N2 - The high occurrence of work-related musculoskeletal disorders (WMSDs) in construction remains a pressing concern, causing numerous nonfatal injuries. Preventing WMSDs necessitates the implementation of an ergonomic process, encompassing the identification of ergonomic problems and corresponding solutions. Finding ergonomic problems and solutions within active construction sites requires significant efforts from personnel possessing ergonomics expertise. However, ergonomic experts and training programs are often lacking in construction. To address this issue, the authors applied deep learning (DL)–based explainable image captioning to identify ergonomic problems and their corresponding solutions from images that are prevalent in construction sites. To this end, the authors proposed a vision-language model (VLM) capable of identifying ergonomic problems and their solutions, aided by data augmentation. The bilingual evaluation understudy (BLEU) score was used to measure the similarity between ergonomic problems and solutions identified by the proposed VLM and those specified in an ergonomic guideline. Testing with 222 real-site images, the proposed VLM achieved the highest BLEU-4 score, 0.796, compared with the traditional convolutional neural network-long short-term memory and a state-of-the-art VLM, the bootstrapping language-image pretraining. In addition, the authors developed an explainability module, visualizing which specific areas of the images the proposed VLM focuses on when identifying ergonomic problems and the important words for identifying ergonomic solutions. The highest BLEU score and the visual explanations demonstrate the potential and credibility of the proposed VLM in identifying ergonomic problems and their solutions. The proposed VLM and explainability module greatly contribute to implementing the ergonomic process in construction, identifying ergonomic problems and their solutions only with site images.
AB - The high occurrence of work-related musculoskeletal disorders (WMSDs) in construction remains a pressing concern, causing numerous nonfatal injuries. Preventing WMSDs necessitates the implementation of an ergonomic process, encompassing the identification of ergonomic problems and corresponding solutions. Finding ergonomic problems and solutions within active construction sites requires significant efforts from personnel possessing ergonomics expertise. However, ergonomic experts and training programs are often lacking in construction. To address this issue, the authors applied deep learning (DL)–based explainable image captioning to identify ergonomic problems and their corresponding solutions from images that are prevalent in construction sites. To this end, the authors proposed a vision-language model (VLM) capable of identifying ergonomic problems and their solutions, aided by data augmentation. The bilingual evaluation understudy (BLEU) score was used to measure the similarity between ergonomic problems and solutions identified by the proposed VLM and those specified in an ergonomic guideline. Testing with 222 real-site images, the proposed VLM achieved the highest BLEU-4 score, 0.796, compared with the traditional convolutional neural network-long short-term memory and a state-of-the-art VLM, the bootstrapping language-image pretraining. In addition, the authors developed an explainability module, visualizing which specific areas of the images the proposed VLM focuses on when identifying ergonomic problems and the important words for identifying ergonomic solutions. The highest BLEU score and the visual explanations demonstrate the potential and credibility of the proposed VLM in identifying ergonomic problems and their solutions. The proposed VLM and explainability module greatly contribute to implementing the ergonomic process in construction, identifying ergonomic problems and their solutions only with site images.
UR - https://www.scopus.com/pages/publications/85193201435
UR - https://www.scopus.com/pages/publications/85193201435#tab=citedBy
U2 - 10.1061/JCCEE5.CPENG-5744
DO - 10.1061/JCCEE5.CPENG-5744
M3 - Article
SN - 0887-3801
VL - 38
JO - Journal of Computing in Civil Engineering
JF - Journal of Computing in Civil Engineering
IS - 4
M1 - 04024022
ER -