Explainable Image Captioning to Identify Ergonomic Problems and Solutions for Construction Workers

Research output: Contribution to journalArticlepeer-review

Abstract

The high occurrence of work-related musculoskeletal disorders (WMSDs) in construction remains a pressing concern, causing numerous nonfatal injuries. Preventing WMSDs necessitates the implementation of an ergonomic process, encompassing the identification of ergonomic problems and corresponding solutions. Finding ergonomic problems and solutions within active construction sites requires significant efforts from personnel possessing ergonomics expertise. However, ergonomic experts and training programs are often lacking in construction. To address this issue, the authors applied deep learning (DL)–based explainable image captioning to identify ergonomic problems and their corresponding solutions from images that are prevalent in construction sites. To this end, the authors proposed a vision-language model (VLM) capable of identifying ergonomic problems and their solutions, aided by data augmentation. The bilingual evaluation understudy (BLEU) score was used to measure the similarity between ergonomic problems and solutions identified by the proposed VLM and those specified in an ergonomic guideline. Testing with 222 real-site images, the proposed VLM achieved the highest BLEU-4 score, 0.796, compared with the traditional convolutional neural network-long short-term memory and a state-of-the-art VLM, the bootstrapping language-image pretraining. In addition, the authors developed an explainability module, visualizing which specific areas of the images the proposed VLM focuses on when identifying ergonomic problems and the important words for identifying ergonomic solutions. The highest BLEU score and the visual explanations demonstrate the potential and credibility of the proposed VLM in identifying ergonomic problems and their solutions. The proposed VLM and explainability module greatly contribute to implementing the ergonomic process in construction, identifying ergonomic problems and their solutions only with site images.

Original languageAmerican English
Article number04024022
JournalJournal of Computing in Civil Engineering
Volume38
Issue number4
DOIs
StatePublished - Jul 1 2024

ASJC Scopus subject areas

  • Civil and Structural Engineering
  • Computer Science Applications

Fingerprint

Dive into the research topics of 'Explainable Image Captioning to Identify Ergonomic Problems and Solutions for Construction Workers'. Together they form a unique fingerprint.

Cite this