AI Powered Smart Search System

[Design Expo] [Demo]

Capstone Design at UM (Jan. - Dec. 2020):

Junhui Li, Ashwin Pothukuchi, Nathan Nguyen, Hongting Zhu, Zhihao Guo, Rishi Tekriwal, Akik Kothekar

Multidisciplinary Design Program sponsored by ProQuest Co., Ltd., Faculty Mentors: Brian Noble


Dialog is an online information retrieval system to be used globally with materially significant databases, which holds over 1.3 billion records. It provides powerful search for professionals like medical practitioners and researchers as well as give “Similar Documents” suggestions that allows a user to bounce around from article to related article in their research. Yet, we found multiple patent tagging schemes in the dataset and the antiquated document similarity methods hinder the system from delivering quality results and suggestions.

Delivery

  • Utilized NLP knowledge to train and fine-tune complex neural network methods
  • Used novel evaluation metrics to compare these methods as well as rank the similarity results
  • Created a deep learning model to assign a hierarchical code structure to a document (patented and non-patented) based on the text
  • The optimized system would be adopted after product delivery.

My contributions

  • Compared existing solutions to the CPC classification model; proposed a hierarchical structure of deep learning(DL) model that can capture class distribution characteristics to predict single CPC code.
  • Trained and assembled single-label DL models DistilBert from section-level to group-level CPC code with optimal reweighting and resampling techniques; achieved 90% precision.
  • Analyzed coverage error of multi-label DL model RoBERTa and located tough classes with high error; identified hyperparameters to address the extreme multi-label text classification problem.
  • Obtained the MDP Summer Fellowship

Demo