Introduction:
This project explores the applications of Artificial Intelligence (AI) techniques for classifying Deoxyribonucleic Acid (DNA) sequences into their corresponding gene families. The paper focuses on presenting how DNA sequences can be treated as the human language to be understood and classified. Specifically, we first transformed the DNA sequences into a more human-like format. Then, we employed Natural Language Processing (NLP) and Multi-layer Perceptron (MLP) algorithms to complete sequence classification into 7 gene families. Our research drew DNA sequence data from three organisms, including humans, dogs, and chimpanzees. Finally, various experiments were conducted to prove the classification performance. Additionally, to validate the generalization of this solution, we designed experiments that involved cross-domain testing. These experimental results demonstrate not only high accuracy and efficiency but also intriguing findings in life sciences.
Achievements:
The paper DNA Sequence Automatic Classification—Learn the Life Language using Artificial Intelligence was published at the 12th International Conference on Soft Computing, Artificial Intelligence and Applications (SCAI), 2023 (listed in Publications).
A demo is delivered and you can get the recognition result from the app when you input the needed info, such as "DNA sequence". The example output is shown below (Note that the server side is located on Colab, the notebook might be down. Please let me know if you have an issue when you try. Thanks! Also listed in Projects).
The authors Joseph, Josephine, and Phoebe presented the work on AI Teen Talk at East Coast Bays Library and SCAI2023 successfully.
Nice posters and trailers made by the JJP team
Some photos of the presentation
Media