Dynamic GEN AI-Powered Web Crawling on Azure Using Automation Account and GPT-3.5
Chandan Srinath1, Sakshi Srivastava2
1Chandan Srinath, Digital Aviation Solutions, Boeing India, Pt. Ltd., Bangalore, India.
2Sakshi Srivastava, Digital Aviation Solutions, Boeing India, Pt. Ltd., Bangalore, India.
Manuscript received on 19 September 2024 | Revised Manuscript received on 27 September 2024 | Manuscript Accepted on 15 December 2024 | Manuscript published on 30 December 2024 | PP: 6-10 | Volume-14 Issue-2, December 2024 | Retrieval Number: 100.1/ijeat.B455614021224 | DOI: 10.35940/ijeat.B4556.14021224
Open Access | Editorial and Publishing Policies | Cite | Zenodo | OJS | Indexing and Abstracting
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC-BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Abstract: The integration of AI-powered automation in web crawling marks a significant advancement over traditional methods, which were often labor-intensive, inflexible, and prone to security risks. This paper presents a case study on the implementation of a dynamic web crawling solution using Azure Automation Account, leveraging GPT-3.5 from Azure OpenAI services. This new approach enables parameterised execution via automation variables, allowing user-defined requirements to guide the crawler’s behaviour more flexibly and intelligently. Unlike previous static methods that required constant manual adjustments, our system uses GPT-3.5’s Natural Language Processing (NLP) capabilities to interpret complex instructions and dynamically adapt to various web structures. After crawling, the data undergoes a security scan using ClamAV to ensure its integrity, and then it is stored in Azure Blob Storage. SendGrid is used for sending user alerts regarding scan results and storage status. The system is scheduled to run at regular intervals, fully automating the process while maintaining robust security protocols. This paper presents a detailed comparison between traditional web crawling techniques and this AI-driven approach, highlighting improvements in efficiency, security, and adaptability.
Keywords: Azure Automation, Clam AV, GPT-3.5, Web Crawling.
Scope of the Article: Artificial Intelligence and Methods