Classify Website Content using Langchain, OpenAI and Browserless

Classify Website Content using Langchain, OpenAI and Browserless

In the fast-paced digital era, the internet is teeming with an overwhelming amount of websites and online content. Finding the right information amidst this vast sea of data can be a daunting and time-consuming task.

Fortunately, with the advent of advanced language models and frameworks such as Langchain, we now have access to powerful AI tools that can streamline various tasks.

In this blog, you’ll explore the capabilities of the Langchain URL Classifier app (powered by Rowy) and discover how it harnesses Langchain to effectively classify websites.

Langchain URL Classifier - An Overview 🔎

The Langchain URL Classifier is a low-code demo app of how to leverage Langchain framework along with OpenAI to classify URLs and website content coupled with Browserless tool to make it more accurate. This app will take a website URL and give out a category of the website along with what is the purpose of each website.

By following this tutorial, you'll acquire a complete and functional app backend for URL classification. It serves as an excellent foundation that can be customized and adapted to suit your specific use cases as needed.

langchain1.png

What is Langchain? 🦜🔗

Langchain is an AI framework designed for developing applications powered by language models. It distinguishes itself by going beyond mere language model integration, incorporating key principles that make applications more data-aware and dynamic.

By connecting language models to additional data sources, Langchain unlocks new possibilities for developing intelligent applications.

The Langchain URL Classifier primarily uses LLM Chains that enable us to connect several components to form a single, coherent application. We can, for example, build a chain that receives user input, formats it with a PromptTemplate, and then delivers the prepared result to an LLM. We can create more sophisticated chains by merging multiple chains or chains with other components.

Pre-requisites ✅

  • A Rowy Account with Rowy Run set up. If you don’t have one, create one on www.rowy.app - It is fully free for one project with unlimited database tables and functions (For more information on setting up Rowy, check out our documentation at docs.rowy.io/setup/install)

Setting up the Langchain URL Classifier ✨

Get started with the following step-by-step instructions to create a Langchain URL Classifier backend app 🚀

1. Create a table on Rowy

Visit www.rowy.io and click "Sign In". Once you've created your Workspace and set up your Project, select the Langchain URL Classifier template from the "Create a Table" section at the top of the page.

langchain2.png

The template setup wizard will guide you through the steps to set up your table, add your secret keys, and deploy your cloud functions – all in one go!

2. Setting up your Secrets 🤫

As part of the template setup process, you'll be prompted to add two secrets: your OpenAI API Key and Browserless API Key. Here's how to retrieve these secret keys for your project:

2.1 Retrieving the OpenAI API Key ✅

  • Go to the OpenAI Account Dashboard and create an account if you haven’t already.
  • Click on Create a new secret key. langchain3.png
  • Add a name for your secret key, let’s say Rowy Langchain. Click Create.
  • Now, your OpenAI API Key is generated! 🥳 Copy and store it somewhere temporarily. langchain4.png In the Template Setup Wizard, when prompted to add your OpenAI API Key, click on the Add a key in Secret Manager button. This will redirect you to the GCP Secret Manager. Add your OpenAI API Key as shown below and click Create a Secret. langchain5.png Now, you can go back to the setup process. In the Secret dropdown itself, click Refresh to see your OpenAI API Key and select it.

2.2 Retrieving the Browserless API Key ✅

  • Navigate to the Browserless Dashboard and create an account, if you haven’t already. langchain7.png
  • Select a plan as per your requirement. (You can also opt for the free tier) langchain8.png
  • Verify your email address according to the prompted steps on the portal.
  • Your Browserless API Key is generated! 🥳 Copy the generated API key from the dashboard. langchain9.png

In the Template Setup Wizard, when prompted to add your Browserless API Key, click on the Add a key in Secret Manager button. This will redirect you to the GCP Secret Manager. Add your API Key as shown below and click Create a Secret.

langchain10.png

Now, you can go back to the setup process. In the Secret dropdown itself, click Refresh to view your Browserless API Key and select it.

3. Deploying Cloud Functions 🚀

The template set up process will then prompt you to deploy the automations. Click the “Deploy Functions” button to set your derivatives into gear. Once the deployment completes, click “Proceed” and you will be redirected to your Langchain URL Classifier application. langchain11.png

Now, you’re all set to get hands on with the application! 🎉 Connect it to the frontend of your choice, experiment with the logic, or feel free to use the default Rowy template as a starting point.

Simplifying Web URL Classification 💫

The Langchain URL Classifier primarily consists of two automated data fields, i.e. Website Category and Website Summary. Every time a new row is added, the user can specify the URL of the Website in the URL field. The derivative functions listen to this field and dynamically generate results.

langchain12.png

Website Category

The Website Category field uses an instance of the LLMChain class provided by Langchain. The Web Scraping service of the Browserless API retrieves the HTML content of the website. Based on the processing and extraction of this HTML content, the language model generates a response based on the provided prompt, which includes classifying the website into a category.

Website Summary

The Website Summary field uses a similar logic to generate the summary of the HTML content provided by the web scraping service.

Conclusion

The Langchain URL Classifier is an innovative application that utilizes the power of language models to simplify web URL classification. By leveraging the capabilities of the Langchain framework, this tool categorizes web URLs based on the content and purpose of the respective websites.

This blog highlights the use case of Rowy as a platform for leveraging large language models. It focuses on the Langchain URL Classifier, an application that dynamically generates website categories and summaries based on the provided URL. By simplifying web content navigation and enhancing information retrieval, the Langchain URL Classifier offers an efficient solution for finding relevant information.

If you enjoyed our content, please give us a follow on Twitter or join our Discord in case of any questions. We look forward to hearing from you! 💜