Using RPA tools for web data scraping is a common means of data collection, and RPA can also greatly improve the efficiency of data scraping and reduce the cost of collection. Obviously, Nstbrowser RPA provides you with the best RPA experience and the best work efficiency.
After reading this tutorial, you will:
You need to:
Now, we can start to configure the workflow for RPA crawling based on Google map search results.
Before searching the target content, we need to visit our target website: https://www.google.com/maps.
Goto Url
node.And you can visit the target website now.
After reaching the website, we need to search for the target address. Here you need to use the Chrome Devtool to locate the HTML elements.
Open the DevTools and use your mouse to select the search box. Then we can see:
id
" attribute, which can be used as a CSS selector to locate the input box.So, we need to do:
Input Content
node. Select "Selector" for the Element option and Selector for the Selector option.id
we have positioned in the input box and enter the content we want to search for in the Content option.We have completed the action of typing in the input box:
After typing, we need to make Google Maps search for the content we've filled in:
Keyboard
node to simulate a keyboard's "Enter".Okay, carry on now, we have successfully gotten the content we want, and the next step is to scrape these contents!
Through observation, we can find that the search results of Google Maps are displayed in a list form (a very classic way). Only some important information will be displayed here, and if you click on one particular item, all the detailed corresponding information will appear next to it.
Again, open the DevTools to locate each result in the list:
Since each item in the list uses an HTML layout, we need to use the Loop Element
node to iterate through all the results of the query:
We should save each of the traversed elements to the map
variable and the index of each element to the map-index traversal for subsequent use.
All the search results are obtained through a web request, so we have to add a "wait" action before traversing to make sure we will get the latest and correct element. Nstbrowser RPA provides two wait actions: Wait Time
and Wait Request
.
Wait Time
: used to wait for a certain period of time. You can choose a fixed time or a random time according to your specific situation.Wait Request
: used to wait for the end of the network request. It is applicable to the case of obtaining data through a network request.After traversing the results for each item, we need to collect the data.
Before getting the full information, click on the "list" item. Here we need to use the Get Element Data
node to locate the target element to click on based on the elements saved in the map
variable:
Then, use the Click Element
node to simulate "click":
Loop Element
so that these nodes will be executed inside the loop.After performing the above actions, we can already see the specific information of each search result! Now, it's time to use the Get Element Data
node to get the data we want:
Congratulation!
At this point, we're done crawling information from a single search result!
Of course, collecting data from a single search is not enough, and Nstbrowser's RPA functionality facilitates this repetitive work with only one node!
Repeat Flow
node is used to repeat the execution of an already existing node. All you need to do is just configure the number of repetitions or the end condition. As a result, Nstbrowser can repeat the action automatically according to your needs.Suppose we need to scrape data for 2 more requests, then just configure the repeat count to 2:
By now, we have acquired all the data we want to collect and it's time to save them.
Nstbrowser RPA provides two ways to save data: Save To File
and Save To Excel
.
Save To File
provides three file types for you to choose from: .txt, . csv, .json.Save To Excel
, on the other hand, can only save data to an Excel file.For easy viewing, we choose to save the collected data to Excel:
Save To Excel
node.How to execute it automatically? We need to:
Then, we can start collecting data from Google Maps!
After completing, let's take a look at the results we collected:
It's very cool, isn't it?
You only need to configure the workflow once, and then you can do data scraping anytime. That's why Nstbrowser RPA is charming!
Scraping the Google Maps' search result is now available in the Nstbrowser RPA marketplace, and you can go to the RPA marketplace to get it directly! Just change the content you want to search and the path of the file you want to save after getting it, and you can start your RPA crawling journey.