ServiceNow REST Message - Mimic Python WebScraping

Jordan Rose1
Kilo Expert

I am trying to accomplish a web scraping process of getting behind a password-protected site and pulling back HTML from the site.  I am able to do this with no authentication through a REST message, but when the site I am trying to authenticate against is using an apache token and needs to be accessed via a form post, I am unable to pass through that authentication gate.  I am able to work around this hurdle using the following Python script:

import mechanize
import cookielib
from bs4 import BeautifulSoup
import html2text

# Browser
br = mechanize.Browser()

# Cookie Jar
cj = cookielib.LWPCookieJar()
br.set_cookiejar(cj)

# Browser options
br.set_handle_equiv(True)
br.set_handle_gzip(True)
br.set_handle_redirect(True)
br.set_handle_referer(True)
br.set_handle_robots(False)
br.set_handle_refresh(mechanize._http.HTTPRefreshProcessor(), max_time=1)

br.addheaders = [('User-agent', 'Chrome')]

# The site we will navigate into, handling it's session
br.open('https://www.acmecorp.com/login')

# View available forms
for f in br.forms():
    print f

# Select the second (index one) form (the first form is a search query box)
br.select_form(nr=1)

# User credentials
br.form['userName'] = 'test user'
br.form['password'] = '12345'

# Login
br.submit()

print(br.open('https://www.acmecorp.com/records.do').read())

My question is, can I duplicate this functionality from Python somehow in Javascript/REST?  I'd like to avoid relying on Python script calls if possible.

Thanks

5 REPLIES 5

Chris Sanford1
Kilo Guru

I'm pretty sure it can't be done in REST, unless the website had a REST API designed specifically to return HTML data from a page. Maybe with javascript, but I don't understand what this has to do with ServiceNow.

Jordan Rose1
Kilo Expert

Hey Chris, theoretically I want to crawl an existing external web site/repository that my organization has credentials to access in order to extract data/attachments that I would then use to generate new records in a custom table in ServiceNow.  I'd essentially be utilizing this external website as a data source that would feed into ServiceNow records (as SNOW is the system of record for the integration).

 

I know I can do this in Python and use the REST API to post to the ServiceNow instance to create the records, but I wanted to avoid relying on Python is possible.

Check the API of the external website. If they have supported REST/SOAP calls to retrieve data and attachments then yes you can do it in ServiceNow. But I don't think there's any practical way to input a website URL and return a page's HTML in ServiceNow.

Update: I stand corrected. You can indeed create a 'REST' message record in ServiceNow with a website URL as the endpoint, and retrieve a web page's HTML. I just tested with google.com.

Here's my REST Message I created:

find_real_file.png

Open the 'Default GET' method at the bottom that automatically generates, and click 'Preview Script Usage'. It will give you the javascript you need to pull the response body. Looks like in my example, the response body was the HTML of google.com.

No idea about authentication though.