ServiceNow REST Message - Mimic Python WebScraping
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-12-2018 11:13 AM
I am trying to accomplish a web scraping process of getting behind a password-protected site and pulling back HTML from the site. I am able to do this with no authentication through a REST message, but when the site I am trying to authenticate against is using an apache token and needs to be accessed via a form post, I am unable to pass through that authentication gate. I am able to work around this hurdle using the following Python script:
import mechanize
import cookielib
from bs4 import BeautifulSoup
import html2text
# Browser
br = mechanize.Browser()
# Cookie Jar
cj = cookielib.LWPCookieJar()
br.set_cookiejar(cj)
# Browser options
br.set_handle_equiv(True)
br.set_handle_gzip(True)
br.set_handle_redirect(True)
br.set_handle_referer(True)
br.set_handle_robots(False)
br.set_handle_refresh(mechanize._http.HTTPRefreshProcessor(), max_time=1)
br.addheaders = [('User-agent', 'Chrome')]
# The site we will navigate into, handling it's session
br.open('https://www.acmecorp.com/login')
# View available forms
for f in br.forms():
print f
# Select the second (index one) form (the first form is a search query box)
br.select_form(nr=1)
# User credentials
br.form['userName'] = 'test user'
br.form['password'] = '12345'
# Login
br.submit()
print(br.open('https://www.acmecorp.com/records.do').read())
My question is, can I duplicate this functionality from Python somehow in Javascript/REST? I'd like to avoid relying on Python script calls if possible.
Thanks
- Labels:
-
Integrations
-
Scripting and Coding
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-12-2018 12:00 PM
Yep, i did that myself before. The issue is complex authentication with the site I need to access. Python allows me to make post from the login form and then access another page after being authenticated. Unfortunately this site I am accessing does not have REST/SOAP API available, so i'll probably just end up using a scheduled Python script from a server to parse through the HTML and then write to the ServiceNow REST API.