Skip to content

Instantly share code, notes, and snippets.

@nro-bot
Last active December 30, 2018 23:56
Show Gist options
  • Select an option

  • Save nro-bot/02dc4a32f1085e5bbded347c39e2be4f to your computer and use it in GitHub Desktop.

Select an option

Save nro-bot/02dc4a32f1085e5bbded347c39e2be4f to your computer and use it in GitHub Desktop.
Automating Mal-Lys form submission

To use

(install python)

$ pip install requests
$ python robo_submit.py

Created by nouyang for jbaek

>E0CZH6
MVVKEDGHMVTARQEPRLVLVSITLENNYLTLEAPGMEQIVLPIKLPSSNKIHNCRLFGL
DIKGRDCGDEVAQWFTNYLKTQAYRLVQFDTSMKGRTTKKLYPSESYLQNYEVAYPDCSP
VHLISEASLVDLNTRLKKKVKMEYFRPNIVVSGCEAFEEDTWDELLIGDVEMKRVLSCPR
CVLTTVDPDT
>P16015
MAKEWGYASHNGPDHWHELYPIAKGDNQSPIELHTKDIKHDPSLQPWSASYDPGSAKTIL
NNGKTCRVVFDDTYDRSMLRGGPLSGPYRLRQFHLHWGSSDDHGSEHTVDGVKYAAELHL
VHWNPKYNTFGEALKQPDGIAVVGIFLKIGREKGEFQILLDALDKIKTKGKEAPFTHFDP
SCLFPACRDYWTYHGSFTTPPCEECIVWLLLKEPMTVSSDQMAKLRSLFSSAENEPPVPL
VGNWRPPQPVKGRVVRASFK
import requests
URL2 = 'http://app.aporc.org/Mal-Lys/result.jsp'
session = requests.Session()
myfiles = {'FILE': open('2fasta.txt', 'rb')}
multipart_req = requests.Request('POST', URL2, files=myfiles).prepare()
# print(multipart_req.body.decode('utf-8'))
response = session.send(multipart_req)
print(response.text)

Useful

Screenshots of using the inspector tool to figure out what is going on

https://photos.app.goo.gl/LLTsrjXDLC5PphjN9

Example response from server

<html>
    <head>
<style type="text/css">
      body{font-family:Arial;margin-left:auto;margin-right:auto;}
    .bgDiv{width:70%;height:100%;
	       border:1px solid #808080;
		   		   text-align:center;margin-left:auto;margin-right:auto;
				   background-color:#F0F8FF;
				   margin-top:20px;
		  }
	.topDiv{width:90%;height:100px;text-align:center;margin-left:auto;margin-right:auto;
	        border:0px solid #808080;
			border-bottom-width:0px;
			margin-top:5px;
	}
	.seqDiv{width:800;text-align:center;margin-left:auto;margin-right:auto;border:1px solid #808080;}
	.batchDiv{width:800;text-align:center;margin-left:auto;margin-right:auto;border:1px solid #808080;}
	.footDiv{width:80%;height:40px;padding-top:10px;text-align:center;margin-left:auto;margin-right:auto;}
	.blankDiv{width:100%;height:20px;background-color:white;}

     span{display:block;font-size:16pt;text-align:left;padding-left:8px;}
	 </style>
        <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
        <title> 
		   Malonyllysine  Result
		</title>
    </head>
    <body bgcolor="#99CCCC">
        <center>
             	<span style="width:100%;height:60px;padding-top:10px;text-align:center">
               |&nbsp;<a href="readme.html" target="_blank">
							<font  class="head3"><a href="ReadMe.html">Read Me</font>
							</a>&nbsp;&nbsp;| 
							&nbsp;<a href="data.html" target="_blank">
							<font  class="head3">Data</font>
							</a>&nbsp;|
							<a href="Citation.html" target="_blank">
							<font  class="head3">Citation</font>
							</a>&nbsp;|
 </span>
 </center>
	<hr style="border-color:blue;height:3px;background-color:blue"/>

<table align="center"><tr><td>You Input Sequences</td></tr><tr><td>Example 1;E0CZH6;Length :190</td></tr><tr><td><font face="Courier" >MVVKEDGHMVTARQEPRLVLVSITLENNYLTLEAPGMEQIVLPIKLPSSNKIHNCRLFGLDIKGRDCGDEVAQWFTNYLKTQAYRLVQFDTSMKGRTTKK<BR>LYPSESYLQNYEVAYPDCSPVHLISEASLVDLNTRLKKKVKMEYFRPNIVVSGCEAFEEDTWDELLIGDVEMKRVLSCPRCVLTTVDPDT</font></td></tr><tr><td>Total number of K  in the example protein# 1 :<font face="Cambria" color='#FF0000'>13 </font></td></tr><tr><td><table border="1" cellspacing="1" cellpadding="4"  margin="40,0,0,0" ><tr><th> Position </th><th> Peptide </th><th>Posterior probability Score</th><th>CutOff</th></tr><tr align="center"><td><font face="Courier" color='#0000FF'>4</font></td><td><font face="Courier" color='#0000FF'>XXXMVV</font><font face="Courier"  color='red'>K</font><font face="Courier" color='#0000FF'>EDGHMVTAR</font></td><td><font color='red'>0.9997</font></td><td>0.5</td></tr><tr align="center"><td><font face="Courier" color='#0000FF'>63</font></td><td><font face="Courier" color='#0000FF'>LFGLDI</font><font face="Courier"  color='red'>K</font><font face="Courier" color='#0000FF'>GRDCGDEVA</font></td><td><font color='red'>0.7754</font></td><td>0.5</td></tr></table><hr></td></tr><tr><td>Example 2;P16015;Length :260</td></tr><tr><td><font face="Courier" >MAKEWGYASHNGPDHWHELYPIAKGDNQSPIELHTKDIKHDPSLQPWSASYDPGSAKTILNNGKTCRVVFDDTYDRSMLRGGPLSGPYRLRQFHLHWGSS<BR>DDHGSEHTVDGVKYAAELHLVHWNPKYNTFGEALKQPDGIAVVGIFLKIGREKGEFQILLDALDKIKTKGKEAPFTHFDPSCLFPACRDYWTYHGSFTTP<BR>PCEECIVWLLLKEPMTVSSDQMAKLRSLFSSAENEPPVPLVGNWRPPQPVKGRVVRASFK</font></td></tr><tr><td>Total number of K  in the example protein# 2 :<font face="Cambria" color='#FF0000'>19 </font></td></tr><tr><td><table border="1" cellspacing="1" cellpadding="4"  margin="40,0,0,0" ><tr><th> Position </th><th> Peptide </th><th>Posterior probability Score</th><th>CutOff</th></tr><tr align="center"><td><font face="Courier" color='#0000FF'>36</font></td><td><font face="Courier" color='#0000FF'>PIELHT</font><font face="Courier"  color='red'>K</font><font face="Courier" color='#0000FF'>DIKHDPSLQ</font></td><td><font color='red'>0.9943</font></td><td>0.5</td></tr><tr align="center"><td><font face="Courier" color='#0000FF'>39</font></td><td><font face="Courier" color='#0000FF'>LHTKDI</font><font face="Courier"  color='red'>K</font><font face="Courier" color='#0000FF'>HDPSLQPWS</font></td><td><font color='red'>0.9928</font></td><td>0.5</td></tr><tr align="center"><td><font face="Courier" color='#0000FF'>57</font></td><td><font face="Courier" color='#0000FF'>YDPGSA</font><font face="Courier"  color='red'>K</font><font face="Courier" color='#0000FF'>TILNNGKTC</font></td><td><font color='red'>0.9985</font></td><td>0.5</td></tr><tr align="center"><td><font face="Courier" color='#0000FF'>64</font></td><td><font face="Courier" color='#0000FF'>TILNNG</font><font face="Courier"  color='red'>K</font><font face="Courier" color='#0000FF'>TCRVVFDDT</font></td><td><font color='red'>0.8620</font></td><td>0.5</td></tr><tr align="center"><td><font face="Courier" color='#0000FF'>126</font></td><td><font face="Courier" color='#0000FF'>LVHWNP</font><font face="Courier"  color='red'>K</font><font face="Courier" color='#0000FF'>YNTFGEALK</font></td><td><font color='red'>0.9944</font></td><td>0.5</td></tr></table><hr></td></tr></table> 


<hr>
  <span>
<BR><BR><BR>
<center>
<a href="index.html" target="data_wind"> Go back</a>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
  <a href="javascript:window.close();">Close</a>
</span>
</center>
</body>  
</html> 

Potentially useful links

https://stackoverflow.com/questions/13147914/how-to-simulate-http-post-request-using-python-requests-module

http://app.aporc.org/Mal-Lys/index.html

http://jonathansoma.com/lede/foundations-2017/classes/adv-scraping/advanced-scraping-form-submission/

https://stackoverflow.com/questions/23118249/whats-the-difference-between-request-payload-vs-form-data-as-seen-in-chrome

https://franklingu.github.io/programming/2017/10/30/post-multipart-form-data-using-requests/

Key issues

Hm

Submitting to the text-input box does not work

I'm not sure why, but the following does not work.

txt = '''>E0CZH6
MVVKEDGHMVTARQEPRLVLVSITLENNYLTLEAPGMEQIVLPIKLPSSNKIHNCRLFGL
DIKGRDCGDEVAQWFTNYLKTQAYRLVQFDTSMKGRTTKKLYPSESYLQNYEVAYPDCSP
VHLISEASLVDLNTRLKKKVKMEYFRPNIVVSGCEAFEEDTWDELLIGDVEMKRVLSCPR
CVLTTVDPDT
>P16015
MAKEWGYASHNGPDHWHELYPIAKGDNQSPIELHTKDIKHDPSLQPWSASYDPGSAKTIL
NNGKTCRVVFDDTYDRSMLRGGPLSGPYRLRQFHLHWGSSDDHGSEHTVDGVKYAAELHL
VHWNPKYNTFGEALKQPDGIAVVGIFLKIGREKGEFQILLDALDKIKTKGKEAPFTHFDP
SCLFPACRDYWTYHGSFTTPPCEECIVWLLLKEPMTVSSDQMAKLRSLFSSAENEPPVPL
VGNWRPPQPVKGRVVRASFK'''

form_data = {'Seqe':txt, 'B1':'Submit'}
session = requests.Session()
r = session.post(URL2, data=form_data)

(Instead, I had to submit a file as per http://docs.python-requests.org/en/master/user/quickstart/#post-a-multipart-encoded-file)

Was not sure how to handle javascript

The solution is to submit directly to action.jsp instead of the page URL.

<form name="uploadForm" method="post" action="result.jsp"

Was unclear why my network inspector said 'response payload' rather than 'form data'

They are pretty similar, the formatting by the inspector for 'form data' is just prettier. The difference I think is that since we are going through a javascript submission (action.jsp), the inspector is less certain how to format the incoming data. (see picture in the google photos album of what submitting to the pizza for mat http://httpbin.org/post looks like).

Thus have to deal with trying to mimic this

This is some data I got with Firefox network inspector (see screenshots in photo album)

-----------------------------200507512520056464881016492692
Content-Disposition: form-data; name="Seqe"


-----------------------------200507512520056464881016492692
Content-Disposition: form-data; name="FILE"; filename="2fasta.txt"
Content-Type: text/plain

>E0CZH6
MVVKEDGHMVTARQEPRLVLVSITLENNYLTLEAPGMEQIVLPIKLPSSNKIHNCRLFGL
DIKGRDCGDEVAQWFTNYLKTQAYRLVQFDTSMKGRTTKKLYPSESYLQNYEVAYPDCSP
VHLISEASLVDLNTRLKKKVKMEYFRPNIVVSGCEAFEEDTWDELLIGDVEMKRVLSCPR
CVLTTVDPDT
>P16015
MAKEWGYASHNGPDHWHELYPIAKGDNQSPIELHTKDIKHDPSLQPWSASYDPGSAKTIL
NNGKTCRVVFDDTYDRSMLRGGPLSGPYRLRQFHLHWGSSDDHGSEHTVDGVKYAAELHL
VHWNPKYNTFGEALKQPDGIAVVGIFLKIGREKGEFQILLDALDKIKTKGKEAPFTHFDP
SCLFPACRDYWTYHGSFTTPPCEECIVWLLLKEPMTVSSDQMAKLRSLFSSAENEPPVPL
VGNWRPPQPVKGRVVRASFK

-----------------------------200507512520056464881016492692
Content-Disposition: form-data; name="B1"

Submit
-----------------------------200507512520056464881016492692--

This required finding the 'multipart' link.

https://franklingu.github.io/programming/2017/10/30/post-multipart-form-data-using-requests/

And in fact, if you uncomment the # print(multipart_req.body.decode('utf-8')) line, you should see something very similar to the "true" POST request above, where I manually hit the button.

Page formatting was a bit haphazard

For example, the 'form' tag above is never closed. I wonder if this is why submitting to the text-input box does not work, and we have to use the file-upload mechanism instead.

Based on looking at the HTML of the page, we can find the relevant lines of HTML

<form name="uploadForm" method="post" action="result.jsp" enctype="multipart/form-data" onsubmit="return FormCheck()" <hr="" width="630"><table width="321" align="center"><tbody><tr><td class="style6" width="313" height="28"><div align="justify"><p class="style22">  <span class="style24">|</span> <span class="style26"><a href="ReadMe.html" target="_blank">Read Me</a></span> <span class="style24">|</span> <span class="style31"><a href="data.html" target="_blank" class="style26"> Data</a></span><span class="style24"> |</span> <a href="Citation.html" target="_blank" class="style26">Citation</a> <span class="style24">|</span> </p></div></td></tr></tbody></table><hr width="90%"><table align="center"><tbody><tr><td><p class="style32">Enter or copy/paste query protein sequences in <font color="Red">FASTA</font> format <a href="example.html" target="_blank">(Example)</a>:</p></td></tr><tr><td><p align="center"><textarea name="Seqe" cols="66" style="width: 550px; height: 200px;"></textarea></p></td></tr><tr><td><p class="style32">Upload input file in <font color="Red">FASTA</font> format (<a href="example.fasta" target="example">Example</a>);</p><p class="style32">
Upload your input file:<input type="FILE" name="FILE" size="30" class="style32">
</p></td></tr><tr><td><br><br><br><p align="center">
<input name="B1" type="submit" class="style32" value="Submit">
<input name="B2" type="reset" class="style32" value="Clear">
</p></td></tr></tbody></table><hr width="90%">
</form>

Split up more finely

This is for the upload-file input

Upload your input file:<input class="style32" name="FILE" size="30" type="FILE"/></p>

This is for the typing-in-text input

<table align="center">
<tr><td>
<p class="style32">Enter or copy/paste query protein sequences in <font color="Red">FASTA</font> format <a href="example.html" target="_blank">(Example)</a>:</p>
</td></tr>
<tr><td>
<p align="center">
<textarea cols="66" name="Seqe" style="width: 550px; height: 200px;"></textarea>
</p>
</td></tr>

The following is the actual buttons

<p align="center">
<input class="style32" name="B1" type="submit" value="Submit"> # NOTE the VALUE field
<input class="style32" name="B2" type="reset" value="Clear"/>
</input></p>

A 'good' request

-----------------------------200507512520056464881016492692 Content-Disposition: form-data; name="Seqe"

-----------------------------200507512520056464881016492692 Content-Disposition: form-data; name="FILE"; filename="2fasta.txt" Content-Type: text/plain

E0CZH6 MVVKEDGHMVTARQEPRLVLVSITLENNYLTLEAPGMEQIVLPIKLPSSNKIHNCRLFGL DIKGRDCGDEVAQWFTNYLKTQAYRLVQFDTSMKGRTTKKLYPSESYLQNYEVAYPDCSP VHLISEASLVDLNTRLKKKVKMEYFRPNIVVSGCEAFEEDTWDELLIGDVEMKRVLSCPR CVLTTVDPDT P16015 MAKEWGYASHNGPDHWHELYPIAKGDNQSPIELHTKDIKHDPSLQPWSASYDPGSAKTIL NNGKTCRVVFDDTYDRSMLRGGPLSGPYRLRQFHLHWGSSDDHGSEHTVDGVKYAAELHL VHWNPKYNTFGEALKQPDGIAVVGIFLKIGREKGEFQILLDALDKIKTKGKEAPFTHFDP SCLFPACRDYWTYHGSFTTPPCEECIVWLLLKEPMTVSSDQMAKLRSLFSSAENEPPVPL VGNWRPPQPVKGRVVRASFK

-----------------------------200507512520056464881016492692 Content-Disposition: form-data; name="B1"

Submit -----------------------------200507512520056464881016492692--

Next steps

Parse response using Beautifulsoup. Example syntax

from bs4 import BeautifulSoup
import requests

URL = 'http://app.aporc.org/Mal-Lys/index.html'
session = requests.Session()
response = session.post(URL)

page = BeautifulSoup(response.text)
somestuff = page.find('textarea', attrs={'name':'Seqe'})
print(somestuff)
print(somestuff.string) # this is NONE, because there is nothing between the opening and closing <textarea> tags
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment