Automatic HTML Form Submission

Extracting LinkedIn Connections Example

Table of Contents

Automatic HTML form submission


        Content of the attached archive

        Using the script

        Technical details



This article will show you how to automate HTTP actions such as login to a website, retrieve content on different pages. We will connect as an example to the LinkedIn website, login using your credentials (if you have an account there) and automatically retrieve your connections or someone else's connections.

LinkedIn Web Connections

The goal is to show you how to script automatic form submission, retrieve HTML content. This can be very useful if you want for instance to automatically register a user to a website (provisioning) when there is no other API available or do automated tests on web interface.

LinkedIn XML Connections

I took LinkedIn as an example but it seems that later this year, there will be a LinkedIn API available to developers to connect to the site, do searches, retrieve profiles, connections, etc. This could be a pretty useful and dangerous tool...

Content of the attached archive

Here is the content of the file

|__ get_linkedin_connections
\__ docs
|__ images_linkedin
| \__ *.png
|__ linkedin.txt
\__ linkedin.html


  • get_linkedin_connections: Main script retrieving your connection or someone else's connection, if a key is provided. You can display or save the result using TXT, CSV or XML format.

  • docs/linkedin.txt: Wiki source of this article

  • docs/images_linkedin/*: All the pictures used in this article

Using the script

You can call the script by specifying credentials and user to check from the command-line. You can get the list of options anytime using the -h or --help option:

/LinkedIn> ./get_linkedin_connections -h
usage: get_linkedin_connections [options] [output.ext]
retrieve LinkedIn connections and export in different formats
possible formats: TXT, CSV, XML
-h or --help for help

example: get_linkedin_connections -D -w mypass
get_linkedin_connections -D -W
get_linkedin_connections -D -w mypass -k 1234567 -o csv
get_linkedin_connections -D -w mypass -o xml output.xml

--version show program's version number and exit
-h, --help show this help message and exit
-c, --changelog display changelog
-D USER user name (email)
-w PASSWD password
-W prompt for password
-k KEY key of the user to check (logged user by default)
-o OUTTYPE output type: txt, csv or xml [default: txt]

To retrieve your own connections you can use the following commands:

get_linkedin_connections -D -w mypass 


get_linkedin_connections -D -W 

The result will have the following format:

MyFirstName MyLastName's Connections (key=0123456)

My Connection1 (key=1234567)

My Connection2 (key=1234568)

My Connection3 (key=1234569)

My Connection4 (key=1234570)

My Connection5 (key=1234571)


To retrieve someone else's connections, you need to specify the key corresponding to that user (when listing your connections with the above command, you will see the keys corresponding to each user):

get_linkedin_connections -D -w mypass -k 1234567

The result will have the following format:

My Connection1's Connections (key=1234567)

MyFirstName MyLastName (key=0123456)

My Connection4 (key=1234570)

My Connection5 (key=1234571)

My Connection6 (key=1234572)

My Connection7 (key=1234573)


To export the result to a different format, you can use the following commands. To export as CSV:

get_linkedin_connections -D -w mypass -o csv 

The result will look like the following:

# MyFirstName MyLastName's Connections (key=0123456)
# MyLongTitle
# 20 connections
"1234567";"My Connection1";"Title1"
"1234568";"My Connection2";"Title2"
"1234569";"My Connection3";"Title3"
"1234570";"My Connection4";"Title4"
"1234571";"My Connection5";"Title5"

To export as XML:

get_linkedin_connections -D -w mypass -o xml 

In that case, the result will look like the following:

<?xml version="1.0" encoding="utf8"?>
<profile id="0123456">
<name>MyFirstName MyLastName</name>
<connections count="20">
<profile id="1234567">
<name>My Connection1</name>
<profile id="1234568">
<name>My Connection2</name>
<profile id="1234569">
<name>My Connection3</name>
<profile id="1234570">
<name>My Connection4</name>
<profile id="1234571">
<name>My Connection5</name>

To save the result to an output file, you can use the following commands:

get_linkedin_connections -D -w mypass -o txt output.txt


get_linkedin_connections -D -w mypass -o csv output.csv


get_linkedin_connections -D -w mypass -o xml output.xml

Technical details

This section explains the different parts of the script. The global behavior is to log into LinkedIn, go to the Connections page, get the user's information and all connections on multiple pages, if applicable. All connections are stored in a dictionary before being processed to generate the output.

1. The first part specifies the modules to use in the script. The httplib and urllib modules are used to build HTTP URLs, connect to web pages, submit a form, and retrieve the HTML result. The codecs module is only used to write UTF-8 files, as LinkedIn uses Unicode characters in names and titles.


import getpass, httplib, urllib, codecs, sys, re
from htmlentitydefs import name2codepoint as n2cp
from optparse import OptionParser

2. The second part handles command-line arguments and options using the OptionParser class. To use the script, you just need the LinkedIn credentials of the user, an optional key if you want to check someone else's connections, and an optional output format if you want to save the result in a file (TXT, CSV or XML formats):

changelog = [ "02/03/2008 - v0.1 - retrieve LinkedIn connections" ]

usage = """%prog [options] [output.ext]
retrieve LinkedIn connections and export in different formats
possible formats: TXT, CSV, XML
-h or --help for help

example: %prog -D -w mypass
%prog -D -W
%prog -D -w mypass -k 1234567 -o csv
%prog -D -w mypass -o xml output.xml"""

# Handle command-line options and arguments
parser = OptionParser(usage=usage, version="%prog - 02/03/2008 - v0.1 - Reza Kalfane")
parser.add_option( "-c", "--changelog", action="store_true", dest="changelog", help="display changelog" )
parser.add_option( "-D", action="store", type="string", metavar="USER", dest="user", help="user name (email)" )
parser.add_option( "-w", action="store", type="string", metavar="PASSWD", dest="passwd", help="password" )
parser.add_option( "-W", action="store_true", dest="passwd_i", help="prompt for password" )
parser.add_option( "-k", action="store", metavar="KEY", dest="key", help="key of the user to check (logged user by default)" )
parser.add_option( "-o", action="store", type="choice", metavar="OUTTYPE", dest="out_type", default="txt", help="output type: txt, csv or xml [default: �fault]", choices=["txt","csv","xml"] )
(options, args) = parser.parse_args()

3. Once the arguments and the options are parsed from the command-line, you can check that everything is valid, display the changelog if requested, and prompt for the password if needed.

# Display changelog
if options.changelog:
print "\n".join( changelog )

# Prompt for password
if options.passwd_i:
options.passwd = getpass.getpass()

# Options verifications
if options.user == None or options.passwd == None:
parser.error( "please specify credentials" )

I used in that script functions I found on the web to convert HTML entities to full unicode strings:

# Transform HTML entities
def substitute_entity(match):
ent =
if == "#":
return unichr(eval("0" ent))
cp = n2cp.get(ent)
if cp:
return unichr(cp)

def decode_htmlentities(string):
entity_re = re.compile("&(#?)(\d{1,5}|\w{1,8});")
return entity_re.subn(substitute_entity, string)[0]

5. From there, you need to simulate a login to LinkedIn website. The main page looks like the following:

LinkedIn Web SignIn

Here is the part which is of interest for us - the login form:

LinkedIn Web SignIn Form

Let's look at the source code of the page to see what fields names to use:

<form action="" method="post" accept-charset="UTF-8" name="login">
<td colspan="3" class="reason" name="reason"></td>
<td align="right" width="30%"><label for="session_key-login">Email&nbsp;address:</label></td>
<td colspan="2" width="70%"><input name="session_key" value="" id="session_key-login" size="24" type="text"></td>
<td align="right"><label for="session_password-login">Password:</label></td>
<td colspan="2"><input name="session_password" value="" id="session_password-login" size="24" type="password"></td>
<tr valign="top">
<td><input name="session_login" value="Sign In" class="btn-primary" type="submit"></td>
<td width="20"><a href="" name="forgotPassword" class="forgotpwd">Forgot password?</a></td>
<div style="display: none;" id="cookieDisabled">Make sure you have cookies and Javascript enabled in your browser before signing in.</div>
<script type="text/javascript">
if (navigator.cookieEnabled == true) {
if(document.getElementById('cookieDisabled')) document.getElementById('cookieDisabled').style.display = 'none';
<input name="session_login" value="" id="session_login-login" type="hidden"><input name="session_rikey" value="" id="session_rikey-login" type="hidden">

In the LinkedIn login form, here are the needed fields:

  • session_key handling the user name

  • session_password for the password of the user

  • session_login which holds the values empty and "Sign In"

  • session_rikey which is empty here

Using the HTTPSConnection class from httplib module, you can connect to, fill the form using the user name and password from Options, submit the form, and get the authentication cookie back from the result. The cookie contains multiple values, including a session ID and information about the logged user, such as the LinkedIn key. You need to store that cookie to use it for later HHTP connections.

# Login
conn = httplib.HTTPSConnection( "" )
headers = {'Content-type': 'application/x-www-form-urlencoded', 'Accept': 'text/plain'}
params = urllib.urlencode( {'session_key': options.user} ) '&session_password=' options.passwd '&session_login=Sign In&session_login=&session_rikey='
conn.request( "POST", "/secure/login", params, headers )
response = conn.getresponse()
cookie = response.getheader( "set-cookie" )
mykey = None
match = re.match( "^.*leo_auth_token=LIM:(.*?):.*$", cookie )
if not match:
print "Could not log into LinkedIn!"
mykey =
if options.key != None:
mykey = options.key

6. Once logged into the LinkedIn website, you can connect to the regular site and access and retrieve the connections page of the selected profile. This is either the logged user or another user when a key is specified in the options:

LinkedIn Web Connections

# Get connections
result = ""
conn = httplib.HTTPConnection( "" )
conn.request("GET","/profile?viewConns=&key=" mykey "&split_page=1","",headers)
response = conn.getresponse()
htmlresult =

7. The connections page contains the full name of the user, its title, and the list of the connections on multiple pages. You can go through the contents of this page to get the number of connections pages the user has.

# Retrieve user name, title and max connections pages
# from first page
givenname = "?"
familyname = "?"
title = "?"
title_in_next_line = False
splitpage = 1
for line in htmlresult.split( "\n" ):
match1 = re.match( '^.*<span class="given-name">(.*?)</span>.*', line )
match2 = re.match( '^.*<span class="family-name">(.*?)</span>.*', line )
match3 = re.match( '^.*split_page=([0-9] ).*', line )
match4 = re.match( '^.*<p class="title">.*', line )
# Given name found
if match1:
givenname =
# Family name found
if match2:
familyname =
# Pages count found
if match3:
maxpage = int( max( re.findall( "split_page=([0-9] )", line ) ) )
if maxpage > splitpage:
splitpage = maxpage
# Line contains title
if title_in_next_line:
match5 = re.match( '^\s*(.*)', line )
if match5:
title =
title_in_next_line = False
# Next line contains title
if match4:
title_in_next_line = True

8. If there are multiple pages, the script can navigate through them using the split_page parameter in the URL to retrieve all the HTML pages containing connections.

# Get connections from additional pages
if splitpage > 1:
for i in range( 2, splitpage 1 ):
conn.request("GET","/profile?viewConns=&key=" mykey "&split_page=" str( i ),"",headers)
response = conn.getresponse()
htmlresult =

9. Now that you have all the pages of contents, you can cycle through each line of the result to extract the key, name and title and store everything in a dictionary. The key of that dictionary is a tuple based on the full name in uppercase and the unique key.

10. Sort the result by name:

# Build connections dictionary
connections = {}
current_key = ""
current_name = ""
current_title = ""
for line in htmlresult.split( "\n" ):
match1 = re.match( '^.*<span name="connection"><a href=".*?key=(.*?)&.*?">(.*?)</a></span>.*$', line )
match2 = re.match( '^.*<span name="headline" class="headline">(.*?)</span>.*$', line )
if match1:
current_key =
current_name = decode_htmlentities( )
if match2:
current_title = decode_htmlentities( )
connections[ ( current_name.upper(), current_key ) ] = {}
connections[ ( current_name.upper(), current_key ) ][ "name" ] = current_name
connections[ ( current_name.upper(), current_key ) ][ "title" ] = current_title

10. Cycle through the resulting dictionary to export the result. Here is the code used to export as text content:

# Output
output = ""
# txt
if options.out_type == "txt":
output = givenname " " familyname "'s Connections\n"
output = title "\n\n"
for ( name, key ) in sorted( connections.keys() ):
output = connections[ ( name, key ) ][ "name" ] " (key=" key ")\n"
output = connections[ ( name, key ) ][ "title" ] "\n\n"
output = str( len( connections ) ) " connection" "s"*( len( connections ) > 1 )

Here is the code used to export as CSV content. The first three lines are comments about the user (name, key, title and number of connections):

# csv
elif options.out_type == "csv":
output = "# " givenname " " familyname "'s Connections\n"
output = "# " title "\n"
output = "# " str( len( connections ) ) " connection" "s"*( len( connections ) > 1 ) "\n"
output = '"key";"name";"title"\n'
for ( name, key ) in sorted( connections.keys() ):
output = '"%s";"%s";"%s"\n' % ( key, connections[ ( name, key ) ][ "name" ], connections[ ( name, key ) ][ "title" ] )

Here is the code to export the result as XML document:

# xml
elif options.out_type == "xml":
output = '<?xml version="1.0" encoding="utf8"?>\n'
output = '<profile id="%s">\n' %mykey
output = '\t<name>%s %s</name>\n' % ( givenname, familyname )
output = '\t<title>%s</title>\n' % title
output = '\t<connections count="%s">\n' % len( connections )
for ( name, key ) in sorted( connections.keys() ):
output = '\t\t<profile id="%s">\n' % key
output = '\t\t\t<name>%s</name>\n' % connections[ ( name, key ) ][ "name" ]
output = '\t\t\t<title>%s</title>\n' % connections[ ( name, key ) ][ "title" ]
output = '\t\t</profile>\n'
output = '\t</connections>\n'
output = "</profile>"
output = re.sub( "&", "&amp;", output )

11. Once you have the final output, you can either display it on the screen or save it in a UTF-8 encoded file:

# Display to standard output or to UTF-8 file
if len( args ) == 0:
print output
# UTF-8 file
out = file( args[0], "w" )
out.write( codecs.BOM_UTF8 )
out.write( output.encode( "utf-8" ) )

LinkedIn XML Connections

From there you have a simple export of connections. You can improve the script to access the Profile page for each connection and retrieve all information there, such as contact email, current and previous employers, skills, education, etc.


Through the LinkedIn Connections example, we have seen in this article how to access and submit content to HTML pages automatically. This can be very useful in doing automated tests, or automatically provisioning a user to a web application when there is no API available. As it relies on the HTML content, and as this content can change over time, the script may stop working at some point.

This is not really the preferred way to integrate to a web site, but it can be nice in demos, Proof-of-Concepts, tests or personal use. Now, let's monitor your LinkedIn connections to see what they are doing!
Parents Comment Children
No Data