Michelle Dalal Jian MichelleDalalJian

Discovered as a communication facilitator. Outside the People Operations work, I am an advocate for sustainable living.

19 followers · 7 following

Pinkoi
Taiwan
https://www.linkedin.com/in/michelle-jian-32b8506b/

View GitHub Profile

Recently created

Least recently created

Recently updated

Least recently updated

MichelleDalalJian / py4e_ex_13

Created November 24, 2017 16:40

Extracting Data from XML: The program will prompt for a URL, read the XML data from that URL using urllib and then parse and extract the comment counts from the XML data, compute the sum of the numbers in the file.

	from urllib import request
	import xml.etree.ElementTree as ET

	url = 'http://python-data.dr-chuck.net/comments_24966.xml'
	print ("Retrieving", url)
	html = request.urlopen(url)
	data = html.read()
	print("Retrieved",len(data),"characters")

	tree = ET.fromstring(data)

MichelleDalalJian / py4e_ex_12_02

Created November 24, 2017 16:04

Following Links in Python: The program will use urllib to read the HTML from the data files below, extract the href= vaues from the anchor tags, scan for a tag that is in a particular position relative to the first name in the list, follow that link and repeat the process a number of times and report the last name you find.

	from bs4 import BeautifulSoup
	import urllib.request, urllib.parse, urllib.error
	import ssl
	import re

	ctx = ssl.create_default_context()
	ctx.check_hostname = False
	ctx.verify_mode = ssl.CERT_NONE
	url = "http://py4e-data.dr-chuck.net/known_by_Bryce.html"

MichelleDalalJian / py4e_ex_12_01

Last active August 21, 2024 03:16

Scraping Numbers from HTML using BeautifulSoup. The program will use urllib to read the HTML from the data files below, and parse the data, extracting numbers and compute the sum of the numbers in the file.

	#Actual data: http://py4e-data.dr-chuck.net/comments_24964.html (Sum ends with 73)

	from urllib import request
	from bs4 import BeautifulSoup
	html=request.urlopen('http://python-data.dr-chuck.net/comments_24964.html').read()
	soup = BeautifulSoup(html)
	tags=soup('span')
	sum=0
	for tag in tags:
	sum=sum+int(tag.contents[0])

MichelleDalalJian / py4e_ex_12

Created October 7, 2017 14:53

Exploring the HyperText Transport Protocol You are to retrieve the following document using the HTTP protocol in a way that you can examine the HTTP Response headers. http://data.pr4e.org/intro-short.txt There are three ways that you might retrieve this web page and look at the response headers: Preferred: Modify the socket1.py program to retrie…

	import socket

	mysock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
	mysock.connect(('data.pr4e.org', 80))
	cmd = 'GET http://data.pr4e.org/intro-short.txt HTTP/1.0\r\n\r\n'.encode()
	mysock.send(cmd)

	while True:
	data = mysock.recv(512)
	if (len(data) < 1):

MichelleDalalJian / py4e_ex_11

Created October 7, 2017 14:48

Extracting Data With Regular Expressions Finding Numbers in a Haystack In this assignment you will read through and parse a file with text and numbers. You will extract all the numbers in the file and compute the sum of the numbers. Data Files We provide two files for this assignment. One is a sample file where we give you the sum for your testi…

	import re

	hand = open("regex_sum_24962.txt")
	x=list()
	for line in hand:
	y = re.findall('[0-9]+',line)
	x = x+y

	sum=0
	for z in x:

MichelleDalalJian / py4e_ex_10_02

Created October 7, 2017 14:44

10.2 Write a program to read through the mbox-short.txt and figure out the distribution by hour of the day for each of the messages. You can pull the hour out from the 'From ' line by finding the time and then splitting the string a second time using a colon. From [email protected] Sat Jan 5 09:14:16 2008 Once you have accumulated the c…

	name = raw_input("Enter file:")
	if len(name) < 1 : name = "mbox-short.txt"
	hand = open(name)

	hours = dict()

	for line in hand:
	if line.startswith("From "):
	hour = line.split()[5].split(':')[0]
	hours[hour] = hours.get(hour, 0) + 1

MichelleDalalJian / py4e_ex_09_04

Created October 7, 2017 14:43

9.4 Write a program to read through the mbox-short.txt and figure out who has the sent the greatest number of mail messages. The program looks for 'From ' lines and takes the second word of those lines as the person who sent the mail. The program creates a Python dictionary that maps the sender's mail address to a count of the number of times th…

	fname = input("Enter file:")
	if len(fname) < 1 : name = "mbox-short.txt"
	hand = open(fname)

	lst = list()

	for line in hand:
	if not line.startswith("From:"): continue
	line = line.split()
	lst.append(line[1])

MichelleDalalJian / py4e_ex_08_05

Created October 7, 2017 12:54

8.5 Open the file mbox-short.txt and read it line by line. When you find a line that starts with 'From ' like the following line: From [email protected] Sat Jan 5 09:14:16 2008 You will parse the From line using split() and print out the second word in the line (i.e. the entire address of the person who sent the message). Then print out…

	fhand = open("mbox-short.txt")
	count = 0
	for line in fhand:
	line = line.rstrip()
	if line == "": continue

	words = line.split()
	if words[0] !="From": continue

	print(words[1])

MichelleDalalJian / py4e_ex_08_04

Created October 7, 2017 12:53

8.4 Open the file romeo.txt and read it line by line. For each line, split the line into a list of words using the split() method. The program should build a list of words. For each word on each line check to see if the word is already in the list and if not append it to the list. When the program completes, sort and print the resulting words in…

	fhand = open("romeo.txt")

	lst = list()

	for line in fhand:
	line = line.rstrip()
	line = line.split()
	for i in line:
	if i not in lst:
	lst.append(i)

MichelleDalalJian / py4e_ex_07_02

Created October 7, 2017 12:52

7.2 Write a program that prompts for a file name, then opens that file and reads through the file, looking for lines of the form: X-DSPAM-Confidence: 0.8475 Count these lines and extract the floating point values from each of the lines and compute the average of those values and produce an output as shown below. Do not use the sum() function or …

	# Use the file name mbox-short.txt as the file name
	fname = input("Enter file name: ")
	fhand = open(fname)

	count = 0
	for line in fhand:
	if line.startswith("X-DSPAM-Confidence:") :
	count = count + 1

	total = 0

NewerOlder