I am learning python and need guidance for writing some code. I've written a simple program (with pointers from people) that parses an tv show xml feed and prints their values in plain text after performing some string operations.
Code:
feed = urllib.urlopen(rssPage) #rssPage: address of xml feed
tree = etree.parse(feed)
x = tree.xpath("/rss/channel/item/title/text()")
x = str(x[0])
for tag in tags: #tags is a list of items like hdtv, xvid, 720p etc
x = re.sub(r'\b' + tag + r'\b', '', x)
z = re.sub(r'[^\w\s]', '', x)
y = tree1.xpath("/rss/channel/item/pubDate/text()")
print "%s - %s" %(z.rstrip(), y[0][:16])
The code works fine (prints the name of the show and date). Now since I am parsing more than one feed, I thought the better way was to split the functionality into diff functions: one to get the values and the other to remove the tags. I'm still *very* new to python and came up with the following code.
Code:
def get_value(feed):
try:
url = urllib2.urlopen(feed)
tree = etree.parse(url)
x = tree.xpath("/rss/channel/item/title/text()")
y = tree.xpath("/rss/channel/item/pubDate/text()")
x = str(x[0])
y = str(y[0][:16])
return x
return y
except SyntaxError:
print 'Service Unavailable'
pass
def del_tag(x):
tags = ['HDTV', 'LOL', 'VTV', 'x264', 'DIMENSION', 'XviD', '720P', 'IMMERSE', '720p', 'X264']
for tag in tags:
x = re.sub(r'\b' + tag + r'\b', '', x)
y = re.sub(r'[^\w\s]', '', x)
def main():
a = get_value(rssPage)
b = del_tag(a)
print b
if __name__ == '__main__':
main()
My desired working is to supply the xml feed address to the
get_value function which returns the title and date as strings assigned to
x and
y. Then I run the
del_tag function on the title string (
x) and remove tags and the
main() function prints both
x and
y (Title, Date).
Running this code returns
None.
Let the teaching begin