Simple Links Extractor Using Python by Ksetyadi's [B]Log

Simple Links Extractor Using Python

Those who already registered in Search Engine class by Sebastian Thrun and David Evans (both of them are Professors) should familiar with the title. Last week, we had finished the first homework of grasping the Python concept while learning about how search engine’s worked in a simple way. We learned about String and integer manipulations. We learned about finding certain keywords within the string. We learned how to extract a specific text based on the pattern we wanted.

This is a fundamental concept on how web crawler and search engine indexing machine’s worked. I’m experimenting a little bit more on this based on the first lecture to create links extractor. Since the first lecture didn’t explain about how to crawl the links more than one-level depth, I’ve just finished the extractor with no depth at all. I’ll update with more features when the lectures went deep. I will also put the source code on Github just in case anybody in the class interested modify the codes to fit the current lectures or any other purposes.

So, without further ado, here is the code.

I’m really excited with where this class is moving its direction into. Challenges ahead.

Comments

long time no see brother :)
it seems that i must learn more about python from you.

Reply ↓

ksetyadi said on March 1, 2012 at 1:39 am:

Let’s learn together! :)
Reply ↓

I dont see why we need status variable here, just while True is enough.
text.find() can be replace with text.startswith() to be more readable.

Reply ↓

Adi Wirawan said on February 29, 2012 at 1:55 pm:

long time no see brother :)
it seems that i must learn more about python from you.
Reply ↓
- ksetyadi said on March 1, 2012 at 1:39 am:
  
  Let’s learn together! :)
  Reply ↓
hungnv said on March 2, 2012 at 10:52 am:

I dont see why we need status variable here, just while True is enough.
text.find() can be replace with text.startswith() to be more readable.
Reply ↓

Ksetyadi's [B]Log

Kristiono Setyadi

Defining life as if it were (programming) languages

Simple Links Extractor Using Python

Also read...

A Guide to Running Unit Test using PHPUnit on Yii Framework

The Universality of APIs

Nodejs and MongoDB, A Beginner’s Approach

Automate, Automate, Automate

Playing with Audio in HTML5

Comments

Leave a Reply Cancel reply