Simple Links Extractor Using Python

Those who already registered in Search Engine class by Sebastian Thrun and David Evans (both of them are Professors) should familiar with the title. Last week, we had finished the first homework of grasping the Python concept while learning about how search engine’s worked in a simple way. We learned about String and integer manipulations. We learned about finding certain keywords within the string. We learned how to extract a specific text based on the pattern we wanted.

This is a fundamental concept on how web crawler and search engine indexing machine’s worked. I’m experimenting a little bit more on this based on the first lecture to create links extractor. Since the first lecture didn’t explain about how to crawl the links more than one-level depth, I’ve just finished the extractor with no depth at all. I’ll update with more features when the lectures went deep. I will also put the source code on Github just in case anybody in the class interested modify the codes to fit the current lectures or any other purposes.

So, without further ado, here is the code.

I’m really excited with where this class is moving its direction into. Challenges ahead.

Also read...

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.