Simple Links Extractor Using Python
Those who already registered in Search Engine class by Sebastian Thrun and David Evans (both of them are Professors) should familiar with the title. Last week, we had finished the first homework of grasping the Python concept while learning about how search engine’s worked in a simple way. We learned about String and integer manipulations. We learned about finding certain keywords within the string. We learned how to extract a specific text based on the pattern we wanted.
This is a fundamental concept on how web crawler and search engine indexing machine’s worked. I’m experimenting a little bit more on this based on the first lecture to create links extractor. Since the first lecture didn’t explain about how to crawl the links more than one-level depth, I’ve just finished the extractor with no depth at all. I’ll update with more features when the lectures went deep. I will also put the source code on Github just in case anybody in the class interested modify the codes to fit the current lectures or any other purposes.
So, without further ado, here is the code.
I’m really excited with where this class is moving its direction into. Challenges ahead.
long time no see brother :)
it seems that i must learn more about python from you.
Let’s learn together! :)
I dont see why we need status variable here, just while True is enough.
text.find() can be replace with text.startswith() to be more readable.