2019-06-09 13:43:17 +03:00
2019-06-08 15:29:09 +03:00
2019-06-09 13:43:17 +03:00
2019-06-08 19:59:49 +03:00
2019-06-09 13:43:17 +03:00
2019-06-09 13:43:17 +03:00

Geziyor

Geziyor is a blazing fast web crawling and web scraping framework, used to crawl websites and extract structured data from their pages. It can be used for a wide range of purposes, from data mining to monitoring and automated testing.

GoDoc report card

Features

  • 1.000+ Requests/Sec
  • Caching (Memory/Disk)
  • Automatic Data Exporting
  • Limit Concurrency Global/Per Domain
  • Automatic response decoding to UTF-8

Usage

Simplest usage

geziyor.NewGeziyor(geziyor.Options{
    StartURLs: []string{"http://api.ipify.org"},
    ParseFunc: func(r *geziyor.Response) {
    	fmt.Println(r.Doc.Text())
    },
}).Start()

Export all quotes and authors to out.json file.

geziyor := NewGeziyor(Opt{
    StartURLs: []string{"http://quotes.toscrape.com/"},
    ParseFunc: func(r *Response) {
        r.Doc.Find("div.quote").Each(func(i int, s *goquery.Selection) {
            // Export Data
            r.Exports <- map[string]interface{}{
                "text":   s.Find("span.text").Text(),
                "author": s.Find("small.author").Text(),
            }
        })

        // Next Page
        if href, ok := r.Doc.Find("li.next > a").Attr("href"); ok {
            go r.Geziyor.Get(r.JoinURL(href))
        }
    },
})
geziyor.Start()

Installation

go get github.com/geziyor/geziyor

We highly recommend you to use go modules. As this project is in development stage right now.

Description
No description provided
Readme 355 KiB
Languages
Go 100%