2019-06-09 19:14:46 +03:00
2019-06-08 15:29:09 +03:00
2019-06-09 14:24:53 +03:00
2019-06-09 19:14:46 +03:00
2019-06-08 19:59:49 +03:00
2019-06-09 19:14:46 +03:00
2019-06-09 19:14:46 +03:00

Geziyor

Geziyor is a blazing fast web crawling and web scraping framework, used to crawl websites and extract structured data from their pages. It can be used for a wide range of purposes, from data mining to monitoring and automated testing.

GoDoc report card

Features

  • 1.000+ Requests/Sec
  • Caching (Memory/Disk)
  • Automatic Data Exporting
  • Limit Concurrency (Global/Per Domain)
  • Request Delays (Constant/Randomized)
  • Automatic response decoding to UTF-8

See scraper Options for customization.

Usage

Simplest usage

geziyor.NewGeziyor(geziyor.Options{
    StartURLs: []string{"http://api.ipify.org"},
    ParseFunc: func(r *geziyor.Response) {
    	fmt.Println(r.Doc.Text())
    },
}).Start()

Export all quotes and authors to out.json file.

geziyor := NewGeziyor(Opt{
    StartURLs: []string{"http://quotes.toscrape.com/"},
    ParseFunc: func(r *Response) {
        r.Doc.Find("div.quote").Each(func(i int, s *goquery.Selection) {
            // Export Data
            r.Exports <- map[string]interface{}{
                "text":   s.Find("span.text").Text(),
                "author": s.Find("small.author").Text(),
            }
        })

        // Next Page
        if href, ok := r.Doc.Find("li.next > a").Attr("href"); ok {
            go r.Geziyor.Get(r.JoinURL(href))
        }
    },
})
geziyor.Start()

Installation

go get github.com/geziyor/geziyor

Status

We highly recommend you to use go modules. As this project is in development stage right now and API is not stable.

Description
No description provided
Readme 355 KiB
Languages
Go 100%