Geziyor

Geziyor is a blazing fast web crawling and web scraping framework, used to crawl websites and extract structured data from their pages. It can be used for a wide range of purposes, from data mining to monitoring and automated testing.

GoDoc report card

Features

  • 1.000+ Requests/Sec
  • Caching (Memory/Disk)
  • Automatic Data Exporting (JSON, CSV, or custom)
  • Limit Concurrency (Global/Per Domain)
  • Request Delays (Constant/Randomized)
  • Automatic response decoding to UTF-8

See scraper Options for all custom settings.

Status

We highly recommend you to use go modules. As this project is in development stage right now and API is not stable.

Usage

Simple usage

geziyor.NewGeziyor(geziyor.Options{
    StartURLs: []string{"http://api.ipify.org"},
    ParseFunc: func(r *geziyor.Response) {
        fmt.Println(string(r.Body))
    },
}).Start()

Advanced usage

func main() {
	geziyor.NewGeziyor(geziyor.Options{
		StartURLs: []string{"http://quotes.toscrape.com/"},
		ParseFunc: quotesParse,
		Exporters: []geziyor.Exporter{exporter.JSONExporter{}},
	}).Start()
}

func quotesParse(r *geziyor.Response) {
	r.DocHTML.Find("div.quote").Each(func(i int, s *goquery.Selection) {
		r.Exports <- map[string]interface{}{
			"text":   s.Find("span.text").Text(),
			"author": s.Find("small.author").Text(),
		}
	})
	if href, ok := r.DocHTML.Find("li.next > a").Attr("href"); ok {
		go r.Geziyor.Get(r.JoinURL(href), quotesParse)
	}
}

See tests for more usage examples

Installation

go get github.com/geziyor/geziyor
Description
No description provided
Readme 355 KiB
Languages
Go 100%