Commit Graph

64 Commits

Author SHA1 Message Date
85597219e6 Refactored client options
Fixed default User-Agent string not being set.
2019-08-05 15:42:30 +03:00
0e5230eac8 Remote endpoint support added for js rendered requests. Geziyor is beta now. 2019-08-05 15:14:47 +03:00
d19465c44a Robotstxt metrics added. 2019-07-08 14:51:54 +03:00
90d2be2210 Caching policies added.
We used httpcache library to implement this. As it was not possible to support different policies, I mostly copied and modified it.
2019-07-07 12:18:40 +03:00
0d6c2a6864 Graceful shut down system implemented 2019-07-06 18:32:13 +03:00
42faa92ece Robots.txt support implemented 2019-07-06 16:18:03 +03:00
2cab68d2ce Middlewares refactored to multiple files in middleware package.
Extractors removed as they introduce complexity to scraper. Both in learning and developing.
2019-07-04 21:04:29 +03:00
9adff75509 Retry requests support implemented for client. 2019-07-04 13:36:10 +03:00
33238bc875 Charset detection heuristics added with chardet lib. 2019-07-03 18:08:28 +03:00
4ab7cfd904 Exporter and Extractor interfaces moved to its own package for simplicity of main Geziyor package 2019-07-02 13:22:23 +03:00
c0dd0393e6 Maximum redirection option added. Performance improvement on exports. Duplicate requests only checked on GET requests. 2019-07-01 15:44:28 +03:00
80f3500a69 Fixed Chrome response not right on some sites. 2019-07-01 12:32:15 +03:00
0eda056065 Attribute extractor added. HTML extractor added. Outer HTML Extractor added.
exporter package renamed to export, extractor package renamed to extract for simplicity.
2019-06-30 22:20:17 +03:00
7c383b175f Metrics Server support added for expvar. Refactored some methods. 2019-06-30 19:09:03 +03:00
ec4551a8a0 Making Requests and reading responses refactored to client package. 2019-06-30 16:21:18 +03:00
0eac5f5f40 Fixed exporters bug that was causing last exported items not written to disk. 2019-06-29 16:11:52 +03:00
bd6466a5f2 http package renamed to client to reduce cunfusion 2019-06-29 14:18:31 +03:00
1e109c555d Request and response moved to http package 2019-06-29 13:36:39 +03:00
59757607eb Pretty print exporter added. Panic counter added to metrics 2019-06-29 11:20:06 +03:00
276b248ebb Synchronized requests support added. Benchmarks added. 2019-06-28 17:28:16 +03:00
b000581c3d Extractors implemented. Exporters name simplified. README Updated for extracting data. Removed go 1.11 support 2019-06-28 13:00:30 +03:00
02df5aa4e8 Fixed issues on non-trailing URLS on rendered requests 2019-06-22 14:47:12 +03:00
a64a262554 HTTP Client can be changed now. Docs updated. 2019-06-22 13:12:05 +03:00
7bc782400c Expvar metrics support added. Metrics refactored to its own package. 2019-06-21 21:37:25 +03:00
88c4b1dd35 Prometheus metrics support added. 2019-06-21 20:05:28 +03:00
141bab0d05 Error handling improved 2019-06-20 10:14:36 +03:00
f88b88986c Delays and logs refactored as middlewares. 2019-06-20 09:54:30 +03:00
514fe2e8d2 Recover system refactored like middleware 2019-06-19 22:45:40 +03:00
c28b228a12 Response header bug fixed for Chrome 2019-06-18 16:37:06 +03:00
ec83a92eb3 Response header support added for Chrome Rendering 2019-06-18 16:26:40 +03:00
217f3c96df Header and native http.Response support added for Chrome rendering 2019-06-18 16:16:29 +03:00
4177f10de9 Request creation simplified and basic auth test added. 2019-06-17 13:53:34 +03:00
a5ec28664d Cookies support added. 2019-06-17 13:31:19 +03:00
e50fa3b1dc Response middlewares support implemented. 2019-06-16 18:29:07 +03:00
80383ebd6f Middlewares and some string util functions refactored. Added partial Documentation. 2019-06-16 10:38:03 +03:00
ddff3aee25 Request cancellations support added to Middlewares.
Some core functions refactored as middlewares.
Fixed race condition in exporting system. Now, only one goroutine will be responsible for exporting. This fixes concurrency issues on writing.
2019-06-15 22:27:46 +03:00
7b23596a2d Middleware support added. HTML Parsing disable option added.
Goroutine leaks will be tested using leaktest lib.
2019-06-15 17:55:40 +03:00
4799b0f7b4 Fixed goroutine leaks. Updated travis build 2019-06-14 17:30:49 +03:00
f5b3b0d049 Fixed race conditions on exporters.
MaxIdleConns limit disabled to support unlimited requests to all hosts.
MaxIdleConnsPerHost limit increased to speed up requests to same host.
2019-06-14 16:10:36 +03:00
6caf1effd6 Rendered field exported to support rendered requests on Do function. Data races fixed. 2019-06-14 15:23:56 +03:00
1a7d480b36 JS Rendered requests with Chrome support added 2019-06-13 22:08:45 +03:00
76a687e193 Do function refactored 2019-06-13 20:26:07 +03:00
8a6e19a031 New requests on StartRequests func will be made using Geziyor's methods. Not Requests chan
Options field exported.
2019-06-13 14:06:37 +03:00
d56ea161a5 Making new requests on StartRequestsFunc is simplified by using channels 2019-06-12 21:54:57 +03:00
f7f4e401e2 Metadata adding on requests support added. StartRequests function implemented. 2019-06-12 21:30:45 +03:00
bd8d58576f Start requests function implemented. 2019-06-12 12:40:38 +03:00
2f6cb06982 Disabling charset detection implemented. 2019-06-12 11:44:31 +03:00
a311a0f998 CSV exporter support added. Not finished for map type. 2019-06-11 20:42:22 +03:00
bbdc3bcacd Exporters made optional, as some scrapers only want to see data in console. 2019-06-11 18:59:37 +03:00
b8305d5e1a Limiting body reading support implemented. 2019-06-11 16:19:30 +03:00