113 Commits

Author SHA1 Message Date
Musab Gültekin
9b266b6cce Allocator options added 2021-01-28 20:49:01 +03:00
Musab Gültekin
29c29235ae Fixed response error if retrying disabled 2020-09-05 17:24:22 +03:00
Musab Gültekin
7a76a9b95e Allocators seperated for transparency. Updated chrome library. 2020-09-05 16:14:41 +03:00
Musab Gültekin
cfb16fe1ee Call ErrorFunc on errors. Unexport DoRequestClient and DoRequestChrome 2019-12-13 00:03:44 +03:00
Musab Gültekin
7d2fe57bab Added error logging for HTML parser. 2019-12-11 13:55:38 +03:00
Musab Gültekin
cbca22fefb Updated chrome protocol library 2019-11-16 20:34:57 +03:00
Musab Gültekin
6645820408 Added logging on allowed domains middleware and duplicate requests 2019-11-16 20:34:09 +03:00
Musab Gültekin
9b8a3837bd Added response joinURL test and updated chromedp. 2019-09-13 14:34:29 +03:00
Musab Gültekin
3264057679 Fixed issue on JoinURL 2019-08-06 17:21:41 +03:00
Musab Gültekin
86d4e80596 Added user-agent test, Fixed failing test 2019-08-05 16:18:44 +03:00
Musab Gültekin
85597219e6 Refactored client options
Fixed default User-Agent string not being set.
2019-08-05 15:42:30 +03:00
Musab Gültekin
0e5230eac8 Remote endpoint support added for js rendered requests. Geziyor is beta now. 2019-08-05 15:14:47 +03:00
Musab Gültekin
c117d71fef Updated license 2019-08-05 15:01:48 +03:00
Musab Gültekin
32077d8433 Updated docs for rendered requests 2019-07-26 16:40:42 +03:00
Musab Gültekin
e07ef4d66d Fixed important bug on rendering that was causing client request made too. Updated chromedp dependency 2019-07-26 16:07:09 +03:00
Musab Gültekin
762854e511 Go 1.10 and 1.11 support added by using different methods on reflect package. 2019-07-21 12:08:41 +03:00
Musab Gültekin
df37629d4d Disabled indenting on JSON exporter as it looks so ugly on exported data.
JSONLine still supports indenting.
2019-07-14 03:37:52 +03:00
Musab Gültekin
dfabcb84fd JSON renamed to JSONLine. JSON List support added. 2019-07-14 03:30:59 +03:00
Musab Gültekin
d19465c44a Robotstxt metrics added. 2019-07-08 14:51:54 +03:00
Musab Gültekin
d3c4389c46 Retrying support added for chrome. Fixed robots.txt retry issue. Fixed Meta issue 2019-07-07 19:50:15 +03:00
Musab Gültekin
90d2be2210 Caching policies added.
We used httpcache library to implement this. As it was not possible to support different policies, I mostly copied and modified it.
2019-07-07 12:18:40 +03:00
Musab Gültekin
0d6c2a6864 Graceful shut down system implemented 2019-07-06 18:32:13 +03:00
Musab Gültekin
42faa92ece Robots.txt support implemented 2019-07-06 16:18:03 +03:00
Musab Gültekin
2cab68d2ce Middlewares refactored to multiple files in middleware package.
Extractors removed as they introduce complexity to scraper. Both in learning and developing.
2019-07-04 21:04:29 +03:00
Musab Gültekin
9adff75509 Retry requests support implemented for client. 2019-07-04 13:36:10 +03:00
Musab Gültekin
da03567fae Extractors refactored to support pass by value. Documentation added for request and response. 2019-07-04 02:13:29 +03:00
Musab Gültekin
71683ec6de Chardet removed as its not good enough to detect. Built-int library is good enough. 2019-07-03 20:54:17 +03:00
Musab Gültekin
33238bc875 Charset detection heuristics added with chardet lib. 2019-07-03 18:08:28 +03:00
Musab Gültekin
b355a566cf Added more tests and refactored exporter tests. Added code coverage badge. 2019-07-02 14:53:06 +03:00
Musab Gültekin
4ab7cfd904 Exporter and Extractor interfaces moved to its own package for simplicity of main Geziyor package 2019-07-02 13:22:23 +03:00
Musab Gültekin
c0dd0393e6 Maximum redirection option added. Performance improvement on exports. Duplicate requests only checked on GET requests. 2019-07-01 15:44:28 +03:00
Musab Gültekin
80f3500a69 Fixed Chrome response not right on some sites. 2019-07-01 12:32:15 +03:00
Musab Gültekin
fb5b4e3406 README updated according to new package names 2019-06-30 22:21:36 +03:00
Musab Gültekin
0eda056065 Attribute extractor added. HTML extractor added. Outer HTML Extractor added.
exporter package renamed to export, extractor package renamed to extract for simplicity.
2019-06-30 22:20:17 +03:00
Musab Gültekin
7c383b175f Metrics Server support added for expvar. Refactored some methods. 2019-06-30 19:09:03 +03:00
Musab Gültekin
ec4551a8a0 Making Requests and reading responses refactored to client package. 2019-06-30 16:21:18 +03:00
Musab Gültekin
0eac5f5f40 Fixed exporters bug that was causing last exported items not written to disk. 2019-06-29 16:11:52 +03:00
Musab Gültekin
bd6466a5f2 http package renamed to client to reduce cunfusion 2019-06-29 14:18:31 +03:00
Musab Gültekin
1e109c555d Request and response moved to http package 2019-06-29 13:36:39 +03:00
Musab Gültekin
59757607eb Pretty print exporter added. Panic counter added to metrics 2019-06-29 11:20:06 +03:00
Musab Gültekin
276b248ebb Synchronized requests support added. Benchmarks added. 2019-06-28 17:28:16 +03:00
Musab Gültekin
b000581c3d Extractors implemented. Exporters name simplified. README Updated for extracting data. Removed go 1.11 support 2019-06-28 13:00:30 +03:00
Musab Gültekin
679fd8ab7a Map support added for CSV exporter 2019-06-27 22:39:06 +03:00
Musab Gültekin
8fe194bd10 Added options and tests for exporters. 2019-06-27 16:54:09 +03:00
Musab Gültekin
d20ea47390 Fix Header convertion bug. Map was not canonicalizing keys 2019-06-22 15:04:08 +03:00
Musab Gültekin
02df5aa4e8 Fixed issues on non-trailing URLS on rendered requests 2019-06-22 14:47:12 +03:00
Musab Gültekin
92e7cfefec Fixed README Doc. 2019-06-22 13:13:33 +03:00
Musab Gültekin
a64a262554 HTTP Client can be changed now. Docs updated. 2019-06-22 13:12:05 +03:00
Musab Gültekin
7bc782400c Expvar metrics support added. Metrics refactored to its own package. 2019-06-21 21:37:25 +03:00
Musab Gültekin
88c4b1dd35 Prometheus metrics support added. 2019-06-21 20:05:28 +03:00