138 Commits

Author SHA1 Message Date
Musab Gültekin
6415a775f4 Fix exporter bug 2021-10-14 21:54:46 +03:00
Musab Gültekin
b8bda36f92 JoinURL deprecated 2021-10-05 22:13:00 +03:00
Musab Gültekin
019fe62883
Merge pull request #37 from geziyor/proxy-support
Proxy Support
2021-10-05 21:59:10 +03:00
Musab Gültekin
97ecb7f118 Proxy support 2021-09-24 16:15:20 +03:00
DeepSource Bot
110394a753 Add .deepsource.toml 2021-08-30 18:29:32 +00:00
Musab Gültekin
242b025c9a Set cookie test 2021-08-08 22:08:47 +03:00
Musab Gültekin
53a91d63d6
Merge pull request #29 from albertbronsky/fix-remote-allocator
fixed empty context in call to NewRemoteAllocator
2021-08-08 21:56:27 +03:00
Musab Gültekin
fc67cec165 Update chromedp 2021-08-08 21:29:08 +03:00
Albert Bronsky
f73f83e493
fixed empty context in call to NewRemoteAllocator 2021-08-08 14:07:08 +03:00
Musab Gültekin
d3bdaf6240 Added documentation and tests for request.Meta 2021-05-30 10:43:54 +03:00
Musab Gültekin
a2a91b7b2e Default allocator options are used on rendered scraping. It can be changed using custom Client or changing client options after scraper creation. 2021-05-23 23:48:55 +03:00
Musab Gültekin
5aa2c2540e Default client function moved to client_test.go as its only used there. 2021-05-23 23:47:43 +03:00
Musab Gültekin
f35d34bc02 chromedp library updated. 2021-05-23 23:14:47 +03:00
Musab Gültekin
16265e524d Response.JoinURL simplified. 2021-05-18 13:31:23 +03:00
Musab Gültekin
3c9a3849e2 Start command now waits for synchronized requests too. This fixes if requests are made using different goroutines with synchronized requests.
It doesn't cause any issues on concurrent requests because we already wait for them.
2021-04-19 12:58:47 +03:00
Musab Gültekin
d28beca57a Fix race condition on hosts semaphore 2021-04-17 14:46:45 +03:00
Musab Gültekin
c527d0b885 SIGINT (interrupt) signal receiving refactored and fixed working on some conditions 2021-04-17 14:11:17 +03:00
Musab Gültekin
6a23efd175 JoinURL now returns *url.URL and error 2021-04-17 11:12:22 +03:00
Musab Gültekin
9ea67b3554 Use fmt.Errorf instead of errors package. This is good convention after go 1.13 2021-04-17 11:11:29 +03:00
Musab Gültekin
fbee722a38 Rate limiting per second implemented 2021-04-16 15:31:31 +03:00
Musab Gültekin
d8252092f7 Add duplicate_requests_test.go 2021-04-16 14:43:42 +03:00
Musab Gültekin
be4d13c0ef Retry checking refactored using util function. 2021-04-14 09:32:42 +03:00
Musab Gültekin
46c4db6b1a Exporters now need to return error. This is done because of simple error logging. 2021-04-14 09:30:17 +03:00
Musab Gültekin
e3d79e2574 Added custom logger. Right now, not configurable. 2021-04-13 23:36:42 +03:00
Musab Gültekin
129402d754 Updated chromedp 2021-01-28 20:50:25 +03:00
Musab Gültekin
9b266b6cce Allocator options added 2021-01-28 20:49:01 +03:00
Musab Gültekin
29c29235ae Fixed response error if retrying disabled 2020-09-05 17:24:22 +03:00
Musab Gültekin
7a76a9b95e Allocators seperated for transparency. Updated chrome library. 2020-09-05 16:14:41 +03:00
Musab Gültekin
cfb16fe1ee Call ErrorFunc on errors. Unexport DoRequestClient and DoRequestChrome 2019-12-13 00:03:44 +03:00
Musab Gültekin
7d2fe57bab Added error logging for HTML parser. 2019-12-11 13:55:38 +03:00
Musab Gültekin
cbca22fefb Updated chrome protocol library 2019-11-16 20:34:57 +03:00
Musab Gültekin
6645820408 Added logging on allowed domains middleware and duplicate requests 2019-11-16 20:34:09 +03:00
Musab Gültekin
9b8a3837bd Added response joinURL test and updated chromedp. 2019-09-13 14:34:29 +03:00
Musab Gültekin
3264057679 Fixed issue on JoinURL 2019-08-06 17:21:41 +03:00
Musab Gültekin
86d4e80596 Added user-agent test, Fixed failing test 2019-08-05 16:18:44 +03:00
Musab Gültekin
85597219e6 Refactored client options
Fixed default User-Agent string not being set.
2019-08-05 15:42:30 +03:00
Musab Gültekin
0e5230eac8 Remote endpoint support added for js rendered requests. Geziyor is beta now. 2019-08-05 15:14:47 +03:00
Musab Gültekin
c117d71fef Updated license 2019-08-05 15:01:48 +03:00
Musab Gültekin
32077d8433 Updated docs for rendered requests 2019-07-26 16:40:42 +03:00
Musab Gültekin
e07ef4d66d Fixed important bug on rendering that was causing client request made too. Updated chromedp dependency 2019-07-26 16:07:09 +03:00
Musab Gültekin
762854e511 Go 1.10 and 1.11 support added by using different methods on reflect package. 2019-07-21 12:08:41 +03:00
Musab Gültekin
df37629d4d Disabled indenting on JSON exporter as it looks so ugly on exported data.
JSONLine still supports indenting.
2019-07-14 03:37:52 +03:00
Musab Gültekin
dfabcb84fd JSON renamed to JSONLine. JSON List support added. 2019-07-14 03:30:59 +03:00
Musab Gültekin
d19465c44a Robotstxt metrics added. 2019-07-08 14:51:54 +03:00
Musab Gültekin
d3c4389c46 Retrying support added for chrome. Fixed robots.txt retry issue. Fixed Meta issue 2019-07-07 19:50:15 +03:00
Musab Gültekin
90d2be2210 Caching policies added.
We used httpcache library to implement this. As it was not possible to support different policies, I mostly copied and modified it.
2019-07-07 12:18:40 +03:00
Musab Gültekin
0d6c2a6864 Graceful shut down system implemented 2019-07-06 18:32:13 +03:00
Musab Gültekin
42faa92ece Robots.txt support implemented 2019-07-06 16:18:03 +03:00
Musab Gültekin
2cab68d2ce Middlewares refactored to multiple files in middleware package.
Extractors removed as they introduce complexity to scraper. Both in learning and developing.
2019-07-04 21:04:29 +03:00
Musab Gültekin
9adff75509 Retry requests support implemented for client. 2019-07-04 13:36:10 +03:00