|
97ecb7f118
|
Proxy support
|
2021-09-24 16:15:20 +03:00 |
|
|
242b025c9a
|
Set cookie test
|
2021-08-08 22:08:47 +03:00 |
|
|
53a91d63d6
|
Merge pull request #29 from albertbronsky/fix-remote-allocator
fixed empty context in call to NewRemoteAllocator
|
2021-08-08 21:56:27 +03:00 |
|
|
fc67cec165
|
Update chromedp
|
2021-08-08 21:29:08 +03:00 |
|
|
f73f83e493
|
fixed empty context in call to NewRemoteAllocator
|
2021-08-08 14:07:08 +03:00 |
|
|
d3bdaf6240
|
Added documentation and tests for request.Meta
|
2021-05-30 10:43:54 +03:00 |
|
|
a2a91b7b2e
|
Default allocator options are used on rendered scraping. It can be changed using custom Client or changing client options after scraper creation.
|
2021-05-23 23:48:55 +03:00 |
|
|
5aa2c2540e
|
Default client function moved to client_test.go as its only used there.
|
2021-05-23 23:47:43 +03:00 |
|
|
f35d34bc02
|
chromedp library updated.
|
2021-05-23 23:14:47 +03:00 |
|
|
16265e524d
|
Response.JoinURL simplified.
|
2021-05-18 13:31:23 +03:00 |
|
|
3c9a3849e2
|
Start command now waits for synchronized requests too. This fixes if requests are made using different goroutines with synchronized requests.
It doesn't cause any issues on concurrent requests because we already wait for them.
|
2021-04-19 12:58:47 +03:00 |
|
|
d28beca57a
|
Fix race condition on hosts semaphore
|
2021-04-17 14:46:45 +03:00 |
|
|
c527d0b885
|
SIGINT (interrupt) signal receiving refactored and fixed working on some conditions
|
2021-04-17 14:11:17 +03:00 |
|
|
6a23efd175
|
JoinURL now returns *url.URL and error
|
2021-04-17 11:12:22 +03:00 |
|
|
9ea67b3554
|
Use fmt.Errorf instead of errors package. This is good convention after go 1.13
|
2021-04-17 11:11:29 +03:00 |
|
|
fbee722a38
|
Rate limiting per second implemented
|
2021-04-16 15:31:31 +03:00 |
|
|
d8252092f7
|
Add duplicate_requests_test.go
|
2021-04-16 14:43:42 +03:00 |
|
|
be4d13c0ef
|
Retry checking refactored using util function.
|
2021-04-14 09:32:42 +03:00 |
|
|
46c4db6b1a
|
Exporters now need to return error. This is done because of simple error logging.
|
2021-04-14 09:30:17 +03:00 |
|
|
e3d79e2574
|
Added custom logger. Right now, not configurable.
|
2021-04-13 23:36:42 +03:00 |
|
|
129402d754
|
Updated chromedp
|
2021-01-28 20:50:25 +03:00 |
|
|
9b266b6cce
|
Allocator options added
|
2021-01-28 20:49:01 +03:00 |
|
|
29c29235ae
|
Fixed response error if retrying disabled
|
2020-09-05 17:24:22 +03:00 |
|
|
7a76a9b95e
|
Allocators seperated for transparency. Updated chrome library.
|
2020-09-05 16:14:41 +03:00 |
|
|
cfb16fe1ee
|
Call ErrorFunc on errors. Unexport DoRequestClient and DoRequestChrome
|
2019-12-13 00:03:44 +03:00 |
|
|
7d2fe57bab
|
Added error logging for HTML parser.
|
2019-12-11 13:55:38 +03:00 |
|
|
cbca22fefb
|
Updated chrome protocol library
|
2019-11-16 20:34:57 +03:00 |
|
|
6645820408
|
Added logging on allowed domains middleware and duplicate requests
|
2019-11-16 20:34:09 +03:00 |
|
|
9b8a3837bd
|
Added response joinURL test and updated chromedp.
|
2019-09-13 14:34:29 +03:00 |
|
|
3264057679
|
Fixed issue on JoinURL
|
2019-08-06 17:21:41 +03:00 |
|
|
86d4e80596
|
Added user-agent test, Fixed failing test
|
2019-08-05 16:18:44 +03:00 |
|
|
85597219e6
|
Refactored client options
Fixed default User-Agent string not being set.
|
2019-08-05 15:42:30 +03:00 |
|
|
0e5230eac8
|
Remote endpoint support added for js rendered requests. Geziyor is beta now.
|
2019-08-05 15:14:47 +03:00 |
|
|
c117d71fef
|
Updated license
|
2019-08-05 15:01:48 +03:00 |
|
|
32077d8433
|
Updated docs for rendered requests
|
2019-07-26 16:40:42 +03:00 |
|
|
e07ef4d66d
|
Fixed important bug on rendering that was causing client request made too. Updated chromedp dependency
|
2019-07-26 16:07:09 +03:00 |
|
|
762854e511
|
Go 1.10 and 1.11 support added by using different methods on reflect package.
|
2019-07-21 12:08:41 +03:00 |
|
|
df37629d4d
|
Disabled indenting on JSON exporter as it looks so ugly on exported data.
JSONLine still supports indenting.
|
2019-07-14 03:37:52 +03:00 |
|
|
dfabcb84fd
|
JSON renamed to JSONLine. JSON List support added.
|
2019-07-14 03:30:59 +03:00 |
|
|
d19465c44a
|
Robotstxt metrics added.
|
2019-07-08 14:51:54 +03:00 |
|
|
d3c4389c46
|
Retrying support added for chrome. Fixed robots.txt retry issue. Fixed Meta issue
|
2019-07-07 19:50:15 +03:00 |
|
|
90d2be2210
|
Caching policies added.
We used httpcache library to implement this. As it was not possible to support different policies, I mostly copied and modified it.
|
2019-07-07 12:18:40 +03:00 |
|
|
0d6c2a6864
|
Graceful shut down system implemented
|
2019-07-06 18:32:13 +03:00 |
|
|
42faa92ece
|
Robots.txt support implemented
|
2019-07-06 16:18:03 +03:00 |
|
|
2cab68d2ce
|
Middlewares refactored to multiple files in middleware package.
Extractors removed as they introduce complexity to scraper. Both in learning and developing.
|
2019-07-04 21:04:29 +03:00 |
|
|
9adff75509
|
Retry requests support implemented for client.
|
2019-07-04 13:36:10 +03:00 |
|
|
da03567fae
|
Extractors refactored to support pass by value. Documentation added for request and response.
|
2019-07-04 02:13:29 +03:00 |
|
|
71683ec6de
|
Chardet removed as its not good enough to detect. Built-int library is good enough.
|
2019-07-03 20:54:17 +03:00 |
|
|
33238bc875
|
Charset detection heuristics added with chardet lib.
|
2019-07-03 18:08:28 +03:00 |
|
|
b355a566cf
|
Added more tests and refactored exporter tests. Added code coverage badge.
|
2019-07-02 14:53:06 +03:00 |
|