Albert Bronsky
|
f73f83e493
|
fixed empty context in call to NewRemoteAllocator
|
2021-08-08 14:07:08 +03:00 |
|
Musab Gültekin
|
d3bdaf6240
|
Added documentation and tests for request.Meta
|
2021-05-30 10:43:54 +03:00 |
|
Musab Gültekin
|
a2a91b7b2e
|
Default allocator options are used on rendered scraping. It can be changed using custom Client or changing client options after scraper creation.
|
2021-05-23 23:48:55 +03:00 |
|
Musab Gültekin
|
5aa2c2540e
|
Default client function moved to client_test.go as its only used there.
|
2021-05-23 23:47:43 +03:00 |
|
Musab Gültekin
|
f35d34bc02
|
chromedp library updated.
|
2021-05-23 23:14:47 +03:00 |
|
Musab Gültekin
|
16265e524d
|
Response.JoinURL simplified.
|
2021-05-18 13:31:23 +03:00 |
|
Musab Gültekin
|
3c9a3849e2
|
Start command now waits for synchronized requests too. This fixes if requests are made using different goroutines with synchronized requests.
It doesn't cause any issues on concurrent requests because we already wait for them.
|
2021-04-19 12:58:47 +03:00 |
|
Musab Gültekin
|
d28beca57a
|
Fix race condition on hosts semaphore
|
2021-04-17 14:46:45 +03:00 |
|
Musab Gültekin
|
c527d0b885
|
SIGINT (interrupt) signal receiving refactored and fixed working on some conditions
|
2021-04-17 14:11:17 +03:00 |
|
Musab Gültekin
|
6a23efd175
|
JoinURL now returns *url.URL and error
|
2021-04-17 11:12:22 +03:00 |
|
Musab Gültekin
|
9ea67b3554
|
Use fmt.Errorf instead of errors package. This is good convention after go 1.13
|
2021-04-17 11:11:29 +03:00 |
|
Musab Gültekin
|
fbee722a38
|
Rate limiting per second implemented
|
2021-04-16 15:31:31 +03:00 |
|
Musab Gültekin
|
d8252092f7
|
Add duplicate_requests_test.go
|
2021-04-16 14:43:42 +03:00 |
|
Musab Gültekin
|
be4d13c0ef
|
Retry checking refactored using util function.
|
2021-04-14 09:32:42 +03:00 |
|
Musab Gültekin
|
46c4db6b1a
|
Exporters now need to return error. This is done because of simple error logging.
|
2021-04-14 09:30:17 +03:00 |
|
Musab Gültekin
|
e3d79e2574
|
Added custom logger. Right now, not configurable.
|
2021-04-13 23:36:42 +03:00 |
|
Musab Gültekin
|
129402d754
|
Updated chromedp
|
2021-01-28 20:50:25 +03:00 |
|
Musab Gültekin
|
9b266b6cce
|
Allocator options added
|
2021-01-28 20:49:01 +03:00 |
|
Musab Gültekin
|
29c29235ae
|
Fixed response error if retrying disabled
|
2020-09-05 17:24:22 +03:00 |
|
Musab Gültekin
|
7a76a9b95e
|
Allocators seperated for transparency. Updated chrome library.
|
2020-09-05 16:14:41 +03:00 |
|
Musab Gültekin
|
cfb16fe1ee
|
Call ErrorFunc on errors. Unexport DoRequestClient and DoRequestChrome
|
2019-12-13 00:03:44 +03:00 |
|
Musab Gültekin
|
7d2fe57bab
|
Added error logging for HTML parser.
|
2019-12-11 13:55:38 +03:00 |
|
Musab Gültekin
|
cbca22fefb
|
Updated chrome protocol library
|
2019-11-16 20:34:57 +03:00 |
|
Musab Gültekin
|
6645820408
|
Added logging on allowed domains middleware and duplicate requests
|
2019-11-16 20:34:09 +03:00 |
|
Musab Gültekin
|
9b8a3837bd
|
Added response joinURL test and updated chromedp.
|
2019-09-13 14:34:29 +03:00 |
|
Musab Gültekin
|
3264057679
|
Fixed issue on JoinURL
|
2019-08-06 17:21:41 +03:00 |
|
Musab Gültekin
|
86d4e80596
|
Added user-agent test, Fixed failing test
|
2019-08-05 16:18:44 +03:00 |
|
Musab Gültekin
|
85597219e6
|
Refactored client options
Fixed default User-Agent string not being set.
|
2019-08-05 15:42:30 +03:00 |
|
Musab Gültekin
|
0e5230eac8
|
Remote endpoint support added for js rendered requests. Geziyor is beta now.
|
2019-08-05 15:14:47 +03:00 |
|
Musab Gültekin
|
c117d71fef
|
Updated license
|
2019-08-05 15:01:48 +03:00 |
|
Musab Gültekin
|
32077d8433
|
Updated docs for rendered requests
|
2019-07-26 16:40:42 +03:00 |
|
Musab Gültekin
|
e07ef4d66d
|
Fixed important bug on rendering that was causing client request made too. Updated chromedp dependency
|
2019-07-26 16:07:09 +03:00 |
|
Musab Gültekin
|
762854e511
|
Go 1.10 and 1.11 support added by using different methods on reflect package.
|
2019-07-21 12:08:41 +03:00 |
|
Musab Gültekin
|
df37629d4d
|
Disabled indenting on JSON exporter as it looks so ugly on exported data.
JSONLine still supports indenting.
|
2019-07-14 03:37:52 +03:00 |
|
Musab Gültekin
|
dfabcb84fd
|
JSON renamed to JSONLine. JSON List support added.
|
2019-07-14 03:30:59 +03:00 |
|
Musab Gültekin
|
d19465c44a
|
Robotstxt metrics added.
|
2019-07-08 14:51:54 +03:00 |
|
Musab Gültekin
|
d3c4389c46
|
Retrying support added for chrome. Fixed robots.txt retry issue. Fixed Meta issue
|
2019-07-07 19:50:15 +03:00 |
|
Musab Gültekin
|
90d2be2210
|
Caching policies added.
We used httpcache library to implement this. As it was not possible to support different policies, I mostly copied and modified it.
|
2019-07-07 12:18:40 +03:00 |
|
Musab Gültekin
|
0d6c2a6864
|
Graceful shut down system implemented
|
2019-07-06 18:32:13 +03:00 |
|
Musab Gültekin
|
42faa92ece
|
Robots.txt support implemented
|
2019-07-06 16:18:03 +03:00 |
|
Musab Gültekin
|
2cab68d2ce
|
Middlewares refactored to multiple files in middleware package.
Extractors removed as they introduce complexity to scraper. Both in learning and developing.
|
2019-07-04 21:04:29 +03:00 |
|
Musab Gültekin
|
9adff75509
|
Retry requests support implemented for client.
|
2019-07-04 13:36:10 +03:00 |
|
Musab Gültekin
|
da03567fae
|
Extractors refactored to support pass by value. Documentation added for request and response.
|
2019-07-04 02:13:29 +03:00 |
|
Musab Gültekin
|
71683ec6de
|
Chardet removed as its not good enough to detect. Built-int library is good enough.
|
2019-07-03 20:54:17 +03:00 |
|
Musab Gültekin
|
33238bc875
|
Charset detection heuristics added with chardet lib.
|
2019-07-03 18:08:28 +03:00 |
|
Musab Gültekin
|
b355a566cf
|
Added more tests and refactored exporter tests. Added code coverage badge.
|
2019-07-02 14:53:06 +03:00 |
|
Musab Gültekin
|
4ab7cfd904
|
Exporter and Extractor interfaces moved to its own package for simplicity of main Geziyor package
|
2019-07-02 13:22:23 +03:00 |
|
Musab Gültekin
|
c0dd0393e6
|
Maximum redirection option added. Performance improvement on exports. Duplicate requests only checked on GET requests.
|
2019-07-01 15:44:28 +03:00 |
|
Musab Gültekin
|
80f3500a69
|
Fixed Chrome response not right on some sites.
|
2019-07-01 12:32:15 +03:00 |
|
Musab Gültekin
|
fb5b4e3406
|
README updated according to new package names
|
2019-06-30 22:21:36 +03:00 |
|