Musab Gültekin
242b025c9a
Set cookie test
2021-08-08 22:08:47 +03:00
Musab Gültekin
d3bdaf6240
Added documentation and tests for request.Meta
2021-05-30 10:43:54 +03:00
Musab Gültekin
6a23efd175
JoinURL now returns *url.URL and error
2021-04-17 11:12:22 +03:00
Musab Gültekin
86d4e80596
Added user-agent test, Fixed failing test
2019-08-05 16:18:44 +03:00
Musab Gültekin
85597219e6
Refactored client options
...
Fixed default User-Agent string not being set.
2019-08-05 15:42:30 +03:00
Musab Gültekin
0e5230eac8
Remote endpoint support added for js rendered requests. Geziyor is beta now.
2019-08-05 15:14:47 +03:00
Musab Gültekin
90d2be2210
Caching policies added.
...
We used httpcache library to implement this. As it was not possible to support different policies, I mostly copied and modified it.
2019-07-07 12:18:40 +03:00
Musab Gültekin
0d6c2a6864
Graceful shut down system implemented
2019-07-06 18:32:13 +03:00
Musab Gültekin
42faa92ece
Robots.txt support implemented
2019-07-06 16:18:03 +03:00
Musab Gültekin
2cab68d2ce
Middlewares refactored to multiple files in middleware package.
...
Extractors removed as they introduce complexity to scraper. Both in learning and developing.
2019-07-04 21:04:29 +03:00
Musab Gültekin
9adff75509
Retry requests support implemented for client.
2019-07-04 13:36:10 +03:00
Musab Gültekin
da03567fae
Extractors refactored to support pass by value. Documentation added for request and response.
2019-07-04 02:13:29 +03:00
Musab Gültekin
33238bc875
Charset detection heuristics added with chardet lib.
2019-07-03 18:08:28 +03:00
Musab Gültekin
b355a566cf
Added more tests and refactored exporter tests. Added code coverage badge.
2019-07-02 14:53:06 +03:00
Musab Gültekin
4ab7cfd904
Exporter and Extractor interfaces moved to its own package for simplicity of main Geziyor package
2019-07-02 13:22:23 +03:00
Musab Gültekin
c0dd0393e6
Maximum redirection option added. Performance improvement on exports. Duplicate requests only checked on GET requests.
2019-07-01 15:44:28 +03:00
Musab Gültekin
80f3500a69
Fixed Chrome response not right on some sites.
2019-07-01 12:32:15 +03:00
Musab Gültekin
0eda056065
Attribute extractor added. HTML extractor added. Outer HTML Extractor added.
...
exporter package renamed to export, extractor package renamed to extract for simplicity.
2019-06-30 22:20:17 +03:00
Musab Gültekin
ec4551a8a0
Making Requests and reading responses refactored to client package.
2019-06-30 16:21:18 +03:00
Musab Gültekin
bd6466a5f2
http package renamed to client to reduce cunfusion
2019-06-29 14:18:31 +03:00
Musab Gültekin
1e109c555d
Request and response moved to http package
2019-06-29 13:36:39 +03:00
Musab Gültekin
276b248ebb
Synchronized requests support added. Benchmarks added.
2019-06-28 17:28:16 +03:00
Musab Gültekin
b000581c3d
Extractors implemented. Exporters name simplified. README Updated for extracting data. Removed go 1.11 support
2019-06-28 13:00:30 +03:00
Musab Gültekin
a64a262554
HTTP Client can be changed now. Docs updated.
2019-06-22 13:12:05 +03:00
Musab Gültekin
7bc782400c
Expvar metrics support added. Metrics refactored to its own package.
2019-06-21 21:37:25 +03:00
Musab Gültekin
ec83a92eb3
Response header support added for Chrome Rendering
2019-06-18 16:26:40 +03:00
Musab Gültekin
4177f10de9
Request creation simplified and basic auth test added.
2019-06-17 13:53:34 +03:00
Musab Gültekin
a5ec28664d
Cookies support added.
2019-06-17 13:31:19 +03:00
Musab Gültekin
80383ebd6f
Middlewares and some string util functions refactored. Added partial Documentation.
2019-06-16 10:38:03 +03:00
Musab Gültekin
ddff3aee25
Request cancellations support added to Middlewares.
...
Some core functions refactored as middlewares.
Fixed race condition in exporting system. Now, only one goroutine will be responsible for exporting. This fixes concurrency issues on writing.
2019-06-15 22:27:46 +03:00
Musab Gültekin
7b23596a2d
Middleware support added. HTML Parsing disable option added.
...
Goroutine leaks will be tested using leaktest lib.
2019-06-15 17:55:40 +03:00
Musab Gültekin
6caf1effd6
Rendered field exported to support rendered requests on Do function. Data races fixed.
2019-06-14 15:23:56 +03:00
Musab Gültekin
1a7d480b36
JS Rendered requests with Chrome support added
2019-06-13 22:08:45 +03:00
Musab Gültekin
8a6e19a031
New requests on StartRequests func will be made using Geziyor's methods. Not Requests chan
...
Options field exported.
2019-06-13 14:06:37 +03:00
Musab Gültekin
184081d3bf
README updated for more advanced usage. Updated tests.
2019-06-12 22:22:01 +03:00
Musab Gültekin
d56ea161a5
Making new requests on StartRequestsFunc is simplified by using channels
2019-06-12 21:54:57 +03:00
Musab Gültekin
f7f4e401e2
Metadata adding on requests support added. StartRequests function implemented.
2019-06-12 21:30:45 +03:00
Musab Gültekin
bd8d58576f
Start requests function implemented.
2019-06-12 12:40:38 +03:00
Musab Gültekin
a311a0f998
CSV exporter support added. Not finished for map type.
2019-06-11 20:42:22 +03:00
Musab Gültekin
bbdc3bcacd
Exporters made optional, as some scrapers only want to see data in console.
2019-06-11 18:59:37 +03:00
Musab Gültekin
e4e8723426
Callback are now mandatory as almost all the scrapers use it.
2019-06-11 14:24:48 +03:00
Musab Gültekin
ca2414c5c8
Request callbacks added.
...
Recover from all panics and continue scraping.
Only parse HTML if response is HTML.
2019-06-09 21:13:30 +03:00
Musab Gültekin
b973c1c064
Request delays support added
2019-06-09 14:24:53 +03:00
Musab Gültekin
9263877339
Exporting data all types support added.
2019-06-09 13:22:20 +03:00
Musab Gültekin
d967555b62
Global and Domain Concurrency limit implemented. Updated README
2019-06-09 11:53:40 +03:00
Musab Gültekin
b90908066b
Head API added. Opt renamed to Options. Tests updated. More documentation added.
2019-06-08 20:36:43 +03:00
Musab Gültekin
815ae7eec5
Do request support added. Updated docs.
2019-06-08 19:45:48 +03:00
Musab Gültekin
54c7d3550f
Gezer renamed to Geziyor
2019-06-08 17:14:10 +03:00