d19465c44a
Robotstxt metrics added.
2019-07-08 14:51:54 +03:00
d3c4389c46
Retrying support added for chrome. Fixed robots.txt retry issue. Fixed Meta issue
2019-07-07 19:50:15 +03:00
90d2be2210
Caching policies added.
...
We used httpcache library to implement this. As it was not possible to support different policies, I mostly copied and modified it.
2019-07-07 12:18:40 +03:00
0d6c2a6864
Graceful shut down system implemented
2019-07-06 18:32:13 +03:00
42faa92ece
Robots.txt support implemented
2019-07-06 16:18:03 +03:00
2cab68d2ce
Middlewares refactored to multiple files in middleware package.
...
Extractors removed as they introduce complexity to scraper. Both in learning and developing.
2019-07-04 21:04:29 +03:00
9adff75509
Retry requests support implemented for client.
2019-07-04 13:36:10 +03:00
da03567fae
Extractors refactored to support pass by value. Documentation added for request and response.
2019-07-04 02:13:29 +03:00
71683ec6de
Chardet removed as its not good enough to detect. Built-int library is good enough.
2019-07-03 20:54:17 +03:00
33238bc875
Charset detection heuristics added with chardet lib.
2019-07-03 18:08:28 +03:00
b355a566cf
Added more tests and refactored exporter tests. Added code coverage badge.
2019-07-02 14:53:06 +03:00
4ab7cfd904
Exporter and Extractor interfaces moved to its own package for simplicity of main Geziyor package
2019-07-02 13:22:23 +03:00
c0dd0393e6
Maximum redirection option added. Performance improvement on exports. Duplicate requests only checked on GET requests.
2019-07-01 15:44:28 +03:00
80f3500a69
Fixed Chrome response not right on some sites.
2019-07-01 12:32:15 +03:00
fb5b4e3406
README updated according to new package names
2019-06-30 22:21:36 +03:00
0eda056065
Attribute extractor added. HTML extractor added. Outer HTML Extractor added.
...
exporter package renamed to export, extractor package renamed to extract for simplicity.
2019-06-30 22:20:17 +03:00
7c383b175f
Metrics Server support added for expvar. Refactored some methods.
2019-06-30 19:09:03 +03:00
ec4551a8a0
Making Requests and reading responses refactored to client package.
2019-06-30 16:21:18 +03:00
0eac5f5f40
Fixed exporters bug that was causing last exported items not written to disk.
2019-06-29 16:11:52 +03:00
bd6466a5f2
http package renamed to client to reduce cunfusion
2019-06-29 14:18:31 +03:00
1e109c555d
Request and response moved to http package
2019-06-29 13:36:39 +03:00
59757607eb
Pretty print exporter added. Panic counter added to metrics
2019-06-29 11:20:06 +03:00
276b248ebb
Synchronized requests support added. Benchmarks added.
2019-06-28 17:28:16 +03:00
b000581c3d
Extractors implemented. Exporters name simplified. README Updated for extracting data. Removed go 1.11 support
2019-06-28 13:00:30 +03:00
679fd8ab7a
Map support added for CSV exporter
2019-06-27 22:39:06 +03:00
8fe194bd10
Added options and tests for exporters.
2019-06-27 16:54:09 +03:00
d20ea47390
Fix Header convertion bug. Map was not canonicalizing keys
2019-06-22 15:04:08 +03:00
02df5aa4e8
Fixed issues on non-trailing URLS on rendered requests
2019-06-22 14:47:12 +03:00
92e7cfefec
Fixed README Doc.
2019-06-22 13:13:33 +03:00
a64a262554
HTTP Client can be changed now. Docs updated.
2019-06-22 13:12:05 +03:00
7bc782400c
Expvar metrics support added. Metrics refactored to its own package.
2019-06-21 21:37:25 +03:00
88c4b1dd35
Prometheus metrics support added.
2019-06-21 20:05:28 +03:00
141bab0d05
Error handling improved
2019-06-20 10:14:36 +03:00
f88b88986c
Delays and logs refactored as middlewares.
2019-06-20 09:54:30 +03:00
514fe2e8d2
Recover system refactored like middleware
2019-06-19 22:45:40 +03:00
c28b228a12
Response header bug fixed for Chrome
2019-06-18 16:37:06 +03:00
ec83a92eb3
Response header support added for Chrome Rendering
2019-06-18 16:26:40 +03:00
217f3c96df
Header and native http.Response support added for Chrome rendering
2019-06-18 16:16:29 +03:00
936d157785
Revert "Try parsing HTML even if content-type is empty."
...
This reverts commit f384fc2c
2019-06-18 13:03:00 +03:00
f384fc2c13
Try parsing HTML even if content-type is empty.
2019-06-18 13:00:16 +03:00
4177f10de9
Request creation simplified and basic auth test added.
2019-06-17 13:53:34 +03:00
a5ec28664d
Cookies support added.
2019-06-17 13:31:19 +03:00
dd6687f976
Fixed build issue
2019-06-17 12:21:40 +03:00
e50fa3b1dc
Response middlewares support implemented.
2019-06-16 18:29:07 +03:00
80383ebd6f
Middlewares and some string util functions refactored. Added partial Documentation.
2019-06-16 10:38:03 +03:00
40f673f2e2
Fixed README. More Go versions added for testing
2019-06-15 22:35:51 +03:00
ddff3aee25
Request cancellations support added to Middlewares.
...
Some core functions refactored as middlewares.
Fixed race condition in exporting system. Now, only one goroutine will be responsible for exporting. This fixes concurrency issues on writing.
2019-06-15 22:27:46 +03:00
83a7b9eb87
Merge pull request #4 from NMelis/master
...
Create CONTRIBUTING.md
2019-06-15 18:09:58 +03:00
f65456f18c
Update CONTRIBUTING.md
2019-06-15 18:08:27 +03:00
7b23596a2d
Middleware support added. HTML Parsing disable option added.
...
Goroutine leaks will be tested using leaktest lib.
2019-06-15 17:55:40 +03:00