140 Commits

Author SHA1 Message Date
Musab Gültekin
2cab68d2ce Middlewares refactored to multiple files in middleware package.
Extractors removed as they introduce complexity to scraper. Both in learning and developing.
2019-07-04 21:04:29 +03:00
Musab Gültekin
9adff75509 Retry requests support implemented for client. 2019-07-04 13:36:10 +03:00
Musab Gültekin
da03567fae Extractors refactored to support pass by value. Documentation added for request and response. 2019-07-04 02:13:29 +03:00
Musab Gültekin
71683ec6de Chardet removed as its not good enough to detect. Built-int library is good enough. 2019-07-03 20:54:17 +03:00
Musab Gültekin
33238bc875 Charset detection heuristics added with chardet lib. 2019-07-03 18:08:28 +03:00
Musab Gültekin
b355a566cf Added more tests and refactored exporter tests. Added code coverage badge. 2019-07-02 14:53:06 +03:00
Musab Gültekin
4ab7cfd904 Exporter and Extractor interfaces moved to its own package for simplicity of main Geziyor package 2019-07-02 13:22:23 +03:00
Musab Gültekin
c0dd0393e6 Maximum redirection option added. Performance improvement on exports. Duplicate requests only checked on GET requests. 2019-07-01 15:44:28 +03:00
Musab Gültekin
80f3500a69 Fixed Chrome response not right on some sites. 2019-07-01 12:32:15 +03:00
Musab Gültekin
fb5b4e3406 README updated according to new package names 2019-06-30 22:21:36 +03:00
Musab Gültekin
0eda056065 Attribute extractor added. HTML extractor added. Outer HTML Extractor added.
exporter package renamed to export, extractor package renamed to extract for simplicity.
2019-06-30 22:20:17 +03:00
Musab Gültekin
7c383b175f Metrics Server support added for expvar. Refactored some methods. 2019-06-30 19:09:03 +03:00
Musab Gültekin
ec4551a8a0 Making Requests and reading responses refactored to client package. 2019-06-30 16:21:18 +03:00
Musab Gültekin
0eac5f5f40 Fixed exporters bug that was causing last exported items not written to disk. 2019-06-29 16:11:52 +03:00
Musab Gültekin
bd6466a5f2 http package renamed to client to reduce cunfusion 2019-06-29 14:18:31 +03:00
Musab Gültekin
1e109c555d Request and response moved to http package 2019-06-29 13:36:39 +03:00
Musab Gültekin
59757607eb Pretty print exporter added. Panic counter added to metrics 2019-06-29 11:20:06 +03:00
Musab Gültekin
276b248ebb Synchronized requests support added. Benchmarks added. 2019-06-28 17:28:16 +03:00
Musab Gültekin
b000581c3d Extractors implemented. Exporters name simplified. README Updated for extracting data. Removed go 1.11 support 2019-06-28 13:00:30 +03:00
Musab Gültekin
679fd8ab7a Map support added for CSV exporter 2019-06-27 22:39:06 +03:00
Musab Gültekin
8fe194bd10 Added options and tests for exporters. 2019-06-27 16:54:09 +03:00
Musab Gültekin
d20ea47390 Fix Header convertion bug. Map was not canonicalizing keys 2019-06-22 15:04:08 +03:00
Musab Gültekin
02df5aa4e8 Fixed issues on non-trailing URLS on rendered requests 2019-06-22 14:47:12 +03:00
Musab Gültekin
92e7cfefec Fixed README Doc. 2019-06-22 13:13:33 +03:00
Musab Gültekin
a64a262554 HTTP Client can be changed now. Docs updated. 2019-06-22 13:12:05 +03:00
Musab Gültekin
7bc782400c Expvar metrics support added. Metrics refactored to its own package. 2019-06-21 21:37:25 +03:00
Musab Gültekin
88c4b1dd35 Prometheus metrics support added. 2019-06-21 20:05:28 +03:00
Musab Gültekin
141bab0d05 Error handling improved 2019-06-20 10:14:36 +03:00
Musab Gültekin
f88b88986c Delays and logs refactored as middlewares. 2019-06-20 09:54:30 +03:00
Musab Gültekin
514fe2e8d2 Recover system refactored like middleware 2019-06-19 22:45:40 +03:00
Musab Gültekin
c28b228a12 Response header bug fixed for Chrome 2019-06-18 16:37:06 +03:00
Musab Gültekin
ec83a92eb3 Response header support added for Chrome Rendering 2019-06-18 16:26:40 +03:00
Musab Gültekin
217f3c96df Header and native http.Response support added for Chrome rendering 2019-06-18 16:16:29 +03:00
Musab Gültekin
936d157785 Revert "Try parsing HTML even if content-type is empty."
This reverts commit f384fc2c
2019-06-18 13:03:00 +03:00
Musab Gültekin
f384fc2c13 Try parsing HTML even if content-type is empty. 2019-06-18 13:00:16 +03:00
Musab Gültekin
4177f10de9 Request creation simplified and basic auth test added. 2019-06-17 13:53:34 +03:00
Musab Gültekin
a5ec28664d Cookies support added. 2019-06-17 13:31:19 +03:00
Musab Gültekin
dd6687f976 Fixed build issue 2019-06-17 12:21:40 +03:00
Musab Gültekin
e50fa3b1dc Response middlewares support implemented. 2019-06-16 18:29:07 +03:00
Musab Gültekin
80383ebd6f Middlewares and some string util functions refactored. Added partial Documentation. 2019-06-16 10:38:03 +03:00
Musab Gültekin
40f673f2e2 Fixed README. More Go versions added for testing 2019-06-15 22:35:51 +03:00
Musab Gültekin
ddff3aee25 Request cancellations support added to Middlewares.
Some core functions refactored as middlewares.
Fixed race condition in exporting system. Now, only one goroutine will be responsible for exporting. This fixes concurrency issues on writing.
2019-06-15 22:27:46 +03:00
Musab Gültekin
83a7b9eb87
Merge pull request #4 from NMelis/master
Create CONTRIBUTING.md
2019-06-15 18:09:58 +03:00
Musab Gültekin
f65456f18c
Update CONTRIBUTING.md 2019-06-15 18:08:27 +03:00
Musab Gültekin
7b23596a2d Middleware support added. HTML Parsing disable option added.
Goroutine leaks will be tested using leaktest lib.
2019-06-15 17:55:40 +03:00
Melis Nurlan
2e29c47acd
Create CONTRIBUTING.md
could add a description of how to become a contributor?
2019-06-15 19:43:49 +07:00
Musab Gültekin
4799b0f7b4 Fixed goroutine leaks. Updated travis build 2019-06-14 17:30:49 +03:00
Musab Gültekin
f5b3b0d049 Fixed race conditions on exporters.
MaxIdleConns limit disabled to support unlimited requests to all hosts.
MaxIdleConnsPerHost limit increased to speed up requests to same host.
2019-06-14 16:10:36 +03:00
Musab Gültekin
83bfb01856
Merge pull request #3 from isacikgoz/master
Update README.md
2019-06-14 15:34:31 +03:00
Musab Gültekin
b2f32b8830
Merge branch 'master' into master 2019-06-14 15:32:36 +03:00