Administrator
88f37ecc2d
备份
2024-09-05 18:16:17 +08:00
Administrator
688c516c9f
初始化
2024-09-04 16:48:42 +08:00
Musab Gültekin
738852f932
Add custom actions for rendered requests & Fix not closing bug
2022-04-29 03:05:31 +03:00
walker088
b1e4683037
Feature: Implement Geziyor.Post which wraps the httpClient(POST)
...
1. Implement Geziyor.Post by the same style of Geziyor.Head
2. Add two examples in geziyor_test (TestPostJson, TestPostFormUrlEncoded)
issue #38
2021-10-21 09:06:34 -03:00
Musab Gültekin
6415a775f4
Fix exporter bug
2021-10-14 21:54:46 +03:00
Musab Gültekin
97ecb7f118
Proxy support
2021-09-24 16:15:20 +03:00
Musab Gültekin
a2a91b7b2e
Default allocator options are used on rendered scraping. It can be changed using custom Client or changing client options after scraper creation.
2021-05-23 23:48:55 +03:00
Musab Gültekin
3c9a3849e2
Start command now waits for synchronized requests too. This fixes if requests are made using different goroutines with synchronized requests.
...
It doesn't cause any issues on concurrent requests because we already wait for them.
2021-04-19 12:58:47 +03:00
Musab Gültekin
d28beca57a
Fix race condition on hosts semaphore
2021-04-17 14:46:45 +03:00
Musab Gültekin
c527d0b885
SIGINT (interrupt) signal receiving refactored and fixed working on some conditions
2021-04-17 14:11:17 +03:00
Musab Gültekin
fbee722a38
Rate limiting per second implemented
2021-04-16 15:31:31 +03:00
Musab Gültekin
46c4db6b1a
Exporters now need to return error. This is done because of simple error logging.
2021-04-14 09:30:17 +03:00
Musab Gültekin
e3d79e2574
Added custom logger. Right now, not configurable.
2021-04-13 23:36:42 +03:00
Musab Gültekin
cfb16fe1ee
Call ErrorFunc on errors. Unexport DoRequestClient and DoRequestChrome
2019-12-13 00:03:44 +03:00
Musab Gültekin
85597219e6
Refactored client options
...
Fixed default User-Agent string not being set.
2019-08-05 15:42:30 +03:00
Musab Gültekin
0e5230eac8
Remote endpoint support added for js rendered requests. Geziyor is beta now.
2019-08-05 15:14:47 +03:00
Musab Gültekin
d19465c44a
Robotstxt metrics added.
2019-07-08 14:51:54 +03:00
Musab Gültekin
90d2be2210
Caching policies added.
...
We used httpcache library to implement this. As it was not possible to support different policies, I mostly copied and modified it.
2019-07-07 12:18:40 +03:00
Musab Gültekin
0d6c2a6864
Graceful shut down system implemented
2019-07-06 18:32:13 +03:00
Musab Gültekin
42faa92ece
Robots.txt support implemented
2019-07-06 16:18:03 +03:00
Musab Gültekin
2cab68d2ce
Middlewares refactored to multiple files in middleware package.
...
Extractors removed as they introduce complexity to scraper. Both in learning and developing.
2019-07-04 21:04:29 +03:00
Musab Gültekin
9adff75509
Retry requests support implemented for client.
2019-07-04 13:36:10 +03:00
Musab Gültekin
33238bc875
Charset detection heuristics added with chardet lib.
2019-07-03 18:08:28 +03:00
Musab Gültekin
4ab7cfd904
Exporter and Extractor interfaces moved to its own package for simplicity of main Geziyor package
2019-07-02 13:22:23 +03:00
Musab Gültekin
c0dd0393e6
Maximum redirection option added. Performance improvement on exports. Duplicate requests only checked on GET requests.
2019-07-01 15:44:28 +03:00
Musab Gültekin
80f3500a69
Fixed Chrome response not right on some sites.
2019-07-01 12:32:15 +03:00
Musab Gültekin
0eda056065
Attribute extractor added. HTML extractor added. Outer HTML Extractor added.
...
exporter package renamed to export, extractor package renamed to extract for simplicity.
2019-06-30 22:20:17 +03:00
Musab Gültekin
7c383b175f
Metrics Server support added for expvar. Refactored some methods.
2019-06-30 19:09:03 +03:00
Musab Gültekin
ec4551a8a0
Making Requests and reading responses refactored to client package.
2019-06-30 16:21:18 +03:00
Musab Gültekin
0eac5f5f40
Fixed exporters bug that was causing last exported items not written to disk.
2019-06-29 16:11:52 +03:00
Musab Gültekin
bd6466a5f2
http package renamed to client to reduce cunfusion
2019-06-29 14:18:31 +03:00
Musab Gültekin
1e109c555d
Request and response moved to http package
2019-06-29 13:36:39 +03:00
Musab Gültekin
59757607eb
Pretty print exporter added. Panic counter added to metrics
2019-06-29 11:20:06 +03:00
Musab Gültekin
276b248ebb
Synchronized requests support added. Benchmarks added.
2019-06-28 17:28:16 +03:00
Musab Gültekin
b000581c3d
Extractors implemented. Exporters name simplified. README Updated for extracting data. Removed go 1.11 support
2019-06-28 13:00:30 +03:00
Musab Gültekin
02df5aa4e8
Fixed issues on non-trailing URLS on rendered requests
2019-06-22 14:47:12 +03:00
Musab Gültekin
a64a262554
HTTP Client can be changed now. Docs updated.
2019-06-22 13:12:05 +03:00
Musab Gültekin
7bc782400c
Expvar metrics support added. Metrics refactored to its own package.
2019-06-21 21:37:25 +03:00
Musab Gültekin
88c4b1dd35
Prometheus metrics support added.
2019-06-21 20:05:28 +03:00
Musab Gültekin
141bab0d05
Error handling improved
2019-06-20 10:14:36 +03:00
Musab Gültekin
f88b88986c
Delays and logs refactored as middlewares.
2019-06-20 09:54:30 +03:00
Musab Gültekin
514fe2e8d2
Recover system refactored like middleware
2019-06-19 22:45:40 +03:00
Musab Gültekin
c28b228a12
Response header bug fixed for Chrome
2019-06-18 16:37:06 +03:00
Musab Gültekin
ec83a92eb3
Response header support added for Chrome Rendering
2019-06-18 16:26:40 +03:00
Musab Gültekin
217f3c96df
Header and native http.Response support added for Chrome rendering
2019-06-18 16:16:29 +03:00
Musab Gültekin
4177f10de9
Request creation simplified and basic auth test added.
2019-06-17 13:53:34 +03:00
Musab Gültekin
a5ec28664d
Cookies support added.
2019-06-17 13:31:19 +03:00
Musab Gültekin
e50fa3b1dc
Response middlewares support implemented.
2019-06-16 18:29:07 +03:00
Musab Gültekin
80383ebd6f
Middlewares and some string util functions refactored. Added partial Documentation.
2019-06-16 10:38:03 +03:00
Musab Gültekin
ddff3aee25
Request cancellations support added to Middlewares.
...
Some core functions refactored as middlewares.
Fixed race condition in exporting system. Now, only one goroutine will be responsible for exporting. This fixes concurrency issues on writing.
2019-06-15 22:27:46 +03:00