Commit Graph

  • 88f37ecc2d 备份 master Administrator 2024-09-05 18:16:17 +08:00
  • 688c516c9f 初始化 Administrator 2024-09-04 16:48:42 +08:00
  • 229b8ca83a
    Merge pull request #76 from melroy89/update_deps Musab Gultekin 2024-08-12 09:15:56 +03:00
  • 3a0e16934e Update go mods/sum add mod test json files to ignore list Melroy van den Berg 2024-08-11 21:43:29 +02:00
  • a242b58aaa
    Merge pull request #64 from geziyor/dependabot/go_modules/golang.org/x/text-0.3.8 Musab Gultekin 2023-03-15 16:51:10 +03:00
  • 00c8fa909c
    Merge pull request #66 from Gnoale/header Musab Gultekin 2023-03-15 16:50:19 +03:00
  • 89c5699cfa fix header nil map assignment Gnoale 2023-03-15 11:07:23 +01:00
  • 6b9a390735
    Bump golang.org/x/text from 0.3.7 to 0.3.8 dependabot[bot] 2023-02-23 09:30:35 +00:00
  • 555cdee597
    Merge pull request #62 from cristoper/patch-1 Musab Gultekin 2023-02-20 13:14:02 -05:00
  • 9a6a7617b4
    PrettyPrint should conform to Exporter interface chris 2023-02-18 13:02:57 -07:00
  • 7349b81754
    Merge pull request #58 from glaslos/fix_test Musab Gultekin 2022-12-23 19:32:47 +03:00
  • d9ac07754f more fixes Lukas Rist 2022-12-23 16:00:48 +01:00
  • 6d8cc07ce8
    Merge pull request #57 from glaslos/fix_test Musab Gultekin 2022-12-23 17:58:09 +03:00
  • 85d73be641 fix test/code validation issues Lukas Rist 2022-12-23 15:51:54 +01:00
  • 738852f932 Add custom actions for rendered requests & Fix not closing bug Musab Gültekin 2022-04-29 03:05:31 +03:00
  • 34d17a2d3d
    Merge pull request #44 from harnnless/patch-1 Musab Gültekin 2021-12-11 14:04:10 +03:00
  • b0fd08c670
    Fix HTTP2 support Harm Less 2021-12-11 18:05:17 +08:00
  • 369b42cbc6
    Merge pull request #39 from Walker088/wrap_post_method_support Musab Gültekin 2021-10-21 22:19:25 +03:00
  • 88238010b2 fix: DeepSource, Unused parameter detected in function walker088 2021-10-21 09:40:09 -03:00
  • b1e4683037 Feature: Implement Geziyor.Post which wraps the httpClient(POST) 1. Implement Geziyor.Post by the same style of Geziyor.Head 2. Add two examples in geziyor_test (TestPostJson, TestPostFormUrlEncoded) walker088 2021-10-21 09:06:34 -03:00
  • 6415a775f4 Fix exporter bug Musab Gültekin 2021-10-14 21:54:46 +03:00
  • b8bda36f92 JoinURL deprecated Musab Gültekin 2021-10-05 22:13:00 +03:00
  • 019fe62883
    Merge pull request #37 from geziyor/proxy-support Musab Gültekin 2021-10-05 21:59:10 +03:00
  • 97ecb7f118 Proxy support Musab Gültekin 2021-09-24 16:15:20 +03:00
  • 110394a753 Add .deepsource.toml DeepSource Bot 2021-08-30 18:29:32 +00:00
  • 242b025c9a Set cookie test Musab Gültekin 2021-08-08 22:08:13 +03:00
  • 53a91d63d6
    Merge pull request #29 from albertbronsky/fix-remote-allocator Musab Gültekin 2021-08-08 21:56:27 +03:00
  • fc67cec165 Update chromedp Musab Gültekin 2021-08-08 21:29:08 +03:00
  • f73f83e493
    fixed empty context in call to NewRemoteAllocator Albert Bronsky 2021-08-08 14:07:08 +03:00
  • d3bdaf6240 Added documentation and tests for request.Meta Musab Gültekin 2021-05-30 10:43:54 +03:00
  • a2a91b7b2e Default allocator options are used on rendered scraping. It can be changed using custom Client or changing client options after scraper creation. Musab Gültekin 2021-05-23 23:48:55 +03:00
  • 5aa2c2540e Default client function moved to client_test.go as its only used there. Musab Gültekin 2021-05-23 23:47:43 +03:00
  • f35d34bc02 chromedp library updated. Musab Gültekin 2021-05-23 23:14:47 +03:00
  • 16265e524d Response.JoinURL simplified. Musab Gültekin 2021-05-18 13:31:23 +03:00
  • 3c9a3849e2 Start command now waits for synchronized requests too. This fixes if requests are made using different goroutines with synchronized requests. It doesn't cause any issues on concurrent requests because we already wait for them. Musab Gültekin 2021-04-19 12:58:47 +03:00
  • d28beca57a Fix race condition on hosts semaphore Musab Gültekin 2021-04-17 14:46:45 +03:00
  • c527d0b885 SIGINT (interrupt) signal receiving refactored and fixed working on some conditions Musab Gültekin 2021-04-17 14:11:17 +03:00
  • 6a23efd175 JoinURL now returns *url.URL and error Musab Gültekin 2021-04-17 11:12:22 +03:00
  • 9ea67b3554 Use fmt.Errorf instead of errors package. This is good convention after go 1.13 Musab Gültekin 2021-04-17 11:11:29 +03:00
  • fbee722a38 Rate limiting per second implemented Musab Gültekin 2021-04-16 15:31:31 +03:00
  • d8252092f7 Add duplicate_requests_test.go Musab Gültekin 2021-04-16 14:43:42 +03:00
  • be4d13c0ef Retry checking refactored using util function. Musab Gültekin 2021-04-14 09:32:42 +03:00
  • 46c4db6b1a Exporters now need to return error. This is done because of simple error logging. Musab Gültekin 2021-04-14 09:30:17 +03:00
  • e3d79e2574 Added custom logger. Right now, not configurable. Musab Gültekin 2021-04-13 23:36:42 +03:00
  • 129402d754 Updated chromedp Musab Gültekin 2021-01-28 20:50:25 +03:00
  • 9b266b6cce Allocator options added Musab Gültekin 2021-01-28 20:49:01 +03:00
  • 29c29235ae Fixed response error if retrying disabled Musab Gültekin 2020-09-05 17:24:22 +03:00
  • 7a76a9b95e Allocators seperated for transparency. Updated chrome library. Musab Gültekin 2020-09-05 16:14:41 +03:00
  • cfb16fe1ee Call ErrorFunc on errors. Unexport DoRequestClient and DoRequestChrome Musab Gültekin 2019-12-13 00:03:44 +03:00
  • 7d2fe57bab Added error logging for HTML parser. Musab Gültekin 2019-12-11 13:55:21 +03:00
  • cbca22fefb Updated chrome protocol library Musab Gültekin 2019-11-16 20:34:57 +03:00
  • 6645820408 Added logging on allowed domains middleware and duplicate requests Musab Gültekin 2019-11-16 20:34:09 +03:00
  • 9b8a3837bd Added response joinURL test and updated chromedp. Musab Gültekin 2019-09-13 14:34:29 +03:00
  • 3264057679 Fixed issue on JoinURL Musab Gültekin 2019-08-06 17:21:41 +03:00
  • 86d4e80596 Added user-agent test, Fixed failing test Musab Gültekin 2019-08-05 16:18:44 +03:00
  • 85597219e6 Refactored client options Fixed default User-Agent string not being set. Musab Gültekin 2019-08-05 15:42:30 +03:00
  • 0e5230eac8 Remote endpoint support added for js rendered requests. Geziyor is beta now. Musab Gültekin 2019-08-05 15:14:47 +03:00
  • c117d71fef Updated license Musab Gültekin 2019-08-05 15:01:48 +03:00
  • 32077d8433 Updated docs for rendered requests Musab Gültekin 2019-07-26 16:40:42 +03:00
  • e07ef4d66d Fixed important bug on rendering that was causing client request made too. Updated chromedp dependency Musab Gültekin 2019-07-26 16:07:09 +03:00
  • 762854e511 Go 1.10 and 1.11 support added by using different methods on reflect package. Musab Gültekin 2019-07-21 12:08:41 +03:00
  • df37629d4d Disabled indenting on JSON exporter as it looks so ugly on exported data. JSONLine still supports indenting. Musab Gültekin 2019-07-14 03:37:52 +03:00
  • dfabcb84fd JSON renamed to JSONLine. JSON List support added. Musab Gültekin 2019-07-14 03:30:59 +03:00
  • d19465c44a Robotstxt metrics added. Musab Gültekin 2019-07-08 14:51:54 +03:00
  • d3c4389c46 Retrying support added for chrome. Fixed robots.txt retry issue. Fixed Meta issue Musab Gültekin 2019-07-07 19:50:15 +03:00
  • 90d2be2210 Caching policies added. We used httpcache library to implement this. As it was not possible to support different policies, I mostly copied and modified it. Musab Gültekin 2019-07-07 12:18:40 +03:00
  • 0d6c2a6864 Graceful shut down system implemented Musab Gültekin 2019-07-06 18:32:13 +03:00
  • 42faa92ece Robots.txt support implemented Musab Gültekin 2019-07-06 16:18:03 +03:00
  • 2cab68d2ce Middlewares refactored to multiple files in middleware package. Extractors removed as they introduce complexity to scraper. Both in learning and developing. Musab Gültekin 2019-07-04 21:04:29 +03:00
  • 9adff75509 Retry requests support implemented for client. Musab Gültekin 2019-07-04 13:36:10 +03:00
  • da03567fae Extractors refactored to support pass by value. Documentation added for request and response. Musab Gültekin 2019-07-04 02:13:29 +03:00
  • 71683ec6de Chardet removed as its not good enough to detect. Built-int library is good enough. Musab Gültekin 2019-07-03 20:54:17 +03:00
  • 33238bc875 Charset detection heuristics added with chardet lib. Musab Gültekin 2019-07-03 18:08:28 +03:00
  • b355a566cf Added more tests and refactored exporter tests. Added code coverage badge. Musab Gültekin 2019-07-02 14:53:06 +03:00
  • 4ab7cfd904 Exporter and Extractor interfaces moved to its own package for simplicity of main Geziyor package Musab Gültekin 2019-07-02 13:22:23 +03:00
  • c0dd0393e6 Maximum redirection option added. Performance improvement on exports. Duplicate requests only checked on GET requests. Musab Gültekin 2019-07-01 15:44:28 +03:00
  • 80f3500a69 Fixed Chrome response not right on some sites. Musab Gültekin 2019-07-01 12:32:15 +03:00
  • fb5b4e3406 README updated according to new package names Musab Gültekin 2019-06-30 22:21:36 +03:00
  • 0eda056065 Attribute extractor added. HTML extractor added. Outer HTML Extractor added. exporter package renamed to export, extractor package renamed to extract for simplicity. Musab Gültekin 2019-06-30 22:20:17 +03:00
  • 7c383b175f Metrics Server support added for expvar. Refactored some methods. Musab Gültekin 2019-06-30 19:09:03 +03:00
  • ec4551a8a0 Making Requests and reading responses refactored to client package. Musab Gültekin 2019-06-30 16:21:18 +03:00
  • 0eac5f5f40 Fixed exporters bug that was causing last exported items not written to disk. Musab Gültekin 2019-06-29 16:11:52 +03:00
  • bd6466a5f2 http package renamed to client to reduce cunfusion Musab Gültekin 2019-06-29 14:18:31 +03:00
  • 1e109c555d Request and response moved to http package Musab Gültekin 2019-06-29 13:36:39 +03:00
  • 59757607eb Pretty print exporter added. Panic counter added to metrics Musab Gültekin 2019-06-29 11:20:06 +03:00
  • 276b248ebb Synchronized requests support added. Benchmarks added. Musab Gültekin 2019-06-28 17:28:16 +03:00
  • b000581c3d Extractors implemented. Exporters name simplified. README Updated for extracting data. Removed go 1.11 support Musab Gültekin 2019-06-28 13:00:30 +03:00
  • 679fd8ab7a Map support added for CSV exporter Musab Gültekin 2019-06-27 22:39:06 +03:00
  • 8fe194bd10 Added options and tests for exporters. Musab Gültekin 2019-06-27 16:54:09 +03:00
  • d20ea47390 Fix Header convertion bug. Map was not canonicalizing keys Musab Gültekin 2019-06-22 15:04:08 +03:00
  • 02df5aa4e8 Fixed issues on non-trailing URLS on rendered requests Musab Gültekin 2019-06-22 14:47:12 +03:00
  • 92e7cfefec Fixed README Doc. Musab Gültekin 2019-06-22 13:13:33 +03:00
  • a64a262554 HTTP Client can be changed now. Docs updated. Musab Gültekin 2019-06-22 13:12:05 +03:00
  • 7bc782400c Expvar metrics support added. Metrics refactored to its own package. Musab Gültekin 2019-06-21 21:37:25 +03:00
  • 88c4b1dd35 Prometheus metrics support added. Musab Gültekin 2019-06-21 20:05:28 +03:00
  • 141bab0d05 Error handling improved Musab Gültekin 2019-06-20 10:14:36 +03:00
  • f88b88986c Delays and logs refactored as middlewares. Musab Gültekin 2019-06-20 09:54:30 +03:00
  • 514fe2e8d2 Recover system refactored like middleware Musab Gültekin 2019-06-19 22:45:40 +03:00
  • c28b228a12 Response header bug fixed for Chrome Musab Gültekin 2019-06-18 16:37:06 +03:00
  • ec83a92eb3 Response header support added for Chrome Rendering Musab Gültekin 2019-06-18 16:26:40 +03:00