Musab Gültekin
|
cbca22fefb
|
Updated chrome protocol library
|
2019-11-16 20:34:57 +03:00 |
|
Musab Gültekin
|
9b8a3837bd
|
Added response joinURL test and updated chromedp.
|
2019-09-13 14:34:29 +03:00 |
|
Musab Gültekin
|
e07ef4d66d
|
Fixed important bug on rendering that was causing client request made too. Updated chromedp dependency
|
2019-07-26 16:07:09 +03:00 |
|
Musab Gültekin
|
90d2be2210
|
Caching policies added.
We used httpcache library to implement this. As it was not possible to support different policies, I mostly copied and modified it.
|
2019-07-07 12:18:40 +03:00 |
|
Musab Gültekin
|
42faa92ece
|
Robots.txt support implemented
|
2019-07-06 16:18:03 +03:00 |
|
Musab Gültekin
|
71683ec6de
|
Chardet removed as its not good enough to detect. Built-int library is good enough.
|
2019-07-03 20:54:17 +03:00 |
|
Musab Gültekin
|
33238bc875
|
Charset detection heuristics added with chardet lib.
|
2019-07-03 18:08:28 +03:00 |
|
Musab Gültekin
|
b355a566cf
|
Added more tests and refactored exporter tests. Added code coverage badge.
|
2019-07-02 14:53:06 +03:00 |
|
Musab Gültekin
|
7bc782400c
|
Expvar metrics support added. Metrics refactored to its own package.
|
2019-06-21 21:37:25 +03:00 |
|
Musab Gültekin
|
88c4b1dd35
|
Prometheus metrics support added.
|
2019-06-21 20:05:28 +03:00 |
|
Musab Gültekin
|
141bab0d05
|
Error handling improved
|
2019-06-20 10:14:36 +03:00 |
|
Musab Gültekin
|
f384fc2c13
|
Try parsing HTML even if content-type is empty.
|
2019-06-18 13:00:16 +03:00 |
|
Musab Gültekin
|
7b23596a2d
|
Middleware support added. HTML Parsing disable option added.
Goroutine leaks will be tested using leaktest lib.
|
2019-06-15 17:55:40 +03:00 |
|
Musab Gültekin
|
6caf1effd6
|
Rendered field exported to support rendered requests on Do function. Data races fixed.
|
2019-06-14 15:23:56 +03:00 |
|
Musab Gültekin
|
1a7d480b36
|
JS Rendered requests with Chrome support added
|
2019-06-13 22:08:45 +03:00 |
|
Musab Gültekin
|
e4e8723426
|
Callback are now mandatory as almost all the scrapers use it.
|
2019-06-11 14:24:48 +03:00 |
|
Musab Gültekin
|
ca2414c5c8
|
Request callbacks added.
Recover from all panics and continue scraping.
Only parse HTML if response is HTML.
|
2019-06-09 21:13:30 +03:00 |
|
Musab Gültekin
|
a9aaf86df3
|
Automatic determining response and decoding it.
|
2019-06-09 10:46:32 +03:00 |
|
Musab Gültekin
|
54c7d3550f
|
Gezer renamed to Geziyor
|
2019-06-08 17:14:10 +03:00 |
|
Musab Gültekin
|
ca197ff06a
|
Caching added.
JSON File export will append, not truncate.
|
2019-06-08 15:29:09 +03:00 |
|
Musab Gültekin
|
6358b87472
|
Use parse function to parse responses, instead of channels.
Parse response as HTML Document using goquery.
Added simple README.
|
2019-06-06 22:48:57 +03:00 |
|
Musab Gültekin
|
1c96048082
|
Initial commit
|
2019-06-06 17:11:19 +03:00 |
|