Welcome toVigges Developer Community-Open, Learning,Share
Welcome To Ask or Share your Answers For Others


0 votes
in Technique[技术] by (71.8m points)

r - RCurl Cookies on Debian

I am downloading large batches of pdfs from parliaments. I scraped the pdf addresses and now try to download them.

To do this, I set up a debian instance on a university cloud.

It worked fine for most of them, but for 4 parliaments, I downloaded an error page of having to accept cookies. The result is an html page with pdf file ending that contains mainly the question if I accept cookies.

This error does not happen on either Ubuntu or Windows 10. I figure this works because I accepted the cookies here in the Browser. I changed my code to RCurl and exported the cookies as txt files based on the 2 entries I found on stackoverflow.

I used the following example, as I mentioned it works on windows and ubuntu, but also without the cookiefile.


# the pdf to dl

curl = getCurlHandle()
           , curl=curl, followLocation = TRUE)
pdfData <- getBinaryURL(appURL, curl = curl)
writeBin(pdfData, "test2.pdf")

to reproduce, the cookiefile:

www.landtag-mv.de FALSE / FALSE 1641900313 cookieconsent_status dismiss www.landtag-mv.de FALSE / FALSE 1641900313 dp_cookieconsent_status {"dp--cookie-statistics":true,"dp--cookie-marketing":true} www.dokumentation.landtag-mv.de FALSE / FALSE 1641907216 cookieconsent_dismissed yes www.dokumentation.landtag-mv.de FALSE / FALSE 0 ASP.NET_SessionId ejtlcpjr0saw40ahceu4akb1

Maybe somebody has insights about where RCurl draws the cookies from...

best regards and thank you in advance, I hope I gave all the info necessary!

Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

Welcome to Vigges Developer Community for programmer and developer-Open, Learning and Share