Introduction To cURL, The Command Line Web Browser

 

cURL is a command line tool that lets you both download web content and also gather information about that content and its server. I’ll introduce you and then show you some cool tricks!


This tip was sponsored by OSTraining.com! OSTraining aims to explain websites so clearly that anyone can understand, with jargon-free, hands-on training that clearly defines processes to simplify complex tasks. OSTraining offers online courses, best-selling books, onsite training, and online live training.

More than 100,000 students have learned website development through OSTraining. OSTraining courses focus on web development and the tools around it, including WordPress, WooCommerce, Drupal, Magento, and Joomla.


Transcription

Hey folks! Welcome to another HeroPress tip of the week. This week we’re going to talk about cURL which is a command line tool and library for transferring data with urls. I’m going to show you some command line tricks you can do with it and we’re going to base some future tips on this so you might come back to this later but that’s why we’re doing it now! I wanted to show you some information about it first this is the website for it curl.se and I don’t particularly like this site I brought it up because it’s it’s the main site and you should know it’s there. But the Wikipedia page is in my opinion much better. You can see that there’s an alternate spelling here and you can see it was released in 96 had a couple of names and it can do a lot of stuff and it works on a surprisingly high number of platforms including FreeBSD, Iris, openBSD, OS/2 Warp, dos, etc. so it gets around. Most importantly for you is that it’s pretty much present on any Unix platform so if you log into your server curl is probably there. If you’re running a Mac curl is there. If you’re running Linux curl is there so it’s always available and can be very useful. I want to show you some stuff that we’re not going to get into too much in this video but you should be aware PHP can do curl, and these are all the curl functions and if you were to try and use them it would look something like this. So you would assign a variable to curl in it. You would set some options here so this is the website that we’re gonna download with curl. Then you would run curl_exec and it would download it and put it into $ch and then you close the connection. So this right here is a browser session. Curl is just a web browser it just gets one page at a time and none of the additional stuff like it doesn’t get any of the LinkedIn CSS or JavaScript or images or anything it’s just the HTML

Now if you want to do all this stuff in WordPress it’s even more amazing. WordPress has something called the HTTP API. So we’re going to go down here to their code example and you could use one function called wp_remote_get and put it in a URL and oh look! It goes to get Ben Lobaugh and it puts it in the $response. That’s it! So this code is all wrapped up in this one one-liner and it gets back the array headers the body the response and then the actual contents. So doing this in WordPress is a lot easier than doing it just in PHP. Now all that said what I’m here to show you today is some command line Magic. So I’m going to come right over here to my command line and we’re going to do

curl

and put it in an address. We’ll do example.com.

and it just downloaded it and that’s it! It just spits it right out to the screen so it doesn’t save it or anything. It’s just there. Now you can use a flag called -O for output and give it a file name or you can do it the Unix way and put in a little Waka here and give it a name so we’ll call this example.com.html

And there we are it has a little progress meter. If you’re downloading something big enough that actually updates by the second um if you’re just downloading something small like this you don’t really get to take advantage of that but if we do an LS there’s example we edit it you can see that it’s just a web page

now this is quite small let’s do HeroPress.

There we go and again it just dumped into the screen so again let’s use the Waka and put it in heropress.com.html

And here we have all the HTML for HeroPress. So this could be useful if you want to save the code of a page. Curl does not run JavaScript so if you have a page that’s being built by JavaScript but you want to get what the raw code is before JavaScript gets to meddle with it in your browser then this is the perfect way, because this pulls in that code before JavaScript has a chance to meddle with it so you can see what it might be doing ahead of time. Let’s skip a bunch here, yeah see there’s a whole bunch of JavaScript going on on the top there. This is built with Kadence and the Gutenberg editor so there’s a bunch of Gutenberg stuff in there but anyway, that’s a page. Now I more often use Curl to look at headers. The headers are the the meta information about a page and browsers look at them, servers look at them. Browsers send them to the server and the server sends them back to the browser but just information about the transaction. So if we do curl -I on example.com

We get this block of of headers.

So at the top we can see that it’s using http 2. 200 is the Response Code so you may be more familiar with 404 not found 200 is totally successful content. Encoding is gzip we’ll look at that in another video Accept-Ranges bytes, the page is this many seconds old.

And the cache control the max it can get to is this many seconds. So as an academic exercise let’s figure out how much time that is. Pull up my basic calculator here and we’ll do six hundred and four thousand eight hundred divided by 60. So now we have minutes. Let’s do it again we’ll do it by hours. Let’s do it again but we’ll divide by 24 to get days and it’s seven so the max age is a week only for the cash on this. So let’s try its current age

divided by 16 divided by 16.

so it’s 90 hours old

This doesn’t do fractions very well so it’s some somewhat over three days old so now we hope we know how long this page has been sitting here cached. Content-type is HTML, there’s the date, e-tag, this tells when the cache expires, when it was less modified, what web server this is I’m not familiar with ECS so now I’m curious, it tells us that it hit the cache and it tells us how big the file is. 648 bytes now we’re going to rerun that and then I’m going to do heropress.com and compare them.

Now we get a lot less information back because the web server just doesn’t send as much again HTTP 2 again 200. This server is nginx which is very common and I’m familiar with it. There’s the date, the content type, vary, Accept-Encoding. I’m not sure what Vary is. The x-cache-handler is called cache-enabler-engine. What happened when it tried to hit the cache is a bypass so it didn’t hit the cache it actually got the the actual page and another Pro tip here I happen to know that nxcel is the name of the caching engine at Nexus and so just from this I can know that the site is running on Nexus I already knew but if I didn’t that’s what would tell me.

So I’ll show you two tricks here with curl. One is how to actually download a page and one is how to look at the headers. There are thousands of things you can do with curl and we’re not going to go over them. I’m going to use some extra curl options in future tips so don’t forget this stuff or come back and watch it again if you need to but I recommend reading up on curl and finding out what all it can do just going through the PHP options is extremely helpful. It explains it a lot better than trying to read the documentation which I should point out on every machine is under man curl man for manual and uh gives a a nice list of things you can do all the protocols it handles progress meter all the flags all that stuff so you can always find docs under the man file. I hope you find this useful!