Skip to content

Instantly share code, notes, and snippets.

@harshamv
Last active August 29, 2015 14:03
Show Gist options
  • Save harshamv/6e7bd7d5b8b925125667 to your computer and use it in GitHub Desktop.
Save harshamv/6e7bd7d5b8b925125667 to your computer and use it in GitHub Desktop.
Get First Para from Wikipedia
<?php
$query = $_GET['query'];
$url = "http://en.wikipedia.org/w/api.php?action=parse&page=$query&format=json&prop=text&section=0";
$ch = curl_init($url);
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt ($ch, CURLOPT_USERAGENT, "TestScript"); // required by wikipedia.org server; use YOUR user agent with YOUR contact information. (otherwise your IP might get blocked)
$c = curl_exec($ch);
$json = json_decode($c);
$content = $json->{'parse'}->{'text'}->{'*'}; // get the main text content of the query (it's parsed HTML)
// pattern for first match of a paragraph
$pattern = '#<p>(.*)</p>#Us'; // http://www.phpbuilder.com/board/showthread.php?t=10352690
if(preg_match($pattern, $content, $matches))
{
// print $matches[0]; // content of the first paragraph (including wrapping <p> tag)
$cont = strip_tags($matches[1]); // Content of the first paragraph without the HTML tags.
}
$pattern = '/\[([^\[\]]|(?R))*]|\(([^()]|(?R))*\)/';
echo $my = preg_replace($pattern, '', $cont);
?>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment