Get top 100 words / keywords from a text with PHP
This is useful if you want to create dynamic keywords from content or just sort words by appearing frequency in a text or html by excluding very common words like "the on and to ...". You can give custom limit of words to return and custom words to ignore.
<?php
function top_words($str, $limit=100, $ignore=""){
if(!$ignore) $ignore = "the of to and a in for is The that on said with be was by";
$ignore_arr = explode(" ", $ignore);
$str = trim($str);
$str = preg_replace("#[&].{2,7}[;]#sim", " ", $str);
$str = preg_replace("#[()°^!\"§\$%&/{(\[)\]=}?´`,;.:\-_\#'~+*]#", " ", $str);
$str = preg_replace("#\s+#sim", " ", $str);
$arraw = explode(" ", $str);
foreach($arraw as $v){
$v = trim($v);
if(strlen($v)<3 || in_array($v, $ignore_arr)) continue;
$arr[$v]++;
}
arsort($arr);
return array_keys( array_slice($arr, 0, $limit) );
}
// usage:
// $meta_keywords = implode(", ", top_words( strip_tags( $html_content ) ) );
?>
Similar entries
- Most common words in english (top 100)
- Mysql update table if field exists
- PHP get page title function
- Invalid utf8 character string
- How to hide / encrypt CCK Email field with JS against spammer
- PHP Block for Custom Role under Drupal 6
- How to compress/backup just ascii-text files with linux tar
- Search in text files recursively with PHP - Grep
- Get Full Url Path excluding PHP Script's name
- How to use custom function aliases in PHP
- You have reached your quota limit. Please try again later
- Hide your email address + mailto link with javascript from spammer
- How to filter all html tags from each _GET and _POST request
- Recreate all teasers on Drupal
- Common HTML / Web colors
- What is difference between POP, IMAP and Webmail
- The limit is 5000 friends. Your friend count must be below this number to add more friends.
- Case-insensitive replace with Mysql
- Kill a task and exit command line window
- Fix swfupload fake flash player error

Post new comment