Saturday, August 20, 2016

How to find ranking of conferences and journals

(1) conference rankings

(1.1) Google Scholar offers conference rankings in various fields, e.g., the rankings of conferences and journals in the Computational Linguistics field could be found here (other fields could be found on the left category tree)

(1.2) Microsoft Academic Search also offers conference rankings in different fields, e.g., ranking of conferences in the Natural Language & Speech field could be found here.

(1.3) conference ranking in the Computer Science field could also be found on the Computing Research & Education Conference Portal

(2) journals

(2.1) free journal rankings
Scimago Journal & Country Rank

(2.2) non-free ones
Thomson Reuters Journal Citation Reports® (most of the universities have this)

How to use sqoop to copy MySQL tables to Hive

sqoop is useful when you need to copy MySQL tables to Hive

Here is an example to copy MySQL tables from different shards to one Hive table:

 set -x  
 set -e  
 echo "Log: `date` dropping Hive table $game.${hive_tab}"  
 hive -S -e "drop table \`$game.${hive_tab}\`;"   
 for SHARD_ID in {1..200}; do  
   mysql_conn="mysql --user=$mysql_user --password=$mysql_pwd -D${mysql_database} --host=${mysql_host} -s --skip-column-names"  
   if ping -c 1 -W 1 "$mysql_host"; then  
     echo "Log: `date` $mysql_host is alive"  
     echo "Log: `date` $mysql_host is not alive"  
     exit 0  
   echo "Log: `date` dropping Hive table $game.${hive_shard_tab}"  
   hive -S -e "drop table $game.${hive_shard_tab}"   
   echo "Log: `date` removing ${hdfs_dir} on HDFS"  
   hadoop fs -rm -r -f ${hdfs_dir}  
   sql="select count(*) from \`${mysql_tab}\`"  
   mysql_row_cnt=`echo "$sql" | $mysql_conn`  
   echo "Log: `date` found ${mysql_row_cnt} rows in the MySQL table ${mysql_tab} with query: $sql"  
   sqoop import \  
    --connect jdbc:mysql://$mysql_host/${mysql_database} \  
    --table "${mysql_tab}" \  
    --username $mysql_user \  
    --password $mysql_pwd \  
    --num-mappers 1 \  
    --hive-overwrite \  
    --hive-table $game.${hive_shard_tab} \  
    --hive-import \  
    --target-dir ${hdfs_dir} \  
    --hive-delims-replacement ' '   
   hive_row_cnt=`hive -S -e "select count(*) from $game.${hive_shard_tab}"`  
   echo "Log: `date` ended up with ${hive_row_cnt} rows in the Hive table $game.${hive_shard_tab} which are copied from the MySQL table ${mysql_tab} (${mysql_row_cnt} rows)"  
   # merging  
   if [ $SHARD_ID = 1 ]; then  
      sql_str="create table $game.\`$hive_tab\` as select * from $game.${hive_shard_tab};"  
      echo "Log: `date` creating the Hive table $game.${hive_tab} with the data from the first Shard with sql: $sql_str"  
      hive -S -e "$sql_str"   
      sql_str="insert into table $game.\`$hive_tab\` select * from $game.${hive_shard_tab};"  
      echo "Log: `date` merging into the Hive table $game.${hive_tab} the data from Shard $SHARD_ID with sql: $sql_str"  
      hive -S -e "$sql_str"   
 exit 0  

Wednesday, August 17, 2016

How to use ngrep

ngrep is a very useful tool on Linux to capture TCP packages for a given host, a given port number, or a given key word.

(1) to capture packages (printed in hex format) from port 1234 with keyword "my-word" (using network device bond0 (see ifconfig to pick a device))
 sudo ngrep -l -t -d bond0 -q -x my-word port 1234

(2) to capture packages to a host
sudo ngrep -l -t -d bond0 -q -W byline host