Saturday, August 20, 2016

How to find ranking of conferences and journals

(1) conference rankings

(1.1) Google Scholar offers conference rankings in various fields, e.g., the rankings of conferences and journals in the Computational Linguistics field could be found here (other fields could be found on the left category tree)

(1.2) Microsoft Academic Search also offers conference rankings in different fields, e.g., ranking of conferences in the Natural Language & Speech field could be found here.

(1.3) conference ranking in the Computer Science field could also be found on the Computing Research & Education Conference Portal


(2) journals

(2.1) free journal rankings
Scimago Journal & Country Rank

(2.2) non-free ones
Thomson Reuters Journal Citation Reports® (most of the universities have this)

How to use sqoop to copy MySQL tables to Hive

sqoop is useful when you need to copy MySQL tables to Hive

Here is an example to copy MySQL tables from different shards to one Hive table:


 #!/bin/bash  
   
 set -x  
 set -e  
   
 game=gameexample  
   
 mysql_host_prefix=userdb  
 mysql_host_suffix=myhostname.com  
 mysql_tab=user  
 mysql_database=mydatabase  
 mysql_user=myusername  
 mysql_pwd=xxxxx  
   
 hive_tab=user  
   
 echo "Log: `date` dropping Hive table $game.${hive_tab}"  
 hive -S -e "drop table \`$game.${hive_tab}\`;"   
   
 for SHARD_ID in {1..200}; do  
   
   mysql_host=${mysql_host_prefix}-${SHARD_ID}-${mysql_host_suffix}  
   mysql_conn="mysql --user=$mysql_user --password=$mysql_pwd -D${mysql_database} --host=${mysql_host} -s --skip-column-names"  
   hive_shard_tab=${hive_tab}_shard${SHARD_ID}  
   hdfs_dir=/user/mapr/${mysql_tab}_shard${SHARD_ID}  
   if ping -c 1 -W 1 "$mysql_host"; then  
     echo "Log: `date` $mysql_host is alive"  
   else  
     echo "Log: `date` $mysql_host is not alive"  
     exit 0  
   fi   
   
   echo "Log: `date` dropping Hive table $game.${hive_shard_tab}"  
   hive -S -e "drop table $game.${hive_shard_tab}"   
   echo "Log: `date` removing ${hdfs_dir} on HDFS"  
   hadoop fs -rm -r -f ${hdfs_dir}  
     
   sql="select count(*) from \`${mysql_tab}\`"  
   mysql_row_cnt=`echo "$sql" | $mysql_conn`  
   echo "Log: `date` found ${mysql_row_cnt} rows in the MySQL table ${mysql_tab} with query: $sql"  
     
   sqoop import \  
    --connect jdbc:mysql://$mysql_host/${mysql_database} \  
    --table "${mysql_tab}" \  
    --username $mysql_user \  
    --password $mysql_pwd \  
    --num-mappers 1 \  
    --hive-overwrite \  
    --hive-table $game.${hive_shard_tab} \  
    --hive-import \  
    --target-dir ${hdfs_dir} \  
    --hive-delims-replacement ' '   
     
   hive_row_cnt=`hive -S -e "select count(*) from $game.${hive_shard_tab}"`  
   echo "Log: `date` ended up with ${hive_row_cnt} rows in the Hive table $game.${hive_shard_tab} which are copied from the MySQL table ${mysql_tab} (${mysql_row_cnt} rows)"  
     
   # merging  
   if [ $SHARD_ID = 1 ]; then  
      sql_str="create table $game.\`$hive_tab\` as select * from $game.${hive_shard_tab};"  
      echo "Log: `date` creating the Hive table $game.${hive_tab} with the data from the first Shard with sql: $sql_str"  
      hive -S -e "$sql_str"   
   else  
      sql_str="insert into table $game.\`$hive_tab\` select * from $game.${hive_shard_tab};"  
      echo "Log: `date` merging into the Hive table $game.${hive_tab} the data from Shard $SHARD_ID with sql: $sql_str"  
      hive -S -e "$sql_str"   
   fi  
 done  
 exit 0  
   

Wednesday, August 17, 2016

How to use ngrep

ngrep is a very useful tool on Linux to capture TCP packages for a given host, a given port number, or a given key word.

(1) to capture packages (printed in hex format) from port 1234 with keyword "my-word" (using network device bond0 (see ifconfig to pick a device))
 sudo ngrep -l -t -d bond0 -q -x my-word port 1234

(2) to capture packages to a host my.hostname.com
sudo ngrep -l -t -d bond0 -q -W byline host my.hostname.com

Wednesday, July 27, 2016

How to find your citations

As a researcher, you may want to find all the papers which cite to your papers. Some ways are listed here:
(1) Google Scholar: is a good place but it may ignore some minor sources, e.g., some workshop papers;
(2) ResearchGate: you could upload your papers to it, and it could filter papers to find out what papers cite to your papers;
(3) Microsoft Academy: you could search for your papers first and then it could show you what papers cite to your papers;
(4) CiteSeerX

How to split terminal on remote server with background sessions

Screen

screen is a useful command to leave some processes running in the background on Linux.
We could actually split the screen terminal into sub-terminals with the following short-cuts:
(1) to split a terminal horizontally: press Ctrl+A, release them, press Shift+s
(2) to split a terminal vertically: press Ctrl+A, release them, press Shift+\
(3) to switch among sub-terminals: press Ctrl+A, release them, press Tab
But please note that the sub-terminals would be converted to background windows in screen if you detach (Ctrl+A+D) your screen session and then attach it again. To avoid this, you could use tmux instead.

Tmux

Tmux is more convenient than screen on this way.
type "tmux new -s sessionname" to create a new session
in the session, you could use:
(1) to split a terminal horizontally: press Ctrl+B, release them, press % (Shift+5);
(2) to split a terminal vertically: press Ctrl+B, release them, press " (Shift+');
(3) to detach the current session: press Ctrl+B, release them, press D;
(4) to attach to a session: "tmux attach -t sessionname";
(5) to switch to a pane: Ctrl+B, release them, press q (Show pane numbers, when the numbers show up type the key to go to that pane).


Reference:

http://fosshelp.blogspot.com/2014/02/how-to-linux-terminal-split-screen-with.html
https://www.youtube.com/watch?v=BHhA_ZKjyxo
https://gist.github.com/MohamedAlaa/2961058

Thursday, June 9, 2016

Enable core dumps for systemd services on CentOS 7

Core dumps are very useful to C++ programs to debug critical crashes like segfault, etc.

How to enable core dumps for systemd services on CentOS 7?

(1) change the core_pattern to some place you could write to
$ cat /proc/sys/kernel/core_pattern
/home/your-user-name/coredumps/core-%e-sig%s-user%u-group%g-pid%p-time%t
Note: your-user-name is the user name you are using to run your program which would crash to generate a core dump.


(2) create a new file:
/etc/security/limits.d/core.conf
like:

*       hard        core        unlimited
*       soft        core        unlimited
to enable core dumps for all users



(3) modify /etc/systemd/system.conf
to add:
DefaultLimitCORE=infinity

(4) modify your systemd service conf
e.g., /etc/systemd/system/your-service.service
to add:
LimitCORE=infinity
in the "[Service]" section.

(5) reload the new systemd conf and restart your service
systemctl daemon-reexec
systemctl stop your-service
systemctl start your-service
(6) how to test it
You could kill your service process by sending signal 11 (SIGSEGV) to your process. By right, you should see a new core dump at:
/home/your-user-name/coredumps/core-%e-sig%s-user%u-group%g-pid%p-time%t


References:
http://www.kibinlabs.com/re-enabling-core-dumps-redhat-7/

yum update meets "No packages marked for update"

When you use "yum update", you could see "No packages marked for update", though you are sure that there are some package updates.

In this case, you could try:
yum clean all
yum update your-package-name