Sunday, April 8, 2012

Android Traffic Statistics Inside

If you're going to write an application for calculating/aggregating traffic statistics for Android device, you should be aware of some peculiarities and issues.

TrafficStats class

Of course, you will use TrafficStats class. It's available since API level 8 (Android 2.2), and provides bytes transmitted and received and network packets transmitted and received, over all interfaces, over the mobile interface, and on a per-UID basis.
Note, that you can't get stats, for example, separately for roaming - only total mobile traffic.
Per-UID stats merely returns total available stats for this UID, without separation by network interfaces.
Also there's no time information : if TrafficStats.getMobileRxBytes() returns 12345 bytes you never know if that's only today's traffic usage or last 10 days usage (explanation is below).

TrafficStats magic

If you checkout TrafficStats.java sources, you'll see that this class only calls native methods. And what does C do?
As you might know, Android's kernel is based on the Linux kernel. There's a sysfs virtual filesystem that exports information about devices and drivers, and also is used for configuration. Each network interface config is in a /sys/class/net/ dir. If you list that dir on a linux, you'll probably see something like that :

$ ls /sys/class/net/
eth0 eth1 eth2 lo


On my Android device there are :

$ ls sys/class/net
lo
dummy0
ifb0
ifb1
rmnet0
rmnet1
rmnet2
usb0
sit0
ip6tnl0
gannet0
tun
eth0


Of course, these lists can differ a little bit on another devices. The list is even changed while time's passing on the same device.

Anyway, we're interested in files :

/sys/class/net/[interface]/statistics/tx_packets
/sys/class/net/[interface]/statistics/rx_packets

/sys/class/net/[interface]/statistics/tx_bytes
/sys/class/net/[interface]/statistics/rx_bytes


These are the files that store traffic stats data. The contents of a file is actually just a number (type long). For example, to get your mobile received bytes traffic :

/sys/class/net/rmnet0/statistics/rx_bytes
or
/sys/class/net/ppp0/statistics/rx_bytes


That's exactly what native code does - it reads these files and returns the number stored in it.
To get total traffic it sums rx_bytes/tx_bytes/rx_packets/tx_packets for all interfaces under /sys/class/net/.
Important note : When you turn off some interface, for example wi-fi, its config dir disappears from /sys/class/net/ dir list and when you turn it on again - the dir appears and the stats is counted from the very beginning.

UIDs stats is taken from a proc file system, which is in fact a pseudo-file system, used as an interface to kernel data structures. On an Android device (so as on linux) it is mounted at /proc.
Sent/received UID traffic stats is here :

/proc/uid_stat/[uid]/tcp_snd
and
/proc/uid_stat/[uid]/tcp_rcv


In order to gather per-interface stats for a UID, you have to listen to Connectivity changes : register a broadcast receiver filtering android.net.conn.CONNECTIVITY_CHANGE action, and then save traffic values as corresponding interface traffic.
Important note : procfs is mounted at boot time, which means that every time your device is rebooted  there're 0 traffic values for all UIDs.
You can list /proc/uid_stat/ dir right now to see which UIDs have been spending traffic since last reboot.

As you can see, if you want to get a clear picture of your traffic consumption by time, merely calling TrafficStats methods is not enough. You have to catch connectivity changes in order to save applications' usage per interface, treat device reboots, and keep an eye on network interfaces configurations mounting/unmounting (sysfs).

UIDs stats

Let's dig deeper into UIDs stats, as it might be one of your traffic stats app key features - providing traffic usage per application.
UID (User ID) is unique for each application and stays constant as long as app is not reinstalled. (Well, an application can explicitly request to share a userid with another application, but there are security restrictions around this and that's an offtopic right now).
UIDs before 10000 are system reservered. Begining with 10000 and further UIDs are applications UIDs.
Here's a static list of system UIDs which will only possibly grow, but never change existing items:

0 - Root
1000 - System
1001 - Radio
1002 - Bluetooth
1003 - Graphics
1004 - Input
1005 - Audio
1006 - Camera
1007 - Log
1008 - Compass
1009 - Mount
1010 - Wi-Fi
1011 - ADB
1012 - Install
1013 - Media
1014 - DHCP
1015 - External Storage
1016 - VPN
1017 - Keystore
1018 - USB Devices
1019 - DRM
1020 - Available
1021 - GPS
1022 - deprecated
1023 - Internal Media Storage
1024 - MTP USB
1025 - NFC
1026 - DRM RPC


There's also UID 2000 standing for shell user. For example, if you take a device screenshot using ddms, you'll get 2000 UID's traffic increased by nearly the size of the picture.
In other words, to get system UIDs stats you should gather stats for all UID's before 2000.

While getting installed applications stats the first thought coming to your mind is usually to retrieve installed apps list with their UIDs and get stats for each of them:

PackageManager pm = getPackageManager();
List<ApplicationInfo> packages = pm.getInstalledApplications(PackageManager.GET_META_DATA);
for (ApplicationInfo packageInfo : packages) {
  ... // add packageInfo.uid to UIDs list
}


but this way can take much time (even for a couple of seconds), which is inefficient. Instead, you can grep /proc/uid_stat/ to get only those UIDs that actually have spent any traffic :

File dir = new File("/proc/uid_stat/");
String[] children = dir.list();
List<integer> uids = new ArrayList<integer>();
if (children != null) {
  for (int i = 0; i < children.length; i++) {
    int uid = Integer.parseInt(children[i]);
    if ((uid >= 0 && uid < 2000) || (uid >= 10000)) {
      uids.add(uid);
    }
  }
}


Moreover, using the way above instead of getting installed apps list will let you not loose unistalled apps traffic. Probably you won't show unistalled apps in UI, but you will show in totals that there's some extra traffic besides shown apps.

Always keep in mind, that some processes in a system use another processes to do some job. Thus the traffic that you assume is used by some app may be split between the processes which the app have used. When you'll examine your apps stats you might be really surprised. For example, viewing 10MB video with Youtube application adds 200KB as Youtube app traffic and 10MB as Media traffic (system UID 1013). After adding traffic stats tool in Android ICS, they really fixed that "feature" around Media traffic, but in previous APIs get ready for that surprise.

Doubling stats bug

There's another nice present for developers who's gonna write traffic statistics tool. Some Android 2.3 based systems collect DOUBLED per-UID stats. Exactly per-UID. Totals-methods (getTotalRxBytes(), getTotalTxBytes(), ...) always return correct data. You can use totals-methods in your app to automatically detect if current device has this bug present or not right after app installation : get total traffic value by calling (getTotalRxBytes() + getTotalTxBytes()) and by summing traffic used by all apps, do it twice with some delay between calculations; and then compare subtractions : if the bug is present, then your apps traffic growth will be nearly twice larger than totals-methods growth.

Connectivity change broadcasts receiving delay

As I've already mentioned, getting applications traffic stats per network interface requires extra moves. Assume you collect stats quite frequently (each 15 seconds, for example). While collecting you define current network type : 

final NetworkInfo activeNetworkInfo = connectivityManager.getActiveNetworkInfo();
if (activeNetworkInfo == null) {
  // we are not connected anywhere now
}
if ((activeNetworkInfo.getType() == ConnectivityManager.TYPE_WIFI) ||
    (activeNetworkInfo.getType() == ConnectivityManager.TYPE_WIMAX)) {
  // that's wifi
} else {
  // that's mobile
}


and save data into corresponding storage.
Assume, device's connected to a wifi hotspot, traffic data is successfully collected and saved. Then the connectivity changes : you receive broadcast about current network is disconnected, do what you need around this event (stop collecting, for example); and in a few moments you receive a new broadcast about new network (mobile in our case) is connected , and next time you collect and save your traffic data as mobile data. Everything seems to be OK. But, sometimes you can catch such situation : BEFORE you receive a broadcast "disconnected", your device in fact is ALREADY connected to a new network, and at that moment your handler collects stats and saves it to the wrong storage :

precondition: wifi's connected
1) collect wifi traffic
2) collect wifi traffic
3) collect wifi traffic // that's all OK
4) wifi's actually disconnected
5) receive broadcast telling wifi's disconnected // do something around this
6) mobile's actually connected
7) receive broadcast telling mobile's connected // do something around this
8) collect mobile traffic
9) collect mobile traffic // that's all OK
10) mobile's actually disconnected
11) wifi's actually connected // !!!! wifi is connected before your application
                                             // gets mobile-disconnected broadcast
12) collect wifi traffic // because current network type is already wifi
                                  // while mobile traffic is expected
13) receive broadcast telling mobile's disconnected // finally... :(
14) receive broadcast telling wifi's connected

Depending on your data gathering algorithm you might loose (or duplicate) some data because of that broadcasts receiving delay.


These are all the hazards you should be aware of before writing traffic stats application on Android.

Saturday, October 29, 2011

Git on amazon ec2

Very quick solution, if you have a your_amazon_key.pem:
ssh-add /path/to/your_amazon_key.pem
After this git push proceeds successfully.

And a longer way.

Friday, September 16, 2011

Количество генов в клетке

Рассмотрим организм, обладающий одним из наименьших известных геномов - Mycoplasma genitalium.


Этот организм живет паразитом в млекопитающих, беря из них уже готовые маленькие молекулы. При этом он производит самостоятельно большие молекулы - ДНК, РНК, протеины - необходимые для базовых процессов наследования.
В его геноме, состоящем из 580070 нуктеотидных пар, насчитывается  477 генов, которые представлены 145018 байтами информации. 37 из этих генов кодируют трансферные, рибосомные и другие non-messenger  РНК. 297 генов кодируют протеины, из них 153 задействованы в репликации, транскрипции, трансляции и других подобных процессах с ДНК, РНК и протеинами; 29 - мембрана и поверхность клетки; 33 - транспортировка питательных веществ и других молекул сквозь мембрану; 71 - преобразование энергии, а также синтез и деградация маленьких молекул; 11 - регулировка деления клетки и другие процессы.

Минимальное число генов в жизнеспособной клетке на сегодняшний день составляет не менее 200-300 генов, среди которых около 60 генов являются core set  для всех живых видов без исключений.

Плазменная мембрана клетки

Еще одна универсальна фича клеток - они все окружены плазменной мембраной. Этот своеобразный барьер пропускает питательные вещества, сохраняет продукты синтеза клетки, и выводит наружу продукты распада. Не имея плазменной мембраны, клетка не смогла бы поддерживать свою целостность, оставаясь упорядоченной химической сиситемой.
Мембрана состоит из амфифильных молекул, которые, попадая в воду, спонтанно группируются в двухслойные сущности, пряча от воды свои гидрофобные части, и образуя таким образом маленькие закрытые полости, внутренее водяное содержание которых изолировано от внешней среды.
Гидрофобные части преобладающего большинства молекул мембраны являются гидрокарбоновыми полимерами (-CH2-CH2-CH2-), и их спонтанная агрегация в двухслойную полость явлется одним из многих примеров важного принципа: клетки производят молекулы, чьи химические свойства провоцируют их само-образование именно в те структуры, которые нужны клетке.
Граница клетки не может быть абсолютно непроницаемой. Поскольку клетка растет и размножается, ее мембрана должна пропускать внутрь сырье (raw materials) и выводить наружу отходы. Поэтому, во всех клетках есть специальные протеины, встроенные в мембрану, которые транспортируют определенные молекулы с одной стороны на другую.


Транспортные протеины в мембране определяют какие молекулы проходят внутрь клетки, а каталитические протеины внутри клетки определяют реакции, в которых участвуют эти молекулы.
Таким образом, путем определения протеинов, производимых в клетке, записанная в ДНК генетическая информация диктует ход протекания всех химических процессов в клетке.

Tuesday, September 6, 2011

Create GIT remote repository

# On local machine

cd foo_project
git init
git add *
git commit -m "My initial commit message"

# On remote machine (Git remote repository)

cd /var/git/
mkdir foo.git
cd foo.git
git --bare init

# On local machine

git remote add origin ssh://agolovatuk@example.com/var/git/foo.git
git push origin master
git checkout origin/master
git branch -f master origin/master
git checkout master

Some useful GIT tips

revert the last commit as it never was
# git reset --hard HEAD^

delete remote git branch
# git push origin :branchName

sync remote branches (removes tracking branch if remote branch was deleted)
# git remote prune origin

delete local branch links if they were deleted on origin
# git remote prune origin

create branch with a link to remote repository
# git checkout --track -b deployCandidate origin/deployCandidate

git config list
# git config -l

git show repository changes history
# git reflog
(to get to the some state run from reflog)
# git reset --hard HEAD@{0}

git show diff between what is commited (but not pushed) and the repository
# git diff HEAD^

colors in git put this to .git/config:
[color]
  branch = auto
  diff = auto
  status = auto
[color "branch"]
  current = yellow reverse
  local = yellow
  remote = green
[color "diff"]
  meta = yellow bold
  frag = magenta bold
  old = red bold
  new = green bold
[color "status"]
  added = yellow
  changed = green
  untracked = cyan

Setup mongodb replica sets

Here is a sample how to setup and run mongodb replica sets with 3 members (2 for data and 1 arbiter). We'll run on localhost on ports 27017, 27018 and arbiter on 27019.

# download mongodb here
# create data and log directories for all replica members

touch /var/log/mongodb/mongo1.log
touch /var/log/mongodb/mongo2.log
touch /var/log/mongodb/mongo3.log


mkdir /var/lib/mongodb/data1/db
mkdir /var/lib/mongodb/data2/db
mkdir /var/lib/mongodb/data3/db

# create configuration files for members

/etc/mongo1.conf:

dbpath = /var/lib/mongodb/data1/db
bind_ip = 127.0.0.1
noauth = true
verbose = true
port=27017
logpath=/var/log/mongodb/mongo1.log
logappend=true
replSet=mySetName

/etc/mongo2.conf:

dbpath = /var/lib/mongodb/data2/db
bind_ip = 127.0.0.1
noauth = true
verbose = true
port=27018
logpath=/var/log/mongodb/mongo2.log
logappend=true
replSet=mySetName

/etc/mongo3.conf:

dbpath = /var/lib/mongodb/data3/db
bind_ip = 127.0.0.1
noauth = true
verbose = true
port=27019
logpath=/var/log/mongodb/mongo3.log
logappend=true
replSet=mySetName

# run all members (use --rest if you want to use web admin UI, http://localhost:28017)

mongod --rest --config /etc/mongo1.conf &
mongod --rest --config /etc/mongo2.conf &
mongod --rest --config /etc/mongo3.conf &

# init replica sets in mongo shell

$ mongo
> cfg = {_id:"mySetName", members:[
{_id:0,host:"127.0.0.1:27017"},
{_id:1,host:"127.0.0.1:27018"},
{_id:2,host:"127.0.0.1:27019",arbiterOnly:true}
]}
> rs.initiate(cfg);
> rs.status()