string table的本质是hashtable,而hashtable的性能和桶的个数密切相关,对于string table进行调优其实就是要对hashtable的桶的个数进行调节。

/**
 * 演示串池大小对性能的影响
 *  -XX:+PrintStringTableStatistics
 */
public class Demo1_24 {

    public static void main(String[] args) throws IOException {
        try (BufferedReader reader = new BufferedReader(new InputStreamReader(new FileInputStream("linux.words"), "utf-8"))) {
            String line = null;
            long start = System.nanoTime();
            while (true) {
                line = reader.readLine();
                if (line == null) {
                    break;
                }
                line.intern();
            }
            System.out.println("cost:" + (System.nanoTime() - start) / 1000000);
        }


    }
}

配置并运行上列代码,打印信息如下。

cost:439
SymbolTable statistics:
Number of buckets       :     20011 =    160088 bytes, avg   8.000
Number of entries       :     13697 =    328728 bytes, avg  24.000
Number of literals      :     13697 =    609024 bytes, avg  44.464
Total footprint         :           =   1097840 bytes
Average bucket size     :     0.684
Variance of bucket size :     0.684
Std. dev. of bucket size:     0.827
Maximum bucket size     :         6
StringTable statistics:
Number of buckets       :     60013 =    480104 bytes, avg   8.000
Number of entries       :    481494 =  11555856 bytes, avg  24.000
Number of literals      :    481494 =  29750344 bytes, avg  61.788
Total footprint         :           =  41786304 bytes
Average bucket size     :     8.023
Variance of bucket size :     8.084
Std. dev. of bucket size:     2.843
Maximum bucket size     :        23

配置参数-XX:StringTableSize=200000,打印信息如下。

cost:393
SymbolTable statistics:
Number of buckets       :     20011 =    160088 bytes, avg   8.000
Number of entries       :     13697 =    328728 bytes, avg  24.000
Number of literals      :     13697 =    609024 bytes, avg  44.464
Total footprint         :           =   1097840 bytes
Average bucket size     :     0.684
Variance of bucket size :     0.684
Std. dev. of bucket size:     0.827
Maximum bucket size     :         6
StringTable statistics:
Number of buckets       :    200000 =   1600000 bytes, avg   8.000
Number of entries       :    481494 =  11555856 bytes, avg  24.000
Number of literals      :    481494 =  29750344 bytes, avg  61.788
Total footprint         :           =  42906200 bytes
Average bucket size     :     2.407
Variance of bucket size :     2.420
Std. dev. of bucket size:     1.556
Maximum bucket size     :        12

配置参数-XX:StringTableSize=1009,打印信息如下。

cost:4870
SymbolTable statistics:
Number of buckets       :     20011 =    160088 bytes, avg   8.000
Number of entries       :     16327 =    391848 bytes, avg  24.000
Number of literals      :     16327 =    698456 bytes, avg  42.779
Total footprint         :           =   1250392 bytes
Average bucket size     :     0.816
Variance of bucket size :     0.811
Std. dev. of bucket size:     0.901
Maximum bucket size     :         6
StringTable statistics:
Number of buckets       :      1009 =      8072 bytes, avg   8.000
Number of entries       :    482764 =  11586336 bytes, avg  24.000
Number of literals      :    482764 =  29845512 bytes, avg  61.822
Total footprint         :           =  41439920 bytes
Average bucket size     :   478.458
Variance of bucket size :   432.042
Std. dev. of bucket size:    20.786
Maximum bucket size     :       547

可以发现,当桶的数量更多时,哈希冲突的可能性会减少,这样入池的时间会更少。

为什么要使用string table来存储字符串呢?因为这样可以节省空间,避免重复创建字符串对象。网络上流传twitter中存储用户信息包含地址项,如果不使用string table存储,需要约30G内存,但是这些地址可能包含大量重复地址,可能很多个用户都是来自于北京市中关村,于是twitter将地址信息入池,由string table创建存储,将这个内存空间降低至数十M。

通过下面实例来感受这一过程。

/**
 * 演示 intern 减少内存占用
 * -XX:StringTableSize=200000 -XX:+PrintStringTableStatistics
 * -Xsx500m -Xmx500m -XX:+PrintStringTableStatistics -XX:StringTableSize=200000
 */
public class Demo1_25 {

    public static void main(String[] args) throws IOException {

        List<String> address = new ArrayList<>();
        System.in.read();
        for (int i = 0; i < 10; i++) {
            try (BufferedReader reader = new BufferedReader(new InputStreamReader(new FileInputStream("linux.words"), "utf-8"))) {
                String line = null;
                long start = System.nanoTime();
                while (true) {
                    line = reader.readLine();
                    if(line == null) {
                        break;
                    }
                    address.add(line);
                }
                System.out.println("cost:" +(System.nanoTime()-start)/1000000);
            }
        }
        System.in.read();


    }
}

在键盘键入前,没有读进行数据读取操作,键入后则进行了数据读取并存入了address中。在这两个时间节点采用jvisualvm中sampler进行取样。结果如下。

图片说明
图片说明

修改代码address.add()

address.add(line.intern());

图片说明

图片说明