感觉写这个标题,明眼人一看可能觉得这不就是死锁吗?但是今天说的情况还不是真正意义上的死锁,顶多算是宏观意义上的死锁。而且这个情况使用jstack工具查看不到死锁的信息。

使用线程池不当,导致的线程相互等待

今天的例子

public class Test {
    static ThreadPoolExecutor threadPoolExecutor = new ThreadPoolExecutor(1, 1, 0L, TimeUnit.MILLISECONDS, new LinkedBlockingQueue<Runnable>());
    public static void main(String[] args) throws ExecutionException, InterruptedException {
        Future<String> outterFuture = threadPoolExecutor.submit(() -> {            Future<String> innerFuture = threadPoolExecutor.submit(() -> {                System.out.println("inner finish");
                return "inner finish";
            });            String s = innerFuture.get();
            System.out.println("outter get inner finish:" + s);
            System.out.println("outter finish");
            return "outter finish";
        });        String s = outterFuture.get();
        System.out.println("process get outter finish:" + s);
    }}

意思就是提交了一个线程1,线程1里面提交了一个线程2,线程1等待线程2的结果。可能有些人很明显就看出问题了,当然这个是简化后的结果,实际情况线程池使用可能比这隐晦的多。执行这个方法,直接就会导致两个线程相互等待。

 

jstack现象

2020-09-12 09:52:41
Full thread dump Java HotSpot(TM) 64-Bit Server VM (25.131-b11 mixed mode):
"Attach Listener" #11 daemon prio=9 os_prio=0 tid=0x00007fbf38001000 nid=0x37c waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE"pool-1-thread-1" #10 prio=5 os_prio=0 tid=0x00007fbf9819c800 nid=0x7932 waiting on condition [0x00007fbf77af9000]
   java.lang.Thread.State: WAITING (parking)	at sun.misc.Unsafe.park(Native Method)	- parking to wait for  <0x00000006c8e08478> (a java.util.concurrent.FutureTask)
	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
	at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:429)
	at java.util.concurrent.FutureTask.get(FutureTask.java:191)
	at Test.lambda$main$1(Test.java:24)
	at Test$$Lambda$1/1418481495.call(Unknown Source)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:748)
"Service Thread" #9 daemon prio=9 os_prio=0 tid=0x00007fbf980d2000 nid=0x7930 runnable [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE"C1 CompilerThread3" #8 daemon prio=9 os_prio=0 tid=0x00007fbf980c7000 nid=0x792f waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE"C2 CompilerThread2" #7 daemon prio=9 os_prio=0 tid=0x00007fbf980c4800 nid=0x792e waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE"C2 CompilerThread1" #6 daemon prio=9 os_prio=0 tid=0x00007fbf980c3000 nid=0x792d waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE"C2 CompilerThread0" #5 daemon prio=9 os_prio=0 tid=0x00007fbf980c0000 nid=0x792c waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE"Signal Dispatcher" #4 daemon prio=9 os_prio=0 tid=0x00007fbf980be800 nid=0x792b runnable [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE"Finalizer" #3 daemon prio=8 os_prio=0 tid=0x00007fbf9808b800 nid=0x792a in Object.wait() [0x00007fbf84371000]
   java.lang.Thread.State: WAITING (on object monitor)	at java.lang.Object.wait(Native Method)	- waiting on <0x00000006c8e01a60> (a java.lang.ref.ReferenceQueue$Lock)
	at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:143)
	- locked <0x00000006c8e01a60> (a java.lang.ref.ReferenceQueue$Lock)
	at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:164)
	at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:209)
"Reference Handler" #2 daemon prio=10 os_prio=0 tid=0x00007fbf98086800 nid=0x7929 in Object.wait() [0x00007fbf84472000]
   java.lang.Thread.State: WAITING (on object monitor)	at java.lang.Object.wait(Native Method)	- waiting on <0x00000006c8e0f950> (a java.lang.ref.Reference$Lock)
	at java.lang.Object.wait(Object.java:502)
	at java.lang.ref.Reference.tryHandlePending(Reference.java:191)
	- locked <0x00000006c8e0f950> (a java.lang.ref.Reference$Lock)
	at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:153)
"main" #1 prio=5 os_prio=0 tid=0x00007fbf98008800 nid=0x791e waiting on condition [0x00007fbf9e635000]
   java.lang.Thread.State: WAITING (parking)	at sun.misc.Unsafe.park(Native Method)	- parking to wait for  <0x00000006c8e177b8> (a java.util.concurrent.FutureTask)
	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
	at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:429)
	at java.util.concurrent.FutureTask.get(FutureTask.java:191)
	at Test.main(Test.java:31)
"VM Thread" os_prio=0 tid=0x00007fbf9807f000 nid=0x7928 runnable 
"GC task thread#0 (ParallelGC)" os_prio=0 tid=0x00007fbf9801d800 nid=0x791f runnable 
"GC task thread#1 (ParallelGC)" os_prio=0 tid=0x00007fbf9801f800 nid=0x7920 runnable 
"GC task thread#2 (ParallelGC)" os_prio=0 tid=0x00007fbf98021800 nid=0x7921 runnable 
"GC task thread#3 (ParallelGC)" os_prio=0 tid=0x00007fbf98023000 nid=0x7922 runnable 
"GC task thread#4 (ParallelGC)" os_prio=0 tid=0x00007fbf98025000 nid=0x7923 runnable 
"GC task thread#5 (ParallelGC)" os_prio=0 tid=0x00007fbf98027000 nid=0x7925 runnable 
"GC task thread#6 (ParallelGC)" os_prio=0 tid=0x00007fbf98028800 nid=0x7926 runnable 
"GC task thread#7 (ParallelGC)" os_prio=0 tid=0x00007fbf9802a800 nid=0x7927 runnable 
"VM Periodic Task Thread" os_prio=0 tid=0x00007fbf980d5000 nid=0x7931 waiting on condition 
JNI global references: 201

通过jstack没有主动发现死锁情况。由于真实情况业务和组件的线程很多更难判断。

线程池参数解析

下面是ThreadPoolExecutor线程池参数最多的构造函数

public ThreadPoolExecutor(int corePoolSize,
                              int maximumPoolSize,
                              long keepAliveTime,
                              TimeUnit unit,
                              BlockingQueue<Runnable> workQueue,
                              ThreadFactory threadFactory,
                              RejectedExecutionHandler handler) {
       ......    }

函数的参数含义如下(具体细节请自行百度):

  • corePoolSize: 线程池核心线程数
  • maximumPoolSize:线程池最大数
  • keepAliveTime: 空闲线程存活时间
  • unit: 时间单位
  • workQueue: 线程池所使用的缓冲队列
  • threadFactory:线程池创建线程使用的工厂
  • handler: 线程池对拒绝任务的处理策略

原因分析1

例子中定义的核心线程数和最大线程数都是1,说明线程池只能同时有一个线程在执行。然后定义了一个线程队列存放待执行的线程。问题就在于,提交线程outter,该线程就占据了核心线程数1,然后线程outter里面提交了一个线程inner,并等待线程inner的执行结果。而线程inner一直没执行,因为线程inner需要等待线程池当前执行线程数小于最大线程数之后才能,在队列中等待的线程。导致了线程outter占据了线程池能执行任务的最大数量,等待线程inner的结果,线程inner等待线程池来执行而未返回结果。

原因分析2

其实通过jstack 的日志也是能发现问题的,如名为Reference Handler和名为Finalizer的线程中,自生waiting on和locked的条件是相同的,就是自己等自己,出现了一直等待。

死锁

这里先温习一下死锁的情况。

死锁条件

  1. 互斥使用,即当资源被一个线程使用(占有)时,别的线程不能使用
  2. 不可抢占,资源请求者不能强制从资源占有者手中夺取资源,资源只能由资源占用者主动释放
  3. 请求和保持,即当资源的请求者在请求其他的资源的同时保持对原有资源的占有
  4. 循环等待,即存在一个等待队列: P1占有P2的资源,P2占有P3的资源,P3占有P1的资源。

死锁例子

public class DeadLock implements Runnable{
    private static Object obj1 = new Object();
    private static Object obj2 = new Object();
    private boolean flag;
    public DeadLock(boolean flag){
        this.flag = flag;
    }    @Override
    public void run(){
        System.out.println(Thread.currentThread().getName() + "运行");
        if(flag){
            synchronized(obj1){
                System.out.println(Thread.currentThread().getName() + "已经锁住obj1");
                try {
                    Thread.sleep(1000);
                } catch (InterruptedException e) {
                    e.printStackTrace();                }                synchronized(obj2){
                    // 执行不到这里
                    System.out.println("1秒钟后,"+Thread.currentThread().getName()
                            + "锁住obj2");
                }
            }
        }else{
            synchronized(obj2){
                System.out.println(Thread.currentThread().getName() + "已经锁住obj2");
                try {
                    Thread.sleep(1000);
                } catch (InterruptedException e) {
                    e.printStackTrace();
                }
                synchronized(obj1){
                    // 执行不到这里
                    System.out.println("1秒钟后,"+Thread.currentThread().getName()
                            + "锁住obj1");
                }
            }
        }
    }
    public static void main(String[] args) {
        Thread t1 = new Thread(new DeadLock(true), "线程1");
        Thread t2 = new Thread(new DeadLock(false), "线程2");
        t1.start();
        t2.start();
    }
}

jstack现象

Full thread dump Java HotSpot(TM) 64-Bit Server VM (25.171-b11 mixed mode):
"DestroyJavaVM" #13 prio=5 os_prio=0 tid=0x0000000003866000 nid=0x2ffc waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE"线程2" #12 prio=5 os_prio=0 tid=0x000000001e6b8000 nid=0x20e4 waiting for monitor entry [0x000000001f8bf000]
   java.lang.Thread.State: BLOCKED (on object monitor)        at com.wp.security.springboot.DeadLock.run(DeadLock.java:42)
        - waiting to lock <0x000000076b47b980> (a java.lang.Object)
        - locked <0x000000076b47b990> (a java.lang.Object)
        at java.lang.Thread.run(Thread.java:748)
"线程1" #11 prio=5 os_prio=0 tid=0x000000001eec8800 nid=0x11d8 waiting for monitor entry [0x000000001f7bf000]
   java.lang.Thread.State: BLOCKED (on object monitor)        at com.wp.security.springboot.DeadLock.run(DeadLock.java:28)
        - waiting to lock <0x000000076b47b990> (a java.lang.Object)
        - locked <0x000000076b47b980> (a java.lang.Object)
        at java.lang.Thread.run(Thread.java:748)
"Service Thread" #10 daemon prio=9 os_prio=0 tid=0x000000001e607000 nid=0x3888 runnable [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE"C1 CompilerThread2" #9 daemon prio=9 os_prio=2 tid=0x000000001e57c800 nid=0x1a1c waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE"C2 CompilerThread1" #8 daemon prio=9 os_prio=2 tid=0x000000001e56f000 nid=0x37b4 waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE"C2 CompilerThread0" #7 daemon prio=9 os_prio=2 tid=0x000000001e56e800 nid=0x1eb0 waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE"Monitor Ctrl-Break" #6 daemon prio=5 os_prio=0 tid=0x000000001e56a800 nid=0x2298 runnable [0x000000001e9be000]
   java.lang.Thread.State: RUNNABLE        at java.net.SocketInputStream.socketRead0(Native Method)        at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
        at java.net.SocketInputStream.read(SocketInputStream.java:171)
        at java.net.SocketInputStream.read(SocketInputStream.java:141)
        at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284)
        at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326)
        at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)
        - locked <0x000000076b4cf910> (a java.io.InputStreamReader)
        at java.io.InputStreamReader.read(InputStreamReader.java:184)
        at java.io.BufferedReader.fill(BufferedReader.java:161)
        at java.io.BufferedReader.readLine(BufferedReader.java:324)
        - locked <0x000000076b4cf910> (a java.io.InputStreamReader)
        at java.io.BufferedReader.readLine(BufferedReader.java:389)
        at com.intellij.rt.execution.application.AppMainV2$1.run(AppMainV2.java:61)
"Attach Listener" #5 daemon prio=5 os_prio=2 tid=0x000000001cf8a000 nid=0x1e84 waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE"Signal Dispatcher" #4 daemon prio=9 os_prio=2 tid=0x000000001cf74000 nid=0x2330 runnable [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE"Finalizer" #3 daemon prio=8 os_prio=1 tid=0x000000001cf4e800 nid=0x4168 in Object.wait() [0x000000001e2bf000]
   java.lang.Thread.State: WAITING (on object monitor)        at java.lang.Object.wait(Native Method)        - waiting on <0x000000076b208ed0> (a java.lang.ref.ReferenceQueue$Lock)
        at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:143)
        - locked <0x000000076b208ed0> (a java.lang.ref.ReferenceQueue$Lock)
        at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:164)
        at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:212)
"Reference Handler" #2 daemon prio=10 os_prio=2 tid=0x0000000003956000 nid=0x3478 in Object.wait() [0x000000001e1bf000]
   java.lang.Thread.State: WAITING (on object monitor)        at java.lang.Object.wait(Native Method)        - waiting on <0x000000076b206bf8> (a java.lang.ref.Reference$Lock)
        at java.lang.Object.wait(Object.java:502)
        at java.lang.ref.Reference.tryHandlePending(Reference.java:191)
        - locked <0x000000076b206bf8> (a java.lang.ref.Reference$Lock)
        at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:153)
"VM Thread" os_prio=2 tid=0x000000001cf27000 nid=0x47a4 runnable
"GC task thread#0 (ParallelGC)" os_prio=0 tid=0x000000000387b800 nid=0x1ec8 runnable
"GC task thread#1 (ParallelGC)" os_prio=0 tid=0x000000000387d000 nid=0x47a0 runnable
"GC task thread#2 (ParallelGC)" os_prio=0 tid=0x000000000387e800 nid=0x3364 runnable
"GC task thread#3 (ParallelGC)" os_prio=0 tid=0x0000000003881800 nid=0x4848 runnable
"VM Periodic Task Thread" os_prio=2 tid=0x000000001e5e5800 nid=0x1318 waiting on condition
JNI global references: 12
Found one Java-level deadlock:============================="线程2":
  waiting to lock monitor 0x000000001cf4b598 (object 0x000000076b47b980, a java.lang.Object),
  which is held by "线程1"
"线程1":
  waiting to lock monitor 0x000000001cf4ded8 (object 0x000000076b47b990, a java.lang.Object),
  which is held by "线程2"
Java stack information for the threads listed above:==================================================="线程2":
        at com.wp.security.springboot.DeadLock.run(DeadLock.java:42)
        - waiting to lock <0x000000076b47b980> (a java.lang.Object)
        - locked <0x000000076b47b990> (a java.lang.Object)
        at java.lang.Thread.run(Thread.java:748)
"线程1":
        at com.wp.security.springboot.DeadLock.run(DeadLock.java:28)
        - waiting to lock <0x000000076b47b990> (a java.lang.Object)
        - locked <0x000000076b47b980> (a java.lang.Object)
        at java.lang.Thread.run(Thread.java:748)
Found 1 deadlock.

这里看线程1和线程2中的waiting to lock 和locked 后的资源,一目了然。而且jstack结尾也有提示发现死锁Found one Java-level deadlock

为什么jstack不能主动发现死锁

在线程池的例子中并没有明确的是通过占用锁,导致死锁,所以这个例子中不算死锁。而死锁的例子很明确,就是两个线程相互抢占锁导致的,所以这个就是死锁,在jstack中会发现死锁。

如何判断类似于死锁的相互等待

出现类似这种情况,在jstack不提示的情况下,通过分析业务逻辑的线程确实难以发现问题所在。我对比了一下这两个例子的线程dump,注意到waiting on、waiting to lock、parking to wait for、locked这几个关键字。在百度查了一下。

  • waiting on condition表示非Object.wait的条件等待,比如说你调用了sleep,park等操作
  • parking to wait for 就是调用了park动作了
  • waiting to lock 就是等待一个锁对象

死锁的例子中jstack之所以能检测出死锁,我猜估计他是通过waiting to lock 和 locked 判断,也就是真正意义上的死锁。而waiting on和locked,是今天讨论线程池中线程等待出现的情况。如果想判断线程是否出现这种类似于死锁的相互等待和死锁,其实需要判断所有的waiting和locked条件中是否相同。

如果感觉本文对你有一点帮助,点关注一起学习进步~

也可以关注我公众号,上面有更多技术干货文章以及相关资料共享