遍历数组时进行修改

21 February 2014

今天在用语法树工具UglifyJS批量修改JavaScript文件时，碰到了数组forEach遍历时进行动态修改的问题，导致后面的一些数组元素没有遍历到。

场景简化一下：

//----------- Start of Background Code ------------//
function AST_Block(props){
    if(props){
        this.body = props.body; //array of statement, []
    }
}

function walk_body(node, walker){
    var body = node.body;
    body.forEach(function(statement, i){
        walker(statement, i, body)
    });
}

AST_Block.prototype.walk = function(walker){
    return walk_body(this, walker);
}
    
var ast_block1 = new AST_Block({
    body:  [
        "ast_var",
        "ast_assign",
        "ast_new",
        "ast_new"
    ]
});

//------------ End of Background Code  ---------------//

function insertAfterWalker(statement, i, body){
    // code here    
}

ast_block1.walk(insertAfterWalker);
console.log(ast_block1.body);

需求就是补充上面insertAfterWalker函数，使得在AST_Block的实例上调用ast_block1.walk(insertAfterWalker)之后， AST_Block的body数组中所有"ast_new"元素后面添加一个元素"ast_assign"。

当时第一反应就是遍历到当前元素，发现是目标元素，就在自己后面执行插入：

function insertAfterWalker(statement, i, body){
    if(statement == "ast_new"){
        body.splice(i + 1, 0, "ast_assign");
    }   
}

但是输出结果是：["ast_var", "ast_assign", "ast_new", "ast_assign", "ast_new"]

只插入了一个。跟了一下代码，if(statement == "ast_new")这一行判断为true后，再次走到这里时statement的值是"ast_assign",然后就退出遍历了。

困惑了一会儿，后来想到了可能与数组遍历的实现有关。去看了一下UglifyJS里面对于的代码，数组遍历是使用和上面一样的forEach。虽然没法看到数组遍历的底层实现，但是怀疑是使用了for(var i = 0, len = this.length; i < len; i++){}这种形式的实现。

这种实现在遍历开始之前，就已经确定了遍历的下标范围和顺序，无论遍历过程中数组会不会被修改(上面的场景描述部分写了一大坨Background Code就是为了告知遍历顺序是我们决定不了的)。暂时不讨论这样遍历好不好，先说说可能的解决方案。

1. 在数组的末尾插入

只需要把insertAfterWalker中的i + 1用body.length替换即可。这样能达到为每个找到的目标元素都能插入一条"ast_assign"，但是位置不对。如果对插入位置只要求在目标元素之后，但不要求紧随其后的场景，这种方法正适合。

2. 数组遍历完成之后，再进行插入

这个时候需要记录插入位置，插入元素和要插入的数组。一旦数组遍历完，就按插入位置，从后往前逐一插入。

var lastArrayTraversed;// 记录上一个遍历的数组
var lastArraysInsertStack; //记录上一个遍历的数组中所有待插入的位置信息和元素

function insertToLastArray(){
    var insertInfo;
    if(lastArrayTraversed && lastArraysInsertStack){
        while(lastArraysInsertStack.length){
            //由于数组遍历的顺序是由小到大，所以stack中pop的结果是从大到小，
            //否则要排序。
            insertInfo = lastArraysInsertStack.pop();
            lastArrayTraversed.splice(insertInfo.index, 0,
                insertInfo.statement)
        }
    }
}

function insertAfterWalker(statement, i, body){
    if(statement == "ast_new"){
        if(body != lastArrayTraversed){
            insertToLastArray();
            lastArrayTraversed = body;
            lastArraysInsertStack = [];
        }
        lastArraysInsertStack.push({
            index: i + 1,
            statement: "ast_assign"
        });
    }   
}

ast_block1.walk(insertAfterWalker);
insertToLastArray(); 

console.log(ast_block1.body)
//>["ast_var", "ast_assign", "ast_new", "ast_assign", "ast_new", "ast_assign"]

上面的方案就能满足需求了。

其实如果能自己控制遍历顺序，那么从后往前遍历，那么就可以在目标元素后面直接插入。

如果需求场景是在遍历到目标元素将目标元素删除，那也要从后往前删。

或者如果能自己控制数组遍历顺序，从后向前遍历的话，那就可以就地删除。

补充：关于数组遍历时修改的可能结果与方案

数组遍历时进行修改，这个也不算是个大问题。但是很多基础框架里面都会有数组遍历的场景，之前在监听自定义事件的时候，对于同一个自定义事件名，多个事件处理程序有的执行了，有的没有执行。最后费了大半天的劲，发现是与基础框架里面实现的事件触发/解绑机制有关，在事件处理程序里面把自己给取消监听了，而取消监听的实现是直接在所在位置上进行`splice(i,1)操作，然后紧随其后的那一个事件处理程序（原本在i+1位置上，现在被挪到第i个位置上，而下一个遍历位置仍然是i+1）就被跳过触发了，如果在本轮事件触发过程中，又有同类事件的绑定监听操作，那么新绑定的事件处理程序也会立即被触发。

后来我们的自定义事件系统把解绑用list[i] = null代替splice(i,1)，并在事件触发时检查当前位置的元素是否为null，如果为null，则略过。但是，什么时候把null所在的位置清理掉呢？目前是没清理，如果清理，是不是又要在遍历的同时进行删除？这就回到了上面讨论的情形，倒序遍历进行清理。

有的框架在触发事件时，会先把事件处理程序的队列给复制一份，然后在复制的哪一个数组上进行遍历。好像DOM原生的事件机制就能保证在某个事件处理过程中，继续添加新的同类事件处理程序不会在本轮被触发，解绑自己也不会影响后面的事件处理程序(TODO 看DOM API标准/写页面进行验证，看一下多线程语言的触发/绑定/解绑的处理策略)。

关于在遍历到的元素上调用事件处理程序，加不加try catch也会影响遍历过程，该不该加，也是个问题。

场景与方案分类汇总

遍历顺序固定(正序/倒序)的场景
可以自己决定遍历顺序（遍历过程中移动循环变量i）的场景

--------------- 2014-07-15 分割线 ---------------

补充 DOM Level 2 Events: 1.3.1. Event registration interfaces

EventTarget.addEventListener

If an EventListener is added to an EventTarget while it is processing an event, it will not be triggered by the current actions but may be triggered during a later stage of event flow, such as the bubbling phase.
If multiple identical EventListeners are registered on the same EventTarget with the same parameters the duplicate instances are discarded. They do not cause the EventListener to be called twice and since they are discarded they do not need to be removed with the removeEventListener method.

按DOM Event API标准：事件触发过程中注册在EventTarget的事件不会在本阶段触发过程中被执行。

TODO：如果元素A上注册的事件处理器clickListenerA在执行时，往元素A的父元素B上再注册一个同类的事件处理器clickListenerB，这个新注册的clickListenerB会在clickListenerA执行完毕后（冒泡到B）立即被执行吗？

2014-07-16：用chrome/safari进行验证（jsFiddle地址 here），结论如下：

1. 元素A的事件处理程序clickListenerA在执行过程中往A上再注册一个事件处理程序clickListenerA2,新注册的clickListenerA2只会在下次点击时才执行。

2. 元素A的事件处理程序clickListenerA在执行过程中往其父元素B上注册一个事件处理程序clickListenerB,新注册的clickListenerB会在clickListenerA执行完冒泡到B上时被立即执行，不需要等到下次点击。

写代码验证了一下，才真正搞明白but may be triggered during a later stage of event flow, such as the bubbling phase.的意思。

dom标准的事件触发机制，看上去很像是先把dom元素上的事件处理程序队列复制一份，然后在复制后的队列上进行遍历并执行，这样就避免了遍历时修改的问题。

关于执行时加不加try catch，W3C DOM Level 2标准上写的是：

Any exceptions thrown inside an EventListener will not stop propagation of the event. It will continue processing any additional EventListener in the described manner.

W3C DOM Level 3标准:

Exceptions thrown inside event listeners must not stop the propagation of the event or affect the propagation path. The exception itself must not propagate outside the scope of the event handler.

根据标准，事件处理程序之间应该是互不干扰的，因此应该加try catch。

完。

work 43