技术讨论 | 自动化Web渗透Payload提取技术

发布时间：2018-10-10 05:49:35 所属栏目：业界来源：zhanghaoyil

导读：副标题#e# 【新品产上线啦】51CTO播客，随时随地，碎片化学习一、写在前面做Web安全已经三四年了，从最初的小白到今天的初探门路，小鲜肉已经熬成了油腻大叔。Web安全是一个日新月异的朝阳领域，每天的互联网上都在发生着从未暴露的0 Day和N Day攻击。这

熟悉TF-IDF的同学一定有疑问了，你这TF-IDF的字典也会很大呀，如果样本量很大而且有各式各样的参数value，你的特征向量岂不是稀疏得不行了?对于这个问题，我有一个解决方案，也就是将所有的TF-IDF进一步加以处理，对参数key相同的TF-IDF项进行求和。设参数key集合为K={k1, k2, …, kn}，TF-IDF字典为集合x={x1, x2, …, xm}。则每个参数key的特征值为：

vn = ∑TF-IDFxn   xn∈{x | x startswith ‘kn=’}

具体代码在vectorize/vectorizer.py中：

for path, strs in path_buckets.items(): 
        if not strs: 
            continue 
        vectorizer = TfidfVectorizer(analyzer='word', token_pattern=r"(?u)bSS+b") 
        try: 
            tfidf = vectorizer.fit_transform(strs) 
            #putting same key's indices together 
            paramindex = {} 
            for kv, index in vectorizer.vocabulary.items(): 
                k = kv.split('=')[0] 
                if k in param_index.keys(): 
                    param_index[k].append(index) 
                else: 
                    param_index[k] = [index] 
            #shrinking tfidf vectors 
            tfidf_vectors = [] 
            for vector in tfidf.toarray(): 
                v = [] 
                for param, index in param_index.items(): 
                    v.append(np.sum(vector[index])) 
                tfidf_vectors.append(v) 
            #other features 
            other_vectors = [] 
            for str in strs: 
                ov = [] 
                kvs = str.split(' ')[:-1] 
                lengths = np.array(list(map(lambda x: len(x), kvs))) 
                #param count 
                ov.append(len(kvs)) 
                #mean kv length 
                ov.append(np.mean(lengths)) 
                #max kv length 
                ov.append(np.max(lengths)) 
                #min kv length 
                ov.append(np.min(lengths)) 
                #kv length std 
                ov.append(np.std(lengths)) 
                other_vectors.append(ov) 
            tfidf_vectors = np.array(tfidf_vectors) 
            other_vectors = np.array(other_vectors) 
            vectors = np.concatenate((tfidf_vectors, other_vectors), axis=1)

（编辑：惠州站长网）

【声明】本站内容均来自网络，其相关言论仅代表作者个人观点，不代表本站立场。若无意侵犯到您的权利，请及时与联系站长删除相关内容!

4/13

首页

尾页