(1)CiBuildsCrawller.py: 从数据库中读取builds_id,然后爬取builds文件;
(2)CiJobsCrawller.py: 解析builds文件,读取每个build下的job_id,然后根据job_id爬取job的log文件,如果爬取失败就将missing的job_id存到单独的missing表,所有爬取任务结束后,重新扫描missing表的job_id,继续爬取(爬取成功删除missing表里对应的job_id),直到所有job都爬取完毕;
(3)CiJobTestsParser.py:读取job文件,解析基本的test信息;
(4)CiJobTestsFilesParser.py: 进一步解析test的文件对应信息
数据库表结构:
(1)ci_builds_jobs_yu:
CREATE TABLE `ci_builds_jobs_yu` (
`Id` int(11) NOT NULL AUTO_INCREMENT,
`repo_id` int(11) DEFAULT NULL,
`builds_id` int(11) DEFAULT NULL,
`job_id` int(11) DEFAULT NULL,
PRIMARY KEY (`Id`)
) ENGINE=MyISAM AUTO_INCREMENT=45620 DEFAULT CHARSET=utf8;
(2)ci_builds_yu:
CREATE TABLE `ci_builds_yu` (
`Id` int(11) NOT NULL AUTO_INCREMENT,
`repo_id` int(11) DEFAULT NULL,
`builds_id` int(11) DEFAULT NULL,
PRIMARY KEY (`Id`)
) ENGINE=MyISAM AUTO_INCREMENT=7649 DEFAULT CHARSET=utf8;
(3)ci_jobs_tests:
CREATE TABLE `ci_jobs_tests` (
`Id` int(11) NOT NULL AUTO_INCREMENT,
`repo_id` int(11) DEFAULT NULL,
`builds_id` int(11) DEFAULT NULL,
`job_id` int(11) DEFAULT NULL,
`test_id` int(11) DEFAULT NULL,
`running_command` longtext,
`run_options` int(11) DEFAULT NULL,
`running` longtext,
`finished_time` float DEFAULT NULL,
`runs_time` float DEFAULT NULL,
`assertions_time` float DEFAULT NULL,
`runs_count` int(11) DEFAULT NULL,
`assertions_count` int(11) DEFAULT NULL,
`failures_count` int(11) DEFAULT NULL,
`errors_count` int(11) DEFAULT NULL,
`skips_count` int(11) DEFAULT NULL,
PRIMARY KEY (`Id`),
KEY `job_id` (`job_id`),
KEY `builds_id` (`builds_id`),
KEY `test_id` (`test_id`)
) ENGINE=MyISAM AUTO_INCREMENT=714284 DEFAULT CHARSET=utf8;
(4) ci_jobs_tests_file:
CREATE TABLE `ci_jobs_tests_file` (
`Id` int(11) NOT NULL AUTO_INCREMENT,
`repo_id` int(11) DEFAULT NULL,
`builds_id` int(11) DEFAULT NULL,
`job_id` int(11) DEFAULT NULL,
`test_id` int(11) DEFAULT NULL,
`running_file` longtext,
PRIMARY KEY (`Id`),
KEY `job_id` (`builds_id`),
KEY `builds_id` (`builds_id`),
KEY `test_id` (`test_id`)
) ENGINE=MyISAM AUTO_INCREMENT=8315307 DEFAULT CHARSET=utf8 ROW_FORMAT=DYNAMIC;