Copyright Infringement and Trade Secret Misappropriation of Software

Copyright infringement in the consumer entertainment market has drawn considerable media attention. Less well publicized are the growing number of cases involving copyright infringement of business application software by former employees. Similar cases between former joint venture partners also arise.

Identifying misappropriation of Intellectual Property (IP) in the form of software is rather different than for digital assets in the consumer market. The challenge arises primarily from the fact that software can be changed in many ways, often almost beyond recognition, and yet still preserve its commercial value. That is its literal expression can be changed without effecting its concept. Furthermore, such changes can be propogated throughout a software program with relatively little cost or effort. This makes the problem of identifying and then providing evidence of copyright infringement or trade secret misappropriation in software to a standard acceptable in court more difficult.

Twenty years of case history have established a number of different approaches to assess copyright infringement in software. These have included expert evaluation of the quantitative and qualitative evidence of similarities in structure, sequence, and organization of the software. More widely the Altai Test, or abstraction-filtration-comparison test, used in Computer Associates International, Inc. v. Altai,Inc., 982 F2d 693, 702 (2d Cir. 1992), and variations upon this test, have been used.

To what extent can the analysis required by these tests be assisted by the use of automated tools? There has been considerable research over the past decade into tools that can help identify plagiarism by students in an academic setting. Among the solutions developed for software comparison are Sherlock, MOSS, jPlag and Yap3. At a basic level these tools may have applicability to the analysis of copyright infringement and trade secret misappropriation by employees in a corporate setting. However, the problem of plagiarism in academic coursework has characteritics that differ from the corporate setting. There can be several ways in which these techniques can be improved for use in the context of litigation. These include:

The internal logic for some of these analysis tools is not in the public domain. It would therefore be difficult to prove the reliability of the techniques used in a court of law. However, where the tools identify identical software code and are just a productivity tool to help find this code, this may be less of an issue.
The techniques focus primarily on source code at a syntactic level. There is inadequate attention paid to physical data structure, user interface and systems interface designs. In practice these are sometimes harder to modify and therefore a useful way to compare a suspect system with its suspected origin.
The techniques do not analyze or compare high-level logical constructs in the software which are again more difficult to modify without incurring significant costs.
The techniques are focused on helping an educator identify fraud from within a potentially large body of submissions, where each submission is relatively small. In contrast, in the case of litigation, the analysis is typically between only two code bases but each may be very large, perhaps including millions of lines of code.
The techniques do little or nothing to address similarities that might be expected to occur between software programs because of the large base of standards, templates and open source software that are now widely available and commonly used.
The techniques focus on the substance of the code almost to the exclusion of the style of the code. While substance is important, a comparison of stylistic attributes that are typically embodied in programming standards may also help identify copyright infringement.
There are known techniques for obfuscation of software code to avoid detection by common tools. The analysis tools need to be extended to look specifically for the code structure most commonly produced by these obfuscation tools.
The representation of analysis results remains largely technical. There is a critical need to be able to represent results in a way that would be meaningful to a jury.

Care and attention are therefore required in both the selection and application of tools for software comparison to be used in a court of law. These then need to be supported by a formal methodology to make the process of software comparison repeatable. Steps that should normally be included are:

Identification of proprietary and high complexity algorithms, including any that may be covered by patent.
Collection and analyis of programming standards applied to the original software..
Identification of 3rd Party or Open Source software used in the original code software.
Identification of industry standards for data schema used in the original code software
Interviews with developers and architects of the original code software.